.. . .. . .. . .. . SLIDES BY John Loucks St. Edward’s University © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 1 Chapter 2, Part B Descriptive Statistics: Tabular and Graphical Displays Summarizing Data for Two Variables Using Tables Summarizing Data for Two Variables Using Graphical Displays Data Visualization: Best Practices in Creating Effective Graphical Displays © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 2 Summarizing Data for Two Variables Using Tables Thus far we have focused on methods that are used to summarize the data for one variable at a time. Often a manager is interested in tabular and graphical methods that will help understand the relationship between two variables. Crosstabulation is a method for summarizing the data for two variables. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 3 Crosstabulation A crosstabulation is a tabular summary of data for two variables. Crosstabulation can be used when: • one variable is categorical and the other is quantitative, • both variables are categorical, or • both variables are quantitative. The left and top margin labels define the classes for the two variables. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 4 Crosstabulation Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative categorical variable variable Home Style Price Colonial Log Split A-Frame Total Range < $250,000 > $250,000 18 12 6 14 19 16 12 3 55 Total 30 20 35 15 100 © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 45 Slide 5 Crosstabulation Example: Finger Lakes Homes Insights Gained from Preceding Crosstabulation • The greatest number of homes (19) in the sample are a split-level style and priced at less than $250,000. • Only three homes in the sample are an A-Frame style and priced at $250,000 or more. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 6 Crosstabulation Example: Finger Lakes Homes Frequency distribution for the price range variable Home Style Log Split A-Frame Price Range Colonial < $250,000 > $250,000 18 12 6 14 19 16 12 3 55 Total 30 20 35 15 100 Total 45 Frequency distribution for the home style variable © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 7 Crosstabulation: Row or Column Percentages Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 8 Crosstabulation: Row Percentages Example: Finger Lakes Homes Price Range Colonial < $250,000 > $250,000 32.73 26.67 Home Style Log Split A-Frame 10.91 31.11 34.55 35.56 21.82 6.67 Total 100 100 Note: row totals are actually 100.01 due to rounding. (Colonial and > $250K)/(All > $250K) x 100 = (12/45) x 100 © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 9 Crosstabulation: Column Percentages Example: Finger Lakes Homes Price Range Colonial < $250,000 > $250,000 60.00 40.00 30.00 70.00 54.29 45.71 80.00 20.00 Total 100 100 100 100 Home Style Log Split A-Frame (Colonial and > $250K)/(All Colonial) x 100 = (12/30) x 100 © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 10 Crosstabulation: Simpson’s Paradox Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation. We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation. In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data. The reversal of conclusions based on aggregate and unaggregated data is called Simpson’s paradox. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 11 Summarizing Data for Two Variables Using Graphical Displays In most cases, a graphical display is more useful than a table for recognizing patterns and trends. Displaying data in creative ways can lead to powerful insights. Scatter diagrams and trendlines are useful in exploring the relationship between two variables. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 12 Scatter Diagram and Trendline A scatter diagram is a graphical presentation of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis. The general pattern of the plotted points suggests the overall relationship between the variables. A trendline provides an approximation of the relationship. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 13 Scatter Diagram A Positive Relationship y x © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 14 Scatter Diagram A Negative Relationship y x © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 15 Scatter Diagram No Apparent Relationship y x © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 16 Scatter Diagram Example: Panthers Football Team The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions 1 3 2 1 3 y = Number of Points Scored 14 24 18 17 30 © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 17 Scatter Diagram and Trendline Number of Points Scored y 35 30 25 20 15 10 5 0 0 1 x 2 3 4 Number of Interceptions © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 18 Example: Panthers Football Team Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored. • Higher points scored are associated with a higher number of interceptions. • The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 19 Side-by-Side Bar Chart A side-by-side bar chart is a graphical display for depicting multiple bar charts on the same display. Each cluster of bars represents one value of the first variable. Each bar within a cluster represents one value of the second variable. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 20 Side-by-Side Bar Chart Finger Lake Homes 20 18 Frequency 16 14 < $250,000 > $250,000 12 10 8 6 4 2 Colonial Log Split-Level A-Frame Home Style © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 21 Stacked Bar Chart A stacked bar chart is another way to display and compare two variables on the same display. It is a bar chart in which each bar is broken into rectangular segments of a different color. If percentage frequencies are displayed, all bars will be of the same height (or length), extending to the 100% mark. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 22 Stacked Bar Chart Finger Lake Homes 40 36 Frequencies displayed Frequency 32 28 < $250,000 > $250,000 24 20 16 12 8 4 Colonial Log Split A-Frame Home Style © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 23 Percentage Frequency Stacked Bar Chart Finger Lake Homes 100 90 Percentage frequencies displayed 80 70 < $250,000 > $250,000 60 50 40 30 20 10 Colonial Log Split A-Frame Home Style © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 24 Data Visualization: Best Practices in Creating Effective Graphical Displays Data visualization describes the use of graphical displays to summarize and present information about a data set. The goal is to communicate as effectively and clearly as possible the key information about the data. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 25 Creating Effective Graphical Displays Creating effective graphical displays is as much art as it is science. Here are some guidelines . . . Give the display a clear and concise title. Keep the display simple. Clearly label each axis and provide the units of measure. If colors are used, make sure they are distinct. If multiple colors or lines are used, provide a legend. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 26 Choosing the Type of Graphical Display Displays used to show the distribution of data: Bar Chart to show the frequency distribution or relative frequency distribution for categorical data Pie Chart to show the relative frequency or percent frequency for categorical data Dot Plot to show the distribution for quantitative data over the entire range of the data Histogram to show the frequency distribution for quantitative data over a set of class intervals Stem-and-Leaf Display to show both the rank order and shape of the distribution for quantitative data © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 27 Choosing the Type of Graphical Display Displays used to make comparisons: Side-by-Side Bar Chart Stacked Bar Chart to compare two variables to compare the relative frequency or percent frequency of two categorical variables © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 28 Choosing the Type of Graphical Display Displays used to show relationships: Scatter Diagram Trendline to show the relationship between two quantitative variables to approximate the relationship of data in a scatter diagram © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 29 Data Dashboards A data dashboard is a widely used data visualization tool. It organizes and presents key performance indicators (KPIs) used to monitor an organization or process. It provides timely, summary information that is easy to read, understand, and interpret. Some additional guidelines include . . . Minimize the need for screen scrolling. Avoid unnecessary use of color or 3D. Use borders between charts to improve readability. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 30 Data Dashboard Example © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 31 Tabular and Graphical Displays Data Categorical Data Tabular Displays • Frequency Distribution • Rel. Freq. Dist. • Percent Freq. Distribution • Crosstabulation Graphical Displays • Bar Chart • Pie Chart • Side-by-Side Bar Chart • Stacked Bar Chart Quantitative Data Tabular Displays • Frequency Distribution • Rel. Freq. Dist. • % Freq. Dist. • Cum. Freq. Dist. • Cum. Rel. Freq. Distribution • Cum. % Freq. Distribution • Crosstabulation Graphical Displays • Dot Plot • Histogram • Stem-andLeaf Display • Scatter Diagram © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 32 End of Chapter 2, Part B © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. Slide 33