Chapter 1 1-1 Student Lecture Notes Organizing and Presenting Data Graphically Descriptive Statistics – Presenting Data with Graphs and Tables, Learning Objectives In this chapter you learn: Data in raw form are usually not easy to use for decision making To develop tables and charts for categorical data Some type of organization is needed Techniques reviewed here: To develop tables and charts for numerical data The principles of properly presenting graphs Table Graph Bar charts and pie charts Pareto diagram Ordered array Stem-and-leaf display Frequency distributions, histograms and polygons Cumulative distributions and ogives Contingency tables Scatter diagrams Chap 2-1 Chap 2-2 Descriptive Statistics – Graphs and Tables Organizing Numerical Data Organizing Numerical Data Tabulating and Graphing Univariate Numerical Data The Ordered Array and Stem-Leaf Display Frequency Distributions: Tables, Histograms, Polygons Cumulative Distributions: Tables, the Ogive Graphing Bivariate Numerical Data Tabulating and Graphing Univariate Categorical Data The Summary Table Bar and Pie Charts, the Pareto Diagram Numerical Data Ordered Array 21, 24, 24, 26, 27, Tabulating and Graphing Bivariate Categorical Data Contingency Tables Side by Side Bar Charts 27, 30, 32, 38, 41 Frequency Distributions and Cumulative Distributions Stem and Leaf Display 2 144677 3 028 Histograms 4 1 Graphical Excellence and Common Errors in Presenting Data Tables Chap 2-3 Statistics for Managers Using Microsoft Excel, 2/e 41, 24, 32, 26, 27, 27, 30, 24, 38, 21 Ogive Polygons Chap 2-4 © 1999 Prentice-Hall, Inc. Chapter 1 1-2 Student Lecture Notes Using other stem units Organizing Numerical Data (continued) Data in Raw Form (as Collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 Data in Ordered Array from Smallest to Largest: Largest 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 A sequence of data in rank order: Shows range (min to max) Provides some signals about variability within the range May help identify outliers (unusual observations) If the data set is large, the ordered array is less useful 2 Stem-and-Leaf Display: Separate the sorted data series into leading digits (the stem) and the trailing digits (the leaves) Using the 100’s digit as the stem: The completed stem-and-leaf display: Round off the 10’s digit to form the leaves: 613 becomes stem 6 and leaves 1 Data: 613, 632, 658, 717, 722, 750, 776, 827, 841, 859, 863, 891, 894, 906, 928, 933, 955, 982, 1034, 1047,1056, 1140, 1169, 1224 144677 3 028 4 1 (continued) Stem 6 Leaves 136 7 2258 8 346699 9 13368 10 356 11 47 12 2 Chap 2-5 Chap 2-6 Tabulating and Graphing Numerical Data Numerical Data Ordere d Array Stem and Leaf Display Tabulating Numerical Data: Frequency Distributions 41, 24, 32, 26, 27, 27, 30, 24, 38, 21 Frequency Distributions and Cumulative Distributions What is a Frequency Distribution? A frequency distribution is a list or a table … containing class groupings (ranges within which the data fall) ... and the corresponding frequencies with which data fall within each grouping or category Why Use a Frequency Distribution? Ogive 120 100 80 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 60 2 144677 40 It is a way to summarize numerical data It condenses the raw data into a more useful form... It allows for a quick visual interpretation of the data 20 0 10 3 028 4 1 Histograms 20 30 40 50 60 Ogive 7 6 5 Tables 4 Polygons 3 2 1 0 10 20 30 40 50 60 Chap 2-7 Statistics for Managers Using Microsoft Excel, 2/e Chap 2-8 © 1999 Prentice-Hall, Inc. Chapter 1 1-3 Student Lecture Notes Tabulating Numerical Data: Frequency Distributions Class Intervals and Class Boundaries Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature Each class grouping has the same width Determine the width of each interval by Width of interval ≅ Sort Raw Data in Ascending Order 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 range number of desired class groupings Usually at least 5 but no more than 15 groupings Class boundaries never overlap Round up the interval width to get desirable endpoints Find Range: 58 - 12 = 46 Select Number of Classes: 5 (usually between 5 and 15) Compute Class Interval (Width): 10 (46/5 then round up) Determine Class Boundaries (Limits):10, 20, 30, 40, 50, 60 Compute Class Midpoints: Count Observations & Assign to Classes 15, 25, 35, 45, 55 Chap 2-9 Chap 2-10 Graphing Numerical Data: The Histogram Frequency Distributions, Relative Frequency Distributions and Percentage Distributions in Excel: Data in Ordered Array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Class Freq Relative Freq Relative Freq (%) Cumulative Freq Cumulative Relative Freq Cumulative Relative Freq (%) 10 but under 20 3 0.15 15% 3 0.15 15% 20 but under 30 6 0.3 30% 9 0.45 45% 30 but under 40 5 0.25 25% 14 0.7 70% 40 but under 50 4 0.2 20% 18 0.9 90% 50 but under 60 2 0.1 10% 20 1 100% 20 100% Tools/Data Analysis/Histogram A graph of the data in a frequency distribution is called a histogram The class boundaries (or class midpoints) are shown on the horizontal axis the vertical axis is either frequency, relative frequency, or percentage Bars of the appropriate heights are used to represent the number of observations within each class Chap 2-11 Statistics for Managers Using Microsoft Excel, 2/e Chap 2-12 © 1999 Prentice-Hall, Inc. Chapter 1 1-4 Student Lecture Notes Graphing Numerical Data: The Frequency Polygon Histogram Example Class Midpoint Frequency 15 25 35 45 55 (No gaps between bars) 3 6 5 4 2 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 His togram : Daily High Te m pe rature 7 6 3 2 1 5 15 25 35 45 Class Midpoints 55 Frequency Polygon: Daily High Temperature 7 6 65 (In a percentage polygon the vertical axis would be defined to show the percentage of observations per class) 5 4 3 2 1 0 5 15 25 35 45 55 The Ogive (Cumulative % Polygon) 65 Class Midpoints Chap 2-13 Graphing Cumulative Frequencies: Chap 2-14 Tables and Charts for Univariate Categorical Data Categorical Data Ogive: Daily High Temperature 100 C um u lative Percen tag e Less than 10 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 3 6 5 4 2 4 Class Boundaries Lower Cumulative class boundar Percentage y 0 0 10 15 20 45 30 70 40 90 50 100 15 25 35 45 55 5 0 Class Class Midpoint Frequency Class Frequency 10 but less than 20 20 but less than 30 30 but less than 40 40 but less than 50 50 but less than 60 Frequency Class Graphing Data Tabulating Data 80 60 40 Summary Table 20 Bar Charts Pie Charts Pareto Diagram 0 10 10 20 20 30 30 40 40 50 50 60 60 Class Boundaries (Not Midpoints) Chap 2-15 Statistics for Managers Using Microsoft Excel, 2/e Chap 2-16 © 1999 Prentice-Hall, Inc. Chapter 1 1-5 Student Lecture Notes Graphing Univariate Categorical Data The Summary Table Summarize data by category Categorical Data Example: Current Investment Portfolio Investment Amount Percentage Type (in thousands $) (%) (Variables are Categorical) Stocks Bonds CD Savings 46.5 32.0 15.5 16.0 Graphing Data Tabulating Data The Summary Table 42.27 29.09 14.09 14.55 Pie Charts CD S aving s Bar Charts Bo nd s Total 110.0 100.0 Sto ck s 0 10 20 30 40 45 50 40 Pareto Diagram 12 0 10 0 35 30 80 25 60 20 15 40 10 20 5 Chap 2-17 0 0 S toc ks Bar Chart Example Stocks Bonds CD Savings Total Amount Percentage (in thousands $) (%) 46.5 32.0 15.5 16.0 110.0 100.0 Chap 2-18 CD Current Investment Portfolio Investment Type 42.27 29.09 14.09 14.55 S avin gs Pie Chart Example Current Investment Portfolio Investment Type B on ds Investor's Portfolio Savings CD Amount Percentage (in thousands $) (%) Stocks Bonds CD Savings 46.5 32.0 15.5 16.0 42.27 29.09 14.09 14.55 Total 110.0 100.0 Savings 15% Stocks 42% CD 14% Bonds Stocks 0 10 20 30 40 Bond s 29% 50 Amount in $1000's Chap 2-19 Statistics for Managers Using Microsoft Excel, 2/e Percentages are rounded to the nearest percent Chap 2-20 © 1999 Prentice-Hall, Inc. Chapter 1 1-6 Student Lecture Notes Tabulating and Graphing Multivariate Categorical Data Pareto Diagram Used to portray categorical data (nominal scale) A bar chart, where categories are shown in descending order of frequency A cumulative polygon is often shown in the same graph Used to separate the “vital few” from the “trivial many” Axis for bar chart shows % invested in each category 45% 100% 40% 90% 80% 35% 70% 30% 60% 25% Axis for line graph shows cumulative % invested 50% 20% 40% 15% 30% 10% 20% 5% 10% 0% Contingency Table for Investment Choices ($1000’s) Investment Category Investor B Investor C Total Stocks 46.5 55 27.5 129 Bonds CD Savings 32.0 15.5 16.0 44 20 28 19.0 13.5 7.0 95 49 51 Total 110.0 147 67.0 324 (Individual values could also be expressed as percentages of the overall total, percentages of the row totals, or percentages of the column totals) 0% Stocks Bonds Savings Chap 2-21 CD Chap 2-22 Tabulating and Graphing Multivariate Categorical Data Side-by-Side Chart Example (continued) Investor A Side-by-side bar charts Sales by quarter for three sales territories: East W e st North C omparing Investors 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr 20.4 27.4 59 20.4 30.6 38.6 34.6 31.6 45.9 46.9 45 43.9 60 Savings 50 CD 40 Bonds East West North 30 Stock s 20 0 10 Investor A 20 30 40 Investor B 50 10 60 0 Investor C 1st Qtr 2nd Qtr Chap 2-23 Statistics for Managers Using Microsoft Excel, 2/e 3rd Qtr 4th Qtr Chap 2-24 © 1999 Prentice-Hall, Inc. Chapter 1 1-7 Student Lecture Notes Time Series Plot Example Scatter Diagram Scatter plot and correlation online scatter plot online and more 23 131 24 120 26 140 29 151 33 160 38 167 41 185 42 170 50 188 55 195 60 200 Scatter Diagrams are used to examine possible relationships between two numerical variables one variable is measured on the vertical axis and the other variable is measured on the horizontal axis A Time Series Plot is used to study patterns in the values of a variable over time The Time Series Plot: one variable is measured on the vertical axis and the time period is measured on the horizontal axis Year Number of Franchises 1996 43 1997 54 120 250 1998 60 100 200 1999 73 2000 82 2001 95 2002 107 2003 99 2004 95 Cost per Day vs. Production Volume 150 100 50 0 0 10 20 30 40 Volum e pe r Day 50 60 Number of Franchises, 1996-2004 Nu mb e r o f F ra n ch ise s Cost per day Cost per Day Volume per day 80 60 40 20 0 1994 70 1996 2002 2004 2006 Chap 2-26 Principles of Graphical Excellence Misusing Graphs and Ethical Issues Well-Designed Presentation of Data that Provides: Substance Statistics Design Communicate Complex Ideas with Clarity, Precision and Efficiency Gives the Largest Number of Ideas in the Most Efficient Manner Almost Always Involves Several Dimensions Telling the Truth about the Data Chap 2-27 Statistics for Managers Using Microsoft Excel, 2/e 2000 Ye ar Chap 2-25 Guidelines for good graphs: Do not distort the data Avoid unnecessary adornments (no “chart junk”) Use a scale for each axis on a two-dimensional graph The vertical axis scale should begin at zero Properly label all axes The graph should contain a title Use the simplest graph for a given set of data 1998 Chap 2-28 © 1999 Prentice-Hall, Inc. Chapter 1 1-8 Student Lecture Notes ‘Chart Junk’ Errors in Presenting Data Using ‘Chart Junk’ Bad Presentation No Relative Basis in Comparing Data between Groups Compressing the Vertical Axis No Zero Point on the Vertical Axis Good Presentation Minimum Wage Minimum Wage 1960: $1.00 4 $ 1970: $1.60 2 1980: $3.10 0 1960 1990: $3.80 1970 1980 1990 Chap 2-29 Chap 2-30 No Relative Basis Bad Presentation Good Presentation A’s received by students Freq. 300 200 30 % 100 10 0 0 FR SO JR SR Compressing Vertical Axis Bad Presentation A’s received by students Good Presentation Quarterly Sales 200 $ 50 $ Quarterly Sales 20 FR SO JR SR 100 25 0 0 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 FR = Freshmen, SO = Sophomore, JR = Junior, SR = Senior Chap 2-31 Statistics for Managers Using Microsoft Excel, 2/e Chap 2-32 © 1999 Prentice-Hall, Inc. Chapter 1 1-9 Student Lecture Notes No Zero Point on Vertical Axis Bad Presentation 45 $ Monthly Sales 45 42 39 36 42 39 Chapter Summary Organized Numerical Data Tabulated and Graphed Univariate Numerical Data Good Presentation Monthly Sales $ 36 J F M A M J 0 J F M A M J Graphing the first six months of sales The Ordered Array and Stem-Leaf Display Frequency Distributions: Tables, Histograms, Polygons Cumulative Distributions: Tables, the Ogive Graphed Bivariate Numerical Data Tabulated and Graphed Univariate Categorical Data The Summary Table Bar and Pie Charts, the Pareto Diagram Tabulated and Graphed Bivariate Categorical Data Contingency Tables Side by Side Charts Discussed Graphical Excellence and Common Errors in Presenting Data Chap 2-33 Chap 2-34 Chapter Summary Data in raw form are usually not easy to use for decision making -- Some type of organization is needed: ♦ Table ♦ Graph Techniques reviewed in this chapter: Bar charts, pie charts, and Pareto diagrams Ordered array and stem-and-leaf display Frequency distributions, histograms and polygons Cumulative distributions and ogives Contingency tables and side-by-side bar charts Scatter diagrams and time series plots Chap 2-35 Statistics for Managers Using Microsoft Excel, 2/e © 1999 Prentice-Hall, Inc.