Chapter 2 Presenting Data in Tables and Charts Note: • Sections 2.1 & 2.2 - examining data from 1 numerical variable. • Section 2.3 - examining data from 2 numerical variables. • Section 2.4 - examining data from 1 categorical variable (read). • Section 2.5 - examining data from 2 categorical variables. Section 2.1 Organizing Numerical Data Examining One Numerical Variable. Ordered Array • Array of data ordered from smallest to largest value – Makes it easier to see the extreme values and where the majority of values are located. Using Excel • Data | Sort • Select the heading of the column you want to sort by first. Choose ascending or descending. • Select the heading of the column you wanted to sort by second. Choose ascending or descending. Etc. • Choose appropriate button “Header row” or “No header row”. Stem & Leaf Display • Shows how the data varies over a range of observations • Separates data according to leading digits (stems) and trailing digits (leaves). Stem & Leaf Display Data 74 74.3 74.6 78.4 79.8 80.2 81.4 82.0 84.7 86.0 89.2 Stem Unit of 1 74 3 6 75 76 77 78 4 79 8 80 2 81 4 82 83 84 7 85 86 86 88 89 2 Stem & Leaf Display Data 74 74.3 74.6 78.4 79.8 80.2 81.4 82.0 84.7 86.0 89.2 x 7 458 8 00159 Stem unit: 10 Using PHStat 7 4 4 5 8 10 8012569 The 10 in the top right cell shows that the number rounds to 80 but is in the 70’s Using PHStat to create a Stem & Leaf Display • PHStat | Descriptive Statistics | Stem-and-Leaf Display • Enter range of values • If selection contains a heading, leave selected “First cell contains a label”. • Select Stem Unit • Enter Title Section 2.2 Tables And Charts For Numerical Data Examining One Numerical Variable The Frequency Distribution • Data is arranged into class groupings. • Creating class groupings – Number of classes • Depends on number of observations • Typically 5 <= class groupings < 15 – Intervals should be the same width. Use the following: • Width of interval = Range / Number of class groupings – Avoid overlapping classes Frequency Distribution (continued) • Consists of the number of occurrences of a value fitting within the range of each interval. • Advantage - Data characteristics can be approximated. • Disadvantage - Individual values are lost due to the grouping. Ex. Given the following data: 74 74.3 74.6 78.4 79.8 80.2 81.4 82.0 84.7 86.0 89.2 Number of classes. Width of interval Lets choose 5 89.2 - 74 = 3.04 5 Approx. 3 Frequency Distribution Interval 74 - 77 77 - 80 80 - 83 83 - 86 86 - 89 89 - 92 Frequency 3 2 3 1 1 1 Right boundary is not included. Using PHStat to create a Frequency Distribution • PHStat | Descriptive Statistics | Frequency Distribution • Enter the variable cell range • Enter the bin cell range • If you selected the heading when selecting the data, leave selected “First cell in each range contains label”. • Leave selected “Single Group Variable” • Enter title of your choice. Bin (Used for PHStat only) • Contains the values that approximate the maximum value of each class. • For example: – If your intervals are, • -20.0 to -10.0 • -10.0 to 0.0 • 0 to 10.0 • 10.0 to 20.0 – Your bin values could be • • • • -10.1 -0.1 9.9 19.9 Bin Values Intervals If your data were recorded with 2 places after the decimal, your bin values would be: -10.01 -.01 9.99 19.99 Example See the file Sec2.2.xls Relative Frequency Distribution • First create a Frequency Distribution. • The values in the Relative Frequency Distribution are formed by dividing the frequency of each value within each class by the total number of values. • The Relative Frequency Distribution contains the proportion of times a value occurs within each class. Relative Frequency Distribution Interval 74 - 77 77 - 80 80 - 83 83 - 86 86 - 89 89 - 92 Total Frequency 3 2 3 1 1 1 11 Relative Frequency 3/11 = .2727 2/11 = .1818 3/11 = .2727 1/11 = .0909 1/11 = .0909 1/11 = .0909 Percentage Distribution • First create a Relative Frequency Distribution • The values in the Percentage Distribution are formed by multiplying each proportion in the Rel. Freq. Dist. by 100. Percentage Distribution Interval Freq. Rel. Freq. 0 - 74 0 74 - 77 3 77 - 80 2 80 - 83 3 83 - 86 1 86 - 89 1 89 - 92 1 Total 11 0.00 .2727 .1818 .2727 .0909 .0909 .0909 Percentage Freq. 0% 27.27% 18.18% 27.27% 9.09% 9.09% 9.09% Benefit of a Relative Frequency Distribution or Percentage Distribution • Essential when comparing two sets of data consisting of a different number of values. For example: Study 1 Study 2 2 2 5 8 5 occurs 1/5 times. 1/5 = 0.2 2 9 Or 20% of the time 5 2 8 5 5 5 8 5 2 5 5 5 occurs 7/12 times. 7/12 = 0.583 Or 58.3% of the time Cumulative Percentage Distribution • Demonstrates the growth over the classes. Cumulative Percentage Distribution Interval Rel.Fq. Cumulative Dist. 0 - 74 0.00 0% = 0.0% 74 - 77 0.2727 0% = 0.0% 77 - 80 0.1818 27.27% = 27.27% 80 - 83 0.2727 27.27% + 18.18% = 45.45% 83 - 86 0.0909 27.27% + 18.18% + 27.27% = 72.72% 86 - 89 0.0909 27.27% + 18.18% + 27.27% 9.09% =81.81% 89 - 92 0.0909 27.27% + 18.18% + 27.27% + 9.09% + 9.09% = 90.9% 92 - 95 0.00 27.27% + 18.18% + 27.27% + 9.09% + 9.09% + 9.09% = 99.99% Total .9999 Cumulative Percentage Distribution • Top of Pg. 56. SOLUTION From Table 2.5 ... • Error Using PHStat to create a Percentage or Cumulative Percentage Distribution • These are automatically generated when you create a Frequency distribution. Class Midpoint • Point halfway between the boundaries of each class. Histogram • Using a picture to demonstrate data. • Describes the numerical data that has been grouped into a frequency, relative frequency, or percentage distribution. • The random variable of interest is displayed along the horizontal axis (x-axis). • The number, proportion or percentage of values per class are plotted along the vertical axis (y-axis) Histogram 3 2.5 2 1.5 1 0.5 0 0 - 74 74 - 77 77 - 80 80 - 83 83 - 86 86 - 89 89 - 92 92 - 95 Frequency Polygon (same info as Histogram) • Using a picture to demonstrate data. • Describes the numerical data that has been grouped into a frequency, relative frequency, or percentage distribution. • The random variable of interest is displayed along the horizontal axis (x-axis). • The number, proportion or percentage of values per class are plotted along the vertical axis (y-axis) Polygon 3.5 Frequency 3 2.5 2 1.5 1 0.5 0 0 - 74 74 - 77 77 - 80 80 - 83 83 - 86 86 - 89 89 - 92 92 - 95 Using PHStat to create a Histogram & Polygon • PHStat | Descriptive Statistics | Histogram & Polygons • Enter the Variable Cell Range • Enter the Bin Cell Range • Enter the Midpoints Cell Range • If the first row contains headings, leave selected “First cell in each range contains label”. • Select “Multiple Groups - Unstacked”. • Enter title of your choice • Leave check boxes on default selection. Section 2.3 • Graphing Bivariate Numerical Data • Examining 2 numerical variables. Scatter Diagram • Used to demonstrate the relationship between to numerical variables. • One numerical variable is plotted on the x-axis. • The other numerical variable is plotted on the y-axis. • The result is a point on the x-y plane. Example • Cholesterol Level 200 176 115 100 120 199 151 100 150 • Meat Consumption in Ounces / Day 24 21 8 3 3 30 26 6 15 Scatter Diagram of previous data: Meat Consumption in Ounces / Day 35 30 25 20 15 10 5 0 0 50 100 150 Cholesterol Level 200 250 Section 2.4 • Tables and charts for categorical data • Covered in CSC 199 – Read Section 2.5 • Tabulating and Graphing Bivariate Categorical Data • Use a Contingency Table or a Side-By-Side Chart. Contingency Table • Also called, “Cross-Classification Table” • Used to study the values from two categorical variables. Example: A sample of 20 graduates was taken and each individual was asked: 1. What was your major? 2. What is your salary level? <= $30,000 $30,000 - $50,000 >= $50,000 Degree Year in School English >=$50,000 Math $30,000 - $50,000 Math <= $30,000 English $30,000 - $50,000 English <= $30,000 Philosophy $30,000 - $50,000 Philosophy <= $30,000 English >=$50,000 Philosophy <= $30,000 Math >=$50,000 Math $30,000 - $50,000 Math >=$50,000 Math >=$50,000 English $30,000 - $50,000 A count of the number of degrees within each salary range. Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 1 2 2 5 Math 1 2 3 6 Philosophy 2 1 0 3 Grand Total 4 5 5 14 Each value is divided by the total (12) Percentages based on overall total Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 7.14% 14.29% 14.29% 35.71% Math 7.14% 14.29% 21.43% 42.86% Philosophy 14.29% 7.14% 0.0% 21.43% Total 28.57% 35.71% 35.71% 100.00% Percentages based on overall total Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 7.14 % 14.29 % 14.29 % 35.71 % Math 7.14 % 14.29 % 21.43 % 42.86 % Philosophy 14.29 % 7.14 % 0.0 % 21.43 % Total 28.57 % 35.71 % 35.71 % 100.00 % 28.57 % of all polled make $30,000 or under. 42.86 % of all polled majored in math. 21.43 % of all polled majored in math and make $50,000 or more. A count of the number of degrees within each salary range. Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 1 2 2 5 Math 1 2 3 6 Philosophy 2 1 0 3 Grand Total 4 5 5 14 Each value is divided by the total of its row. Percentages based on row total Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 20.00 % 40.00 % 40.00 % 100.00 % Math 16.67 % 33.33 % 50.00 % 100.00 % Philosophy 66.67 % 33.33 % 0.0 % 100.00 % Total 28.57 % 35.71 % 35.71 % 100.00 % Percentages based on row total Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 20.00 % 40.00 % 40.00 % 100.00 % Math 16.67 % 33.33 % 50.00 % 100.00 % Philosophy 66.67 % 33.33 % 0.0 % 100.00 % Total 28.57 % 35.71 % 35.71 % 100.00 % Of those who majored in math, 50.00 % make $50,000 or more. Of those who majored in philosophy, 66.67 % make $30,000 or less. A count of the number of degrees within each salary range. Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 1 2 2 5 Math 1 2 3 6 Philosophy 2 1 0 3 Grand Total 4 5 5 14 Each value is divided by the total of its column Percentages based on column total Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 25.00 % 40.00 % 40.00 % 35.71 % Math 25.00 % 40.00 % 60.00 % 42.86 % Philosophy 50.00 % 20.00 % 0.0 % 21.43 % Total 100.00 % 100.00 % 100.00 % 100.00 % Percentages based on column total Degree <= $30,000 $30,000 - $50,000 >= $50,000 Total English 25.00 % 40.00 % 40.00 % 35.71 % Math 25.00 % 40.00 % 60.00 % 42.86 % Philosophy 50.00 % 20.00 % 0.0 % 21.43 % Total 100.00 % 100.00 % 100.00 % 100.00 % Of those who make $30,000 or less, 50.00 % majored in philosophy Of those who make between $30,000 and $50,000, 20.00 % majored in philosophy. Side-By-Side Chart • Visual display of bivariate categorical data. • Used to detect relationships in the data. Consider the following data: NC SC NE IL Percentage of Pop. that is literate 93 89 99 98 Percent of crime-related deaths 10 15 4 5 Side-By-Side Chart of the previous data IL NE Crime Rate Literacy Rate SC NC 0 50 100 150 See the following: • Excel Handbook for Chapter 2 • Pg. 93 - 104