CHAPTER 2: Visual Description of Data to accompany Introduction to Business Statistics fourth edition, by Ronald M. Weiers Presentation by Priscilla Chaffe-Stengel Donald N. Stengel © 2002 The Wadsworth Group Chapter 2 - Learning Objectives • Convert raw data into a data array. • Construct: – a frequency distribution. – a relative frequency distribution. – a cumulative relative frequency distribution. • Construct a stem-and-leaf diagram. • Visually represent data by using graphs and charts. © 2002 The Wadsworth Group Chapter 2 - Key Terms • Data array – An orderly presentation of data in either ascending or descending numerical order. • Frequency Distribution – A table that represents the data in classes and that shows the number of observations in each class. © 2002 The Wadsworth Group Chapter 2 - Key Terms • Frequency Distribution – – – – – Class - The category Frequency - Number in each class Class limits - Boundaries for each class Class interval - Width of each class Class mark - Midpoint of each class © 2002 The Wadsworth Group Sturges’ rule • How to set the approximate number of classes to begin constructing a frequency distribution. k 1 3.322 (log n ) 10 where k = approximate number of classes to use and n = the number of observations in the data set . © 2002 The Wadsworth Group How to Construct a Frequency Distribution 1. Number of classes Choose an approximate number of classes for your data. Sturges’ rule can help. 2. Estimate the class interval Divide the approximate number of classes (from Step 1) into the range of your data to find the approximate class interval, where the range is defined as the largest data value minus the smallest data value. 3. Determine the class interval Round the estimate (from Step 2) to a convenient value. © 2002 The Wadsworth Group How to Construct a Frequency Distribution, cont. 4. Lower Class Limit Determine the lower class limit for the first class by selecting a convenient number that is smaller than the lowest data value. 5. Class Limits Determine the other class limits by repeatedly adding the class width (from Step 2) to the prior class limit, starting with the lower class limit (from Step 3). 6. Define the classes Use the sequence of class limits to define the classes. © 2002 The Wadsworth Group Converting to a Relative Frequency Distribution 1. Retain the same classes defined in the frequency distribution. 2. Sum the total number of observations across all classes of the frequency distribution. 3. Divide the frequency for each class by the total number of observations, forming the percentage of data values in each class. © 2002 The Wadsworth Group Forming a Cumulative Relative Frequency Distribution 1. List the number of observations in the lowest class. 2. Add the frequency of the lowest class to the frequency of the second class. Record that cumulative sum for the second class. 3. Continue to add the prior cumulative sum to the frequency for that class, so that the cumulative sum for the final class is the total number of observations in the data set. © 2002 The Wadsworth Group Forming a Cumulative Relative Frequency Distribution, cont. 4. Divide the accumulated frequencies for each class by the total number of observations -- giving you the percent of all observations that occurred up to an including that class. • An Alternative: Accrue the relative frequencies for each class instead of the raw frequencies. Then you don’t have to divide by the total to get percentages. © 2002 The Wadsworth Group Example: Problem 2.53 • The average daily cost to community hospitals for patient stays during 1993 for each of the 50 U.S. states was given in the next table. – a) Arrange these into a data array. – b) Construct a stem-and-leaf display. – *) Approximately how many classes would be appropriate for these data? [*not in textbook] – c & d) Construct a frequency distribution. State interval width and class mark. – e) Construct a histogram, a relative frequency distribution, and a cumulative relative frequency distribution. © 2002 The Wadsworth Group Problem 2.53 - The Data AL AK AZ AR CA CO CT DE FL GA $775 1,136 1,091 678 1,221 961 1,058 1,024 960 775 HI 823 ID 659 IL 917 IN 898 IA 612 KS 666 KY 703 LA 875 ME 738 MD 889 MA MI MN MS MO MT NE NV NH NJ 1,036 902 652 555 863 482 626 900 976 829 NM 1,046 NY 784 NC 763 ND 507 OH 940 OK 797 OR 1,052 PA 861 RI 885 SC 838 SD 506 TN 859 TX 1,010 UT 1,081 VT 676 VA 830 WA 1,143 WV 701 WI 744 WY 537 © 2002 The Wadsworth Group Problem 2.53 - (a) Data Array CA 1,221 WA 1,143 AK 1,136 AZ 1,091 UT 1,081 CT 1,058 OR 1,052 NM 1,046 MA 1,036 DE 1,024 TX NH CO FL CH IL MI NV IN MD 1,010 976 961 960 940 917 902 900 898 889 RI LA MO PA TN SC VA NJ HI OK 885 875 863 861 859 838 830 829 823 797 NY AL GA NC WI ME KY WV AR VT 784 775 775 763 744 738 703 701 678 676 KS ID MN NE IA MS WY ND SD MT 666 659 652 626 612 555 537 507 506 482 © 2002 The Wadsworth Group Problem 2.53 - (b) The Stem-and-Leaf Display Stem-and-Leaf Display Leaf Unit: 100 1 12 2 11 8 10 7 9 (11) 8 9 7 7 6 4 5 1 4 N = 50 21 43, 36 91, 81, 58, 52, 46, 36, 24, 10 76, 61, 60, 40, 17, 02, 00 98, 89, 85, 75, 63, 61, 59, 38, 30, 29, 23 97, 84, 75, 75, 63, 44, 38, 03, 01 78, 76, 66, 59, 52, 26, 12 55, 37, 07, 06 82 Range: $482 - $1,221 © 2002 The Wadsworth Group Problem 2.53 - Continued • To approximate the number of classes we should use in creating the frequency distribution, use Sturges’ Rule, n = 50: k 13.322(log n)13.322(log 50) 10 10 13.322(1.69897)15.6446.6447 Sturges’ rule suggests we use approximately 7 classes. © 2002 The Wadsworth Group Constructing the Frequency Distribution • Step 1. Number of classes – Sturges’ Rule: approximately 7 classes. The range is: $1,221 – $482 = $739 $739/7 $106 and $739/8 $92 • Steps 2 & 3. The Class Interval – So, if we use 8 classes, we can make each class $100 wide. © 2002 The Wadsworth Group Constructing the Frequency Distribution • Step 4. The Lower Class Limit – If we start at $450, we can cover the range in 8 classes, each class $100 in width. The first class : $450 up to $550 • Steps 5 & 6. Setting Class Limits $450 up to $550 $550 up to $650 $650 up to $750 $750 up to $850 $850 up to $950 $950 up to $1,050 $1,050 up to $1,150 $1,150 up to $1,250 © 2002 The Wadsworth Group Problem 2.53 - (c) & (d) Average daily cost Number $450 – under $550 4 $550 – under $650 3 $650 – under $750 9 $750 – under $850 9 $850 – under $950 11 $950 – under $1,050 7 $1,050 – under $1,150 6 $1,150 – under $1,250 1 Interval width: $100 Mark $500 $600 $700 $800 $900 $1,000 $1,100 $1,200 © 2002 The Wadsworth Group Problem 2.53 - (e) The Histogram 12 10 8 6 4 2 0 500 600 700 800 900 1000 1100 1200 © 2002 The Wadsworth Group Problem 2.53 - The Relative Frequency Distribution Average daily cost $450 – under $550 $550 – under $650 $650 – under $750 $750 – under $850 $850 – under $950 $950 – under $1,050 $1,050 – under $1,150 $1,150 – under $1,250 Number 4 3 9 9 11 7 6 1 Rel. Freq. 4/50 = .08 3/50 = .06 9/50 = .18 9/50 = .18 11/50 = .22 7/50 = .14 6/50 = .12 1/50 = .02 © 2002 The Wadsworth Group Problem 2.53 - (e) The Percentage 0.25 Polygon 0.2 0.15 0.1 0.05 0 0 200 400 600 800 1000 1200 1400 © 2002 The Wadsworth Group Problem 2.53 - The Cumulative Frequency Distribution Average daily cost $450 – under $550 $550 – under $650 $650 – under $750 $750 – under $850 $850 – under $950 $950 – under $1,050 $1,050 – under $1,150 $1,150 – under $1,250 Number 4 3 9 9 11 7 6 1 Cum. Freq. 4 7 16 25 36 43 49 50 © 2002 The Wadsworth Group Problem 2.53 - The Cumulative Relative Frequency Distribution Average daily cost $450 – under $550 $550 – under $650 $650 – under $750 $750 – under $850 $850 – under $950 $950 – under $1,050 $1,050 – under $1,150 $1,150 – under $1,250 Cum.Freq. 4 7 16 25 36 43 49 50 Cum.Rel.Freq. 4/50 = .02 7/50 = .14 16/50 = .32 25/50 = .50 36/50 = .72 43/50 = .86 49/50 = .98 50/50 = 1.00 © 2002 The Wadsworth Group Problem 2.53 - (e) The Percentage Ogive (Less Than) 50 45 40 35 30 25 20 15 10 5 0 0 200 400 600 800 1000 1200 © 2002 The Wadsworth Group The Scatter Diagram • A scatter diagram is a two-dimensional plot of data representing values of two quantitative variables. • x, the independent variable, on the horizontal axis • y, the dependent variable, on the vertical axis • Four ways in which two variables can be related: 1. Direct 2. Inverse 3. Curvilinear 4. No relationship © 2002 The Wadsworth Group An Example: Problem 2.38 • For 6 local offices of a large tax preparation firm, the following data describe x = service revenues and y = expenses for supplies, freight, postage, etc. • Draw a scatter diagram representing the data. Does there appear to be any relationship between the variables? If so, is the relationship direct or inverse? © 2002 The Wadsworth Group Problem 2.38, continued Scatter Plot with Trend Line y = Office 25.0 Expenses (thous) 23.0 21.0 19.0 17.0 15.0 200.0 300.0 400.0 500.0 600.0 x = Service Revenue (thous) There appears to be a direct relationship between the service revenue and the office expenses incurred. © 2002 The Wadsworth Group