427 12.1 Organizing and Presenting Data Introduction Statistics is a branch of mathematics and procedures that involves collecting, organizing, presenting, analyzing, and interpreting data for the purpose of drawing conclusions and making a decision. Statistics is divided into two categories: ■■ Descriptive statistics ■■ Inferential statistics Descriptive statistics deals with the organizing, presenting, and summarizing of raw data to present meaningful information. Inferential statistics deals with the analysis of a sample drawn from a larger population to develop meaningful inferences about the population based on sample results. Population refers to all possible individuals, objects, or measurements of items of interest. This is usually of large or infinite quantity. For example, the ages of all college students. Sample refers to a set of data drawn from the population. It is a subset of a population, meaning a portion or part of the population. For example, the ages of a representative sample of 200 college students. Population Sample The descriptive values for a population are called parameters and that for a sample are called statistics. Parameters are usually represented by Greek letters (μ, σ) and statistics are usually represented by lowercase English letters (x, s). Normally, we may not have access to the whole population we are interested in investigating. Therefore, population parameters are often estimated from the sample statistics. Sample statistics are calculated from the actual data observed or measured from the sample. For example, assume that there are 40 students in a particular math class of a college. If 80% of the students passed an exam, this 80% is referred to as a “parameter”, because it includes the marks of all 40 students. However, if this class is selected as the representative math class of all the math classes in the college, then the 80% is referred to as a “statistic”, because it represents a sample of the population. In this section, the types of data, levels of measurement, and various methods for organizing and presenting data using tables and graphs will be outlined. 12.1 Organizing and Presenting Data 428 Types of Variables and Levels of Measurement Types of Variables A collection of facts and information obtained in a study is known as the data. The variables within a dataset may be numerical or non-numerical, and classified as: ■■ Quantitative variables ■■ Qualitative variables Quantitative Variables Quantitative variables are data that are expressed using numbers and are known as numeric data. These data are further classified as continuous variables or discrete variables. Continuous variables are obtained by measuring. Measurements of length, weight, time, temperature, etc., are examples of continuous variables. These can be measured in whole units, approximated or rounded whole units, fractions, or decimal numbers (with any number of decimal places). Discrete variables are obtained by counting, or are data that can only take on specific values. The number of students in a class, number of chapters in a book, position in a race, shoe size, etc., are examples of discrete variables. Also, any quantitative data that does not belong to the continuous variable classification, falls into the discrete variable classification. Qualitative Variables Qualitative variables are data that are expressed non-numerically and are known as non-numerical data. These data can be classified into categories and are also known as categorical data. Make and model of cars, colour, gender, etc., are examples of qualitative variables. Variables Qualitative (Non - Numeric) Quantitative (Numeric) Continuous (By Measuring) Discrete (By Counting) Levels of Measurement Levels of measurement are rules that describe the properties of numbers that are measured and the way in which they can be used to provide additional information on the data. There are four levels of measurement: Nominal, Ordinal, Interval, and Ratio. The properties used to classify these levels of measurement are: order (rank), meaningful difference (interval between measurements), and meaningful zero point. Levels of Measurement Chapter 12 | Basic Statistics and Probability Order (Rank) Meaningful Difference Meaningful Zero Nominal No No No Ordinal Yes No No Interval Yes Yes No Ratio Yes Yes Yes 429 Levels of Measurement Properties Examples • Have no order, but numbers may be assigned for Gender, Religion, Country referencing and differentiating purposes using of birth, Colour, etc. codes. Nominal • The interval between measurements is not meaningful. • No meaningful zero point. • Qualitative data and usually classified using letters, symbols, or names. High or low, Level of satisfaction, Rating • The interval between measurements is not of movies, GPA (A=4, B=3, C=2...), etc. meaningful. • Have order by their relative position. Ordinal • No meaningful zero point. • Qualitative data and usually classified using letters, symbols, or numbers. Temperature, Dates, Years, Sea level, etc. • Have order by their relative position. • Meaningful intervals between measurements. Interval • No meaningful zero point. (The zero point is located arbitrarily.) • Quantitative data but measurements cannot be multiplied or divided. Percent, Age, Weight, Speed, etc. • Have order by their relative position. • Meaningful intervals between measurements. Ratio • Meaningful zero point. • Quantitative data and measurements can be multiplied or divided. Statistical Representations of Data Stem-and-Leaf Plot A stem-and-leaf plot is one method of displaying data to show the spread of data and the location of where most of the data points lie. The method is simply a sorting technique to arrange the data from the lowest to the highest value, which is known as an array. In this display, the set of numbers is re-written, so that the last digit (unit or ones digit) becomes the leaf and the other digits become the stem. The stems are written vertically and the leaves are written horizontally. A stem-and-leaf plot shows the exact values of individual data values. For example, for a two digit number, 38, the stem is the tens digit number 3 (written on the left side), and the leaf is the unit digit number, 8 (written on the right side), as shown on the stem-and-leaf plot above. For a three digit number, 156, the stem is 15 and the leaf is 6. For a one digit number, 8, the stem is 0 and the leaf is 8. For decimal numbers, all the digits including the decimal point will be the stem and all the decimal values will be the leaves. The stem consists of all the digits except the right-most digit (ones digit) of the number. Stem The leaf consists of the rightmost digit of the number (ones digit). Leaf This represents the data 38. 3 6 8 4 1 2 7 8 5 1 4 4 5 6 This branch represents the data 51, 54, 54, 55, and 56. 12.1 Organizing and Presenting Data 430 The following example illustrates the procedure for constructing a stem-and-leaf plot. Example 12.1-a Constructing a Stem-and-Leaf Plot The marks on a Statistics exam for a sample of 40 students are as follows: 63 74 42 65 51 54 36 56 68 62 64 76 67 79 61 81 77 59 84 68 71 94 71 86 69 75 97 48 82 83 54 79 62 68 58 41 (i) Construct a stem-and-leaf plot to display the data in an array. 57 38 55 47 (ii) Use the stem-and-leaf plot to determine the number of students who scored: a. 70 marks or more b. less than 50 marks. Solution (i) Construct a stem-and-leaf plot to display the data in an array. Step 1: Step 2: Step 3: Identify the lowest and highest stem of the data. Looking at the data, the lowest stem is 3 and the highest stem is 9. Use Step 1 to identify the range in the stem. The stem will have the digits 3, 4, 5, 6, 7, 8, and 9. Draw a vertical line and write out the stem in this order to the left of the line. Starting from the1st data, place each leaf of the number to the right of the vertical line on the corresponding stem, until the last data is recorded. There is no need to use commas on the leaf side. For example, • The first data value is 63. Therefore, the stem is 6 and the leaf is 3. • The second data value is 74. Therefore, the stem is 7 and the leaf is 4. • Continue until the last data, 47, where the stem is 4 and the leaf is 7. Stem # of data Leaf 2 3 6 4 2 8 1 7 4 5 1 4 6 7 9 5 4 8 8 First data, 63 6 3 Second data, 74 7 4 6 9 7 1 1 5 9 8 8 1 4 6 2 3 5 9 4 7 2 Last data, 47 8 5 8 2 4 7 1 8 9 2 8 11 Total = 4 0 Step 4: Rearrange the leaves against each stem, from the smallest to the largest number, to have the numbers displayed in an array. Stem # of data Leaf 3 6 8 2 4 1 2 7 8 4 5 1 4 4 5 6 7 8 9 8 6 1 2 3 3 4 5 7 8 8 8 9 7 1 1 4 5 6 7 9 9 8 8 1 2 3 4 6 5 9 4 7 2 11 Total = 40 Chapter 12 | Basic Statistics and Probability Number of data less than 50 is 2 + 4 = 6. Number of data 70 and above is 8 + 5 + 2 = 15. 431 Solution continued (ii) a. Number of leaves against stem 7 = 8, against stem 8 = 5, and against stem 9 = 2. Therefore, the number of students who scored 70 marks or more is 15. Example 12.1-b b. N umber of leaves against stem 4 = 4 and against stem 3 = 2. Therefore, the number of students who scored less than 50 marks is 6. Interpreting Data in a Stem-and-Leaf Plot The following stem-and-leaf plot shows the number of CDs sold by a salesperson each day in the last 15 days. Stem Leaf 0 6 8 1 0 1 3 4 2 6 8 9 3 0 3 8 9 4 1 4 Calculate the following: (i) Number of CDs sold in the last 15 days. (ii) Highest and lowest sales in a day in the last 15 days. (iii) Number of days 30 or more CDs were sold in the last 15 days. Solution (i) Add all the data values in each row of the stem and leaf plot, Sum of 1st row data Sum of 2 row data Sum of 3rd row data Sum of 4th row data Sum of 5th row data nd 6+8 10 + 11 + 13 + 14 26 + 28 + 29 30 + 33 + 38 + 39 41 + 44 Total = = = = = = 14 48 83 140 85 370 Therefore, the number of CDs sold in the last 15 days = 370. (ii) Highest data value is 44 and the lowest data value is 6. Therefore, the highest sales in a day is 44 and the lowest sales in a day is 6. (iii) Number of leaves against stem 3 = 4 and that against stem 4 = 2. Therefore, the number of days 30 or more CDs were sold = 4 + 2 = 6. Tally Chart A tally chart is another method of collecting and organizing data. A tally chart is used to keep count of the number of times a particular event or data occurs. For each count, a tally mark “|”, a vertical line (or a slant line), in the row against that event or data is used. The fifth tally mark is marked “/”, as a diagonal line (or as a horizontal line) across the four tally marks: “ |||| ”. This helps to count the data in multiples of five. For example, 12 counts of the same item is shown as “ |||| |||| ||” (two groups of five and two = 12). A tally chart helps to produce a frequency table. The number of times an event happens is known as the frequency (f). If the data is ranked using the stem-and-leaf method, then tallying may not be required to produce the frequency table. 12.1 Organizing and Presenting Data 432 Example 12.1-c Constructing a Frequency Table Using a Tally Chart The ages of 35 students in a class were recorded as follows: 18 19 18 20 19 17 18 18 20 19 21 20 17 19 20 18 19 17 19 21 19 20 19 18 20 22 19 21 19 20 19 21 19 21 22 Display the data using a tally chart to show the frequency distribution of ages of students in the class. Solution Example 12.1-d Step 1: Draw 3 columns to represent age, tally, and frequency (f). Age Step 2: Identify the lowest and highest data. 17 is the lowest and 22 is the highest. Therefore, the first column will have 6 entries displaying ages from 17 to 22. 17 ||| 3 18 |||| | 6 19 |||| |||| || 12 20 |||| || 7 21 |||| 5 22 || 2 Step 3: Starting from the first data, use tally marks in Column 2 to count the frequency of the ages in the dataset. Step 4: Total the tally marks in each row to get the frequency distribution of the ages and enter it in Column 3. Frequency ( f ) Tally Total = 35 Interpreting a Tally Chart The tally chart below shows the height of students (in cm) in a class. Height (cm) Frequency ( f ) Tally 140 to under 150 ||| 150 to under 160 |||| || 160 to under 170 |||| |||| |||| 170 to under 180 |||| ||| 180 to under 190 || (i) Complete the frequency column. (ii) Identify the group with the highest frequency and the number of data in that group. (iii) Calculate the total number of data above the group with the highest frequency and express it as a percent of the whole data. Solution Height (cm) (i) Tally Frequency ( f ) 140 to under 150 ||| 3 150 to under 160 |||| || 7 160 to under 170 |||| |||| |||| 170 to under 180 |||| ||| 8 180 to under 190 || 2 15 Total = 35 Chapter 12 | Basic Statistics and Probability (ii) The group with the highest frequency is “160 to under 170” and the number of data in that group is 15. (iii) The number of data above the group with the highest frequency = 8 + 2 = 10. The percent of data above the group with the 10 highest frequency = = 28.5%. 35 433 Scatter Plot A scatter plot is a graph showing pairs of numerical data with the independent variable on a horizontal axis and the dependent variable on a vertical axis. The independent variable is the variable selected for the study and the dependent variable is the variable observed or measured. In a scatter plot, a dot or a small circle is used to represent a single data point for every pair of data measured to illustrate the relationship between the two sets of data. A scatter plot is especially useful when there is paired numerical data, where for a single independent variable, there may be multiple dependent variables. The relationship between two variables is known as their correlation. If the variables are correlated, the data points will fall close to making a line. If the points are equally distributed on a horizontal plane in the scatter plot, the correlation is low, or zero. A correlation is Positive when the slope is positive; i.e., as one variable increases the other variable increases and vice versa. A correlation is Negative when the slope is negative; i.e., as one variable increases the other variable decreases, and vice versa. Y Y Y X Perfect Positive Correlation Example 12.1-e Y X Y X Perfect Negative Correlation X No Correlation Strong Positive Correlation X Strong Negative Correlation Drawing a Scatter Plot of Weight (kg) vs. Height (cm) Construct a scatter plot for the following data and comment on the correlation between the heights and weights of 10 students. 1 2 3 4 5 6 7 8 9 10 Height (cm) 164 145 169 162 181 155 191 151 172 176 Weight (kg) 60 43 62 57 77 55 82 50 64 75 Student Solution Y Scatter Plot of Weight (kg) vs. Height (cm) 80 Weight (kg) 70 60 50 40 X 140 150 160 170 180 190 200 Height (cm) The scatter plot shows a strong positive correlation between the heights and the weights of the students; i.e., as the height increases, the weight increases. 12.1 Organizing and Presenting Data 434 Example 12.1-f Drawing a Scatter Plot of Items Sold (numbers) vs. Price ($) Construct a scatter plot for the following data and comment on the correlation between the price and the number of items sold. Price ($) 15 18 20 25 27 30 35 Items Sold (numbers) 48 40 36 28 24 18 6 Solution Scatter Plot of Items Sold(numbers) vs. Price ($) Y Items Sold (numbers) 50 40 30 20 10 X 10 5 20 15 25 30 35 Price ($) The scatter plot shows a strong negative correlation between the number of items sold and price; i.e., as the price increases, the number of items sold decreases. Line Graphs Line graphs are most often used for representing continuous data. Line graphs are an important feature of mathematics. This is similar to the topic discussed in Chapter 9. The line graph should include the following: ■■ Title of the graph, describing the purpose. ■■ Labelled axes to show the variables and the units of measure used. ■■ Position of origin: (0, 0). Example 12.1-g Drawing Multiple Line Graphs The data showing the daily high and low temperature readings in Toronto for the period from September 15 to September 21 is provided below. Plot the two line graphs for the data. Date Temp (°C) High Temp (°C) Low Solution Sep. 15 20 11 Sep. 16 15 10 Sep. 17 16 8 Sep.18 24 16 Sep. 19 21 15 Sep. 20 20 14 Sep. 21 18 12 Line Graph of Daily High and Low Temperature Readings (°C) from Sep. 15 to Sep. 21. Temperature (oC) Y Graph of daily high temperature readings (oC) 30 25 20 15 10 Graph of daily low temperature readings (oC) 5 Sep. 15 Sep. 16 Sep. 17 Sep. 18 Day Chapter 12 | Basic Statistics and Probability Sep. 19 Sep. 20 Sep. 21 X 435 Example 12.1-h Interpreting a Line Graph The line graph shown below illustrates the monthly sales (in millions of dollars) of a department store for the period from January to December 2014. Y Line Graph of Monthly Sales ($ Millions) for the Year 2014 Sales ($ Million) 7 6 5 4 3 2 1 Jan. Feb. Mar. Apr. May. Jun. Jul. Aug. Sep. Oct. Nov. Dec. X Month (i) Calculate the total sales for the months of May, June, July, and August. (ii) Which month had the lowest sales and what is the amount of sales for that month? (iii) Which month had the highest sales and what is the amount of sales for that month? Solution (i) Sales in May 5 Sales in June 6 Sales in July 5.5 5 Sales in August Total = 21.5 Million Million Million Million Million Therefore, the total sales for the months of May, June, July, and August were 21.5 Million dollars. (ii) The lowest sales amount was in January and the sales amount was 1.5 Million dollars. (iii) The highest sales amount was in October and the sales amount was 7 Million dollars. Pie Chart Pie charts are usually used to summarize and show classes or groups of data in proportion to the whole dataset. The whole pie (circle) represents the total of all the values in the dataset which is 100% and is equal to 360 degrees. The size of each sector represents the percent portion (or fraction) of each category of data. Pie charts are very often used in presenting poll results, expenditures, etc. The pie chart is constructed by first converting each category or group into a percent of the whole and then multiplying this by 360 degrees to determine the number of degrees for the sector of the category being represented in the pie chart. 90° (25%) 180° (50%) 54° Sector representing 15% of the whole 0° (0%) 360° (100%) 270° (75%) For example, 15% of the data is represented by a sector with an angle of 54 degrees (15% of 360° = 54°). The interpretation of a pie chart is based on the fact that the largest ‘slice of pie’ relates to the largest proportion of the data and the smallest ‘slice’ to the smallest proportion. It is therefore, easy to make comparisons between the relative sizes of data items. 12.1 Organizing and Presenting Data 436 Example 12.1-i Drawing a Pie Chart for Given Data Draw a pie chart representing the data using (i) percent and (ii) sector angles. Item Expense ($) Housing Meals Transportation Medicine Miscellaneous Savings Total Solution 15,000 9,000 8,000 5,000 7,000 6,000 $50,000 The expenses totalling $50,000 is 100% and represents 360° in a circle (pie chart). Calculate the percent for each listed expense as a percent of the total. For example, housing expense of $15,000 is $ 15,000 × 100% = 30%. $ 50,000 Similarly, calculate the percent for all the remaining items and complete the 3rd column of the table, as shown below. Calculate the angle that represents each sector in the pie chart by multiplying the calculated percent for each of the items by 360°. For example, the sector angle for the housing expense is 30% of 360° = 108°. Similarly, calculate the sector angle of all the remaining items and complete the 4th column of the table, as shown below. Item Expense Percent Sector Angle $15,000 $9,000 30% 18% 108.0° 64.8° Transportation $8,000 16% 57.6° Medicine $5,000 10% 36.0° Miscellaneous $7,000 14% 50.4° $6,000 12% 43.2° $50,000 100% 360° Housing Meals Savings Total (i) Construct the pie chart using the percent calculated for each of the items. Meals Meals 30%30% 18%18% Transportation Transportation Housing Housing 16%16% 12%12% Savings Savings Miscellaneous Miscellaneous Pie Chart Using Percents Chapter 12 | Basic Statistics and Probability Housing Housing Meals Meals o o 64.864.8 14%14% 10%10% Medicine Medicine (ii) Construct the pie chart using the sector angle calculated for each of the items. o o 57.657.6 Transportation Transportation 36 o36 o o 108108 o o 43.243.2 Savings Savings o o 50.450.4 o Medicine Medicine Miscellaneous Miscellaneous Pie Chart Using Sector Angles 437 Example 12.1-j Interpreting a Pie Chart The final grades of 40 students who passed a Math exam are represented in the pie chart below. Use the pie chart to complete the table. Grade Number of Students Percent A+ 20% A 25% B 30% C 10% D D Sector Angle C A+ 15% Total Solution Grade A+ A B C D Total 40 Number of Students 20% of 40 = 8 25% of 40 = 10 30% of 40 = 12 10% of 40 = 4 15% of 40 = 6 40 100% 360° Percent Sector Angle 20% 25% 30% 10% 15% 100% B 10% 15% 20% 30% 25% A 20% of 360° = 72° 25% of 360° = 90° 30% of 360° = 108° 10% of 360° = 36° 15% of 360° = 54° 360° Bar Chart A bar chart is a graph that uses either horizontal or vertical bars to show comparisons among categories or class intervals of grouped data. The categories or class intervals are plotted on the X-axis. The distribution of the data in these categories or the frequencies associated with the class intervals is plotted on the Y-axis. The width of the base of the rectangle for each category or class interval should be equal. Classes should be set up without any overlap in the data. Bar charts are easy to produce and easy to interpret. The lengths (or heights) of the bars show the quantity of the data in that category or the frequency of that class interval. These are represented by a rectangle with a base that corresponds to a category or class interval and a length (or height) that is proportional to the values that they represent. A bar chart is also used to represent two or more sets of data having the same class interval, side-byside, on one graph. This allows for the data values in these sets to be compared easily. Example 12.1-k Creating Bar Charts Draw a vertical bar chart for the frequencies of grades obtained by students in a Math exam and use the chart to calculate the following: (i) Total number of students graded. (ii) Number of students obtaining a B grade or better. Grade Number of Students A+ A B 8 10 12 C D 4 6 12.1 Organizing and Presenting Data 438 Solution Bar Chart of Number of Students and their Grades Y 14 12 Frequency 12 10 10 8 8 6 6 4 4 2 X A+ A B D C Grade (i) Total number of students graded = 8 + 10 + 12 + 4 + 6 = 40 (ii) Number of students obtaining a B grade or better = 12 + 10 + 8 = 30 Example 12.1-l Interpreting Bar Charts The stacked bar chart below shows the number of cellphones sold by Store A and Store B from January to June. Bar Chart of Number of Cellphones Sold from January to June by Stores A and B Y 100 STORE A 90 STORE B Number of Cellphones Sold 80 70 60 50 40 30 20 10 X January February March April May June Months Use the bar chart to answer the following: (i) What were the total cellphone sales by Stores A and B? (ii) In which months did Store B sell more cellphones than Store A? (iii) I n which particular month did the sales in Store A exceed that of Store B by the greatest difference and by how much more? Solution (i) otal cellphone sales by Store A = 60 + 90 + 90 + 50 + 45 + 75 = 410 T Total cellphone sales by Store B = 70 + 50 + 80 + 70 + 65 + 55 = 390 (ii) The months in which Store B sold more cellphones than Store A are January, April, and May. (iii) Th e month in which there was the greatest difference was February. Store A sold = 90 – 50 = 40 more cellphones. Histogram and Frequency Polygon Histogram A histogram is similar to a vertical bar chart in which the categories or class intervals are marked on a horizontal axis and the class frequencies are represented by the heights of the bars. However, in histograms, there should be no space between the rectangle of a class interval and the rectangle of an adjoining class interval. That is, the bars are drawn adjacent to each other. Chapter 12 | Basic Statistics and Probability 439 Frequency Polygon The frequency polygon is the line joining the midpoints of the bars of a histogram. An additional class interval on both ends of the histogram is created so that the frequency polygon starts and ends at the mid-points of the class intervals at the X-axis. Example 12.1-m Creating Histogram and Frequency Polygon Draw a histogram and a frequency polygon for the distribution of age groups of 200 employees in a company group as shown below. Age (Class Intervals) 20 to under 30 35 30 to under 40 42 40 to under 50 64 50 to under 60 30 60 to under 70 24 70 to under 80 5 Total Solution Number of Employees (Frequency) 200 Histogram and Frequency Polygon of Number of Employees vs. Age Group Y 100 Frequency 90 80 70 60 Frequency Polygon 50 Histogram 40 Midpoint of the bar 30 20 10 X Additional 20 to 40 to 50 to 60 to 70 to Additional 30 to class under 30 under 40 under 50 under 60 under 70 under 80 class Age Group Note: Labels on X-axis representing the class interval can be in any of the following formats: 40-50, 50-60, 60-70…., or 40 to 50, 50 to 60…, or 40 to under 50, 50 to under 60…, or as mid-points of class intervals (45, 55, 65…). Frequency Distributions A frequency distribution is a method to summarize large amounts of data without displaying each value of the observation. It groups the data into different class intervals and indicates the number of observations that fall into the given class interval, known as the frequency, f. In a frequency distribution, the class widths of all intervals should be the same. The smallest value that belongs to a class interval is called the lower class limit, and the largest value that belongs to the class interval is called the upper class limit. The class width refers to the difference between the upper class limit and the lower class limit. 12.1 Organizing and Presenting Data 440 For example, in the class interval “140 to under 150”, the lower class limit is 140 and the upper class limit is below 150. The class width = 150 – 140 = 10 Using the data from Example 12.1-a (shown below), the steps in constructing a frequency distribution table are as follows: Data: The range is the difference between the highest and the lowest value in a dataset. Step 1: 63 62 84 48 74 64 68 82 42 76 71 83 65 67 94 54 51 79 71 79 54 61 86 62 36 81 69 68 56 77 75 58 68 59 97 41 57 38 55 47 First, array the data using the stem-and-leaf method and determine the number of data, highest value, lowest value, and the range. Stem Leaf # of data 3 6 8 2 4 1 2 7 8 4 5 1 4 4 5 6 7 8 9 8 6 1 2 3 3 4 5 7 8 8 89 7 8 1 1 4 5 6 7 9 9 1 2 3 4 6 8 9 4 7 2 11 5 Total = 40 There are 40 data values and the highest value is 97 and the lowest value is 36. The range is 97 – 36 = 61. Step 2: Determine the number of classes and the class interval of each class. (As a guideline, normally, the minimum number of classes is 5 and the highest number of classes is 15.) 61 Range For 5 classes, the width of each class interval for the above data = = = 12.20 # of classes 5 For 15 classes, the width of the class interval for the above data = Therefore, a possible class width is between 4 and 13. Range # of classes = 61 15 = 4.07 Therefore, use a class width of 8. The lowest class limit should accommodate the smallest value of the data, 36. The highest class limit should accommodate the largest value of the data, 97. Therefore, the class intervals are: “34 to under 42”, “42 to under 50”, “50 to under 58”, ... “90 to under 98”. That is, there are 8 classes and each class interval is 8. Step 3: Determine the class frequencies for each class using the stem-and-leaf plot and complete the frequency distribution. Class Interval 34 to under 42 3 42 to under 50 3 50 to under 58 6 58 to under 66 8 66 to under 74 7 74 to under 82 7 82 to under 90 4 90 to under 98 2 Total Chapter 12 | Basic Statistics and Probability Frequency 40 441 Example 12.1-n Creating a Frequency Distribution Table The number of cars sold each month for the last 2 years by a car dealer is as follows: 44 39 51 70 58 15 52 75 68 19 10 84 27 16 26 73 21 33 37 65 55 25 48 80 Group the data into 5 classes and create a frequency distribution table. Solution Array the data using the stem-and-leaf method, as follows: Stem-and-Leaf Data in Array Stem Stem Leaf Leaf 1 5 9 6 0 1 0 5 6 9 2 7 1 5 6 2 1 5 6 7 3 9 3 7 3 3 7 9 4 4 8 4 4 8 5 8 5 1 2 5 1 2 5 8 6 8 5 6 5 8 7 0 5 3 7 0 3 5 8 4 0 8 0 4 Number of classes = 5 (given) There are 24 data values. The highest value is 84 and the lowest value is 10. Range = 84 – 10 = 74 Range 74 = = 14.8 (use 15 for ease of presentation) # of class 5 Therefore, the class interval = 15. Now, the lowest class interval will be “10 to under 25” and the highest class interval will be “70 to 85”. Width of each class = The following is the frequency distribution table: Class Interval Frequency 10 to under 25 5 25 to under 40 6 40 to under 55 4 55 to under 70 4 70 to under 85 5 Total 24 Relative Frequency Distribution and Percent Frequency Distribution Relative Frequency Distribution Relative frequency is the ratio of the frequency of a particular class interval to the total number of observations and is expressed in decimals or fractions. The sum of all relative frequency of a frequency distribution should be equal to one. 12.1 Organizing and Presenting Data 442 Percent Frequency Distribution The percent frequency distribution is calculated by multiplying the relative frequencies of each class interval by 100 and expressing it as a percent. The sum of all relative frequencies of a frequency distribution should be equal to 100%. Example 12.1-o Creating Relative Frequency and Percent Frequency Distributions Use the frequency distribution provided below to create the relative frequency distribution and the percent frequency distribution. Class Interval 30 to under 40 2 40 to under 50 4 50 to under 60 8 60 to under 70 11 70 to under 80 8 80 to under 90 5 90 to under 100 2 Total Solution Frequency 40 Add two columns to the frequency table, one for the relative frequency distribution and one for the percent frequency distribution, as shown below. Class Interval Frequency Relative Frequency Percent Frequency 30 to under 40 2 0.05 5% 40 to under 50 4 0.10 10% 50 to under 60 8 0.20 20% 60 to under 70 11 0.275 27.5% 70 to under 80 8 0.20 20% 80 to under 90 5 0.125 12.5% 2 0.05 5% 40 1 100% 90 to under 100 Total The relative frequency of any class is calculated by dividing the number of observations in that class by the total number of observations. 2 = 0.05 . 40 Similarly, calculate the relative frequency of the remaining class intervals and complete the “Relative Frequency” column. For example, the relative frequency of class “30 to under 40” is The percent frequency distribution is calculated by multiplying the relative frequency by 100. For example, the percent frequency of class “30 to under 40” is 0.05 × 100% = 5%. Similarly, calculate the percent frequency of the remaining class intervals and complete the “Percent Frequency” column. Chapter 12 | Basic Statistics and Probability 443 Cumulative Frequency Distribution and Cumulative Frequency Curve (or Polygon) The cumulative frequency distribution at a given class interval is calculated by adding the frequency at that class interval to the preceding class intervals. That is, the sum of the frequencies of all the class intervals before the class interval in question and the particular class interval in question. Simply put, it is the running total of the frequencies. A curve showing the cumulative frequency plotted against the upper class boundary of the class interval is called a cumulative frequency curve. Example 12.1-p Creating Cumulative Frequency Distribution Use the frequency distribution in Example 12.1-o (also shown below) to create the relative cumulative frequency distribution and the cumulative percent frequency distribution. Also, draw the cumulative frequency curve. Class Interval 30 to under 40 2 40 to under 50 4 50 to under 60 8 60 to under 70 11 70 to under 80 8 80 to under 90 5 90 to under 100 2 Total Solution Frequency 40 Add two columns to the frequency table, one for the cumulative frequency distribution and one for the cumulative percent frequency distribution, as shown below. Class Interval Frequency Cumulative Frequency Cumulative Percent Frequency 30 to under 40 2 2 5% 40 to under 50 4 6 15% 50 to under 60 8 14 35% 60 to under 70 11 25 62.5% 70 to under 80 8 33 82.5% 80 to under 90 5 38 95% 90 to under 100 2 40 100% Total 40 Compute the cumulative frequency of any class by using the total value of the frequency up to and including that class. For example, the cumulative frequency of class “50 to under 60” is 2 + 4 + 8 = 14. Compute the percent cumulative frequency by dividing the cumulative frequency distribution of that class by the total number of observations and convert the answer to a percent. 14 For example, the cumulative percent distribution of class “50 to under 60” is 100%== 35% 35%. × 100% 40 12.1 Organizing and Presenting Data 444 Solution Y continued Graph of Cumulative Frequency and Cumulative Percent Frequency vs. Marks 100 % 40 35 Cumulative Frequency 25 50 % 20 15 Cumulative Percent 75 % 30 25 % 10 5 X 10 20 30 40 50 60 70 80 90 100 Marks 12.1 Exercises Answers to odd-numbered problems are available at the end of the textbook. 1. Identify the following variables as continuous or discrete: a. Height of students in a class b. Room temperature c. Net profit of a company d. Position in class 2. Identify the following variables as continuous or discrete: a. Number of seasons b. Weight of a person c. Time between the arrivals of two flights d. Air pressure 3. Identify the following variables as quantitative or qualitative: a. Marks on an exam b. Seasons of a year c. Amount of rainfall d. Mode 4. Identify the following variables as quantitative or qualitative: a. Height of a person b. Letter grade in an exam c. Median d. Model of a car 5. Identify the levels of measurements (nominal, ordinal, interval, or ratio) for the following measurements: a. Pant size b. Mean c. Annual salary of individuals d. Title of a person in a company 6. Identify the levels of measurements (nominal, ordinal, interval, or ratio) for the following measurements: a. Places of birth b. Ages of people c. Hours spent watching TV d. Temperature of ice in degrees C 7. Construct a stem-and-leaf plot to display the following data in an array: 39 31 34 32 18 31 44 19 19 13 37 43 29 25 21 38 27 28 8. Construct a stem-and-leaf plot to display the following data in an array: 7 22 17 16 18 31 19 26 9 Chapter 12 | Basic Statistics and Probability 5 20 16 37 32 13 25 11 35 445 9. Construct a stem-and-leaf plot to display the following data in an array: 65 75 95 77 78 80 81 48 73 55 92 81 51 52 59 45 69 88 85 84 64 82 70 97 10. Construct a stem-and-leaf plot to display the following data in an array: 76 72 89 95 84 83 77 88 85 75 62 59 78 58 97 66 52 87 97 92 80 71 91 65 11. The following data was collected from a sample survey of 40 first-year students who were asked to indicate their favourite subject among the four subjects, Statistics (S), Marketing (M), Accounting (A), and Finance Math (F). A M S M S M F A M S S S M F M S S A A A M M S F A M F A S S M F S S M S A A S F Organize the data in a frequency table using a tally chart. 12. The following data was collected from a sample survey of 40 students who were asked to indicate the mode of transportation that they normally use to get to college during the summer term. Their choices were walking (W), bicycling (B), taking public transportation (P), and driving a car (C). C W W P B B W C C P P P B P C W C B B B B P W W P C W B W W C C P W C P C P B B Organize the data in a frequency table using a tally chart. 13. The following are the letter grades obtained by 200 students in Finance Math, in the business program of a college: Grade Number of Students A+ 24 A 30 B 36 C 52 D 42 F 16 Total 200 Percent Cumulative Percent Angle Cumulative Angle a. Complete the above table for percent, cumulative percent, angle, and cumulative angle. b. Draw a pie chart using either the percent or the angle measure. 14. Victoria kept a record of the average number of hours she spent on different activities during the weekdays. The information is provided below: Activity Number of Hours School 7.0 Meals 1.0 Homework 2.0 Travel 2.5 Sleep 8.0 Other 3.5 Total 24 Percent Cumulative Percent Angle Cumulative Angle a. Complete the above table for percent, cumulative percent, angle, and cumulative angle. b. Draw a pie chart using either the percent or the angle measure. 12.1 Organizing and Presenting Data 446 15. A store’s monthly sales (in thousands of dollars) for last year were as follows: Month Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sep. Oct. Sales ($ Thousands) 45 52 74 78 70 95 98 120 105 89 Nov. Dec. 80 92 Draw a line graph representing the data. 16. The number of houses sold by a developer for the period from 2006 to 2014 is provided below: Year Number of houses sold 2006 2007 2008 2009 2010 2011 2012 2013 2014 82 110 130 145 90 75 128 160 180 Draw a line graph representing the data. 17. Use a scatter plot to determine the relationship (if any) between the price of an item and the number of items sold: Price ($) 60 61 62 64 66 68 Number of items sold 190 182 176 116 104 87 18. Use a scatter plot to determine the relationship (if any) between age and income: Age (years) 21 27 35 41 46 52 56 Income ($ Thousands) 38 51 53 64 72 76 80 19. The frequency distribution below was constructed from data collected from a sample of 100 professors at a college. Construct a histogram and a frequency polygon for the data. Years of Teaching Frequency 0 to under 5 10 5 to under 10 14 10 to under 15 39 15 to under 20 24 20 to under 25 13 20. The frequency distribution below was constructed from data collected from a sample of 30 students at a college. Construct a histogram and a frequency polygon for the data. Height (in cm) Frequency 155 to under 160 2 160 to under 165 5 165to under 170 9 170 to under 175 7 175 to under 180 4 180 to under 185 3 21. Use the frequency distribution in Problem 19 to compute the following: a. Relative frequency distribution. b. Percent frequency distribution. 22. Use the frequency distribution in Problem 20 to compute the following: a. Relative frequency distribution. b. Percent frequency distribution. 23. Use the frequency distribution in Problem 19 to compute the following: a. Cumulative frequency distribution. b. Cumulative percent frequency distribution. c. Cumulative frequency and cumulative percent curve. 24. Use the frequency distribution in Problem 20 to compute the following: a. Cumulative frequency distribution. b. Cumulative percent frequency distribution. c. Cumulative frequency and cumulative percent curve. Chapter 12 | Basic Statistics and Probability