12.1 Organizing and Presenting Data

advertisement
427
12.1 Organizing and Presenting Data
Introduction
Statistics is a branch of mathematics and procedures that involves collecting, organizing, presenting,
analyzing, and interpreting data for the purpose of drawing conclusions and making a decision.
Statistics is divided into two categories:
■■ Descriptive statistics
■■ Inferential statistics
Descriptive statistics deals with the organizing, presenting, and summarizing of raw data to present
meaningful information.
Inferential statistics deals with the analysis of a sample drawn from a larger population to develop
meaningful inferences about the population based on sample results.
Population refers to all possible individuals, objects, or measurements of items of interest. This is
usually of large or infinite quantity. For example, the ages of all college students.
Sample refers to a set of data drawn from the population. It is a subset of a population, meaning a
portion or part of the population. For example, the ages of a representative sample of 200 college
students.
Population
Sample
The descriptive values for a population are called parameters and that for a sample are called statistics.
Parameters are usually represented by Greek letters (μ, σ) and statistics are usually represented by
lowercase English letters (x, s).
Normally, we may not have access to the whole population we are interested in investigating.
Therefore, population parameters are often estimated from the sample statistics. Sample statistics are
calculated from the actual data observed or measured from the sample.
For example, assume that there are 40 students in a particular math class of a college. If 80% of the
students passed an exam, this 80% is referred to as a “parameter”, because it includes the marks of all
40 students. However, if this class is selected as the representative math class of all the math classes in
the college, then the 80% is referred to as a “statistic”, because it represents a sample of the population.
In this section, the types of data, levels of measurement, and various methods for organizing and
presenting data using tables and graphs will be outlined.
12.1 Organizing and Presenting Data
428
Types of Variables and Levels of Measurement
Types of Variables
A collection of facts and information obtained in a study is known as the data. The variables within a
dataset may be numerical or non-numerical, and classified as:
■■ Quantitative variables
■■ Qualitative variables
Quantitative Variables
Quantitative variables are data that are expressed using numbers and are known as numeric data.
These data are further classified as continuous variables or discrete variables.
Continuous variables are obtained by measuring. Measurements of length, weight, time, temperature,
etc., are examples of continuous variables. These can be measured in whole units, approximated or
rounded whole units, fractions, or decimal numbers (with any number of decimal places).
Discrete variables are obtained by counting, or are data that can only take on specific values. The
number of students in a class, number of chapters in a book, position in a race, shoe size, etc., are
examples of discrete variables. Also, any quantitative data that does not belong to the continuous
variable classification, falls into the discrete variable classification.
Qualitative Variables
Qualitative variables are data that are expressed non-numerically and are known as non-numerical
data. These data can be classified into categories and are also known as categorical data. Make and
model of cars, colour, gender, etc., are examples of qualitative variables.
Variables
Qualitative
(Non - Numeric)
Quantitative
(Numeric)
Continuous
(By Measuring)
Discrete
(By Counting)
Levels of Measurement
Levels of measurement are rules that describe the properties of numbers that are measured and the
way in which they can be used to provide additional information on the data. There are four levels of
measurement: Nominal, Ordinal, Interval, and Ratio.
The properties used to classify these levels of measurement are: order (rank), meaningful difference
(interval between measurements), and meaningful zero point.
Levels of
Measurement
Chapter 12 | Basic Statistics and Probability
Order (Rank)
Meaningful
Difference
Meaningful Zero
Nominal
No
No
No
Ordinal
Yes
No
No
Interval
Yes
Yes
No
Ratio
Yes
Yes
Yes
429
Levels of
Measurement
Properties
Examples
• Have no order, but numbers may be assigned for Gender, Religion, Country
referencing and differentiating purposes using of birth, Colour, etc.
codes.
Nominal
• The interval between measurements is not meaningful.
• No meaningful zero point.
• Qualitative data and usually classified using letters,
symbols, or names.
High or low, Level of satisfaction, Rating
• The interval between measurements is not of movies, GPA (A=4, B=3, C=2...), etc.
meaningful.
• Have order by their relative position.
Ordinal
• No meaningful zero point.
• Qualitative data and usually classified using letters,
symbols, or numbers.
Temperature, Dates, Years, Sea level, etc.
• Have order by their relative position.
• Meaningful intervals between measurements.
Interval
• No meaningful zero point. (The zero point is
located arbitrarily.)
• Quantitative data but measurements cannot be
multiplied or divided.
Percent, Age, Weight, Speed, etc.
• Have order by their relative position.
• Meaningful intervals between measurements.
Ratio
• Meaningful zero point.
• Quantitative data and measurements can be
multiplied or divided.
Statistical Representations of Data
Stem-and-Leaf Plot
A stem-and-leaf plot is one method of displaying data to show the spread of data and the location of
where most of the data points lie. The method is simply a sorting technique to arrange the data from
the lowest to the highest value, which is known as an array.
In this display, the set of numbers is re-written, so that the last digit (unit or ones digit) becomes the
leaf and the other digits become the stem. The stems are written vertically and the leaves are written
horizontally. A stem-and-leaf plot shows the exact values of individual data values.
For example, for a two digit number, 38, the stem is the tens digit number 3 (written on the left side), and
the leaf is the unit digit number, 8 (written on the right side), as shown on the stem-and-leaf plot above.
For a three digit number, 156, the stem is 15 and the leaf is 6.
For a one digit number, 8, the stem is 0 and the leaf is 8.
For decimal numbers, all the digits including the decimal point will be the stem and all the decimal
values will be the leaves.
The stem
consists of all
the digits except
the right-most
digit (ones digit)
of the number.
Stem
The leaf consists of the rightmost digit of the number
(ones digit).
Leaf
This represents the data 38.
3
6
8
4
1
2
7
8
5
1
4
4
5
6
This branch represents the
data 51, 54, 54, 55, and 56.
12.1 Organizing and Presenting Data
430
The following example illustrates the procedure for constructing a stem-and-leaf plot.
Example 12.1-a
Constructing a Stem-and-Leaf Plot
The marks on a Statistics exam for a sample of 40 students are as follows:
63
74
42
65
51
54
36
56
68
62
64
76
67
79
61
81
77
59
84
68
71
94
71
86
69
75
97
48
82
83
54
79
62
68
58
41
(i) Construct a stem-and-leaf plot to display the data in an array.
57
38
55
47
(ii) Use the stem-and-leaf plot to determine the number of students who scored:
a. 70 marks or more
b. less than 50 marks.
Solution
(i) Construct a stem-and-leaf plot to display the data in an array.
Step 1:
Step 2:
Step 3:
Identify the lowest and highest stem of the data.
Looking at the data, the lowest stem is 3 and the highest stem is 9.
Use Step 1 to identify the range in the stem. The stem will have the digits 3, 4, 5, 6, 7, 8,
and 9. Draw a vertical line and write out the stem in this order to the left of the line.
Starting from the1st data, place each leaf of the number to the right of the vertical line on
the corresponding stem, until the last data is recorded. There is no need to use commas
on the leaf side.
For example,
• The first data value is 63. Therefore, the stem is 6 and the leaf is 3.
• The second data value is 74. Therefore, the stem is 7 and the leaf is 4.
• Continue until the last data, 47, where the stem is 4 and the leaf is 7.
Stem
# of data
Leaf
2
3
6
4
2 8 1 7
4
5
1 4 6 7 9 5 4 8
8
First data, 63
6
3
Second data, 74
7
4 6 9 7 1 1 5 9
8
8
1 4 6 2 3
5
9
4 7
2
Last data, 47
8
5 8 2 4 7 1 8 9 2 8
11
Total = 4 0
Step 4:
Rearrange the leaves against each stem, from the smallest to the largest number, to have the
numbers displayed in an array.
Stem
# of data
Leaf
3
6 8
2
4
1 2 7 8
4
5
1 4 4 5 6 7 8 9
8
6
1 2 3 3 4 5 7 8 8 8 9
7
1 1 4 5 6 7 9 9
8
8
1 2 3 4 6
5
9
4 7
2
11
Total = 40
Chapter 12 | Basic Statistics and Probability
Number of data less than 50
is 2 + 4 = 6.
Number of data
70 and above is
8 + 5 + 2 = 15.
431
Solution
continued
(ii) a. Number of leaves against stem 7 = 8, against stem 8 = 5, and against stem 9 = 2.
Therefore, the number of students who scored 70 marks or more is 15.
Example 12.1-b
b. N
umber of leaves against stem 4 = 4 and against stem 3 = 2.
Therefore, the number of students who scored less than 50 marks is 6.
Interpreting Data in a Stem-and-Leaf Plot
The following stem-and-leaf plot shows the number of CDs sold by a salesperson each day in the last
15 days.
Stem
Leaf
0
6 8
1
0 1 3 4
2
6 8 9
3
0 3 8 9
4
1 4
Calculate the following:
(i)
Number of CDs sold in the last 15 days.
(ii) Highest and lowest sales in a day in the last 15 days.
(iii) Number of days 30 or more CDs were sold in the last 15 days.
Solution
(i) Add all the data values in each row of the stem and leaf plot,
Sum of 1st row data
Sum of 2 row data
Sum of 3rd row data
Sum of 4th row data
Sum of 5th row data
nd
6+8
10 + 11 + 13 + 14
26 + 28 + 29
30 + 33 + 38 + 39
41 + 44
Total
=
=
=
=
=
=
14
48
83
140
85
370
Therefore, the number of CDs sold in the last 15 days = 370.
(ii) Highest data value is 44 and the lowest data value is 6.
Therefore, the highest sales in a day is 44 and the lowest sales in a day is 6.
(iii) Number of leaves against stem 3 = 4 and that against stem 4 = 2.
Therefore, the number of days 30 or more CDs were sold = 4 + 2 = 6.
Tally Chart
A tally chart is another method of collecting and organizing data. A tally chart is used to keep count
of the number of times a particular event or data occurs.
For each count, a tally mark “|”, a vertical line (or a slant line), in the row against that event or data is
used. The fifth tally mark is marked “/”, as a diagonal line (or as a horizontal line) across the four tally
marks: “ |||| ”. This helps to count the data in multiples of five.
For example, 12 counts of the same item is shown as “ |||| |||| ||” (two groups of five and two = 12). A
tally chart helps to produce a frequency table. The number of times an event happens is known as the
frequency (f). If the data is ranked using the stem-and-leaf method, then tallying may not be required
to produce the frequency table.
12.1 Organizing and Presenting Data
432
Example 12.1-c
Constructing a Frequency Table Using a Tally Chart
The ages of 35 students in a class were recorded as follows:
18
19
18
20
19
17
18
18
20
19
21
20
17
19
20
18
19
17
19
21
19
20
19
18
20
22
19
21
19
20
19
21
19
21
22
Display the data using a tally chart to show the frequency distribution of ages of students in the class.
Solution
Example 12.1-d
Step 1:
Draw 3 columns to represent age, tally, and frequency (f).
Age
Step 2:
Identify the lowest and highest data.
17 is the lowest and 22 is the highest. Therefore, the
first column will have 6 entries displaying ages from
17 to 22.
17
|||
3
18
|||| |
6
19
|||| |||| ||
12
20
|||| ||
7
21
||||
5
22
||
2
Step 3:
Starting from the first data, use tally marks in Column
2 to count the frequency of the ages in the dataset.
Step 4:
Total the tally marks in each row to get the frequency
distribution of the ages and enter it in Column 3.
Frequency ( f )
Tally
Total = 35
Interpreting a Tally Chart
The tally chart below shows the height of students (in cm) in a class.
Height (cm)
Frequency ( f )
Tally
140 to under 150 |||
150 to under 160
|||| ||
160 to under 170
|||| |||| ||||
170 to under 180
|||| |||
180 to under 190 ||
(i) Complete the frequency column.
(ii) Identify the group with the highest frequency and the number of data in that group.
(iii) Calculate the total number of data above the group with the highest frequency and express it as a
percent of the whole data.
Solution
Height (cm)
(i)
Tally
Frequency ( f )
140 to under 150
|||
3
150 to under 160
|||| ||
7
160 to under 170
|||| |||| ||||
170 to under 180
|||| |||
8
180 to under 190
||
2
15
Total = 35
Chapter 12 | Basic Statistics and Probability
(ii) The group with the highest frequency is “160
to under 170” and the number of data in that
group is 15.
(iii) The number of data above the group with the
highest frequency = 8 + 2 = 10.
The percent of data above the group with the
10
highest frequency =
= 28.5%.
35
433
Scatter Plot
A scatter plot is a graph showing pairs of numerical data with the independent variable on a
horizontal axis and the dependent variable on a vertical axis. The independent variable is the
variable selected for the study and the dependent variable is the variable observed or measured.
In a scatter plot, a dot or a small circle is used to represent a single data point for every pair
of data measured to illustrate the relationship between the two sets of data. A scatter plot is
especially useful when there is paired numerical data, where for a single independent variable,
there may be multiple dependent variables.
The relationship between two variables is known as their correlation. If the variables are
correlated, the data points will fall close to making a line. If the points are equally distributed on
a horizontal plane in the scatter plot, the correlation is low, or zero.
A correlation is Positive when the slope is positive; i.e., as one variable increases the other
variable increases and vice versa.
A correlation is Negative when the slope is negative; i.e., as one variable increases the other
variable decreases, and vice versa.
Y
Y
Y
X
Perfect
Positive Correlation
Example 12.1-e
Y
X
Y
X
Perfect
Negative Correlation
X
No
Correlation
Strong
Positive Correlation
X
Strong
Negative Correlation
Drawing a Scatter Plot of Weight (kg) vs. Height (cm)
Construct a scatter plot for the following data and comment on the correlation between the heights
and weights of 10 students.
1
2
3
4
5
6
7
8
9
10
Height (cm)
164
145
169
162
181
155
191
151
172
176
Weight (kg)
60
43
62
57
77
55
82
50
64
75
Student
Solution
Y
Scatter Plot of Weight (kg) vs. Height (cm)
80
Weight (kg)
70
60
50
40
X
140
150
160
170
180
190
200
Height (cm)
The scatter plot shows a strong positive correlation between the heights and the weights of the
students; i.e., as the height increases, the weight increases.
12.1 Organizing and Presenting Data
434
Example 12.1-f
Drawing a Scatter Plot of Items Sold (numbers) vs. Price ($)
Construct a scatter plot for the following data and comment on the correlation between the price and
the number of items sold.
Price ($)
15
18
20
25
27
30
35
Items Sold (numbers)
48
40
36
28
24
18
6
Solution
Scatter Plot of Items Sold(numbers) vs. Price ($)
Y
Items Sold (numbers)
50
40
30
20
10
X
10
5
20
15
25
30
35
Price ($)
The scatter plot shows a strong negative correlation between the number of items sold and price; i.e., as
the price increases, the number of items sold decreases.
Line Graphs
Line graphs are most often used for representing continuous data. Line graphs are an important
feature of mathematics. This is similar to the topic discussed in Chapter 9.
The line graph should include the following:
■■ Title of the graph, describing the purpose.
■■ Labelled axes to show the variables and the units of measure used.
■■ Position of origin: (0, 0).
Example 12.1-g
Drawing Multiple Line Graphs
The data showing the daily high and low temperature readings in Toronto for the period from
September 15 to September 21 is provided below. Plot the two line graphs for the data.
Date
Temp (°C) High
Temp (°C) Low
Solution
Sep. 15
20
11
Sep. 16
15
10
Sep. 17
16
8
Sep.18
24
16
Sep. 19
21
15
Sep. 20
20
14
Sep. 21
18
12
Line Graph of Daily High and Low Temperature Readings (°C) from Sep. 15 to Sep. 21.
Temperature (oC)
Y
Graph of daily high
temperature readings (oC)
30
25
20
15
10
Graph of daily low
temperature readings (oC)
5
Sep. 15
Sep. 16
Sep. 17 Sep. 18
Day
Chapter 12 | Basic Statistics and Probability
Sep. 19
Sep. 20
Sep. 21
X
435
Example 12.1-h
Interpreting a Line Graph
The line graph shown below illustrates the monthly sales (in millions of dollars) of a department store
for the period from January to December 2014.
Y
Line Graph of Monthly Sales ($ Millions) for the Year 2014
Sales ($ Million)
7
6
5
4
3
2
1
Jan.
Feb.
Mar.
Apr.
May.
Jun.
Jul.
Aug. Sep.
Oct.
Nov.
Dec.
X
Month
(i) Calculate the total sales for the months of May, June, July, and August.
(ii) Which month had the lowest sales and what is the amount of sales for that month?
(iii) Which month had the highest sales and what is the amount of sales for that month?
Solution
(i)
Sales in May
5
Sales in June
6
Sales in July
5.5
5
Sales in August
Total = 21.5
Million
Million
Million
Million
Million
Therefore, the total sales for the months of May, June, July, and August were 21.5 Million dollars.
(ii) The lowest sales amount was in January and the sales amount was 1.5 Million dollars.
(iii) The highest sales amount was in October and the sales amount was 7 Million dollars.
Pie Chart
Pie charts are usually used to summarize and show classes
or groups of data in proportion to the whole dataset. The
whole pie (circle) represents the total of all the values in the
dataset which is 100% and is equal to 360 degrees.
The size of each sector represents the percent portion (or
fraction) of each category of data. Pie charts are very often
used in presenting poll results, expenditures, etc.
The pie chart is constructed by first converting each category
or group into a percent of the whole and then multiplying
this by 360 degrees to determine the number of degrees for
the sector of the category being represented in the pie chart.
90° (25%)
180°
(50%)
54°
Sector representing
15% of the whole
0° (0%)
360° (100%)
270° (75%)
For example, 15% of the data is represented by a sector with an angle of 54 degrees (15% of 360° = 54°).
The interpretation of a pie chart is based on the fact that the largest ‘slice of pie’ relates to the largest
proportion of the data and the smallest ‘slice’ to the smallest proportion. It is therefore, easy to make
comparisons between the relative sizes of data items.
12.1 Organizing and Presenting Data
436
Example 12.1-i
Drawing a Pie Chart for Given Data
Draw a pie chart representing the data using (i) percent and (ii) sector angles.
Item
Expense ($)
Housing
Meals
Transportation
Medicine
Miscellaneous
Savings
Total
Solution
15,000
9,000
8,000
5,000
7,000
6,000
$50,000
The expenses totalling $50,000 is 100% and represents 360° in a circle (pie chart).
Calculate the percent for each listed expense as a percent of the total.
For example, housing expense of $15,000 is $ 15,000 × 100% = 30%.
$ 50,000
Similarly, calculate the percent for all the remaining items and complete the 3rd column of the table,
as shown below.
Calculate the angle that represents each sector in the pie chart by multiplying the calculated percent
for each of the items by 360°.
For example, the sector angle for the housing expense is 30% of 360° = 108°.
Similarly, calculate the sector angle of all the remaining items and complete the 4th column of the
table, as shown below.
Item
Expense
Percent
Sector Angle
$15,000
$9,000
30%
18%
108.0°
64.8°
Transportation
$8,000
16%
57.6°
Medicine
$5,000
10%
36.0°
Miscellaneous
$7,000
14%
50.4°
$6,000
12%
43.2°
$50,000
100%
360°
Housing
Meals
Savings
Total
(i) Construct the pie chart using the percent
calculated for each of the items.
Meals
Meals
30%30%
18%18%
Transportation
Transportation
Housing
Housing
16%16%
12%12%
Savings
Savings
Miscellaneous
Miscellaneous
Pie Chart Using Percents
Chapter 12 | Basic Statistics and Probability
Housing
Housing
Meals
Meals
o
o
64.864.8
14%14%
10%10%
Medicine
Medicine
(ii) Construct the pie chart using the sector
angle calculated for each of the items.
o
o
57.657.6
Transportation
Transportation
36 o36
o
o
108108
o
o
43.243.2
Savings
Savings
o
o
50.450.4
o
Medicine
Medicine
Miscellaneous
Miscellaneous
Pie Chart Using Sector Angles
437
Example 12.1-j
Interpreting a Pie Chart
The final grades of 40 students who passed a Math exam are represented in the pie chart below. Use
the pie chart to complete the table.
Grade
Number of
Students
Percent
A+
20%
A
25%
B
30%
C
10%
D
D
Sector Angle
C
A+
15%
Total
Solution
Grade
A+
A
B
C
D
Total
40
Number of
Students
20% of 40 = 8
25% of 40 = 10
30% of 40 = 12
10% of 40 = 4
15% of 40 = 6
40
100%
360°
Percent
Sector Angle
20%
25%
30%
10%
15%
100%
B
10%
15%
20%
30%
25%
A
20% of 360° = 72°
25% of 360° = 90°
30% of 360° = 108°
10% of 360° = 36°
15% of 360° = 54°
360°
Bar Chart
A bar chart is a graph that uses either horizontal or vertical bars to show comparisons among
categories or class intervals of grouped data. The categories or class intervals are plotted on the
X-axis. The distribution of the data in these categories or the frequencies associated with the class
intervals is plotted on the Y-axis.
The width of the base of the rectangle for each category or class interval should be equal.
Classes should be set up without any overlap in the data.
Bar charts are easy to produce and easy to interpret. The lengths (or heights) of the bars show the
quantity of the data in that category or the frequency of that class interval. These are represented by
a rectangle with a base that corresponds to a category or class interval and a length (or height) that is
proportional to the values that they represent.
A bar chart is also used to represent two or more sets of data having the same class interval, side-byside, on one graph. This allows for the data values in these sets to be compared easily.
Example 12.1-k
Creating Bar Charts
Draw a vertical bar chart for the frequencies of grades obtained by students in a Math exam and use
the chart to calculate the following:
(i) Total number of students graded.
(ii) Number of students obtaining a B grade or better.
Grade
Number of Students
A+
A
B
8
10
12
C
D
4
6
12.1 Organizing and Presenting Data
438
Solution
Bar Chart of Number of Students and their Grades
Y
14
12
Frequency
12
10
10
8
8
6
6
4
4
2
X
A+
A
B
D
C
Grade
(i) Total number of students graded = 8 + 10 + 12 + 4 + 6 = 40
(ii) Number of students obtaining a B grade or better = 12 + 10 + 8 = 30
Example 12.1-l
Interpreting Bar Charts
The stacked bar chart below shows the number of cellphones sold by Store A and Store B from
January to June.
Bar Chart of Number of Cellphones Sold from January to June by Stores A and B
Y
100
STORE A
90
STORE B
Number of Cellphones Sold
80
70
60
50
40
30
20
10
X
January
February
March
April
May
June
Months
Use the bar chart to answer the following:
(i) What were the total cellphone sales by Stores A and B?
(ii) In which months did Store B sell more cellphones than Store A?
(iii) I n which particular month did the sales in Store A exceed that of Store B by the greatest difference
and by how much more?
Solution
(i)
otal cellphone sales by Store A = 60 + 90 + 90 + 50 + 45 + 75 = 410
T
Total cellphone sales by Store B = 70 + 50 + 80 + 70 + 65 + 55 = 390
(ii) The months in which Store B sold more cellphones than Store A are January, April, and May.
(iii) Th
e month in which there was the greatest difference was February. Store A sold = 90 – 50 = 40
more cellphones.
Histogram and Frequency Polygon
Histogram
A histogram is similar to a vertical bar chart in which the categories or class intervals are marked
on a horizontal axis and the class frequencies are represented by the heights of the bars. However, in
histograms, there should be no space between the rectangle of a class interval and the rectangle of an
adjoining class interval. That is, the bars are drawn adjacent to each other.
Chapter 12 | Basic Statistics and Probability
439
Frequency Polygon
The frequency polygon is the line joining the midpoints of the bars of a histogram. An additional
class interval on both ends of the histogram is created so that the frequency polygon starts and ends
at the mid-points of the class intervals at the X-axis.
Example 12.1-m
Creating Histogram and Frequency Polygon
Draw a histogram and a frequency polygon for the distribution of age groups of 200 employees in a
company group as shown below.
Age
(Class Intervals)
20 to under 30
35
30 to under 40
42
40 to under 50
64
50 to under 60
30
60 to under 70
24
70 to under 80
5
Total
Solution
Number of Employees
(Frequency)
200
Histogram and Frequency Polygon of Number of Employees vs. Age Group
Y
100
Frequency
90
80
70
60
Frequency Polygon
50
Histogram
40
Midpoint of the bar
30
20
10
X
Additional 20 to
40 to
50 to
60 to
70 to Additional
30 to
class under 30 under 40 under 50 under 60 under 70 under 80 class
Age Group
Note: Labels on X-axis representing the class interval can be in any of the following formats: 40-50,
50-60, 60-70…., or 40 to 50, 50 to 60…, or 40 to under 50, 50 to under 60…, or as mid-points
of class intervals (45, 55, 65…).
Frequency Distributions
A frequency distribution is a method to summarize large amounts of data without displaying each
value of the observation. It groups the data into different class intervals and indicates the number
of observations that fall into the given class interval, known as the frequency, f. In a frequency
distribution, the class widths of all intervals should be the same.
The smallest value that belongs to a class interval is called the lower class limit, and the largest value
that belongs to the class interval is called the upper class limit. The class width refers to the difference
between the upper class limit and the lower class limit.
12.1 Organizing and Presenting Data
440
For example, in the class interval “140 to under 150”, the lower class limit is 140 and the upper class limit is below 150.
The class width = 150 – 140 = 10
Using the data from Example 12.1-a (shown below), the steps in constructing a frequency distribution
table are as follows:
Data:
The range is the difference
between the highest and the
lowest value in a dataset.
Step 1:
63
62
84
48
74
64
68
82
42
76
71
83
65
67
94
54
51
79
71
79
54
61
86
62
36
81
69
68
56
77
75
58
68
59
97
41
57
38
55
47
First, array the data using the stem-and-leaf method and determine the number of data,
highest value, lowest value, and the range.
Stem
Leaf
# of data
3
6 8
2
4
1 2 7 8
4
5
1 4 4 5 6 7 8 9
8
6
1 2 3 3 4 5 7 8 8 89
7
8
1 1 4 5 6 7 9 9
1 2 3 4 6
8
9
4 7
2
11
5
Total =
40
There are 40 data values and the highest value is 97 and the lowest value is 36. The range is 97 – 36 = 61.
Step 2:
Determine the number of classes and the class interval of each class.
(As a guideline, normally, the minimum number of classes is 5 and the highest number of
classes is 15.)
61
Range
For 5 classes, the width of each class interval for the above data =
=
= 12.20
# of classes
5
For 15 classes, the width of the class interval for the above data =
Therefore, a possible class width is between 4 and 13.
Range
# of classes
=
61
15
= 4.07
Therefore, use a class width of 8.
The lowest class limit should accommodate the smallest value of the data, 36.
The highest class limit should accommodate the largest value of the data, 97.
Therefore, the class intervals are: “34 to under 42”, “42 to under 50”, “50 to under 58”, ...
“90 to under 98”.
That is, there are 8 classes and each class interval is 8.
Step 3:
Determine the class frequencies for each class using the stem-and-leaf plot and complete
the frequency distribution.
Class Interval
34 to under 42
3
42 to under 50
3
50 to under 58
6
58 to under 66
8
66 to under 74
7
74 to under 82
7
82 to under 90
4
90 to under 98
2
Total
Chapter 12 | Basic Statistics and Probability
Frequency
40
441
Example 12.1-n
Creating a Frequency Distribution Table
The number of cars sold each month for the last 2 years by a car dealer is as follows:
44
39
51
70
58
15
52
75
68
19
10
84
27
16
26
73
21
33
37
65
55
25
48
80
Group the data into 5 classes and create a frequency distribution table.
Solution
Array the data using the stem-and-leaf method, as follows:
Stem-and-Leaf
Data in Array
Stem
Stem
Leaf
Leaf
1
5 9 6 0
1
0 5 6 9
2
7 1 5 6
2
1 5 6 7
3
9 3 7
3
3 7 9
4
4 8
4
4 8
5
8 5 1 2
5
1 2 5 8
6
8 5
6
5 8
7
0 5 3
7
0 3 5
8
4 0
8
0 4
Number of classes = 5 (given)
There are 24 data values. The highest value is 84 and the lowest value is 10.
Range = 84 – 10 = 74
Range
74
=
= 14.8 (use 15 for ease of presentation)
# of class
5
Therefore, the class interval = 15. Now, the lowest class interval will be “10 to under 25” and the
highest class interval will be “70 to 85”.
Width of each class =
The following is the frequency distribution table:
Class Interval
Frequency
10 to under 25
5
25 to under 40
6
40 to under 55
4
55 to under 70
4
70 to under 85
5
Total
24
Relative Frequency Distribution and Percent Frequency Distribution
Relative Frequency Distribution
Relative frequency is the ratio of the frequency of a particular class interval to the total number
of observations and is expressed in decimals or fractions. The sum of all relative frequency of a
frequency distribution should be equal to one.
12.1 Organizing and Presenting Data
442
Percent Frequency Distribution
The percent frequency distribution is calculated by multiplying the relative frequencies of each class
interval by 100 and expressing it as a percent. The sum of all relative frequencies of a frequency
distribution should be equal to 100%.
Example 12.1-o
Creating Relative Frequency and Percent Frequency Distributions
Use the frequency distribution provided below to create the relative frequency distribution and the
percent frequency distribution.
Class Interval
30 to under 40
2
40 to under 50
4
50 to under 60
8
60 to under 70
11
70 to under 80
8
80 to under 90
5
90 to under 100
2
Total
Solution
Frequency
40
Add two columns to the frequency table, one for the relative frequency distribution and one for the
percent frequency distribution, as shown below.
Class Interval
Frequency
Relative Frequency
Percent Frequency
30 to under 40
2
0.05
5%
40 to under 50
4
0.10
10%
50 to under 60
8
0.20
20%
60 to under 70
11
0.275
27.5%
70 to under 80
8
0.20
20%
80 to under 90
5
0.125
12.5%
2
0.05
5%
40
1
100%
90 to under 100
Total
The relative frequency of any class is calculated by dividing the number of observations in that class
by the total number of observations.
2
= 0.05 .
40
Similarly, calculate the relative frequency of the remaining class intervals and complete the “Relative
Frequency” column.
For example, the relative frequency of class “30 to under 40” is
The percent frequency distribution is calculated by multiplying the relative frequency by 100.
For example, the percent frequency of class “30 to under 40” is 0.05 × 100% = 5%.
Similarly, calculate the percent frequency of the remaining class intervals and complete the “Percent
Frequency” column.
Chapter 12 | Basic Statistics and Probability
443
Cumulative Frequency Distribution and Cumulative Frequency
Curve (or Polygon)
The cumulative frequency distribution at a given class interval is calculated by adding the frequency
at that class interval to the preceding class intervals. That is, the sum of the frequencies of all the class
intervals before the class interval in question and the particular class interval in question. Simply put,
it is the running total of the frequencies.
A curve showing the cumulative frequency plotted against the upper class boundary of the class
interval is called a cumulative frequency curve.
Example 12.1-p
Creating Cumulative Frequency Distribution
Use the frequency distribution in Example 12.1-o (also shown below) to create the relative cumulative frequency
distribution and the cumulative percent frequency distribution. Also, draw the cumulative frequency curve.
Class Interval
30 to under 40
2
40 to under 50
4
50 to under 60
8
60 to under 70
11
70 to under 80
8
80 to under 90
5
90 to under 100
2
Total
Solution
Frequency
40
Add two columns to the frequency table, one for the cumulative frequency distribution and one for the
cumulative percent frequency distribution, as shown below.
Class Interval
Frequency
Cumulative Frequency
Cumulative Percent Frequency
30 to under 40
2
2
5%
40 to under 50
4
6
15%
50 to under 60
8
14
35%
60 to under 70
11
25
62.5%
70 to under 80
8
33
82.5%
80 to under 90
5
38
95%
90 to under 100
2
40
100%
Total
40
Compute the cumulative frequency of any class by using the total value of the frequency up to and
including that class. For example, the cumulative frequency of class “50 to under 60” is 2 + 4 + 8 = 14.
Compute the percent cumulative frequency by dividing the cumulative frequency distribution of that
class by the total number of observations and convert the answer to a percent.
14
For example, the cumulative percent distribution of class “50 to under 60” is
100%== 35%
35%.
× 100%
40
12.1 Organizing and Presenting Data
444
Solution
Y
continued
Graph of Cumulative Frequency and Cumulative Percent Frequency vs. Marks
100 %
40
35
Cumulative Frequency
25
50 %
20
15
Cumulative Percent
75 %
30
25 %
10
5
X
10
20
30
40
50
60
70
80
90
100
Marks
12.1 Exercises
Answers to odd-numbered problems are available at the end of the textbook.
1. Identify the following variables as continuous or discrete:
a. Height of students in a class
b. Room temperature
c. Net profit of a company
d. Position in class
2. Identify the following variables as continuous or discrete:
a. Number of seasons b. Weight of a person
c. Time between the arrivals of two flights
d. Air pressure
3. Identify the following variables as quantitative or qualitative:
a. Marks on an exam
b. Seasons of a year
c. Amount of rainfall
d. Mode
4. Identify the following variables as quantitative or qualitative:
a. Height of a person
b. Letter grade in an exam
c. Median
d. Model of a car
5. Identify the levels of measurements (nominal, ordinal, interval, or ratio) for the following measurements:
a. Pant size
b. Mean
c. Annual salary of individuals
d. Title of a person in a company
6. Identify the levels of measurements (nominal, ordinal, interval, or ratio) for the following measurements:
a. Places of birth
b. Ages of people
c. Hours spent watching TV d. Temperature of ice in degrees C
7. Construct a stem-and-leaf plot to display the following data in an array:
39
31
34
32
18
31
44
19
19
13
37
43
29
25
21
38
27
28
8. Construct a stem-and-leaf plot to display the following data in an array:
7
22
17
16
18
31
19
26
9
Chapter 12 | Basic Statistics and Probability
5
20
16
37
32
13
25
11
35
445
9. Construct a stem-and-leaf plot to display the following data in an array:
65
75
95
77
78
80
81
48
73
55
92
81
51
52
59
45
69
88
85
84
64
82
70
97
10. Construct a stem-and-leaf plot to display the following data in an array:
76
72
89
95
84
83
77
88
85
75
62
59
78
58
97
66
52
87
97
92
80
71
91
65
11. The following data was collected from a sample survey of 40 first-year students who were asked to indicate their
favourite subject among the four subjects, Statistics (S), Marketing (M), Accounting (A), and Finance Math (F).
A
M
S
M
S
M
F
A
M
S
S
S
M
F
M
S
S
A
A
A
M
M
S
F
A
M
F
A
S
S
M
F
S
S
M
S
A
A
S
F
Organize the data in a frequency table using a tally chart.
12. The following data was collected from a sample survey of 40 students who were asked to indicate the mode of
transportation that they normally use to get to college during the summer term. Their choices were walking (W),
bicycling (B), taking public transportation (P), and driving a car (C).
C
W
W
P
B
B
W
C
C
P
P
P
B
P
C
W
C
B
B
B
B
P
W
W
P
C
W
B
W
W
C
C
P
W
C
P
C
P
B
B
Organize the data in a frequency table using a tally chart.
13. The following are the letter grades obtained by 200 students in Finance Math, in the business program of a college:
Grade
Number
of Students
A+
24
A
30
B
36
C
52
D
42
F
16
Total
200
Percent
Cumulative
Percent
Angle
Cumulative
Angle
a. Complete the above table for percent, cumulative percent, angle, and cumulative angle.
b. Draw a pie chart using either the percent or the angle measure.
14. Victoria kept a record of the average number of hours she spent on different activities during the weekdays. The
information is provided below:
Activity
Number
of Hours
School
7.0
Meals
1.0
Homework
2.0
Travel
2.5
Sleep
8.0
Other
3.5
Total
24
Percent
Cumulative
Percent
Angle
Cumulative
Angle
a. Complete the above table for percent, cumulative percent, angle, and cumulative angle.
b. Draw a pie chart using either the percent or the angle measure.
12.1 Organizing and Presenting Data
446
15. A store’s monthly sales (in thousands of dollars) for last year were as follows:
Month
Jan.
Feb.
Mar.
Apr.
May
Jun.
Jul.
Aug.
Sep.
Oct.
Sales
($ Thousands)
45
52
74
78
70
95
98
120
105
89
Nov. Dec.
80
92
Draw a line graph representing the data.
16. The number of houses sold by a developer for the period from 2006 to 2014 is provided below:
Year
Number of houses sold
2006
2007
2008
2009
2010
2011
2012
2013
2014
82
110
130
145
90
75
128
160
180
Draw a line graph representing the data.
17. Use a scatter plot to determine the relationship (if any) between the price of an item and the number of items sold:
Price ($)
60
61
62
64
66
68
Number of items sold
190
182
176
116
104
87
18. Use a scatter plot to determine the relationship (if any) between age and income:
Age (years)
21
27
35
41
46
52
56
Income ($ Thousands)
38
51
53
64
72
76
80
19. The frequency distribution below was constructed from data collected from a sample of 100 professors at a college.
Construct a histogram and a frequency polygon for the data.
Years of Teaching
Frequency
0 to under 5
10
5 to under 10
14
10 to under 15
39
15 to under 20
24
20 to under 25
13
20. The frequency distribution below was constructed from data collected from a sample of 30 students at a college.
Construct a histogram and a frequency polygon for the data.
Height (in cm)
Frequency
155 to under 160
2
160 to under 165
5
165to under 170
9
170 to under 175
7
175 to under 180
4
180 to under 185
3
21. Use the frequency distribution in Problem 19 to compute the following:
a. Relative frequency distribution.
b. Percent frequency distribution.
22. Use the frequency distribution in Problem 20 to compute the following:
a. Relative frequency distribution.
b. Percent frequency distribution.
23. Use the frequency distribution in Problem 19 to compute the following:
a. Cumulative frequency distribution.
b. Cumulative percent frequency distribution.
c. Cumulative frequency and cumulative percent curve.
24. Use the frequency distribution in Problem 20 to compute the following:
a. Cumulative frequency distribution.
b. Cumulative percent frequency distribution.
c. Cumulative frequency and cumulative percent curve.
Chapter 12 | Basic Statistics and Probability
Download