Uploaded by kruptune

Chapter-1-Displaying-and-describing-data-distributions

advertisement
Chapter 1
1
Core: Data analysis
Displaying and describing
data distributions
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
2
Core
Chapter 1
Displaying and describing data distributions
1A Classifying data
Data and variables
Skillsheet
Statistics is a science concerned with understanding the world through data.
Some data
The data in the table below were collected from a group of university students.
Height (cm)
Weight (kg)
Age (years)
Sex
(M male,
F female)
Fitness level
(1 high,
2 medium, 3 low)
Pulse rate
(beats/min)
173
57
18
M
2
86
179
58
19
M
2
82
167
62
18
M
1
96
195
84
18
F
1
71
173
64
18
M
3
90
184
74
22
F
3
78
175
60
19
F
3
88
140
50
34
M
3
70
WWW Source: http://cambridge.edu.au/redirect/?id=6102. Used with permission.
Variables
In a dataset, we call the qualities or quantities about which we record information
variables.
An important first step in analysing any set of data is to identify the variables involved, their
units of measurement (where appropriate) and the values they take.
In this dataset above, there are six variables:
height (in centimetres)
sex (M = male, F = female)
weight (in kilograms)
fitness level (1 = high, 2 = medium, 3 = low)
age (in years)
pulse rate (beats/minute).
Types of variables
Variables come in two general types, categorical and numerical:
Categorical variables
Categorical variables represent characteristics or qualities of people or things – for
example, a person’s eye colour, sex, or fitness level.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1A Classifying data
3
Data generated by a categorical variable can be used to organise individuals into one of
several groups or categories that characterise this quality or attribute.
For example, an ‘F’ in the Sex column indicates that the student is a female, while a ‘3’ in
the Fitness level column indicates that their fitness level is low.
Categorical variables come in two types: nominal and ordinal.
• Nominal variables
Nominal variables have data values that can be used to group individuals
according to a particular characteristic.
The variable sex is an example of a nominal variable.
The data values for the variable sex, for example M or F, can be used to group students
according to their sex. It is called a nominal variable because the data values name the
group to which the students belong, in this case, the group called ‘males’ or the group
called ‘females’.
• Ordinal variables
Ordinal variables have data values that can be used to both group and order
individuals according to a particular characteristic.
The variable fitness level is an example of an ordinal variable. The data generated by
this variable contains two pieces of information. First, each data value can be used to
group the students by fitness level. Second, it allows us to logically order these groups
according to their fitness level – in this case, as ‘low’, ‘medium’ or ‘high’.
Numerical variables
Numerical variables are used to represent quantities, things that we can count or
measure.
For example, a ‘179’ in the Height column indicates that the person is 179 cm tall,
while an ‘82’ in the Pulse rate column indicates that they have a pulse rate of 82
beats/minute.
Numerical variables come in two types: discrete and continuous.
• Discrete variables
Discrete variables represent quantities that are counted.
The number of mobile phones in a house is an example. Counting leads to discrete
data values such as 0, 1, 2, 3, . . . There can be nothing in between.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
4
Core
Chapter 1
Displaying and describing data distributions
As a guide, discrete variables arise when we ask the question ‘How many?’
• Continuous variables
Continuous variables represent quantities that are measured rather than counted.
Thus, even though we might record a person’s height as 179 cm, their height could
be any value between 178.5 and 179.4 cm. We have just rounded to 179 cm for
convenience, or to match the accuracy of the measuring device.
As a guide, continuous variables arise when we ask the question ‘How much?’
Comparing numerical and categorical variables
The interrelationship between categorical (nominal and ordinal) and numerical variables
(discrete and continuous) is displayed in the diagram below.
Nominal variable
(e.g. eye colour)
Categorical variable
Ordinal variable
(e.g. house number)
Variable
Discrete data
(e.g. number of cars in a car park)
Numerical variable
Continuous variable
(e.g. weight)
Numerical or categorical?
Deciding whether data are numerical of categorical is not an entirely trivial exercise. Two
things that can help your decision-making are:
1 Numerical data can always be used to perform arithmetic computations. This is not the
case with categorical data. For example, it makes sense to calculate the average weight of
a group of individuals, but not the average house number in a street. This is a good test to
apply when in doubt.
2 It is not the variable name alone that determines whether data are numerical or
categorical; it is also the way the data are recorded. For example, if the data for variable
weight are recorded in kilograms, they are numerical. However, if the data are recorded as
‘underweight’, ‘normal weight’, ‘overweight’, they are categorical.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1A
1A Classifying data
5
Exercise 1A
Basic ideas
1
a What is a categorical variable? Give an example.
b What is a numerical variable? Give an example.
2
There are two types of categorical variables. Name them and give an example of each.
3
There are two types of numerical variables. Name them and give an example of each.
Types of variables: categorical or numerical
4
Classify each of the following variables (in italics) as categorical or numerical when
recording information about:
a time (in minutes) spent exercising
each day
b number of frogs in a pond
e time spent playing computer games
(hours)
f number of people in a bus
c bank account numbers
g eye colour (brown, blue, green )
d height (short, average, tall)
h post code.
Categorical variables: nominal or ordinal
5
Classify the categorical variables identified below (in italics) as nominal or ordinal.
a The colour of a pencil
b The different types of animals in a zoo
c The floor levels in a building (0, 1, 2, 3 . . . )
d The speed of a car (on or below the speed limit, above the speed limit)
e Shoe size (6, 8, 10, . . . )
f Family names
Numerical variables: discrete or continuous
6
Classify the numerical variables identified below (in italics) as discrete or continuous.
a The number of pages in a book
b The cost ( in dollars) to fill the tank of a car with petrol
c The volume of petrol (in litres) used to fill the tank of a car
d The speed of a car in km/h
e The number of people at a football match
f The air temperature in degrees Celsius
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
6
Core
Chapter 1
Displaying and describing data distributions
1B Displaying and describing the distributions of
categorical variables
The frequency table
With a large number of data values, it is difficult to identify any patterns or trends in the raw
data.
For example, the set of categorical data
opposite, listing the sex (M = male,
F = female) of 60 individuals, is hard to
make sense of in its raw form.
To help make sense of the data, we
first need to organise them into a more
manageable form.
F
F
M F
M F F
F
F
M F
F M M M F
M M M F
M F
M F
F
M M M F M F
M F
M F
M F
M F M F
M F
M M M F
M F
F
F
F
M F M M M F
F
F F
F
M M F
M F
F
M F
M M
The statistical tool we use for this purpose is the frequency table.
The frequency table
A frequency table is a listing of the values a variable takes in a dataset, along with how
often (frequently) each value occurs.
Frequency can be recorded as a:
number: the number of times a value occurs, or
percentage: the percentage of times a value occurs (percentage frequency):
per cent =
Skillsheet
Example 1
count
× 100%
total count
Frequency table for a categorical variable
The sex of 11 preschool children is as shown (F = female, M = male):
F M M F F M F F F M M
Construct a frequency table (including percentage frequencies) to display the data.
Solution
1 Set up a table as shown. The variable sex has
two categories: ‘Male’ and ‘Female’.
2 Count up the number of females (6) and males
(5). Record this in the ‘Number’ column.
3 Add the counts to find the total count, 11 (6 + 5).
Record this in the ‘Number’ column opposite
‘Total’.
Frequency
Sex
Number Percentage
Female
6
54.5
Male
5
45.5
Total
11
100.0
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1B Displaying and describing the distributions of categorical variables
7
4 Convert the frequencies into percentage frequencies. Record these in the ‘Percentage’
column. For example:
6
percentage of females =
× 100%
11
= 54.5%
5 Finally, total the percentages and record.
Note: There are two things to note in constructing the frequency table in Example 1.
1 The variable sex is nominal, so in setting up this frequency table the order in which we have listed the
categories ‘Female’ and ‘Male’ is quite arbitrary. However, if the variable was ordinal, say year level,
with possible values ‘Year 10’, ‘Year 11’ and ‘Year 12’, it would make sense to group the data values
in that order.
2 The Total should always equal the total number of observations – in this case, 11. The percentages
should add to 100%. However, if percentages are rounded to one decimal place a total of 99.9 or
100.1 is sometimes obtained. This is due to rounding error. Totalling the count and percentages helps
check on your tallying and percentaging.
How has forming a frequency table helped?
The process of forming a frequency table for a categorical variable:
displays the data in a compact form
tells us something about the way the data values are distributed (the pattern of the
data).
The bar chart
Once categorical data have been organised into a frequency table, it is common practice to
display the information graphically to help identify any features that stand out in the data.
The statistical graph we use for this purpose is the bar chart.
The bar chart represents the key information in a frequency table as a picture. The bar chart
is specifically designed to display categorical data.
In a bar chart:
frequency (or percentage frequency) is shown on the vertical axis
the variable being displayed is plotted on the horizontal axis
the height of the bar (column) gives the frequency (count or percentage)
the bars are drawn with gaps to show that each value is a separate category
there is one bar for each category.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Core
Chapter 1
Example 2
Displaying and describing data distributions
Constructing a bar chart from a frequency table
The climate type of 23 countries is classified
as ‘cold’, ‘mild’ or ‘hot’. The results are
summarised in the table opposite.
Construct a frequency bar chart to display this
information.
Frequency
Climate type Number Percentage
Cold
3
13.0
Mild
14
60.9
Hot
6
26.1
Total
23
100.0
Solution
a The data enable us to both group the countries
by climate type and put these groups in some sort
of natural order according to the ‘warmth’ of the
different climate types. The variable is ordinal.
b 1 Label the horizontal axis with the variable name,
‘Climate type’. Mark the scale off into three equal
intervals and label them ‘Cold’, ‘Mild’ and ‘Hot’.
2 Label the vertical axis ‘Frequency’. Scale
allowing for the maximum frequency, 14. Fifteen
would be appropriate. Mark the scale off in fives.
3 For each climate type, draw a bar. There are gaps
between the bars to show that the categories are
separate. The height of the bar is made equal to
the frequency (given in the ‘Number’ column).
a Ordinal
b
15
Frequency
8
10
5
0
Cold Mild Hot
Climate type
Stacked or segmented bar charts
A variation on the standard bar chart is the segmented or stacked bar chart. It is a compact
display that is particularly useful when comparing two or more categorical variables.
25
20
Frequency
In a segmented bar chart, the bars are stacked one on top of
another to give a single bar with several parts or segments.
The lengths of the segments are determined by the frequencies.
The height of the bar gives the total frequency. A legend is
required to identify which segment represents which category (see
opposite). The segmented bar chart opposite was formed from the
climate data used in Example 2. In a percentage segmented bar
chart, the lengths of each segment in the bar are determined by the
percentages. When this is done, the height of the bar is 100.
15
Climate
Hot
Mild
Cold
10
5
0
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1B Displaying and describing the distributions of categorical variables
Example 3
9
Constructing a percentage segmented bar chart from a frequency table
The climate type of 23 countries is classified as
‘cold’, ‘mild’ or ‘hot’.
Construct a percentage frequency segmented
bar chart to display this information.
Frequency
Climate type Number Percentage
Cold
3
13.0
Mild
14
60.9
Hot
6
26.1
Total
23
100.0
Solution
1 In a segmented bar chart, the horizontal axis has no
label.
100
2 Label the vertical axis ‘Percentage’. Scale allowing
for the maximum of 100 (%), Mark the scale in tens.
80
4 The bottom segment represents the countries with
a cold climate. The middle segment represents the
countries with a mild climate. The top segment
represents the countries with a mild climate. Shade
(or colour) the segments differently.
Climate
Hot
Mild
Cold
70
Percentage
3 Draw a single bar of height 100. Divide the bar into
three by inserting dividing lines at 13% and 76.9%
(13 + 60.9%).
90
60
50
40
30
20
10
0
5 Insert a legend to identify each shaded segments by climate type.
The mode
One of the features of a dataset that is quickly revealed with a frequency table or a bar
chart is the mode or modal category.
The mode is the most frequently occurring value or category.
In a bar chart, the mode is given by the category with the tallest bar or longest segment.
For the bar charts above, the modal category is clearly ‘mild’. That is, for the countries
considered, the most frequently occurring climate type is ‘mild’.
Modes are particularly important in ‘popularity’ polls. For example, in answering questions
such as ‘Which is the most watched TV station between 6:00 p.m and 8:00 p.m.?’ or ‘When
is the time a supermarket is in peak demand: morning, afternoon or night?’
Note, however, that the mode is only of real interest when a single category stands out
from the others.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
10
Core
Chapter 1
Displaying and describing data distributions
Answering statistical questions involving categorical variables
A statistical question is a question that depends on data for its answer.
Statistical questions that are of most interest when working with a single categorical variable
are of these forms:
Is there a dominant category into which a significant percentage of individuals fall or
are the individuals relatively evenly spread across all of the categories? For example, are
the shoppers in a department store predominantly male or female, or are there roughly
equal numbers of males and females?
How many and/or what percentage of individuals fall into each category? For example,
what percentage of visitors to a national park are ‘day-trippers’ and what percentage
of visitors are staying overnight?
A short written report is the standard way to answer these questions.
The following guidelines are designed to help you to produce such a report.
Some guidelines for writing a report describing the distribution of
a categorical variable
Briefly summarise the context in which the data were collected including the number
of individuals involved in the study.
If there is a clear modal category, ensure that it is mentioned.
Include frequencies or percentages in the report. Percentages are preferred.
If there are a lot of categories, it is not necessary to mention every category, but the
modal category should always be mentioned.
Example 4
Describing the distribution of a categorical variable in its context
In an investigation of the variation of climate type
across countries, the climate types of 23 countries
were classified as ‘cold’, ‘mild’ or ‘hot’. The data
are displayed in a frequency table to show the
percentages.
Use the information in the frequency table to write
a concise report on the distribution of climate types
across these 23 countries.
Frequency
Climate type Number
%
Cold
3
13.0
Mild
14
60.9
Hot
6
26.1
Total
23
100.0
Solution
Report
The climate types of 23 countries were classified as being, ‘cold’, ‘mild’ or ‘hot’. The
majority of the countries, 60.9%, were found to have a mild climate. Of the remaining
countries, 26.1% were found to have a hot climate, while 13.0% were found to have a
cold climate.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1B
1B Displaying and describing the distributions of categorical variables
11
Exercise 1B
Constructing frequency tables from raw data
1
a In a frequency table, what is the mode?
b Identify the mode in the following datasets.
i Grades:
A A C B A B B B B D
C
ii Shoe size: 8 9 9 10 8 8 7 9 8 10 12 8 10
2
The following data identify the state of residence of a group of people, where
1 = Victoria, 2 = South Australia and 3 = Western Australia.
2 1 1 1 3 1 3 1 1 3 3
a Is the variable state of residence, categorical or numerical?
b Form a frequency table (with both numbers and percentages) to show the
distribution of state of residence for this group of people. Use the table in
Example 1 as a model.
c Construct a bar chart using Example 2 as a model.
3
The size (S = small, M = medium, L = large) of 20 cars was recorded as follows.
S
S
L
M
M
M
L
S
S
M
M
S
L
S
M
M
M
S
S
M
a Is the variable size in this context numerical or categorical?
b Form a frequency table (with both numbers and percentages) to show the
distribution of size for these cars. Use the table in Example 1 as a model.
c Construct a percentage bar chart.
Constructing a percentage segmented bar chart from a frequency table
4
The table shows the frequency distribution of the
place of birth for 500 Australians.
Place of birth
a Is place of birth an ordinal or a nominal
variable?
Australia
78.3
Overseas
21.8
b Display the data in the form of a percentage
segmented bar chart.
5
The table records the number of new
cars sold in Australia during the first
quarter of 1 year, categorised by type of
vehicle (private, commercial).
a Is type of vehicle an ordinal or a
nominal variable?
Total
Percentage
100.1
Frequency
Type of vehicle
Number
Private
132 736
Commercial
49 109
Percentage
Total
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
12
Core
Chapter 1
1B
Displaying and describing data distributions
b Copy and complete the table giving the percentages correct to the nearest whole
number.
c Display the data in the form of a percentage segmented bar chart.
Analysing frequency tables and writing reports
6
The table shows the frequency
distribution of school type for a number
of schools. The table is incomplete.
a Write down the information missing
from the table.
b How many schools are categorised
as ‘independent’?
Frequency
School type
Number
Percentage
4
20
Catholic
Government
11
Independent
5
25
Total
100
c How many schools are there in total?
d What percentage of schools are categorised as ‘government’?
e Use the information in the frequency table to complete the following report
describing the distribution of school type for these schools.
Report
schools were classified according to school type. The majority of these
schools,
%, were found to be
. Of the remaining schools,
were
while 20% were
.
7
Twenty-two students were asked the question,
‘How often do you play sport?’, with the
possible responses: ‘regularly’, ‘sometimes’
or ‘rarely’. The distribution of responses is
summarised in the frequency table.
a Write down the information missing from
the table.
Frequency
Plays sport Number Percentage
Regularly
5
Sometimes
10
Rarely
Total
22.7
31.8
22
b Use the information in the frequency table to complete the report below describing
the distribution of student responses to the question, ‘How often do you play sport?’
Report
When
students were asked the question, ‘How often do you play sport’,
the dominant response was ‘Sometimes’, given by
% of the students. Of
the remaining students,
% of the students responded that they played
sport
while
% said that they played sport
.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1B
1C Displaying and describing the distributions of numerical variables
8
The table shows the frequency distribution of
the eye colour of 11 preschool children.
13
Frequency
Eye colour Number Percentage
Use the information in the table to write a brief
report describing the frequency distribution of
eye colour.
Brown
6
54.5
Hazel
2
18.2
Blue
3
27.3
Total
11
100.0
1C Displaying and describing the distributions of
numerical variables
The grouped frequency distribution
When looking at ways of organising and displaying numerical data, we are faced with the
problem of how to deal with continuous variables that can take a large range of values – for
example, age (0–100+). Listing all possible ages would be tedious and produce a large and
unwieldy frequency table or graphical display.
To solve this problem, we group the data into a small number of convenient intervals. We
then organise the data into a frequency table using these data intervals. We call this sort of
table a grouped frequency table.
Example 5
Constructing a grouped frequency table
The data below give the average hours worked per week in 23 countries.
35.0 48.0 45.0 43.0 38.2 50.0 39.8 40.7 40.0 50.0 35.4 38.8
40.2 45.0 45.0 40.0 43.0 48.8 43.3 53.1 35.6 44.1 34.8
Form a grouped frequency table with five intervals.
Solution
1 Set up a table as shown. Use five
intervals: 30.0–34.9, 35.0–39.9, . . . ,
50.0–54.9.
2 List these intervals, in ascending order,
under ‘Average hours worked’.
3 Count the number of countries whose
average working hours fall into each of
the intervals.
Record these values in the ‘Number’
column.
Average
Frequency
hours worked Number Percentage
30.0−34.9
1
4.3
35.0−39.9
6
26.1
40.0−44.9
8
34.8
45.0−49.9
5
21.7
50.0−54.9
3
13.0
Total
23
99.9
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
14
Core
Chapter 1
Displaying and describing data distributions
4 Convert the counts into percentages and record in the ‘Percentage’ column.
5 Total the number and percentage columns, which may not total 100% because of
rounding.
Notes:
1 The intervals in this example are of width five. For example, the interval 35.0–39.9 is an interval of
width 5.0 because it contains all values from 34.9500 to 39.9499.
2 The intervals are deliberately constructed so that they do not overlap.
3 There are no hard and fast rules for the number of intervals we use when grouping data but, usually,
between five and fifteen intervals are used. Usually, the smaller the number of data values, the smaller
the number of intervals. Here we have chosen to use five intervals.
How has forming a frequency table helped?
The process of forming a frequency table for a numerical variable:
orders the data displays the data in a compact form
tells us how the data values are distributed across the categories
helps us identify the mode (the most frequently occurring value or interval).
The histogram and its construction
The histogram is a graphical display of the information in the grouped frequency table.
Constructing a histogram from a frequency table
In a frequency histogram:
frequency (count or per cent) is shown on the vertical axis
the values of the variable being displayed are plotted on the horizontal axis
each bar in a histogram corresponds to a data interval
the height of the bar gives the frequency (or the percentage frequency).
Example 6
Constructing a histogram from a frequency table
Construct a histogram for the frequency
table opposite.
Average hours worked
Frequency
30.0–34.9
1
35.0–39.9
6
40.0–44.9
8
45.0–49.9
5
50.0–54.9
3
Total
23
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C Displaying and describing the distributions of numerical variables
15
Solution
1 Label the horizontal axis with the variable
name, ‘Average hours worked’. Mark the
scale using the start of each interval: 30,
35, . . .
3 Finally, for each interval draw a bar,
making the height equal to the frequency.
8
7
Frequency
2 Label the vertical axis ‘Frequency’. Scale
allowing for the maximum frequency, 8.
9
6
5
4
3
2
1
0
25 30 35 40 45 50 55 60
Average hours worked
Constructing a histogram from raw data
It is relatively quick to construct a histogram from a frequency table. However, if you have
only raw data (as you mostly do), it is a very slow process because you have to construct the
frequency table first. Fortunately, a CAS calculator will do this for you.
How to construct a histogram using the TI-Nspire CAS
Display the following set of 27 marks in the form of a histogram.
16 11
4 25 15
7 14 13 14 12 15 13 16
15 12 18 22 17 18 23 15 13 17 18 22 23
14
Steps
1 Start a new document by pressing / + N (or
c>New Document. If prompted to save an
existing document, move cursor to No and press
·.
2 Select Add Lists & Spreadsheet.
Enter the data into a list named marks.
a Move the cursor to the name space of
column A and type in marks as the list name.
Press ·.
b Move the cursor down to row 1, type in the first data value and press ·. Continue
until all the data have been entered. Press · after each entry.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
16
Core
Chapter 1
Displaying and describing data distributions
3 Statistical graphing is done through the Data &
)
Statistics application. Press / + I (or /
and select Add Data & Statistics.
a Press e · (or click on the Click to add
variable box on the x-axis) to show the list of
variables. Select marks.
Press · to paste marks to that axis.
b A dot plot is displayed as the default. To change
the plot to a histogram, press b>Plot Type>
Histogram. Your screen should now look like
that shown opposite. This histogram has a
column (or bin) width of 2 and a starting point
of 3.
4 Data analysis
a Move the cursor over any column; a { will
appear and the column data will be displayed
as shown opposite.
b To view other column data values, move the
cursor to another column.
Note: If you click on a column, it will be selected.
Hint: If you accidentally move a column or data point, / + d · will undo the move.
5 Change the histogram column (bin) width to 4 and the starting point to 2.
a Press / + b to get the contextual menu as shown (below left).
Hint: Pressing / + b · with the cursor on the histogram gives you a contextual menu
that relates only to histograms. You can access the commands through b>Plot Properties.
b Select Bin Settings>Equal Bin Width.
c In the settings menu (below right) change the Width to 4 and the Starting Point
(Alignment) to 2 as shown. Press ·.
d A new histogram is displayed with column width of 4 and a starting point of
2 but it no longer fits the window (below left). To solve this problem, press
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C Displaying and describing the distributions of numerical variables
17
/ + b>Zoom>Zoom-Data and · to obtain the histogram as shown
below right.
6 To change the frequency axis to a percentage axis, press / + ·>Scale>Percent
and then press ·.
How to construct a histogram using the ClassPad
Display the following set of 27 marks in the form of a histogram.
16 11
4 25 15
7 14 13 14 12 15 13 16
15 12 18 22 17 18 23 15 13 17 18 22 23
14
Steps
1 From the application menu screen,
locate the built-in Statistics
application.
Tap
to open.
Tapping
from the icon panel
(just below the touch screen) will
display the application menu if it
is not already visible.
2 Enter the data into a list named
marks.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
18
Core
Chapter 1
Displaying and describing data distributions
To name the list:
a Highlight the heading of the first list by tapping it.
b Press
on the front of the calculator and tap the
tab.
c To enter the data, type the
.
word marks and press
and
to
Tap
return to the list screen.
d Type in each data value and
or
(which
press
is found on the cursor
button on the front of the
calculator) to move down to
the next cell.
The screen should look like the one shown above right.
3 Set up the calculator to plot a
statistical graph.
from the toolbar. This
a Tap
opens the Set StatGraphs dialog
box.
b Complete the dialog box as
given below.
Draw: select On.
Type: select Histogram ( ).
XList: select main\marks ( ).
Freq: leave as 1.
c Tap Set to confirm your selections.
Note: To make sure only this graph is drawn, select SetGraph from the menu bar at the top and
confirm that there is a tick only beside StatGraph1 and no others.
4 To plot the graph:
a Tap
in the toolbar.
b Complete the Set Interval
dialog box as follows.
HStart: type 2 (i.e. the
starting point of the
first interval)
HStep: type 4 (i.e. the
interval width).
Tap OK to display
histogram.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C Displaying and describing the distributions of numerical variables
19
Note: The screen is split into two halves, with the graph displayed in the bottom half, as shown
from the icon panel allows the graph to fill the entire screen. Tap
again to
above. Tapping
return to half-screen size.
5 Tapping
from the toolbar
places a marker (+) at the
top of the first column of the
histogram (see opposite) and
tells us that:
a the first interval begins at
2 (x c = 2)
b for this interval, the
frequency is 1 (F c = 1).
To find the frequencies
and starting points of the
other intervals, use the cursor key arrow (
) to move from interval to interval.
A histogram provides a graphical display of a
data distribution. For example, the histogram
opposite displays the distribution of test marks
for a group of 32 students.
Frequency
What to look for in a histogram
8
6
4
2
0
10 20 30 40 50 60 70 80 90 100
Marks
The purpose of constructing a histogram is to help understand the key features of the data
distribution. These features are its:
shape and outliers
centre
spread.
Shape and outliers
How are the data distributed? Is the histogram peaked? That is, do some data values tend to
occur much more frequently than others, or is the histogram relatively flat, showing that all
values in the distribution occur with approximately the same frequency?
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
20
Core
Chapter 1
Displaying and describing data distributions
Symmetric distributions
If a histogram is single-peaked, does the histogram region tail off evenly on either side of the
peak? If so, the distribution is said to be symmetric (see histogram 1).
peak
upper tail
peak
Frequency
Frequency
lower tail
10
8
6
4
2
0
Histogram 1
10
8
6
4
2
0
peak
Histogram 2
A single-peaked symmetric distribution is characteristic of the data that derive from
measuring variables such as intelligence test scores, weights of oranges, or any other data for
which the values vary evenly around some central value.
The double-peaked distribution (histogram 2) is symmetric about the dip between the two
peaks. A histogram that has two distinct peaks indicates a bimodal (two modes) distribution.
A bimodal distribution often indicates that the data have come from two different
populations. For example, if we were studying the distance the discus is thrown by Olympiclevel discus throwers, we would expect a bimodal distribution if both male and female
throwers were included in the study.
Skewed distributions
Sometimes a histogram tails off primarily in one direction. If a histogram tails off to the
right, we say that it is positively skewed (histogram 3). The distribution of salaries of
workers in a large organisation tends to be positively skewed. Most workers earn a similar
salary with some variation above or below this amount, but a few earn more and even fewer,
such as the senior manager, earn even more. The distribution of house prices also tends to be
positively skewed.
long upper tail
+ve skew
Histogram 3
long lower tail
Frequency
Frequency
peak
10
8
6
4
2
0
10
8
6
4
2
0
peak
−ve skew
Histogram 4
If a histogram tails off to the left, we say that it is negatively skewed (histogram 4). The
distribution of age at death tends to be negatively skewed. Most people die in old age, a few
in middle age and fewer still in childhood.
Outliers
Outliers are any data values that stand out from the main body of data. These are data values
that are atypically high or low. See, for example, histogram 5, which shows an outlier. In
this case it is a data value that is atypically low compared to the rest of the data values.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C Displaying and describing the distributions of numerical variables
Frequency
Sports data often contain outliers. For
example, the heights of the players in
a football side vary, but do so within a
limited range. One exception is the ‘knock’
ruckman, who may be exceptionally tall and
well outside the normal range of variation.
10
8 outlier
6
4
2
0
21
main body of data
Histogram 5
In statistical terms, the exceptionally tall ruckman is an outlier, because his height does not
fit in the range of heights that might be regarded as typical for the team. Outliers can also
indicate errors made collecting or processing data – for example, a person’s age recorded
as 365.
Centre
8
7
6
Frequency
Histograms 6 to 8 display the
distribution of test scores for three
different classes taking the same
subject. They are identical in shape,
but differ in where they are located
along the axis. In statistical terms we
say that the distributions are ‘centred’
at different points along the axis. But
what do we mean by the centre of a
distribution?
5
4
3
2
1
0
50 60 70 80 90 100 110 120 130 140 150
Histograms 6 to 8
This is an issue we will return to in more detail in the next chapter. For the present we will
take the centre to be the middle of the distribution. You might know of this point as the
median.
The middle of a symmetric distribution is reasonably easy to locate by eye. Looking at
histograms 6 to 8, it would be reasonable to say that the centre or middle of each distribution
lies roughly halfway between the extremes; half the observations would lie above this point
and half below. Thus we might estimate that histogram 6 (yellow) is centred at about 60,
histogram 7 (light blue) at about 100, and histogram 8 (dark blue) at about 140.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
22
Core
Chapter 1
Displaying and describing data distributions
For skewed distributions, it is more difficult to estimate the middle of a distribution by eye.
The middle is not halfway between the extremes because, in a skewed distribution, the
scores tend to bunch up at one end.
Using this method, we would estimate
the centre of the distribution to lie
somewhere between 35 and 40, but
closer to 35, so we might opt for 37.
However, remember that this is only an
estimate.
5
4
Frequency
However, if we imagine a cardboard
cut-out of the histogram, the midpoint
lies on the line that divides the histogram
into two equal areas (Histogram 9).
line that divides
the area of the
histogram in half
3
2
1
0
15 20 25 30 35 40 45 50
Histogram 9
Spread
If the histogram is single-peaked, is it narrow? This would indicate that most of the data
values in the distribution are tightly clustered in a small region. Or is the peak broad? This
would indicate that the data values are more widely spread out. Histograms 10 and 11 are
both single-peaked. Histogram 10 has a broad peak, indicating that the data values are not
very tightly clustered about the centre of the distribution. In contrast, histogram 11 has a
narrow peak, indicating that the data values are tightly clustered around the centre of the
distribution.
Frequency
wide central region
Frequency
10
8
6
4
2
0
2 4 6 8 10 12 14 16 18 20 22
Histogram 10
20
16
12
8
4
0
narrow central region
2 4 6 8 10 12 14 16 18 20 22
Histogram 11
But what do we mean by the spread of a distribution? We will return to this in more detail
later. For a histogram we will take it to be the maximum range of the distribution.
Range
Range = largest value − smallest value
For example, histogram 10 has a spread (maximum range) of 22 (22 – 0) units. This is
considerably greater than the spread of histogram 11 which has a spread of 12 (18 – 6) units.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C Displaying and describing the distributions of numerical variables
Example 7
23
Describing a histogram in terms of shape, centre and spread
The histogram opposite shows the distribution of the
number of phones per 1000 people in 85 countries.
35
a Describe its shape and note outliers (if any).
25
c Estimate the spread of the distribution.
Frequency
b Locate the centre of the distribution.
30
20
15
10
5
0
170 340 510 680 8501020
Number of phones
( per 1000 people)
Solution
a Shape and outliers
The distribution is positively
skewed.
There are no outliers.
b Centre: Count up the frequencies from either
end to find the middle interval.
The distribution is centred between
170 and 340 phones per 1000
people.
c Spread: Use the maximum range to estimate
the spread.
Spread = 1020 − 0
= 1020 phones/1000 people
Using a histogram to describe the distribution of a numerical
variable in the context of its data
If you were using the histogram above to describe the distribution in a form suitable for a
statistical report, you might write as follows.
Report
For these 85 countries, the distribution of the number of phones per 1000 people is
positively skewed. The centre of the distribution lies between 170 and 340 phones/1000
people. The spread of the distribution is 1020 phones/1000 people. There are no
outliers.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
24
Core
Chapter 1
1C
Displaying and describing data distributions
Exercise 1C
Constructing a histogram from a frequency table
1 Construct a histogram to display the
information in the frequency table opposite.
Use the histogram in Example 6 as a model.
Label axes and mark scales.
Population density
Frequency
0–199
11
200–399
4
400–599
4
600–799
2
800–999
1
Total
22
Reading information from a histogram
35
2 The histogram opposite displays the distribution
of the number of words in 30 randomly selected
sentences.
a What percentage of these sentences contained:
i 5–9 words?
Percentage
30
25
20
15
ii 25–29 words?
10
iii 10–19 words?
5
0
iv fewer than 15 words?
Write answers correct to the nearest per cent.
5 10 15 20 25 30
Number of words
in sentence
b How many of these sentences contained:
i 20–24 words?
ii more than 25 words?
c What is the modal interval?
a How many players have their
averages recorded in this histogram?
b How many of these cricketers had a
batting average:
4
Frequency
3 The histogram opposite displays the
distribution of the average batting
averages of cricketers playing for a
district team.
3
2
1
0
0 5 10 15 20 25 30 35 40 45 50 55
Batting average
i 20 or more?
ii less than 15?
iii at least 20 but less than 30?
iv of 45?
c What percentage of these cricketers had a batting average:
i 50 or more?
ii at least 20 but less than 40?
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C
1C Displaying and describing the distributions of numerical variables
25
Constructing a histogram from raw data using a CAS calculator
4
The pulse rates of 23 students are given below.
86 82 96 71 90 78 68 71
70 78 69 77 64 80 83 78
68
88
88
70
76
86
74
a Use a CAS calculator to construct a histogram so that the first column starts at 63
and the column width is two.
b
i What is the starting point of the third column?
ii What is the ‘count’ for the third column? What are the actual data values?
c Redraw the histogram so that the column width is five and the first column starts
at 60.
d For this histogram, what is the count in the interval ‘65 to <70’?
5
The numbers of children in the families of 25 VCE students are listed below.
1 6 2 5 5 3 4 1 2 7 3 4 5
3 1 3 2 1 4 4 3 9 4 3 3
a Use a CAS calculator to construct a histogram so that the column width is one and
the first column starts at 0.5.
b What is the starting point for the fourth column and what is the count?
c Redraw the histogram so that the column width is two and the first column starts
at 0.
d
i What is the count in the interval from 6 to less than 8?
ii What actual data value(s) does this interval include?
Determining the shape, centre and spread from a histogram
6
Identify each of the following histograms as approximately symmetric, positively
skewed or negatively skewed, and mark the following.
i The mode (if there is a clear mode)
ii Any potential outliers
iii The approximate location of the centre
c
20
15
10
5
0
b
Frequency
Frequency
a
d
10
5
Histogram C
Histogram B
20
Frequency
Frequency
15
0
0
Histogram A
20
80
65
40
20
15
10
5
0
Histogram D
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Chapter 1
e
20
1C
Displaying and describing data distributions
f
15
10
5
0
7
20
Frequency
15
10
5
0
Histogram E
These three histograms show the
marks obtained by a group of
students in three subjects.
10
9
8
7
6
5
4
3
2
1
0
Histogram F
Frequency
Core
Frequency
26
a Are each of the distributions
approximately symmetric or
skewed?
b Are there any clear outliers?
c Determine the interval
containing the central mark for
each of the three subjects.
2
6 10 14 18 22 26 30 34 38 42 46
Subject A Subject B
Marks
d In which subject was the
spread of marks the least?
Subject C
Use the maximum range to estimate the spread.
e In which subject did the marks vary most? Use the range to estimate the spread.
Describing a histogram in the context of its data
The histogram opposite shows
the distribution of pulse rate for
28 students.
Use the histogram to complete
the report below describing
the distribution of pulse rate in
terms of shape, centre, spread
and outliers (if any).
6
Frequency (count)
8
5
4
3
2
1
0
60 65 70 75 80 85 90 95 100 105 110 115
Pulse rate (beats per minute)
Report
For the
students, the distribution of pulse rates is
with an
outlier. The centre of the distribution lies between
beats per minute and
the spread of the distribution is
beats per minute. The outlier lies in
somewhere between
beats per minute.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1C
1D Using a log scale to display data
The histogram opposite shows the distribution of
travel times (in minutes) for 42 journeys from an
outer suburban station to the city.
Use the histogram to write a brief report
describing the distribution of travel times in
terms of shape, centre, spread and outliers (if
any).
12
10
Frequency
9
27
8
6
4
2
0
55 60 65 70 75 80 85 90 95
Travel time (minutes)
1D Using a log scale to display data
Many numerical variables that we deal with in statistics have values that range over
several orders of magnitude. For example, the population of countries range from a few
thousand to hundreds of thousands, to millions, to hundreds of millions to just over 1 billion.
Constructing a histogram that effectively locates every country on the plot is impossible.
One way to solve this problem is to use a scale that spreads out the countries with small
populations and ‘pulls in’ the countries with huge populations.
A scale that will do this is called a logarithmic scale (or, more commonly, a log scale).
However, before you learn to apply log scales, you will have to learn something about
logarithms.
A brief introduction to logarithms to the base 10 and their
interpretation
Consider the numbers:
0.01,
0.1,
1,
10,
100,
1000,
10 000,
100 000, 1 000 000
Such numbers can be written more compactly as:
10−2 ,
10−1 ,
100 ,
101 ,
102 ,
103 ,
104 ,
105 ,
106
In fact, if we make it clear we are only talking about powers of 10, we can merely
write down the powers:
−2,
−1,
0,
1,
2,
3,
4,
5,
6
These powers are called the logarithms of the numbers or ‘logs’ for short.
When we use logarithms to write numbers as powers of 10, we say we are working with
logarithms to the base 10. We can indicate this by writing log10 .
Note: We could also use logarithms to write numbers as powers of two, for example, 8 = 23 , or powers
of 5 – for example, 625 = 54 . In these cases we would be working with logarithms to the base 2 and 5
respectively. Only base 10 logarithms are required for this course.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
28
Core
Chapter 1
Displaying and describing data distributions
Properties of logs to the base 10
1 If a number is greater than one, its log to the base 10 is greater than zero.
2 If a number is greater than zero but less than one, its log to the base 10 is negative.
3 If the number is zero, then its log is undefined.
Why use logs?
The set of numbers
0.01, 0.1, 1, 10, 100, 1000, 10 000, 100 000, 1 000 000
ranges from 0.01 to 1 million.
Thus, if we wanted to plot these numbers on a scale,
the first seven numbers would cluster together at one
end of the scale, while the eighth (1 million) would
be located at the far end of the scale.
0
By contrast, if we plot the logs of these numbers, they
are evenly spread along the scale. We use this idea to
display a set of data whose values range over several
orders of magnitude. Rather than plot the data values
themselves, we plot the logs of their data values.
–2 –1
300 000 600 000 900 000
Number
0
1 2 3 4
Log number
5
6
For example, the histogram below displays the body weights (in kg) of a number of animal
species. Because the animals represented in this dataset have weights ranging from around
1 kg to 90 tonnes (a dinosaur), most of the data are bunched up at one end of the scale and
much detail is missing. The distribution of weights is highly positively skewed, with an
outlier.
Percentage
80%
60%
40%
20%
0%
0
10 000 20 000 30 000 40 000 50 000 60 000 70 000 80 000 90 000
Bodywt
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1D Using a log scale to display data
28%
24%
Percentage
However, when a log scale is used, their
weights are much more evenly spread
along the scale. The distribution is now
approximately symmetric, with no outliers,
and the histogram is considerably more
informative.
29
We can now see that the percentage of
animals with weights between 10 and
100 kg is similar to the percentage of
animals with weights between 100 and 1000 kg.
20%
16%
12%
8%
4%
0%
–2
–1
0
1
2
3
4
log bodyweight
5
6
Note: In drawing this conclusion, you need to remember that log 10 = 1, log 100 = 2, and so on.
Working with logs
To construct and interpret a log data plot, like the one above, you need to be able to:
1 Work out the log for any number. So far we have only done this for numbers such as 10,
100, 100 that are exact powers of 10; for example, 100 = 102 , so log 100 = 2.
2 Work backwards from a log to the number it represents. This is easy to do in your head
for logs that are exact powers of 10 – for example, if the log of a number is 3 then the
number is 103 = 1000. But it is not a sensible approach for numbers that are not exact
powers of 10.
Your CAS calculator is the key to completing both of these tasks in practice.
Skillsheet
Example 8
Using a CAS calculator to find logs
a Find the log of 45, correct to two significant figures.
b Find the number whose log is 2.7125, correct to the nearest whole number.
Solution
a Open a calculator screen, type log (45) and press
·. Write down the answer correct to two
significant figures.
b If the log of a number is 2.7125, then the number
is 102.7125 .
Enter the expression 102.7125 and press ·.
Write down the answer correct to the nearest
whole number.
a log 45 = 1.65 . . .
= 1.7 (to 2 sig. figs)
b 10
2.7125
= 515.82 . . .
= 516 (to the nearest
whole number)
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
30
Core
Chapter 1
Displaying and describing data distributions
Analysing data displays with a log scale
Now that you know how to work out the log of any number and convert logs back to
numbers, you can analyse a data plot using a log scale.
Interpreting a histogram with a log scale
The histogram shows the
distribution of the weights of 27
animal species plotted on a log
scale.
a What body weight (in kg) is
represented by the number 4 on
the log scale?
b How many of these animals
have body weights more than
10 000 kg?
28%
24%
Percentage
Example 9
20%
16%
12%
8%
4%
0%
–2
–1
0
1
2
3
4
log bodyweight
5
6
c The weight of a cat is 3.3 kg. Use your calculator to determine the log of its weight
correct to two significant figures.
d Determine the weight (in kg) whose log weight is 3.4 (the elephant). Write your
answer correct to the nearest whole number.
Solution
a If the log of a number is 4 then the
number is 104 = 10 000.
b On the log scale, 10 000 is shown as 4.
a 104 = 10 000 kg
b Two animals
Thus, the number of animals with
a weight greater than 10 000 kg,
corresponds to the number of animals
with a log weight of greater than 4.
This can be determined from the
histogram which shows there are two
animals with log weights greater than 4.
c The weight of a cat is 3.3 kg. Use
your calculator to find log 3.3. Write
the answer correct to two significant
figures.
d The log weight of an elephant is 3.4.
Determine its weight in kg by using
your calculator to evaluate 103.4 .
Write the answer correct to the nearest
whole number.
c Cat: log 3.3 = 0.518...
= 0.52 kg (to 2 sig. figs)
d Elephant: 103.4 = 2511.88...
= 2512 kg
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1D Using a log scale to display data
31
Constructing a histogram with a log scale
The task of constructing a histogram is also a CAS calculator task.
Using a TI-Nspire CAS to construct a histogram with a log scale
The weights of 27 animal species (in kg) are recorded below.
1.4
470
36
28
1.0
12 000
2600
190
520
10
3.3
530
210
62
6700
9400
6.8
35
0.12
0.023
2.5
56
100
52
87 000
0.12
190
Construct a histogram to display the distribution:
a of the body weights of these 27 animals and describe its shape
b of the log body weights of these animals and describe its shape.
Steps
1 a Start a new document by pressing / + N.
b Select Add Lists & Spreadsheet.
Enter the data into a column named ‘weight’.
2 a Press / + I and select Add Data &
Statistics.
Click on the Click to add variable on the x-axis
and select the variable ‘weight’. A dot plot is
displayed.
b Plot a histogram using b>Plot
Type>Histogram.
c Describe the shape of the distribution.
Shape: positively skewed with
outliers
3 a Return to the Lists & Spreadsheet screen.
b Name another column ‘logweight’.
c Move the cursor to the grey cell below the
‘logweight’ heading. Type in = log(weight).
Press · to calculate the values of logweight.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
32
Core
Chapter 1
Displaying and describing data distributions
4 a Plot a histogram using a log scale. That is,
plot the variable ‘logweight’.
Note: Use b>Plot Properties>Histogram
Properties>Bin Settings>Equal Bin
Width and set the column width (bin) to 1
and alignment (start point) to −2 and use
b>Window/Zoom>Zoom-Data to rescale.
b Describe the shape of the distribution.
Shape: approximately symmetric
Using a ClassPad to construct a histogram with a log scale
The weights of 27 animal species (in kg) are recorded below.
1.4
470
36
28
1.0
12 000
2600
10
3.3
530
210
62
6700
9400
0.12
0.023
2.5
56
100
52
87 000
Construct a histogram to display the distribution:
a of the body weights of these 27 animals and describe its shape
190
520
6.8
35
0.12
190
b of the log body weights of these animals and describe its shape.
Steps
1 In the statistics application
enter the data
into a column named
‘weight’ as shown.
2 Plot a histogram of the data.
from the
a Tap
toolbar.
b Complete the dialog
box.
Draw: select On.
Type: select Histogram ( )
XList: select main\weight( ).
Freq: leave as 1.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
1D Using a log scale to display data
33
Tap Set to confirm your selections.
c Tap
in the toolbar.
d Complete the Set Interval dialog box as follows:
HStart: 0
HStep: 5000
Describe the shape of the distribution.
Shape: positively skewed with outliers
3 a Return to the data entry screen.
b Name another column ‘1wt’, short for log(weight).
c Tap in the calculation cell at the bottom of this
column.
Type log(weight) and tap
.
4 Plot a histogram to display the distribution of weights
on a log scale. That is, plot the variable 1wt.
a Tap
from the toolbar.
b Complete the dialog box.
Draw: select On.
Type: select Histogram ( ).
XList: select main\1wt ( ).
Freq: leave as 1.
Tap Set to confirm your selections.
c Tap
in the toolbar.
d Complete the Set Interval dialog box as follows:
HStart: type -2
HStep: type 1
Tap OK to display histogram.
Describe the shape of the distribution.
Shape: approximately symmetric
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
34
Core
Chapter 1
1D
Displaying and describing data distributions
Exercise 1D
Determining logs from numbers
1
Using a CAS calculator, find the logs of the following numbers correct to one decimal
place.
a 2.5
b 25
c 250
d 2500
e 0.5
f 0.05
g 0.005
h 0.0005
c −0.5
d 0
Determining numbers from logs
2
Find the numbers whose logs are:
a −2.5
b −1.5
Write your decimal answers correct to two significant figures.
Constructing a histogram with a log scale
3
The brain weights of the same 27 animal species (in g) are recorded below.
465
423
120
115
5.50
50.0
4600
419
655
115
25.6
680
406
1320
5712
70.0
179
56.0
1.00
0.40
12.1
175
157
440
155
3.00
180
a Construct a histogram to display the distribution of brain weights and comment on
its shape.
b Construct a histogram to display the log of the brain weights and note the shape of
the distribution.
Interpreting a histogram with a log scale
The histogram opposite shows the
distribution of brain weights (in g)
of 27 animal species plotted on a log
scale.
a The brain weight (in g) of a
mouse is 0.4 g. What value would
be plotted on the log scale?
b The brain weight (in g) of an
African elephant is 5712 g. What
is the log of this brain weight (to
two significant figures)?
9
Frequency
4
6
3
0
−2 −1
0
1
2
3
log weight
4
5
6
c What brain weight (in g) is represented by the number 2 on the log scale?
d What brain weight (in g) is represented by the number –1 on the log scale?
e Use the histogram to determine the number of these animals with brain weights:
i over 1000 g
ii between 1 and 100 g
iii over 1 g.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Chapter 1 review
35
Univariate data
Univariate data are generated when each observation involves
recording information about a single variable, for example a dataset
containing the heights of the children in a preschool.
Review
Key ideas and chapter summary
Types of variables Variables can be classified as numerical or categorical.
Categorical
variables
Categorical variables are used to represent characteristics of
individuals. Categorical variables come in two types: nominal and
ordinal. Nominal variables generate data values that can only be used
by name, e.g. eye colour. Ordinal variables generate data values that
can be used to both name and order, e.g. house number.
Numerical
variables
Numerical variables are used to represent quantities. Numerical
variables come in two types: discrete and categorical. Discrete variables
represent quantities – e.g. the number of cars in a car park. Continuous
variables represent quantities that are measured rather than counted –
for example, weights in kg.
Frequency table
A frequency table lists the values a variable takes, along with how
often (frequently) each value occurs. Frequency can be recorded as:
the number of times a value occurs – e.g. the number of females in
the dataset is 32
the percentage of times a value occurs – e.g. the percentage of
females in the dataset is 45.5%.
Bar chart
Bar charts are used to display frequency distribution of categorical
data.
Describing
distributions
of categorical
variables
Mode, modal
category
For a small number of categories, the distribution of a categorical
variable is described in terms of the dominant category (if any), the
order of occurrence of each category, and its relative importance.
Histogram
A histogram is used to display the frequency distribution of a
numerical variable. It is suitable for medium- to large-sized datasets.
The mode (or modal interval) is the value of a variable (or the interval
of values) that occurs most frequently.
Describing the
The distribution of a numerical variable can be described in terms of:
distribution of a
shape: symmetric or skewed (positive or negative)
numerical variable
outliers: values that appear to stand out
centre: the midpoint of the distribution (median)
spread: one measure is the range of values covered
(range = largest value – smallest value).
Log scales
Log scales can be used to transform a skewed histogram to symmetry.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Review
36
Core
Chapter 1
Displaying and describing data distributions
Skills check
Having completed this chapter, you should be able to:
differentiate between categorical data and numerical data
differentiate between nominal and ordinal categorical data
differentiate between discrete and continuous numerical data
interpret the information contained in a frequency table
identify and interpret the mode
construct a bar chart, segmented bar chart or histogram from a frequency table
read and interpret a histogram with a log scale.
Multiple-choice questions
The following information relates to Questions 1 and 2.
A survey collected information about the number of cars owned by a family and the car size
(small, medium, large).
1
The variables number of cars owned and car size (small, medium, large) are:
A both categorical variables
B both numerical variables
C a categorical and a numerical variable respectively
D a numerical and a categorical variable respectively
E a nominal and a discrete variable respectively
2
The variables head diameter (in cm) and sex (male, female) are:
A both categorical variables
B both numerical variables
C an ordinal and a nominal variable respectively
D a discrete and a nominal variable respectively
E a continuous and a nominal variable respectively
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
37
Chapter 1 review
The percentage segmented bar chart shows the
distribution of hair colour for 200 students.
4
80
The number of students with brown hair is
closest to:
A 4
B 34
D 72
E 114
Hair color
Other
Red
Black
Brown
Blonde
90
70
Percentage
3
100
C 57
60
Review
The following information relates to Questions 3 and 4.
50
40
The most common hair colour is:
30
A black
B blonde
20
C brown
D red
10
0
Questions 5 to 8 relate to the two-way frequency table below.
A group of 189 healthy middle-aged adults were
asked whether or not they were currently on a
diet. Their responses by sex are summarised in
the two-way frequency table below.
5
6
The total number of females in the group
is:
A 76
B 78
D 113
E 189
Male
Female
Total
Yes
31
45
76
No
47
66
113
Total
78
111
189
C 111
B 45
C 47
D 66
E 78
The percentage of females not on a diet is closest to:
A 39.7%
8
Diet
The number of males who said they were on a diet is:
A 31
7
Sex
B 41.5%
C 59.5%
D 60.3%
E 66.0%
The percentage of people on a diet who were male is:
A 39.7%
B 40.8%
C 41.5%
D 58.4%
E 76.0%
The histogram opposite displays the test
scores of a class of students.
9
The number of students is:
A 6
B 18
D 21
E 22
C 20
Frequency
Questions 9 to 13 relate to the histogram shown below.
6
5
4
3
2
1
0
6
8 10 12 14 16 18 20 22 24 26 28
Test score
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Review
38
Core
10
The number of students in the class who obtained a test score less than 14 is:
Chapter 1
Displaying and describing data distributions
A 4
11
B 10
C 14
D 16
E 28
The histogram is best described as:
A negatively skewed
B negatively skewed with an outlier
C approximately symmetric
D approximately symmetric with an outlier
E positively skewed
12
The centre of the distribution lies in the interval:
A 8–10
13
B 10–12
E 18–20
B 10
C 12
D 20
E 22
B 1
C 2
D 3
E 100
log10 100 equals:
A 0
15
D 14–16
The spread of the students’ marks is closest to:
A 8
14
C 12–14
Find the number whose log is 2.314; give the answer to the nearest whole number.
A 2
B 21
C 206
D 231
E 20606
The following information relates to Questions 16 and 17.
32%
Percentage
The percentage histogram opposite displays
the distribution of the log of the annual
per capita CO2 emissions (in tonnes) for
192 countries in 2011.
24%
16%
8%
0%
16
0.5 1.0 1.5 2.0
log CO2
Australia’s per capita CO2 emissions in 2011 were 16.8 tonnes. In which column of
the histogram would Australia be located?
A −0.5 to <0
17
−1.0 −0.5 0.0
B 0 to <0.5
C 0.5 to <1
D 1 to <1.5
E 1 to <1.5
The percentage of countries with per capita CO2 emissions of under 10 tonnes is
closest to:
A 14%
B 17%
C 31%
D 69%
E 88%
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Chapter 1 review
39
Review
Extended-response questions
2
25
20
15
100
b What percentage of students in total
nominated either going to the movies or
reading as their preferred leisure activity?
5
c What is the most popular leisure activity
for these students? How many rated this
activity as their preferred activity?
Sp
A group of 52 teenagers were asked,
‘Do you agree that the use of marijuana
should be legalised?’ Their responses are
summarised in the table.
M
or
us
ic
ov
i
R es
ea
di
n
O g
th
er
0
M
a What percentage of students nominated
watching TV as their preferred leisure
activity?
30
t
TV
One hundred and twenty-one students were
asked to identify their preferred leisure
activity. The results of the survey are
displayed in a bar chart.
Percentage
1
Preferred leisure activity
Frequency
Legalise
Number
Agree
18
a Construct a properly labelled and scaled
frequency bar chart for the data.
Disagree
26
b Complete the table by calculating the
percentages, to one decimal place.
Total
Don’t know
Percentage
8
52
c Use the percentages to construct a percentage segmented bar chart for the data.
d Use the frequency table to help you complete the following report.
Report
In response to the question, ‘Do you agree that the use of marijuana should
be legalised?’, 50% of the 52 students
. Of the remaining students,
% agreed, while
% said that they
.
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Core
3
Chapter 1
Displaying and describing data distributions
Students were asked how much they spent
on entertainment each month. The results
are displayed in the histogram. Use the
histogram to answer the following questions.
a How many students:
i were surveyed?
10
8
Frequency
Review
40
6
4
ii spent $100–105 per month?
2
b What is the mode?
c How many students spent $110 or more
per month?
0
90
100
110 120 130
Amount ($)
140
d What percentage spent less than $100 per month?
e
i Name the shape of the distribution displayed by the histogram.
ii Locate the interval containing the centre of the distribution.
iii Determine the spread of the distribution using the range.
The distribution of the waiting times of
34 cars stopped by a traffic light is shown in
the histogram. Use the histogram to write a
report on the distribution of waiting times in
terms of shape, centre, spread and outliers.
10
8
Frequency
4
6
4
2
0
5 10 15 20 25 30 40 45 50 55
Waiting time (seconds)
Cambridge Senior Maths AC/VCE
ISBN 978-1-316-61622-2
© Jones et al. 2016
Further Mathematics 3&4
Photocopying is restricted under law and this material must not be transferred to another party.
Cambridge University Press
Updated November 2017
Download