Brighter Thinking
A Level Further
Mathematics for AQA
Statistics Student Book (AS/A Level)
Stephen Ward, Paul Fannon, Vesna Kadelburg and Ben Woolley
Contents
Introduction
How to use this resource
1 Discrete random variables
1: Average and spread of a discrete random variable
2: Expectation and variance of transformations of discrete random variables
3: The discrete uniform distribution
2 Poisson distribution
1: Using the Poisson model
2: Using the Poisson distribution in hypothesis tests
3 Chi-squared tests
1: Contingency tables
2: Yates’ correction
4 Continuous distributions
1: Continuous random variables
2: Expectation and variance of continuous random variables
3: Expectation and variance of functions of a random variable
4: Sums of independent random variables
5: Linear combinations of normal variables
6: Cumulative distribution functions
7: Piecewise-defined probability density functions
8: Rectangular distribution
9: Exponential distribution
10: Combining discrete and continuous random variables
Focus on … Proof 1
Focus on … Problem solving 1
Focus on … Modelling 1
5 Further hypothesis testing
1: t-tests
2: Errors in hypothesis testing
6 Confidence intervals
1: Confidence intervals
2: Confidence intervals for the mean when the population variance is unknown
Focus on … Proof 2
Focus on … Problem solving 2
Focus on … Modelling 2
Cross-topic review exercise
AS Level Practice paper
A Level Practice paper
Formulae
Answers
Worked solutions for chapter exercises
1 Discrete random variables
2 Poisson distribution
3 Chi-squared tests
4 Continuous distributions
5 Further hypothesis testing
6 Confidence intervals
Worked solutions for cross-topic review exercises
Cross-topic review exercises
Acknowledgements
Introduction
You have probably been told that mathematics is very useful, yet it can often seem like a lot of techniques
that just have to be learnt to answer examination questions. You are now getting to the point where you
will start to see where some of these techniques can be applied in solving real problems. However, as well
as seeing how maths can be useful, we hope that anyone working through this book will realise that it can
also be incredibly frustrating, surprising and ultimately beautiful.
The book is woven around three key themes from the new curriculum.
Proof
Maths is valued because it trains you to think logically and communicate precisely. At a high level, maths is
far less concerned about answers and more about the clear communication of ideas. It is not about being
neat – although that might help! It is about creating a coherent argument that other people can easily
follow but find difficult to refute. Have you ever tried looking at your own work? If you cannot follow it
yourself it is unlikely anybody else will be able to understand it. In maths we communicate using a variety
of means – feel free to use combinations of diagrams, words and algebra to aid your argument. And once
you have attempted a proof, try presenting it to your peers. Look critically (but positively) at some other
people’s attempts. It is only through having your own attempts evaluated and trying to find flaws in other
proofs that you will develop sophisticated mathematical thinking. This is why we have included lots of
common errors in our Work it out boxes – just in case your friends don’t make any mistakes!
Problem solving
Maths is valued because it trains you to look at situations in unusual, creative ways, to persevere and to
evaluate solutions along the way. We have been heavily influenced by a great mathematician and maths
educator George Polya, who believed that students were not just born with problem-solving skills – they
developed them by seeing problems being solved and reflecting on their solutions before trying similar
problems. You may not realise it but good mathematicians spend most of their time being stuck. You need
to spend some time on problems you can’t do, trying out different possibilities. If after a while you have not
cracked it, then look at the solution and try a similar problem. Don’t be disheartened if you cannot get it
immediately – in fact, the longer you spend puzzling over a problem the more you will learn from the
solution. You may never need to integrate a rational function in the future, but we firmly believe that the
problem solving skills you will develop by trying it can be applied to many other situations.
Modelling
Maths is valued because it helps us solve real-world problems. However, maths describes ideal situations
and the real world is messy! Modelling is about deciding on the important features needed to describe the
essence of a situation and turning them into a mathematical form, then using that to make predictions,
compare to reality and possibly improve the model. In many situations the technical maths is actually the
easy part – especially with modern technology. Deciding which features of reality to include or ignore and
anticipating the consequences of these decisions is the hard part. Yet it is amazing how some fairly drastic
assumptions – such as pretending a car is a single point or that people’s votes are independent – can result
in models that are surprisingly accurate.
More than anything else, this book is about making links – links between the different chapters, the topics
covered and the themes above, links to other subjects and links to the real world. We hope that you will
grow to see maths as one great complex but beautiful web of interlinking ideas.
Maths is about so much more than examinations, but we hope that if you take on board these ideas (and
do plenty of practice!) you will find maths examinations a much more approachable and possibly even
enjoyable experience. However, always remember that the results of what you write down in a few hours
by yourself in silence under exam conditions are not the only measure you should consider when judging
your mathematical ability – it is only one variable in a much more complicated mathematical model!
How to use this resource
Throughout this resource you will notice particular features that are designed to aid your learning. This
section provides a brief overview of these features.
In this chapter you will learn how to:
predict the mean, mode, median and variance of a discrete random variable
understand how a linear transformation of a variable changes the mean and variance
prove and use the formulae for expectation and variance of a special distribution called the uniform
distribution
recognise when it is appropriate to use a uniform distribution.
If you are following the A Level course, you will also learn how to:
calculate the mean of a discrete random variable after a non-linear transformation.
Learning objectives
A short summary of the content that you will learn in each chapter.
WORKED EXAMPLE
The left-hand side shows you how to set out your working. The right-hand side explains the more
difficult steps and helps you understand why a particular method was chosen.
PROOF
Step-by-step walkthroughs of standard proofs and methods of proof.
WORK IT OUT
Can you identify the correct solution and find the mistakes in the two incorrect solutions?
A Level
Mathematics
Student Book 1,
Chapter 21
You should know how to use the rules of
probability.
1 Two events A and B are
independent. If P(A)=0.4
and P(B)=0.3, find
P(A AND B).
A Level
Mathematics
Student Book 1,
Chapter 21
You should know how to find probabilities of
discrete random variables.
2 P(X=x)=kx for x=1,2, 3.
Find the value of k.
A Level
Mathematics
Student Book 1,
Chapter 20
You should know how to find the mean, variance
and standard deviation of data, including
familiarity with formulae involving sigma notation.
3 Find the variance of 2, 5
and 8.
Before you start
Points you should know from your previous learning and questions to check that you're ready to start the
chapter.
Key point
A summary of the most important methods, facts and formulae.
Common error
Specific mistakes that are often made. These typically appear next to the point in the Worked
example where the error could occur.
Tip
Useful guidance, including on ways of calculating or checking and use of technology.
Each chapter ends with a Checklist of learning and understanding and a Mixed practice exercise, which
includes past paper questions marked with the icon
.
In between chapters, you will find extra sections that bring together topics in a more synoptic way.
FOCUS ON…
Unique sections relating to the preceding chapters that develop your skills in proof, problem-solving and
modelling.
CROSS-TOPIC REVIEW EXERCISE
Questions covering topics from across the preceding chapters, testing your ability to apply what you
have learned.
Key terms are picked out in colour within chapters. You can hover over these terms to view their definitions,
or find them in the Glossary tab. Towards the end of the resource you will find practice paper questions,
short answers to all questions and worked solutions.
Rewind
Reminders of where to find useful information from earlier in your study.
Fast forward
Links to topics that you may cover in greater detail later in your study.
Focus on…
Links to problem-solving, modelling or proof exercises that relate to the topic currently being
studied.
Did you know?
Interesting or historical information and links with other subjects to improve your awareness
about how mathematics contributes to society.
Colour coding of exercises
The questions in the exercises are designed to provide careful progression, ranging from basic fluency to
practice questions. They are uniquely colour-coded, as shown below.
1
A sequence is defined by un=2×3n−1. Use the principle of mathematical induction to prove that
u1+u2+…+un=3n−1.
2
3
4
Show that 12+22+…+n2=n(n+1)(2n+1)6
Show that 13+23+…+n3=n2(n+1)24
Prove by induction that 11×2+12×3+13×4+…+1n(n+1)=nn+1
5
Prove by induction that 11×3+13×5+15×7+…+1(2n−1)×(2n+1)=n2n+1
6
Prove that 1×1!+2×+3×3!…+n×n!=(n+1)!−1
7
Use the principle of mathematical induction to show that 12−22+32−42+…+(−1)n−1n2=(−1)n
−1n(n+1)2.
8
Prove that (n+1)+(n+2)+(n+3)+…+(2n)=12n(3n+1)
9
10
Prove using induction that sinθ+sin3θ+…+sin(2n−1)θ=sin2nθsinθ, n ∈ ℤ+
Prove that ∑k=1nk 2k=(n−1)2n+1+2
Black – practice questions which come in several parts, each with subparts i and ii. You only need
attempt subpart i at first; subpart ii is essentially the same question, which you can use for further
practice if you got part i wrong, for homework, or when you revisit the exercise during revision.
Yellow – designed to encourage reflection and discussion
Green – practice questions at a basic level
Blue – practice questions at an intermediate level
Red – practice questions at an advanced level
Purple – challenging questions that apply the concept of the current chapter across other areas of
maths.
indicates content that is for A level students only
indicates a question that requires a calculator
indicates a non-calculator question
1 Discrete random variables
In this chapter you will learn how to:
predict the mean, mode, median and variance of a discrete random variable
understand how a linear transformation of a variable changes the mean and
variance
prove and use the formulae for expectation and variance of a special distribution
called the uniform distribution
recognise when it is appropriate to use a uniform distribution.
If you are following the A Level course, you will also learn how to:
calculate the mean of a discrete random variable after a non-linear transformation.
Before you start…
1 Two events and are
independent. If
and
find
.
A Level
You should know how to use the rules
Mathematics
Student Book 1,
Chapter 21
of probability.
A Level
Mathematics
Student Book 1,
Chapter 21
You should know how to find probabilities of
discrete random variables.
2
A Level
Mathematics
Student Book 1,
Chapter 20
You should know how to find the mean, variance
and standard deviation of data, including
familiarity with formulae involving sigma
notation.
3 Find the variance of
and .
A Level Further
Mathematics
Student Book 1,
Chapter 11
You should know how to calculate sums of
powers of .
4 Find and simplify an
for
,
. Find the value of .
expression for
What are discrete random variables?
A random variable is a variable that can change every time it is observed – such as the outcome when
you roll a dice. A discrete random variable can only take certain values. In A Level Mathematics Student
Book 1, Chapter 21, you covered the probability distributions of discrete random variables – a table or
rule giving a list of all possible outcomes along with their probabilities.
Tip
Discrete variables don’t have to take integer values. However, the possible distinct values can be
listed, though the list may be infinite. For example:if
is the standard UK shoe size of a random
adult member of the public,
takes values ,
, ,
up to
and is a discrete random
variable.If is the exact foot length of a random adult member of the public (in cm),
values in the interval [ , ] and is a continuous random variable.
takes
Many real-life situations follow probability distributions – such as the velocity of a molecule in a waterfall or
the amount of tax paid by an individual. It is extremely difficult to make a prediction about a single
observation, but it turns out that you can predict remarkably accurately the overall behaviour of many
millions of observations. In this chapter you will see how you can predict the mean and variance of a
discrete random variable.
Section 1: Average and spread of a discrete random variable
The most commonly used measure of the average of a random variable is the expectation. It is a value
representing the mean result if the variable were to be measured an infinite number of times.
Tip
The expectation of a random variable does not need to be a value that the variable can actually
take.
Key point 1.1
The expectation of a discrete random variable
where
is each possible value that
is written
can take and
and calculated as
is the associated probability.
Tip
The subscript in the formula in Key point 1.1 is just a counter referring to each possible value
and its associated probability.
You do not need to be able to prove this result, but you might find it helpful to see this proof.
PROOF 1
The mean of
pieces of discrete data is
Start from the definition of the mean.
Since
If
is large,
probability of
.
will tend towards the
happening, therefore
is constant you can take it into the sum.
When the sample size tends to infinity, the
sample mean becomes the true population
mean, .
WORKED EXAMPLE 1.1
The random variable
has a probability distribution as shown in the table. Calculate
.
Use the values from the distribution in the
formula in Key point 1.1.
As well as knowing the expected average, you may also be interested in how far away from the average
you can expect an outcome to be. The variance, , of a random variable is a value representing the
degree of variation that would be seen if the variable were to be repeatedly measured an infinite number
of times. It is a measure of how spread out the variable is.
Fast forward
You will see in Section 2 how to find expectations of other functions of
.
Key point 1.2
The variance of a discrete random variable
where
is written
and calculated as
Did you know?
Standard deviation – the square root of variance – is a much more meaningful representation of
the spread of a variable. So why is variance used at all? The answer is purely to do with
mathematical elegance. It turns out that the algebra of variance is far neater than the algebra of
standard deviations.
The quantity
is the expected value of
, read as ‘the mean of the squares’. This variance formula is
often read as ‘the mean of the squares minus the square of the mean’.
WORKED EXAMPLE 1.2
Calculate
for the probability distribution in Worked example 1.1.
From Worked example 1.1:
Use the values from the distribution in the formula in Key point 1.2.
Tip
Many calculators can simplify this process. You normally have to treat the values of the
random variable as data and the probabilities as the frequency.
Two other less commonly used measures of average are the mode and the median. For data, the mode is
the most common result and this extends to variables.
Key point 1.3
The mode of a discrete random variable
probability.
is the value of
associated with the largest
For data, the median is the value that has half the data values below it and half above it. You can interpret
this in terms of probabilities.
Key point 1.4
The median,
, of a discrete random variable
is any value that has
and
If there are two possible values, you have to find their mean.
When there are two possible values and you have to take their mean, the median will take a value different
from any observed value of the random variable.
WORKED EXAMPLE 1.3
For the distribution in Worked example 1.1 find:
a the mode
b the median.
a The largest probability is
modes:
so there are two
and .
b
You can create a table of
.
Look for the first value that has a value
of
greater than or equal to
.
You could also check that
but this is not necessary here.
So the median is .
A probability distribution can also be described by a function.
WORKED EXAMPLE 1.4
is a random variable that can take values
and
where
a Find the value of .
b Find the expected mean of
.
c Find the standard deviation of
.
Use the fact that the total of all the probabilities
must be
a
b
Use Key point 1.1.
c
To find the standard deviation you first need to
find the variance, which means you need to find
and use Key point 1.2.
So
WORK IT OUT 1.1
Although you only write down three significant
figures in the working, make sure you use the full
accuracy from your calculator to find the final
answer.
Find the variance of
, the random variable defined by this distribution.
Which is the correct solution? Identify the errors made in the incorrect solutions.
A
B
C
EXERCISE 1A
1
Calculate the expectation, mode, median, variance and standard deviation of each of these discrete
random variables.
a
i
ii
b
i
ii
c
i
ii
d
ii
2
,
i
,
A discrete random variable
is given by
for
.
a Show that
b Find
3
4
.
.
A discrete random variable
a Find the values of
and .
b Find the median of
.
A discrete random variable
has the probability distribution shown and
has its probability given by
, where
a Show that
.
.
b Find the exact value of
5
.
.
The probability distribution of a discrete random variable
,
is defined by
.
a Find the value of .
b Find
.
c Find the standard deviation of
6
.
A fair six-sided dice, with sides numbered
is thrown. Find the mean and variance of the
score.
7
The table shows the probability distribution of a discrete random variable
a Given that
, find the values of
b Find the standard deviation of
8
.
and of .
.
A biased dice with four faces is used in a game. A player pays counters to roll the dice. The table
shows the possible scores on the dice, the probability of each score and the number of counters the
player receives in return for each score.
Score
Probability
Number of counters player
receives
Find the value of
9
in order for the player to get an expected profit of
Two fair dice labelled with the values
to
counters per roll.
are thrown. The random variable
between the larger and the smaller score, or zero if they are the same.
a Copy and complete this table to show the probability distribution of
b Find
c Find
.
.
d Find the median of
e Find
.
.
.
is the difference
10 a In a game a player pays an entrance fee of £ . He then selects one number from
or and
rolls three fair four-sided dice, numbered to . If his chosen number appears on all three dice he
wins four times the entrance fee. If his number appears on exactly two of the dice he wins three
times the entrance fee. If his number appears on exactly one dice he wins £ . If his number does
not appear on any of the dice he wins nothing.
Copy and complete the probability table.
Profit £
Probability
b The game organiser wants to make a profit over many plays of the game. Given that he must
charge a whole number of pence, what is the minimum amount the organiser must charge?
11 Viewers are asked to rate a new film on a three-point scale. Their marks are modelled by the random
variable as shown.
a The mean, median and mode of
are all equal. Find the variance of .
b Two independent viewers of the film are both asked their opinion.
i What is the probability that their total score is more than ?
ii Show that the expectation of their total score is .
12 The number of books borrowed by each person who visits a library is modelled by the random variable
.
a Find the mean of
.
b Show that the expectation of
is larger than the median of
c Show that the standard deviation of
d
.
is less than the median of
.
people visited the library during an audit period. The numbers of books they borrowed are
independent of each other. Find:
i the probability that exactly three people borrow no books
ii the expected number of people who borrow no books.
Section 2: Expectation and variance of transformations of discrete
random variables
Linear transformations
You might have noticed a link between parts a and b of question 1 in Exercise 1A. The distributions were
very similar but in part b all the -values were multiplied by . All the averages and the standard
deviations were also multiplied by
transformation.
but the variances were multiplied by
. This is an example of a
The most common type of transformation is a linear transformation. This is where the new variable
is found from the old variable
by multiplying by a constant and/or adding on a constant. You might do
this, for example, if you change the units of measurement. This kind of change is also known as ‘linear
coding’.
If you know the original mean and variance and how the data were transformed, you can use a shortcut to
find the mean and variance of the new data.
Key point 1.5
If
is a random variable and
is a new random variable such that
, then:
Fast forward
You will prove Key point 1.5 after you have developed a little more theory.
This means that the standard deviation of
, is
does change how spread out they are, but adding on
. This makes sense as multiplying the data by
does not change the spread.
WORKED EXAMPLE 1.5
A random variable
. Find:
has expectation
and variance
.
is a transformation of
given by
a the expectation of
b the standard deviation of
.
a
This is just a direct application of Key point 1.5.
b
To find the standard deviation you first need to find the variance of
using Key point 1.5.
,
Common error
It is easy to get confused with the minus sign in the transformations in Worked example 1.5.
Remember that both variances and standard deviations are always positive.
Non-linear transformations
You can also apply non-linear transformations to
, such as
,
or
. When you do this
there is no shortcut to finding the mean and variance of the transformed variable. You need to adapt
Key point 1.1.
Consider the discrete random variable
construct the probability distribution for
The probability of
being
outcome on a fair six-sided dice. If
, you can
:
is just the same as the probability of
being . So
, i.e. it is
.
Key point 1.6
If
is a discrete random variable with expectation
and
is a function applied to
,
then
WORKED EXAMPLE 1.6
The discrete random variable
If
has the distribution shown in the table.
°, find:
a
b
.
a
Apply Key point 1.6
b
To find
you need
which is
.
You can use Key point 1.6 to prove Key point 1.5.
PROOF 2
Let
Apply Key point 1.6 to the function
Then:
.
You can separate out a sum into its different terms, taking
out constant factors.
Use the fact that
for any probability distribution
and the definition of expectation from Key point 1.1. You
have now established the first part of Key point 1.6.
Considering
to get to
the variance:
Apply Key point 1.6 to the function
expand the brackets.
and
You can separate out a sum into its different terms, taking
out constant factors.
Use the fact that
and the definitions of
for any probability distribution
and
.
Using the definition of
variance from Key point 1.2:
Expand the brackets and lots of terms cancel!
Taking out a factor of
leaves the expression for
from Key point 1.2. This completes the proof.
EXERCISE 1B
1
and
a
. Find
and
if:
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
ii
2
.
The discrete random variable
Find
a
i
ii
b
i
ii
c
i
ii
d
i
and
if:
follows this distribution:
.
ii
3
Stephen goes on a
mile bike ride every weekend. The distance until he stops for a picnic is
modelled by , where
and
.
is the distance remaining after his picnic. Find
4
The rule for converting between degrees Celsius
and
.
and degrees Fahrenheit
When a bread oven is operating it has expected temperature
.
is:
with standard deviation
Find the expected temperature and standard deviation in degrees Fahrenheit.
5
The random variable
has expectation
and variance . If
, find the values of
and so that the expectation of is zero and the standard deviation is .
6
is a discrete random variable where
and
such that
. Find
and the standard deviation of
7
is a discrete random variable satisfying
for
.
is a transformation of
.
.
Find:
a the value of
b
c
d
e
8
.
The discrete random variable
has a distribution given by
for
.
a Find, in terms of
and ,
b Hence find, in terms of
9
.
and ,
.
A discrete random variable
has equal expectation and standard deviation. is a
transformation of
such that
. Prove that it is only possible for the expectation of
to equal the variance of
if
10 The St Petersburg Paradox describes a game where a fair coin is tossed repeatedly until a head
is found. You win
pounds if the first head occurs on the
toss. How much should you pay to
play this game?
Section 3: The discrete uniform distribution
You have already met some special distributions that occur so often that they are named. For example, the
binomial and the normal distributions. Another very common distribution is the discrete uniform
distribution.
This is a distribution in which all the whole numbers from to are equally likely and it is given the symbol
. For example,
gives the distribution of the outcomes on a fair six-sided dice.
Key point 1.7
If a random variable
follows a discrete uniform distribution
for
, then
.
If you identify a random variable as following a uniform distribution you can immediately write down the
expectation and variance.
Key point 1.8
If a random variable
follows a discrete uniform distribution
and
, then
.
Rewind
You met the rules for working with indices in A Level Mathematics Student Book 1, Chapter 2.
You can prove the result in Key point 1.8 by using your knowledge of sums of powers of integers.
PROOF 3
If
then
and
.
denotes the possible values of
which are
.
is a constant so you can take it out of the sum.
Use the result for the sum of the first
All the values of
Use the result for
need to be squared.
.
Use the formula for variance.
positive integers:
In Section 2 you saw how to find the expectation and variance of a linear transformation of a discrete
random variable. You can find the expectation and variance of a linear transformation of a discrete uniform
distribution in the same way.
WORKED EXAMPLE 1.7
The discrete random variable
the variance of
is equally likely to take any even value from
where
The values of
where
So
are
.
is a linear transformation of
Apply Key point 1.8.
Apply Key point 1.5.
EXERCISE 1C
to
inclusive. Find
.
. You can write these as
.
,
EXERCISE 1C
1
Find the mean and variance of these distributions.
a
i
ii
b
i
ii
2
A fair spinner has sides labelled
. Find the expected mean and standard deviation of the
results of the spinner.
3
A fair dice has sides labelled
of throwing the dice.
4
a The random variable
. Find the expectation and standard deviation of the outcome
is equally likely to take any integer value between
can be written as
where
b Hence find the variance of
5
A string of
and . Show that this
.
.
Christmas lights starts with a plug then contains a light every
from the plug.
One light is broken. Assuming all bulbs are equally likely to break, what are the expected mean and
variance of the distance of the broken light from the plug?
6
7
The random variable
is equally likely to take the value of any odd number between
inclusive. Find the variance of
.
The discrete random variable
variance of .
takes values
8
and
9
A random number,
, is chosen from the fractions
Prove that
but
10
and
. Find the expectation and
. Find .
. Prove that
.
is always divisible by
.
Checklist of learning and understanding
The expectation of a discrete random variable
.
is written
The variance of a discrete random variable
is written
where
.
The mode of
is the value of
The median,
, is any value which has
and calculated as
and calculated as
associated with the largest probability.
and
.
If there are two possible values, you have to find their mean.
If
, then:
A discrete uniform distribution models situations in which all discrete outcomes are equally
likely.
If
, then
for
and
and
.
Mixed practice 1
1
A discrete random variable has
options.
and
. Find
. Choose from these
A
B
C
D
2
A discrete random variable
has a distribution defined by
for
. Find
. Choose from these options.
A
B
C
D
3
A drawer contains three white socks and five black socks. Two socks are drawn without
replacement.
is the number of black socks drawn.
a Find the probability distribution of
b Find
4
.
.
A fair six-sided dice is thrown once. The random variable
is calculated as half the result if
the dice shows an even number, or one higher than the result if the dice shows an odd
number.
a Write down a table representing the probability distribution of
b Find
c Find
.
.
d Find the mode of
e Find the median of
5
6
a
.
.
.
. Find the expectation and variance of
.
b
is the discrete random variable that is equally likely to take any integer value between
and .
Find
and
.
c
is the discrete random variable that is equally likely to take any even value between
and .
Find
and
.
The random variable
follows this distribution:
a Write down the median of
b If
, find the values of
c Hence find
7
and .
and show that
.
is a discrete random variable with
standard deviation of
8
.
and
.
. Find
The random variable
has expectation
and variance
. If
and so that the expectation of is
and the standard deviation is
9
and the
.
is a discrete random variable that can take the value
a If
, find the standard deviation of
b
. Find
and
10 A fair dice is thrown until a
, find the values of
.
or .
.
.
has been thrown or three throws have been made.
is the
discrete random variable representing the number of throws made.
a Write down, in tabular form, the distribution of
b Find
.
.
c Find the median of
.
d The number of points awarded in the game,
, is given by
. Find the variance of
.
11 a A four-sided dice labelled with the values to is rolled twice. Write down, in a table, the
probability distribution of , the sum of the two rolls.
b Find
and
.
c A four-sided dice is rolled once and the score,
variance of .
12 The discrete random variable
is the variance of . Find
13
follows the
, is twice the result. Find the mean and
distribution.
is the expectation of
and
.
is a discrete random variable satisfying
for
.
Find, in terms of :
a
b
c
d
.
14 A discrete random variable has
. Find
in terms of
15 In a card game a pack of
standard playing cards is used. The cards are dealt one at a time
until the Queen of Spades (a unique card in the pack) is revealed.
a What are the expected mean and standard deviation of the number of cards until the
Queen of Spades is revealed?
b In the game the player scores
points if the Queen of Spades is the th card revealed.
Find the expected number of points scored.
16 A box contains a large number of pea pods. The number of peas in a pod can be modelled by
the random variable . The probability distribution of
is shown here:
or fewer
or more
a Two pods are picked randomly from the box. Find the probability that the number of peas in
each pod is at most .
b It is given that
.
i Determine the values of
and .
ii Hence show that
.
iii Some children play a game with the pods, randomly picking a pod and scoring points
depending on the number of peas in the pod. For each pod picked, the number of points
scored,
, is found by doubling the number of peas in the pod and then subtracting .
Find the mean and the standard deviation of
.
[© AQA 2014]
17 In a computer game, players try to collect five treasures. The number of treasures that Isaac
collects in one play of the game is represented by the discrete random variable .
The probability distribution of
a
i Show that
is defined by
.
ii Calculate the value of
iii Show that
.
.
iv Find the probability that Isaac collects more than
treasures.
b The number of points that Isaac scores for collecting treasures is
Calculate the mean and the standard deviation of
where
.
.
[© AQA 2014]
2 Poisson distribution
In this chapter you will learn how to:
use the conditions required for a Poisson distribution to model a situation
use the Poisson formula and calculate Poisson probabilities
calculate the mean, variance and standard deviation of a Poisson variable
use the distribution of the sum of independent Poisson distributions
carry out a hypothesis test of a population mean from a single observation from a
Poisson distribution.
Before you start…
A Level
Mathematics
Student Book 1,
Chapter 21
You should know how to
work with the binomial
distribution.
1 Given that
A Level
Mathematics
Student Book 2,
Chapter 20
You should know how to
work with conditional
probability.
2 Given that
find
.
Chapter 1
You should know how to find
the expectation and variance
of discrete random variables.
3 Find
A Level
Mathematics
Student Book 1,
Chapter 22
You should know how to
carry out hypothesis tests on
the binomial distribution.
4 A coin is tossed
times and tails are
observed. Use a two-tailed test to
determine at the
significance level if
this coin is biased.
and
, find
and
.
,
for this distribution:
What is the Poisson distribution?
When you are waiting for a bus there are two possible outcomes – at any given moment the bus either
arrives or it doesn’t. You can try modelling this situation, using a binomial distribution, but it is not clear
what an individual trial is. Instead you have an average rate of success – the number of buses that arrive in
a fixed time period.
There are many situations in which you know the average rate of events within a given space or time, in
contexts ranging from commercial, such as the number of calls through a telephone exchange per minute,
to biological, such as the number of clover plants seen per square metre in a pasture. If the events can be
considered independent of each other (so that the probability of each event is not affected by what has
already been seen), the number of events in a fixed space or time interval can be modelled by the Poisson
distribution.
Section 1: Using the Poisson model
The Poisson distribution is commonly used when these conditions hold:
the events occur singly (one at a time)
the events are independent of each other
the average rate of events (conventionally called lambda, ) is constant.
If these conditions are satisfied then the discrete random variable ‘number of events,
Poisson distribution with mean . You write this as
.
’, follows the
Tip
If a question mentions average rate of success, or events occurring at a constant rate, you
should use the Poisson distribution.
If you can identify a fixed number of trials, you should use the binomial distribution.
The Poisson distribution can also be a useful approximate model for discrete random variables in other
situations. However, if the stated conditions are not met this can only be established by looking empirically
at data.
Once you have identified that a situation follows a Poisson distribution, you can use facts about the
probability of a certain number of events, the expected number of events and the expected variance.
Key point 2.1
If a random variable
follows a Poisson distribution
, then:
for
These formulae will be given in your formula book.
Common error
Remember that
, not .
Notice that the values of the mean and variance are equal for the Poisson distribution. This is something
you look out for when determining if data are likely to fit a Poisson model, although in itself is not sufficient
to decide – there are other distributions with this feature.
A typical Poisson distribution, the
distribution, is shown here:
p
0.4
0.3
0.2
0.1
0
0
1
2
3
4
5
6
x
Notice that:
the mean rate does not have to be a whole number
the distribution is not symmetric
the graph, in theory, should continue on to infinite values of
of
get very small.
, but the probabilities of very large values
WORKED EXAMPLE 2.1
Recordable accidents occur in a factory at an average rate of every year, independently of each
other. Find the probability that in a given year exactly recordable accidents occurred.
Let
be the number of recordable
accidents in a year:
Define the random variable.
Give the probability distribution.
Write down the probability required, and calculate
the answer.
The Poisson distribution is scalable. For example, if the number of butterflies seen on a flower in
minutes
follows a Poisson distribution with mean , then the number of butterflies seen on a flower in
minutes
follows a Poisson distribution with mean
a Poisson distribution with mean
, the number of butterflies seen on a flower in
minutes follows
, and so on.
Tip
Learn how to use your calculator to find Poisson probabilities,
probabilities,
.
and cumulative
WORKED EXAMPLE 2.2
If there are, on average,
buses per hour arriving at a bus stop, find the probability that more
than buses arrive in
minutes.
Let
be the number of buses in
minutes:
Define the random variable.
Give the probability distribution.
Write down the probability required. To use your
calculator you must relate this probability to
.
The scalability of the Poisson distribution is a consequence of a more general result. If two independent
variables both follow a Poisson distribution then so does their sum.
Key point 2.2
If random variables
and
, then
follow Poisson distributions such that
and
and
.
Although you do not need to know the proof of the result in Key point 2.2, it does show an interesting link
with the binomial expansion.
PROOF 4
Consider all the different ways in which
can take the value . If
then
If
then
, etc.
Rewrite in sigma notation to keep the
expression shorter.
Use the formula for the Poisson distribution.
You can take out factors of
and
the sum since they are constants.
from
You are close to having a binomial
coefficient. Multiply by in the sum to get to
this, but then you have to divide by too.
Replace the factorials with a binomial
coefficient.
You can recognise the sum as a binomial
expansion.
This is a Poisson distribution with mean
.
WORKED EXAMPLE 2.3
Hywel receives an average of
message he receives.
emails and
texts each hour. These are the only types of
a Assuming that both the emails and the texts form an independent Poisson distribution, find the
probability that he receives more than messages in an hour.
b Explain why the assumption that the emails and texts form independent Poisson distributions is
unlikely to be true.
a
Use Key point 2.2 to combine the two
Poisson distributions.
You need to write the required probability in
terms of a cumulative probability to use the
calculator function.
b The rate of arrival of messages is unlikely to
be constant – there will probably be more at
some times of the day than at others. Within
each distribution messages are not likely to
be independent as they may occur as part of
a conversation. The two distributions are
also probably not independent of each other,
as times when more emails arrive might be
similar to times when more texts might
arrive.
Common error
Sometimes people think that the mean rate in a Poisson distribution has to be a whole number.
This is not the case.
WORK IT OUT 2.1
The number of errors in a computer code is believed to follow a Poisson distribution with a mean
of
errors per
lines of code. Find the probability that there are more than errors in
lines
of code. Which is the correct solution? Identify the errors made in the incorrect solutions.
A
If
is the number of errors in
lines, then
.
B
If
is the number of errors in
lines, then
.
More than
errors in
lines is equivalent to more than
error in
lines, so you need
C
EXERCISE 2A
1
State the distribution of the variable in each of these situations.
a Cars pass under a motorway bridge at an average rate of per
i The number of cars passing under the bridge in one minute.
ii The number of cars passing under the bridge in
b Leaks occur in water pipes at an average rate of
.
i The number of leaks in
ii The number of leaks in
c
worms are found on average in a
i The number of worms found in a
Calculate these probabilities.
a If
i
ii
:
per kilometre.
.
ii The number of worms found in a
2
seconds.
area of a garden.
area of garden.
by
area of garden.
second period.
b If
:
i
ii
c If
:
i
ii
d If
:
i
ii
e If
:
i
ii
3
A random variable
follows a Poisson distribution with mean
of probabilities, giving the results to three significant figures.
4
From a particular observatory, shooting stars are observed in the night sky at an average rate of
one every five minutes. Assuming that this rate is constant and that shooting stars occur (and
are observed) independently of each other, what is the probability that more than
are seen
over a period of one hour?
5
When examining blood from a healthy individual, under a microscope, a haematologist knows
. Copy and complete this table
he should see on average four white blood cells in each high power field. Find the probability
that blood from a healthy individual will show:
a seven white blood cells in a single high power field
b a total of
6
white blood cells in six high power fields, selected independently.
A wire manufacturer is looking for flaws. Experience suggests that there are on average
flaws per metre in the wire.
a Determine the probability that there is exactly one flaw in one metre of the wire.
b Determine the probability that there is at least one flaw in
7
The random variable
metres of the wire.
has a Poisson distribution with mean . Calculate:
a
b
c
d
8
The number of eagles observed in a forest in one day follows a Poisson distribution with mean
.
a Find the probability that more than three eagles will be observed on a given day.
b Given that at least one eagle is observed on a particular day, find the probability that exactly
two eagles are seen that day.
9
The random variable
follows a Poisson distribution. Given that
a the mean of the distribution
b
.
, find:
10 Let
be a random variable with a Poisson distribution, such that
to estimate
, giving your answer to three significant figures.
. Use technology
11 The number of emails Sarah receives per day follows a Poisson distribution with mean . Let
be the number of emails received in one day and
the number of emails received in a sevenday week.
a Calculate
and
.
b Find the probability that Sarah receives
c Explain why this is not the same as
emails every day in a seven-day week.
.
12 The number of mistakes a teacher makes while marking homework has a Poisson distribution
with a mean of
errors per piece of homework.
a Find the probability that there are at least two marking errors in a randomly chosen piece of
homework.
b Find the most likely number of marking errors occurring in a piece of homework. Justify your
answer.
c Find the probability that in a class of
marking.
students fewer than half of them have errors in their
13 A car company has two limousines that it hires out by the day. The number of requests per day
has a Poisson distribution with mean
requests per day.
a Find the probability that neither limousine is hired on any given day.
b Find the probability that some requests have to be denied on any given day.
c If each limousine is to be used equally, on how many days in a period of
expect a particular limousine to be in use?
14 The random variable
follows a Poisson distribution with mean . Given that
, find the exact value of .
15 The random variable
follows a Poisson distribution with mean .
a Show that
b Given that
.
, find the value of
such that
.
days would you
Section 2: Using the poisson distribution in hypothesis tests
If it is known that a variable follows a Poisson distribution you can use data to make inferences about the
value of the mean. To do this you use a hypothesis test. First, you need to work out the -value – the
probability of getting the observed result or more extreme, assuming that the null hypothesis is true. You
can then compare this to the significance level to determine whether or not to reject the null hypothesis.
For a one-tailed test, compare the calculated probability to the significance level directly. For a two-tailed
test, you usually find the probability of one tail and compare it to half of the significance level.
WORKED EXAMPLE 2.4
The number of telephone calls received by a company follows a Poisson distribution. Over long
experience it is thought that the mean is calls per hour. After a redesign of their website it is
found that they got
calls in an hour. Test at the
significance level if this provides significant
evidence of a change in the mean number of calls per hour.
It is a two-tailed test because you are looking for a
change in either direction.
If
Calculate the probability of the observed outcome or
more extreme, assuming that
is true.
This is more than
.
so do not reject
Compare the upper tail to half of the significance value,
since this is a two-tailed test. If you want the -value,
double the probability to get a -value of
There is insufficient evidence to
suggest that the mean number of
calls has changed from per hour.
Write a conclusion within the context of the question.
WORK IT OUT 2.2
is the random variable ‘number of absences per day in a school’. It is thought to follow a
Poisson distribution with mean . Following a change in the registration system, the number of
absences over five days was . Test at the
significance level if the change in the registration
system has affected the average rate of absences. Which is the correct solution? Identify the
errors made in the incorrect solutions.
A
,
Under
,
.
If there are
absences over five days, this is a rate of eight per day, so you need
. This is more than
so you cannot reject
. The average rate is
absences per day.
B
,
Let
be the number of absences in five days. Under
,
.
. Since this is a two-tailed test you must double this to get a -value of
. This is less than the significance level so you can reject
. There is evidence at the
significance level that the average rate has changed from
absences per day.
C
,
.
so reject
.
EXERCISE 2B
1
Conduct these hypothesis tests based on the given observation. You can assume that the data follow
a Poisson distribution. Use the
a
significance level.
i
ii
b
i
ii
c
i
ii
d
i
ii
2
Find the critical region (the set of values for which the null hypothesis is rejected) at the
significance level if:
a
i
ii
b
i
ii
c
i
ii
3
.
It is known that a sample of radium emits
alpha particles per millisecond. A second sample of the
same size and shape emits alpha particles in a millisecond. Test at the
significance level whether
this sample has the same emission rate as radium.
4
a Over a long period it is believed that the average number of cars travelling past a traffic light
follows a Poisson distribution, with
cars per minute. After some roadworks, it is thought that the
number of cars passing is lower. In a one minute observation only cars pass the traffic light. Find
the -value of this observation and hence decide at the
caused a decrease in traffic levels.
significance level if the roadworks have
b Suggest two reasons why a Poisson distribution might not be appropriate.
5
a The number, , of accidents per month on a road is studied. The mean number of accidents per
month is
with standard deviation
. Explain why this supports the suggestion that the number
of accidents follows a Poisson distribution.
b Assume that
does indeed follow a Poisson distribution. It is thought that adding a speed camera
will reduce the average number of accidents from
. In the month after the camera was added
there were accidents. Test at the
significance level if this is evidence of a reduction in the
average number of accidents.
6
The numbers of mistakes in nine pieces of a student’s homework are shown:
a Estimate the mean and standard deviation of the number of mistakes, based upon these data.
b Hence explain why the Poisson distribution is a plausible model.
c After a study skills session the student produced a piece of work with mistakes. You can assume
that the number of mistakes does follow a Poisson distribution. Test at the
significance level if
the mean number of mistakes is lower than the value found in part a.
7
The number of bees visiting a flower is thought to follow a Poisson distribution with mean
minute.
per
a Describe in context two conditions that must be met for the Poisson distribution to be an
appropriate model for the arrival of bees.
b After a new hedge has been planted it is thought that the number of bees arriving will increase. In
minutes
bees visit the flower. Test at the
significance level if there is evidence that the
number of bees has increased.
8
The number of leaks in a pipe is known to follow a Poisson distribution with mean
leaks per km.
After the water pressure was changed, an inspection of
of pipe revealed
leaks. Has there been
a change in the mean number of leaks? Test, using the
significance level.
9
It is known from long experience that earthquakes occur in a particular town once every four months.
Environmentalists believe that a change in the way oil is extracted from a well will increase the
number of earthquakes. They monitor the activity for one year and six earthquakes occur.
a Test at the
significance level whether the number of earthquakes has increased from the longterm trend, stating your -value.
b They continue to monitor earthquake activity and the following year six earthquakes also occur.
Test at the
significance level whether the number of earthquakes has increased from the longterm trend, stating your -value.
10 The discrete random variable
follows
. A single observation is used to test
against
. What is the smallest value of for which
will be rejected at the
significance level
when the observation is
?
Checklist of learning and understanding
The Poisson distribution is commonly used when these conditions hold:
the events occur singly (one at a time)
the events are independent of each other
the average rate of events (conventionally called ) is constant.
If
, then:
for
If
,
and
, then
You can use the Poisson distribution to conduct a hypothesis test to see if it suggests that the
mean rate has changed.
Mixed practice 2
1
The number of complaints in a shop in any hour while it is open follows a Poisson distribution
with mean
per hour. Find the probability that in a three-hour shift there are fewer than
complaints, giving your answer to three significant figures. Choose from these options.
A
B
C
D
2
A random variable
follows a Poisson distribution with standard deviation . Find
three significant figures. Choose from these options.
to
A
B
C
D
3
The random variable is the number of robins that visit a bird table each hour. The random
variable is the number of thrushes that visit a bird table each hour. These are the only types
of bird that visit the table.
It is believed that
and
.
is the random variable ‘Number of birds visiting the table each hour’.
a Stating a necessary assumption, write down the distribution of
.
b Find the probability that no birds visit the table in one hour.
c Find
4
is the random variable ‘number of burgers ordered per hour in a restaurant’. It is thought
that
.
a Write down two conditions required for the Poisson distribution to model data.
b Find
c During a ‘happy hour’ special offer the number of burgers sold increased to . Test at the
significance level whether the special offer has increased the average rate of burgers
ordered from
.
5
Salah is sowing flower seeds in his garden. He scatters seeds randomly so that the number of
seeds falling on any particular region is a random variable with a Poisson distribution, with
mean value proportional to the area. He intends to sow fifty thousand seeds over an area of
.
a Calculate the expected number of seeds falling on a
b Calculate the probability that a given
6
a If
write down
and
region.
area receives no seeds.
.
b Hence find
7
where
Seven observations of the random variable
cable, are shown:
is the expected standard deviation of
.
, the number of power surges per day in a power
a Estimate the mean and standard deviation of
, based upon these observations.
b Use your answer to part a to explain why the Poisson distribution is a plausible model for
.
c When a new brand of cable is used it is observed that there are
power surges in five
days. Does this suggest that the new brand has a different average rate of power surges to
your answer in part a? Use a
8
significance level.
A receptionist at a hotel answers on average
phone calls a day.
a Find the probability that on a particular day she will answer more than
b Find the probability that she will answer more than
phone calls.
phone calls every day during a five-
day week.
9
During the month of August in Bangalore, India, there are on average
rainy days.
a Find the probability that there are fewer than seven rainy days during the month of August
in a particular year.
b Find the probability that, in ten consecutive years, exactly five have fewer than seven rainy
days in August.
10 The random variable
follows a Poisson distribution. Given that
, find:
a the mean of the distribution
b
.
11 a Given that
b
and
find the value of
. Find the possible values of
c If
and
such that
, express
.
.
in terms of .
12 A geyser erupts randomly. The eruptions at any given time are independent of one another
and can be modelled, using a Poisson distribution with mean
per day.
a Determine the probability that there will be exactly one eruption between
b Determine the probability that there are more than
eruptions during one day.
c Determine the probability that there are no eruptions in the
watching the geyser.
minutes Naomi spends
d Find the probability that the first eruption of a day occurs between
e If each eruption produces
and
and
litres of water, find the expected volume of water produced
in a week.
f Determine the probability that there will be at least one eruption in at least six out of the
eight hours the geyser is open for public viewing.
g Given that there is at least one eruption in an hour, find the probability that there is exactly
one eruption.
13 In a particular town, rainstorms occur at an average rate of two per week and can be
modelled, using a Poisson distribution.
a What is the probability of at least eight rainstorms occurring during a particular four-week
period?
b Given that the probability of at least one rainstorm occurring in a period of
weeks is greater than
, find the least possible value of .
14 Patients arrive at random at an emergency room in a hospital at the rate of
complete
per hour
throughout the day.
a Find the probability that exactly four patients will arrive at the emergency room between
and
.
b Given that fewer than
arrive.
patients arrive in one hour, find the probability that more than
15 It is thought that
. A single observation of
takes the value . Does this provide
evidence at the
significance level that the average rate has decreased? Support your
answer by writing down the -value of the observation.
16 Based on long experience a gardener knows that birds tend to arrive in his garden at an
average rate of
per hour.
a State two assumptions required to model the birds’ arrival, using a Poisson distribution. Are
these reasonable assumptions?
b If these assumptions do hold, find the probability of observing more than
birds in an
hour.
The gardener plants some new flowers. He wants to know if this changes the birds’ behaviour.
c If is the true average rate of arrival of birds after the new flowers have been planted,
write down suitable null and alternative hypotheses for answering the gardener’s question.
d If
birds are observed in an hour, what is the conclusion of the test at
significance?
17 A water company believes that pipes have
leaks per km, following a Poisson distribution.
After increasing water pressure they are concerned that there are more leaks. They find
leaks in a
section of pipe. Does this provide significant evidence at the
significance
level to suggest that the mean number of leaks has increased?
18 A shop has four copies of the magazine Ballroom Dancing delivered each week. Any unsold
copies are returned. The demand for the magazine follows a Poisson distribution with mean
requests per week.
a Calculate the probability that the shop cannot meet the demand in a given week.
b Find the most probable number of magazines sold in one week.
c Find the expected number of magazines sold in one week.
d Determine the smallest number of copies of the magazine that should be ordered each
week to ensure that the demand is met with a probability of at least
.
19 Annette is a senior typist and makes an average of
typist and makes an average of
mistakes per letter. Bruno is a trainee
mistakes per letter. Assume that the number of mistakes
made by any typist follows a Poisson distribution.
a Calculate the probability that on a particular letter:
i Annette makes exactly three mistakes
ii Bruno makes exactly three mistakes.
b Annette types
of all the letters.
i Find the probability that a randomly chosen letter contains exactly three mistakes.
ii Given that a letter contains exactly three mistakes, find the probability that it was typed
by Annette.
c Annette and Bruno type one letter each. Given that the two letters contain a total of three
mistakes, find the probability that Annette made more mistakes than Bruno.
20 The number of worms in a square metre in a forest satisfies the distribution
. A scientist
samples many square-metre areas but only records areas where some worms are observed.
What is the mean value of her observations?
21 Mohammed is offered a week’s trial with a view to being permanently employed to service
bicycles in Robyn’s bicycle shop.
The number of bicycles brought in to be serviced can be modelled by a Poisson distribution
with mean
per day.
a Find the probability that, on Mohammed’s first day, the number of bicycles brought in to be
serviced is:
i
or fewer
ii more than
iii exactly .
b Before starting work, Mohammed told his mother that he hoped that, during his first week (
days), the number of bicycles brought in to be serviced would be:
at least , otherwise Robyn might decide that there was not enough work to justify
permanently employing him
not more than , so that he would not have to work too hard.
Find the probability that Mohammed’s hopes will be met.
[© AQA 2011]
22 At a Roman site, coins are found at an average rate of
coin per
number of coins found can be modelled by a Poisson distribution.
a Determine the probability that, in an area of
i at most
coins are found
ii exactly
coins are found.
b Determine the probability that more than
. Assume that the
:
coins are found in an area of
.
c Bronze brooches are less common than coins at this site, and are found at an average rate
of brooch per
. The number of these brooches found is independent of the number of
coins found. Assume that the number of bronze brooches found can also be modelled by a
Poisson distribution.
i Determine the probability that the total number of coins and bronze brooches found in
an area of
is at least .
ii Sometimes, Romans buried a hoard of several coins together. They did not usually bury
several bronze brooches together. State, with a reason, which of
the number of coins found or
the number of bronze brooches found
is likely to be better modelled by a Poisson distribution.
[© AQA 2013]
3 Chi-squared tests
In this chapter you will learn how to:
check if two variables are dependent.
If you are following the A Level course, you will also learn how to:
use Yates’ correction, a way of improving the method for checking if two variables
are dependent.
Before you start…
A Level Mathematics
Student Book 1, Chapter
22
You should know how to
conduct hypothesis tests.
1 A coin is tossed
times and
heads are observed. Does this
provide evidence (at a
significance level) that the coin is
biased towards heads?
A Level Mathematics
Student Book 1, Chapter
21
You should know how to
calculate probabilities for
independent events.
2 The probability of Andrew scoring
a goal is
and the probability of
Helen scoring a goal is
. Given
that these outcomes are
independent, what is the
probability that they both score a
goal?
A Level Mathematics
Student Book 2, Chapter 3
You should know how to
evaluate expressions
including the modulus
function.
3 Evaluate
.
Independent?
One common question you can ask in a statistical situation is whether or not two variables are dependent –
for example, do future earnings depend on A Level choices? In this chapter you will look at a statistical test
to answer this type of question.
Did you know?
You might have already met a test to see if two variables are correlated. This is related to
independence, but it is not quite the same. For example, the scatter graph shows the results of a
psychology experiment where people are asked to estimate the size of an angle, and the time
taken for them to do so is measured.
The two variables are not correlated (there is no linear trend) but they are not independent –
people who spend longer making the estimate seem to have more tightly clustered estimates.
Estimated angle
(degrees)
90
80
70
60
50
0
5
10
Time (s)
15
It turns out that if two variables are independent then they will definitely be uncorrelated, but the
reverse is not true. You can write:
Section 1: Contingency tables
In this section you will try to design a hypothesis test that decides whether two variables are dependent:
: The two variables are independent.
: The two variables are dependent.
Tip
Choose
to be that the variables are independent because you can use that to calculate
expected values. You cannot use the fact that two variables are dependent to calculate expected
values unless you are given more information about what that dependence is.
To describe the two variables you use contingency tables that list how often each combination of
variables occurs. For example, this table illustrates the results of a survey of young families. The observed
value in cell is called .
Number of children
or more
or fewer
Number
of
bedrooms
or more
This is a
contingency table. Notice that each cell contains actual frequencies rather than probabilities
or proportions.
You are going to need a way of measuring how far this is away from the numbers you would expect if the
two variables were independent. To do this you look at the totals.
Number of children
or more
Total
or fewer
Number
of
bedrooms
or more
Total
Based on the sample, the probability of having
children is
bedrooms or fewer is
and the probability of having
. If the two variables are independent, the probability of both occurring is the product of
these probabilities, so the probability of two children and two bedrooms is
you would then expect there to be
The expected frequency in cell
is called
. In a sample of size
families with two children and two bedrooms.
.
Tip
Expected frequencies do not have to be whole numbers.
Key point 3.1
In a contingency table, the expected frequency in cell
is
You can create another contingency table containing all the expected frequencies.
Number of children
or more
or fewer
Number
of
bedrooms
or more
There are several possible measures of the difference between observed and expected values. The
measure you need to know is called chi-squared
.
Tip
Notice that the row totals and the column totals are the same as for the original data. This is a
useful check.
Key point 3.2
The chi-squared value that gives the difference between the observed values,
expected values, , is
, and the
For the given data,
Large values indicate a big difference between observed and expected data. Is this value large enough to
conclude that number of bedrooms and number of children are not independent? To decide, you need to
know the distribution of
to see how likely the observed value is. This distribution has a single
parameter that depends on the number of cells in the table and that, for historical reasons, is called the
degrees of freedom – often given the symbol (lowercase Greek letter ‘nu’) or DF.
Key point 3.3
In an
by
contingency table, the number of degrees of freedom is
If the null hypothesis is true (that the variables are independent)
squared distribution with
degrees of freedom – the
approximately follows the chi-
distribution. However, this approximation is only
valid if all the expected frequencies in the contingency table are greater than .
Key point 3.4
If the null hypothesis is true and the expected value
This will be given in your formula book.
for all , then the chi-squared value is
Tip
Only expected values need to be above . Observed values are irrelevant.
In the survey results, not all of the expected values are above . When this happens you need to combine
some rows or columns in a way that is sensible in context. The most obvious way with the example given is
to combine the ‘ children’ group with the ‘ or more children’ group.
Number of children
or more
or fewer
Number of
bedrooms
or more
You can then create the new contingency table of expected values.
Tip
You can find the expected values by adding up the corresponding expected values from the
original table. You don’t have to recalculate the frequencies, using Key point 3.1.
Number of children
or more
or fewer
Number of
bedrooms
or more
You can find the contributions of each cell to the total chi-squared value.
Number of children
or more
or fewer
Number of
bedrooms
or more
Totalling these contributions,
values given in the formula book.
and
. You can compare this value to critical
The highlighted value, 9.488, gives the critical value for a test at the
significance level with four
degrees of freedom. The column is headed
because
of chi-squared values with degrees of
freedom are below this value, therefore
are higher. The calculated value of
is higher than
you reject the null hypothesis and conclude that number of bedrooms and number of children are
dependent variables.
The contingency table showing the contributions of each cell to the
much larger value of
than others.
sum shows some cells have a
, so
You can use this to analyse which combinations of the variables are very different from the expected
frequencies. This can give you further insight into what is happening in the situation being investigated. In
the example you can see that two or more children in houses with four or more bedrooms makes the
largest contribution to the
. You could interpret this as meaning that large families preferring large
houses is a big factor in why number of children and number of bedrooms are dependent.
Tip
Some calculators allow you to do the chi-squared test automatically and provide you with the
-value. This alternative approach is acceptable.
WORKED EXAMPLE 3.1
Determine at the
significance level whether or not the colour of a car sold by a dealership is
independent of the gender of the purchaser.
Gender
Male
Female
Total
Blue
Red
Colour
Green
Silver
Total
Set up hypotheses.
: Gender and car colour are
independent.
: Gender and car colour are not
independent.
The expected values are:
Male
Blue
Red
Green
Female
Find expected values, using Key point 3.1. Check that
all the expected frequency values are above , which
they are in this case. Also check that the row and
column totals match the row and column totals in the
original table of observed values.
Silver
Find the chi-squared value, using Key point 3.2.
Find the number of degrees of freedom, using Key
point 3.3.
The critical value is
than
which is more
.
Therefore do not reject
; the data set
is consistent with gender and car colour
being independent.
Use the formula book to find the critical value.
Remember that
is a measure of distance
between observed and expected values, so your
calculated distance is less than the critical distance.
Worked example 3.2 shows how to deal with combining groups.
WORKED EXAMPLE 3.2
This contingency table shows the favourite sports played by different age groups in a sample at a
sports centre.
Age
Soccer
Favourite
sport
Basketball
Swimming
Tennis
a Test at the
significance level whether preferred sport and age are independent, showing the
contributions of each cell.
b Interpret your results in context.
a
: Age and preferred sport are
Set up hypotheses.
independent.
: Age and preferred sport are not
independent.
The expected values are:
Soccer
Find expected values, using Key point 3.1.
Check that all the expected values are above ,
which they are not in this case. Notice that the
row and column totals are the same as those in
the observed data table.
Basketball
Swimming
Tennis
Several cells in the
range have a
frequency less than , so combining this
column with the
column:
Soccer
Basketball
Swimming
Tennis
And the corresponding observed values
are:
Soccer
Basketball
The most obvious choice is to combine the
and
groups.
Swimming
Tennis
Use Key point 3.2.
The contributions of each cell are:
Soccer
Basketball
Swimming
Tennis
The critical value is
.
Therefore reject
; preferred sport and
age are dependent.
b The main contributions to
come from
basketball and swimming. It appears
swimming is more popular than would be
expected amongst younger children,
whilst basketball is more popular than
Compare
with the critical value from the
formula book and conclude.
Use the contributions of each cell to find the
most important factors.
would be expected amongst older
children.
You might have to construct a contingency table from given information.
WORKED EXAMPLE 3.3
A biologist claims that the mobility of fish is dependent on their breeding ground. A sample of
fish was taken, with equal numbers from each of the two breeding grounds, Ellesmere and Duxbury,
studied. A test was used to classify the fish as sedentary, normal or highly mobile. In Ellesmere half
of the fish were classified as highly mobile and one-fifth as normal. In Duxbury one quarter of the
fish were classified as normal and
Test the biologists claim at the
of the Duxbury fish were classified as sedentary.
significance level.
: Mobility and breeding ground are
independent.
First write the null and alternative hypotheses.
: Mobility and breeding ground are
dependent.
Observed values:
Sedentary
Normal
Highly
mobile
Ellesmere
Duxbury
Create a contingency table. There are
fish in
each location. Turn the given proportions in
each location into frequencies and use the fact
that each row adds up to
to complete the
table.
Expected values:
Sedentary
Ellesmere
Duxbury
Normal
Highly
mobile
You can use Key point 3.1 to calculate the
expected values. All expected values are larger
than so no combining is required.
Find the value of
, using Key point 3.2.
Find the number of degrees of freedom, using
Key point 3.3.
The critical value is
reject
so do not
. There is no significant evidence
that mobility depends upon breeding
ground.
EXERCISE 3A
1
Test these contingency tables to see if the two variables are dependent at the
significant level.
State carefully the number of degrees of freedom and the value of
. In part b, combine
suitable columns to make all expected frequencies greater than 5.
a
i
Exam grade
or
,
or
or
Mr Archer
Teacher
Ms Baker
Mrs Chui
ii
Time working
hours
hours
hours
Male
Gender
Female
b
i
Age
Social
media
followers
ii
Cost
Red
Colour Green
Blue
2
A Physics teacher wants to investigate whether or not there is any association between the
Physics grade her students get and the Mathematics course they study. She collects data for a
random sample of
students over several years. The results are given in this table.
or lower
Further Maths
Maths AS or
or
or
No Maths
a State the null and alternative hypotheses.
b Calculate the expected frequencies.
c Calculate the value of
d Test at the
and write down the number of degrees of freedom.
significance level whether the Physics grade is independent of the Mathematics
course studied. Show clearly how you arrived at your conclusion. Interpret your results in
context.
3
A random sample of
books was taken from a library. One third of the books were fiction and
the rest were non-fiction. The reading level of each book was assessed as elementary, moderate
or advanced.
of the non-fiction books were classified as advanced and
were classified as
elementary. One quarter of the fiction books were classified as elementary. There were the same
number of moderate fiction and moderate non-fiction books.
Conduct an appropriate test to determine, at the
significance level, if there is evidence to
suggest that reading level depends on whether a book is fiction or non-fiction.
4
James wanted to know whether people being late, early or on time to school depends upon their
mode of transport. This partially filled contingency table shows his results, based on asking
students.
Early
On time
Late
Total
Walk
Car
Other
Total
a Copy and complete the contingency table.
b Calculate the value of
for this data.
c Conduct an appropriate test at the
significance level to answer James’ question.
d What assumptions does James have to make in conducting this test?
5
The owner of a beauty salon wants to find out whether there is any association between the
number of times in a year people visit the salon and the amount of money they spend on each
visit. He collects these data for a random sample of clients.
Number of visits
Amount spent
per visit £
Is there evidence, at the
level of significance, that there is some association between the
number of visits and the amount of money spent? Interpret your result in context.
6
A drugs manufacturer claims that the speed of recovery from a certain illness is higher for people
who take a higher dose of their new drug. They provide these data for a sample of
patients.
No drug taken
days
days
days
Single dose
Double dose
days
Test whether there is evidence for the manufacturer’s claim at the
Interpret your results in context.
7
level of significance.
A company is investigating their gender equality policies. As a part of this investigation they
collect data on salaries, to the nearest pound, for a random sample of
this table.
Male
employees, as shown in
Female
£
£
£
£
£
a Assuming salary is independent of gender, calculate the corresponding expected frequencies.
b Carry out a suitable test to determine whether salary is independent of gender. State and
justify your conclusion at the
8
level of significance.
a Prove that
b A
, where
contingency table has
.
. Find the largest possible sample size.
c Find the largest sample size that will produce a significant result in the chi-squared test at the
significance level, assuming that all cells have an expected frequency of at least 5.
9
A researcher believes that these percentages are the true proportions of people voting for
different political parties based on their gender:
Male
Female
Party A
Party B
Party C
a Show that gender and voting intention are dependent.
b Show that if a sample of size
in the chi-squared test at
follows these proportions, it will not provide a significant result
significance.
c Calculate an estimate of the smallest sample size required to find significant evidence that
gender and preferred political party are dependent, using a
significance level.
d Explain why your answer to part c is only an estimate.
10 This contingency table has some blank spaces.
First factor
A
Second factor
B
C
Total
a Copy the table and fill in the blanks.
Total
b Hence explain why it can be said that this contingency table has
11 Explain why the formula for chi-squared contains:
a squaring before summing
b dividing by the expected value.
degrees of freedom.
Section 2: Yates’ correction
It turns out that when the number of degrees of freedom
(i.e. a × contingency table) then
the approximation that
is not very good. To improve upon this you use an alternative
formula, called Yates’ correction.
Key point 3.5
Yates’ correction:
Rewind
You met the modulus function,
, in A Level Mathematics Student Book 2, Chapter 3.
WORKED EXAMPLE 3.4
This contingency table shows the results of
gender.
people in a driving test, along with their
Gender
Male
Pass
Result
Test at the
Female
Fail
significance level if the outcome of the test is independent of gender.
: Gender and result are independent.
Set up hypotheses.
: Gender and result are not independent.
The expected values are:
Gender
Male
Result
Female
Find the expected values, using
Key point 3.1.
Pass
Fail
Use Key point 3.5.
so the critical value is
.
You cannot reject the null hypothesis; the test
outcome is independent of gender.
WORK IT OUT
3.1
Test at the
significance level if there is any association between teacher and test result:
Teacher
Mr A
Total
Mrs B
Pass
Result
Fail
Total
Which is the correct solution? Identify the errors made in the incorrect solutions.
A
If there is no association then each cell will be the same, so the expected values are:
Teacher
Mr
Mrs
Pass
Result
Fail
So
The critical value when
is
, so you can reject
; the result does not
depend on the teacher.
B
: The result depends on the teacher.
: The result does not depend on the teacher.
The expected values are:
Teacher
Mr
Mrs
Pass
Result
Fail
Using Yates’ correction:
The critical value is
, which is more than the calculated value, so do not reject
the result does depend on the teacher.
C
: Teacher and result are independent.
: Teacher and result are dependent.
The expected values are:
Teacher
Mr
Pass
Result
Fail
Using Yates correction:
Mrs
;
The critical value is
, which is more than the calculated value, so do not reject
the result and the teacher are independent.
;
Sometimes Yates’ correction only becomes clear after you combine rows and columns of a
contingency table.
WORKED EXAMPLE 3.5
This contingency table shows the location and ownership status of a random sample of
houses.
Owned outright
Owned with
mortgage
Rented
Urban
Rural
Does this sample provide evidence at the
on location?
significance level that ownership status depends
: Ownership status and location are
independent.
: Ownership status and location are
dependent.
First write the null and alternative
hypotheses.
Expected values:
Use Key point 3.1 to find the expected
values.
Owned
outright
Owned
with
mortgage
Rented
Urban
Rural
Combining the two owned categories
gives observed data:
Owned
Rented
Urban
Since there are two cells with an expectation
below , you must combine rows or columns.
The most reasonable combination here is the
two types of owned category.
Rural
The expected data is:
Owned
Rented
Urban
You do not need to use Key point 3.1 again.
You can just add the expected values of the
appropriate cells in the previous table.
Rural
Since there is now only one degree of
freedom, it is appropriate to use Yates’
correction.
The critical value is
, which is greater
than the observed value, so you do not
reject
. There is no significant evidence
that ownership status depends on
location.
EXERCISE 3B
1
Use Yates’ correction to test these contingency tables for evidence of association, using
significance.
a
i
ii
b
i
ii
2
Gregor Mendel, the founder of modern genetics, carefully observed peas and found these
results:
Wrinkled
Round
Yellow
Green
Show that the round or wrinkled appearance of the pea is independent of the colour at the
level of significance.
Did you know?
These results are actually suspiciously close to being perfect – some people believe
Mendel faked his results. However, it is not possible to conduct a hypothesis test to
check this. How is statistics used to check for authenticity in results? In particular, how
is Benford’s law used to check tax returns?
3
A scientist wanted to find out if a colleague could tell whether tea or milk was put in the cup first
when tea was prepared for her. The results are shown here.
Tea first
Milk first
Likes
Dislikes
Determine, at the
level of significance, whether the colleague’s enjoyment is independent
of whether tea or milk is added first.
Did you know?
‘The lady tasting tea’ was one of the experiments reported by eminent statistician
Ronald Fisher in his
book, The Design of Experiments. He used a variant on the chisquared test, called Fisher’s exact test.
4
This table shows the number of books in libraries in rural and urban locations.
Number of books
to
Rural
Urban
Conduct a test at the
significance level to determine if the number of books differs between
rural and urban libraries.
5
These data show the number of murders each year and the amount spent on horror films in the
cinema across the last
years in the UK.
Amount spent on horror films in million £
to
Number of
murders
Test at the
significance level to see if there is an association between the amount spent on
horror films and the number of murders each year. Does this provide evidence that watching
horror films encourages people to commit murder?
6
In
admissions to the six largest departments in Berkeley, a university in California, followed
this pattern.
Accepted
Rejected
Male
Female
a Conduct a chi-squared test at
significance to show that acceptance patterns depend on
gender. Is a higher percentage of men or women admitted? Is this evidence of bias?
b Data from six departments are shown here.
Men
Department
Admitted
Women
Rejected
Admitted
Rejected
Total
Total
Conduct a test at the
significance level to determine if acceptance patterns vary in
different departments. In how many departments is the proportion of men admitted higher
than the proportion of women admitted?
Did you know?
This effect is called Simpson’s Paradox. You have to be very careful when using
statistics to support arguments!
7
Explain why the null hypothesis in a chi-squared test cannot be ‘The two variables are
dependent’.
Checklist of learning and understanding
The
distribution provides a very important method for deciding if two variables are
independent.
If the variables are independent, you use the formula
to find the expected values in each cell.
The test statistic used is
is the number of degrees of freedom, calculated as (rows
)
(columns
If
, then
.
If
, then you use an alternative formula, called Yates’ correction:
).
Mixed practice 3
1
What is the number of degrees of freedom for a
contingency table?
Choose from these options.
A
B
C
D More information needed.
2
A
contingency table has all expected frequencies larger than
. What is the largest range of values of
not independent at
significance?
and a chi-squared value of
for which there is evidence that the two factors are
Choose from these options.
A
B
C
D
3
The area manager of a bank obtained information on
the bank during the previous two years.
randomly selected loans made by
The loan outcomes were categorised as ‘Satisfactory’ or as a ‘Bad debt’.
The loan recipient types were categorised as ‘Individual’, ‘Small business’ or ‘Large business’.
Recipient
Individual
Outcome
Small business
Large business
Satisfactory
Bad debt
Using a
distribution and the
level of significance, test whether the outcome of a loan is
independent of the type of recipient.
Interpret your conclusion in the context of the question.
[© AQA 2013]
4
This contingency table shows the data on hair colour and eye colour for a sample of
children.
Eye colour
Blue
Hair colour
Green
Brown
Brown
Blonde
a Assuming that hair colour and eye colour are independent, calculate the expected
frequencies.
b Calculate the value of the
freedom.
statistic for this data and state the number of degrees of
c Perform a suitable hypothesis test at the
level of significance to decide whether hair
colour and eye colour are independent. State your hypotheses and your conclusion clearly.
5
A nurse thinks that she has noticed that more boys are born at certain times of the year. She
records the data for babies born in her hospital in one year.
Spring
Summer
Autumn
Winter
Boy
Girl
Test at the
significance level whether her data gives evidence for any association between
the gender of the baby and the time of the year. You must show all your working clearly.
6
Find the value of the appropriate chi-squared test statistic (to three significant figures) for this
contingency table.
Choose from these options.
A
B
C
D
7
A large estate agency would like all the properties that it handles to be sold within three
months. A manager wants to know whether the type of property affects the time taken to sell
it. The data for a random sample of properties sold are tabulated here.
Type of property
Flat
Terraced
Semidetached
Detached
Total
Sold within three months
Sold in more than three months
Total
a Conduct a
test, at the
level of significance, to determine whether there is an
association between the type of property and the time taken to sell it. Explain why it is
necessary to combine two columns before carrying out this test.
b The manager plans to spend extra money on advertising for one type of property in an
attempt to increase the number sold within three months. Explain why the manager might
choose:
i terraced properties
ii flats.
[© AQA 2013]
8
Fiona, a lecturer in a school of engineering, believes that there is an association between the
class of degree obtained by her students and the grades they had achieved in A Level
Mathematics.
In order to investigate her belief, she collected the relevant data on the performances of a
random sample of
recent graduates who had achieved grades or in A level
Mathematics. These data are tabulated here.
Class of degree
Total
A Level grade
Total
a Conduct a
test, at the
level of significance, to determine whether Fiona’s belief is
justified.
b Make two comments on the degree performance of those students in the sample who
achieved a grade in Level Mathematics.
[© AQA 2012]
9
An organisation kept details of sideswipe accidents involving heavy goods vehicles (HGVs)
during 2006.
The type of each sideswipe accident was recorded as changing lane to the left, changing lane
to the right or overtaking moving vehicle.
The HGV involved was identified as either British registered (right-hand drive) or foreign
registered (left-hand drive).
The table summarises details for a random sample of
sideswipe accidents.
Type of sideswipe accident
Changing
lane to
the left
Changing
lane
to the right
Total
Overtaking
moving
vehicle
British registered HGV
Foreign registered HGV
Total
a
significance level, whether the type of sideswipe accident is
i Investigate, at the
independent of whether the HGV involved was British registered or foreign registered.
ii Describe any differences found in the type of sideswipe accident between British
registered and foreign registered HGVs.
b A further random sample of
serious HGV accidents was investigated. It was found that
of these involved drivers who were
years of age or younger. Of these
accidents,
resulted in prosecution for a driving offence. Of the other accidents, which involved drivers
over the age of
years,
resulted in prosecution for a driving offence.
i Form a
contingency table from this information.
test, at the
significance level, to investigate whether the age of the
ii Carry out a
driver is independent of whether a prosecution for a driving offence resulted.
Interpret your conclusion in context.
[© AQA 2011]
10 The director of a large company wants to know whether there is any association between the
ages of her staff and the departments they work in. The table shows the data for a sample of
employees.
Accounts
Personnel
Marketing
Communications
Perform a suitable test at the
level of significance to decide whether there is any
association between age and department.
11 The table shows the experience of a bank over a long period of the types of loan that they
give and whether they are repaid or defaulted (i.e. not repaid).
Repaid
Defaulted
Personal
Mortgage
Business
a Show that whether or not a loan gets repaid depends on the type of loan.
b A statistician wants to sample
loans at random. Show that
dependence between the two values, using
significance.
would not show
c Find the smallest whole number
for which the sample would be expected to show
dependence at
significance. You can assume that all expected values are above .
d State an additional assumption required for your calculations in parts b and c.
12 Research was carried out to investigate for a possible connection between weekly alcohol
consumption and development of Type 2 diabetes. In the research report, it was stated that a
sample of
women, aged between
and , was studied and that
of these women went
on to develop Type diabetes.
The women were categorised according to their average level of weekly alcohol
consumption. This was measured, in grams of alcohol per week, as ‘less than ’, ‘between
and ’ or ‘more than 30’.
The results are summarised in the table.
Type 2 diabetes
developed
Yes
No
Less than
Average level of weekly alcohol
consumption
Between
and
More than
a Test, at the
level of significance, whether the development of Type
diabetes is
independent of the average level of weekly alcohol consumption.
b A medical reviewer for a newspaper read the report and then stated that people should
increase their weekly alcohol consumption in order to decrease their chance of developing
Type diabetes.
Make two comments on his statement, referring to both the study and the sources of
association, if any, identified when carrying out the test in part a.
c In fact,
women were involved in the research but the frequencies in the resulting
contingency table had been divided by
in order to make the calculation simpler.
The test in part a was therefore repeated using the correct frequencies.
For this test, state:
i the critical value
ii the value of the test statistic
iii the conclusion.
[© AQA 2010]
4 Continuous distributions
In this chapter you will learn how to:
describe probabilities of continuous random variables
calculate expected statistics of continuous random variables and functions of
continuous random variables
find the median, mode and quartiles of continuous random variables
find the expected statistics of the sum of two continuous random variables
work with the sum of two normally distributed random variables.
If you are following the A Level course, you will also learn how to:
convert between probability density function,
, and cumulative probability
function,
use distributions of random variables that are part discrete and part continuous
use two new probability distributions – the rectangular and the exponential
use the cumulative distribution function to find the distribution of the function of a
random variable.
Before you start …
Chapter 1
You should know how to calculate
expectations and variances for
discrete distributions.
1 Find
, given that
has the distribution:
A Level Mathematics
Student Book 1,
Chapter 20
You should know the meaning of the
statistical measures covered in AS
Level Mathematics
2 Find the interquartile range
of:
A Level Mathematics
Student Book 1,
Chapter 14
You should know how to integrate all
functions from AS Level
Mathematics.
A Level Mathematics
Student Book 2,
Chapters 9 and 11
You should know how to integrate all
functions from A Level Mathematics.
4 Find
A Level Mathematics
Student Book 2,
Chapter 20
You should know how to use the
5 Given that
and
A Level Mathematics
Student Book 2,
Chapter 21
You should be able to perform
calculations with a normal
distribution.
rules of probability including
conditional probability.
From discrete to continuous
3 Find
6 Given that
find
, find
.
,
.
In Chapter 1 you saw that being able to describe random variables allowed you to make predictions about
their properties. However, a major limitation was that the methods in chapter 1 only applied to discrete
variables. In reality, many variables you are interested in, such as height, weight and time, are continuous
variables. In this chapter you will extend the methods from chapter 1 to work with continuous random
variables.
Section 1: Continuous random variables
Consider these data for the masses of several bags of rice labelled ‘
Mass
’.
Frequency
Not all of the data in category
has a mass of exactly
. A bag with mass
or
would be included in this category. It is impossible to list all the different possible actual masses, and it is
impossible to measure the mass absolutely accurately. When you collect continuous data, you have to put
it into groups. This means that you cannot talk about the probability of a single value of a continuous
random variable (CRV). You can only talk about the probability of the CRV being in a specified interval.
5.05
x
5.15
5.0879546
A useful way of representing probabilities of a CRV is as an area under a graph. The probability of a single
value would correspond to the ‘area’ of a vertical line, which would be zero. However, you can find the area
of the CRV in any interval by integration.
The function which you have to integrate is called the probability density function (PDF), and it is often
denoted
. The defining feature of
is that the area between two values is the probability of the CRV
falling between those two values.
Tip
For a continuous random variable, it does not matter whether you use strict inequalities
or inclusive inequalities
.
Key point 4.1
For a continuous random variable
with probability density function
:
y
y = f(x)
P(a < X < b)
O
a
b
x
As with discrete probabilities, the total probability over all cases must equal . Also, no probability can ever
be negative. This provides two requirements for a function to be a probability density function.
Key point 4.2
For
to be a probability density function, it must satisfy:
, for all
Tip
The limits
and
represent the fact that, in theory, a continuous random variable can take
any real value. In practice, the limits of the integral are set to the lowest and the highest value
the variable can take.
WORKED EXAMPLE 4.1
The continuous random variable shown has probability density function:
f(x)
O
1
x
a Find the value of .
b Find the probability of
a
being between
and
.
The total area is . The limits are and
non-zero between
and
.
b
because the PDF is only
Use the formula in Key point 4.1 and substitute the value of
found in part a.
EXERCISE 4A
1
For each of these distributions find the possible values of the unknown parameter .
a
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
ii
f
i
ii
g
i
ii
h
i
ii
2
In each part, a continuous random variable
has the given probability density function.
a
i Find
ii Find
b
i Find
ii Find
c
i Find
ii Find
3
.
In each part, a continuous random variable
a
i Find
if
ii Find
if
i Find
if
ii Find
if
b
has the given probability density function.
c
4
i Find
if
ii Find
if
A model predicts that the angle,
, an alpha particle is deflected by a nucleus is modelled by
the PDF
a Find the value of the constant
b
alpha particles are fired at a nucleus. Assuming that the model is correct, estimate the
number of alpha particles deflected by less than
5
.
The probability density function of finding a seed at a distance
. The minimum distance seed is found from the tree is
being found more than
6
A random variable
from a tree is proportional to
. Find the probability of a seed
from the tree.
has PDF
Find the exact value of
7
Given that the continuous random variable
has probability density function
, find the interquartile range of
8
9
A continuous random variable
a Find
in terms of
if
b Find
in terms of
if
The continuous random variable
.
has probability density function
has probability density function
otherwise. The probability of two independent observations of
Find the values of
10
for
both being above
and
is
.
and of .
The continuous random variable
has probability density function
. Find
has probability density function given by
for
.
11 The continuous random variable
. Prove that there is only one possible value of , and state its value.
Section 2: Expectation and variance of continuous random variables
The expressions for expectation and variance of continuous random variables all involve integration.
Key point 4.3
The expectation and variance of a continuous random variable are:
Tip
You might notice that the expressions for
and
look similar to those for discrete
random variables, but with integration instead of summation signs. This is because there is a link
between sums and integrals.
You need to evaluate these integrals over the whole domain of the probability density function.
WORKED EXAMPLE 4.2
A continuous random variable has pdf:
Find
and the standard deviation of
.
You can do the definite integration on your calculator.
To find the standard deviation you must first find
which requires you to find
.
It is also possible to find the median and mode for a continuous distribution.
The defining feature of the median is that half of the data should be below this value and half above. You
can interpret this in terms of probability.
Key point 4.4
If you represent the median of a continuous random variable with PDF
.
The mode is the value of
at the maximum value of
.
by
, then it satisfies
Common error
Don't forget to look at the end points of the function when finding the mode.
You can use similar ideas to find the quartiles (or any other percentile). For example, if
quartile and
is the upper quartile then
is the lower
Although the lower limit is written as minus infinity, in practice it starts from the lowest value for which the
probability density function is defined.
WORKED EXAMPLE 4.3
Find the median and mode of the random variable
For median
with probability density function
:
Use the formula in Key point 4.4 with the lower limit as .
Using your calculator:
This is a cubic equation. Use your calculator to solve it.
For the mode, check for a maximum point. This could be where the
derivative is zero or at an end point.
Hence the mode is .
The largest of these three numbers is
.
EXERCISE 4B
1
Find
function.
a
i
ii
b
i
ii
c
i
, the median of
, the mode of
and
if
has the given probability density
ii
d
i
ii
2
a Given that
, find
if:
i
ii
b Given that
, find
if:
i
ii
3
The continuous random variable
a Find the expected mean of
b Find
4
has pdf
.
.
A continuous random variable
has pdf
a Find the value of the constant .
b Find
5
.
Consider the function
a Show that, for all values of , the function
b The random variable
6
has probability density function
. Find
in terms of .
is a continuous random variable with probability density function
a Show that
b Given that
7
satisfies the conditions to be a PDF.
Given that
.
, find the exact value of .
, is a probability distribution, find
.
and prove that
Section 3: Expectation and variance of functions of a random variable
Linear transformations
Suppose the average height of students in a class was
and their standard deviation was
. If they
all stood on their
-high chairs then the new average height would be
, but the range, and any
other measure of variability, would not change, so the standard deviation would still be
. In other
words, if you add a constant on to a variable, you add the same constant on to the expectation, but the
variance does not change:
,
Rewind
You have already met this idea for discrete random variables in Key point 2.5. In this chapter
you extend it to continuous random variables.
If, instead, each student were given a magical growing potion that doubled their heights, the new average
height would be
. However, the range, along with any other measure of variability, would have
doubled, so the new standard deviation would be
. This means that their variance would change from
to
. In other words, if you multiply a variable by a constant, you multiply the expectation by
the constant and multiply the variance by the constant squared:
,
Common error
It is important to know that this only works for the structure
function. So, for example,
cannot be simplified to
or
, which is called a linear
.
Key point 4.5
For a random variable
then
with expectation
and variance
. If
and
are constants,
,
WORKED EXAMPLE 4.4
A
length of pipe is cut into a long pipe with average length
and standard deviation
.
The leftover piece is used as a short pipe. Find the mean and standard deviation of the short pipe.
Length of long pipe
Length of short pipe
Define your variables.
Connect your variables.
Apply Key point 4.5.
So the standard deviation of
General transformations
is also
.
Key point 1.6 stated that for a discrete random variable
.
You can extend this to continuous random variables by changing the probability into a probability density
function and integrating instead of summing.
When finding the variance you used the fact that
This can be generalised to any function of
.
.
Common error
You will always get a positive variance (since square numbers are always positive), even if the
coefficients are negative. If you find you have a negative variance, something has gone wrong!
Key point 4.6
If
is a continuous random variable with pdf
, then:
WORKED EXAMPLE 4.5
Given that the random variable
otherwise, find
has probability density function
and
because that is where
EXERCISE 4C
Given that
a
, find:
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
ii
2
Given that
and
.
Identify
1
for
.
, find:
. The limits are between
is not zero.
and
a
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
ii
3
Given that
find:
a
is a continuous random variable with PDF
for
and
otherwise,
i
ii
b
i
ii
c
i
ii
d
i
ii
4
The expected distance of a random taxi journey is
miles with standard deviation
miles.
The charge for a taxi journey is £ plus £ per mile (so that, for example, a
mile journey
would cost £
). Find:
a the expected value
b the standard deviation in the charge for a taxi journey.
5
The random variable
has
and
. Given that
and
, find
.
6
Daniel has hours of playtime each Sunday afternoon. In that time he either reads or plays
games. If the expected amount of time reading is hours with a standard deviation of
hours,
find:
a the expected amount of time playing games
b the standard deviation in the amount of time spent playing a game.
7
The side of a cube, , is a continuous random variable with pdf
for
otherwise.
a Find
.
b Find the expected volume of the cube.
Common error
Notice that the answer to part b is not the cube of the answer to part a.
and
8
The continuous random variable
and
has probability density function
for
has probability density function
for
otherwise.
Find:
a
b
c
9
The continuous random variable
and otherwise.
a Find
b Find
.
.
where
is a positive whole number.
Section 4: Sums of independent random variables
A tennis racquet is formed by adding together two components – the handle and the head. If both
components have their own distribution of length and they are combined together randomly then you have
formed a new random variable – the length of the racquet. It is not surprising that the average length of
the racquet is the sum of the average lengths of the parts, but with a little thought you can reason that the
standard deviation will be less than the sum of the standard deviations of the parts. To get extremely long
or extremely short tennis racquets you must have extremes in the same direction for both the handle and
the head. This is not very likely. It is more likely that:
both are close to average
an extreme value is paired with an average value
an extreme value in one direction is balanced by another.
Tip
The first of the results in Key point 4.7 is true even if
and
are not independent.
Key point 4.7
For independent random variables
expectation
and variance
with expectation
:
and variance
, and
with
The results in Key point 4.7 extend to more than two variables.
WORKED EXAMPLE 4.6
The mean thickness of the base of a burger bun is
The mean thickness of a burger is
with variance
The mean thickness of the top of a burger bun is
with variance
.
with variance
.
Find the mean and standard deviation of the total height of a whole burger in a bun, assuming that
the thicknesses of the individual parts are independent.
Define your variables.
Connect your variables.
Apply Key point 4.7.
So the standard deviation of
is
.
In Key point 4.7 it was stressed that
and have to be independent, but this does not mean that they
have to be drawn from different populations. They could be two different observations of the same
population, for example, the heights of two different people added together. This is a different variable
from the height of one person doubled. Use a subscript to emphasise when there are repeated
observations from the same population:
means adding together two different observations of
means observing
once and doubling the result.
The expectation of both of these combinations is the same:
. However, the variance is different.
From Key point 4.7:
From Key point 4.5:
So the variability of a single observation doubled is greater than the variability of two independent
observations added together. This is consistent with the earlier argument about the possibility of
independent observations cancelling out extreme values.
You can also combine the results in Key points 4.5 and 4.7 to look at other linear combinations of
independent random variables.
WORKED EXAMPLE 4.7
The volume
of lemonade Chris purchases at a supermarket is a discrete random variable with
mean
and standard deviation
. The volume
of lemonade Chris drinks on the
journey home is a continuous random variable with mean
and standard deviation
.
Assume that is independent of .
after the journey home.
is the random variable: volume of lemonade in ml remaining
a Find the expected mean and standard deviation of
b How realistic is the assumption that
.
is independent of ?
a
Write the required random variable in
terms of the other two random
variables.
You only have a rule for sums so you
need to write it as a sum.
Use Key point 4.7.
Use Key point 4.5.
Use Key point 4.7.
Use Key point 4.5.
Remember that variance is the square of
the standard deviation.
So the standard deviation of
is
b Although the two variables might reasonably be
thought to be independent there are also some
reasons to doubt this. If Chris is very thirsty he
might buy more lemonade and then drink more.
If Chris does not buy any lemonade then he
cannot drink any.
Worked example 4.7 illustrates that the theories of Key points 4.5 and 4.7 are applicable to both continuous
and discrete random variables, or indeed combinations of the two. It also highlights the counter-intuitive
fact that
.
EXERCISE 4D
EXERCISE 4D
1
Let
and be two independent variables with
the expectation and variance of:
a
,
,
and
. Find
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
.
ii
2
Let
and
be two independent variables with
,
,
and
. Find:
a
b
c
d
3
.
and
are two independent observations of the random variable
with
and
.
The sample mean,
.
, of these two observations is also a random variable defined by
a Show that
.
b Find
4
.
The average mass of a man in an office is
with standard deviation
. The average mass of a
woman in the office is
with standard deviation
. The empty lift has a mass of
the expectation and standard deviation of the total mass of the lift when women and
inside?
. What is
men are
5
A weighted dice has mean outcome with standard deviation . Brian rolls the dice once and doubles
the outcome. Camilla rolls the dice twice and adds the results together. Work out the expected mean
and standard deviation of the difference between their scores.
6
Exam scores at a large school have mean
and standard deviation . Two students are selected at
random. Find the expected mean and standard deviation of the difference between their exam scores.
7
Adrian cycles to school with a mean time of
Pamela walks to school with a mean time of
minutes and a standard deviation of
minutes and a standard deviation of
minutes.
minutes.
They each calculate the total time it takes them to get to school over a five day week. Find the
expected mean and standard deviation of the difference in the total weekly journey times, assuming
journey times are independent.
8
is the random variable mass of a gerbil. Explain the difference between
and
.
Section 5: Linear combinations of normal variables
Although the proof is beyond the scope of this course, it turns out that any linear combination of normal
variables will also follow a normal distribution. You can use the methods from Section 4 to find out the
parameters of this distribution.
Rewind
You studied the normal distribution in A Level Mathematics Student Book 2, Chapter 21.
Key point 4.8
If
and are independent random variables following a normal distribution and
, then also follows a normal distribution.
WORKED EXAMPLE 4.8
Given that
,
and
, find
.
Use Key point 4.5.
State the distribution of
.
Use your calculator to find the probability.
WORKED EXAMPLE 4.9
Given that
and four independent observations of
Express
are made, find
.
in terms of observations of
.
Use Key point 4.7.
State the distribution of
.
Use your calculator to find the probability.
Rewind
In Chapter 2 you met the idea that the Poisson distribution was scalable. You can now interpret
this as meaning that the sum of two Poisson variables is also Poisson. This is the only other
distribution in this course that has this property. However, it only applies to sums of Poisson
distributions – not to differences or multiples or linear combinations.
EXERCISE 4E
1
Given that
a
i
ii
and
, find:
b
i
ii
c
i
ii
d
i
ii
e
i
ii
f
2
i
, where
is the average of
ii
, where
is the average of
observations of
observations of
.
.
An airline has found that the masses of their passengers follow a normal distribution with mean
and variance
.
The masses of their hand luggage follow a normal distribution with mean
.
and variance
a State the distribution of the total mass of a passenger and their hand luggage and find any
necessary parameters.
b What is the probability that the total mass of a passenger and their luggage exceeds
3
Evidence suggests that the times Aaron takes to run
are normally distributed with mean
and standard deviation
. The times Bashir takes to run
are normally
distributed with mean
and standard deviation
.
a Find the mean and standard deviation of the difference
Bashir’s times.
b Find the probability that Aaron finishes a
between Aaron’s and
race before Bashir.
c What is the probability that Bashir beats Aaron by more than
4
?
?
A machine produces metal rods so that their lengths follow a normal distribution with mean
and variance
.
The rods are checked in batches of six, and a batch is rejected if the average length is less than
or more than
.
a Find the distribution, including any necessary parameters, of the mean of a random sample
of six rods.
b Hence find the probability that a batch is rejected.
5
The distribution of lengths of pipes produced by a machine is normal with mean
standard deviation
.
a What is the probability that a randomly chosen pipe has a length of
or more?
b What is the probability that the average length of a randomly chosen set of
type is
or more?
6
and
pipes of this
The masses,
, of male birds of a certain species are normally distributed with mean
and standard deviation
.
The masses,
, of female birds of this species are normally distributed with mean
standard deviation
.
a Find the mean and variance of
and
.
b Find the probability that the mass of a randomly chosen male bird is more than twice the
mass of a randomly chosen female bird.
c Find the probability that the total mass of three male birds and
independently) exceeds
.
7
female birds (chosen
A shop sells apples and pears. The masses, in grams, of the apples can be assumed to have a
distribution and the masses of the pears, in grams, can be assumed to have a
distribution.
a Find the probability that the mass of a randomly chosen apple is more than double the mass
of a randomly chosen pear.
b A shopper buys
apples and a pear. Find the probability that the total mass is greater than
.
8
The length of a corn snake is normally distributed with mean
The probability that a randomly selected sample of
is
.
corn snakes has an average of above
.
Find the standard deviation of the length of a corn snake.
9
a In a test, boys have scores that follow the distribution
. Girls’ scores follow
What is the probability that a randomly chosen boy and a randomly chosen girl differ in
scores by less than ?
.
b What is the probability that a randomly chosen boy scores less than three-quarters of the
mark of a randomly chosen girl?
10 The daily rainfall in Algebraville follows a normal distribution with mean
deviation
and standard
.
On a randomly chosen day, there is a probability of
that the rainfall is greater than
In a randomly chosen seven-day week, there is a probability of
is less than
.
a Find the value of
.
that the mean daily rainfall
and of .
b What assumption was required in performing this calculation? How reasonable is this
assumption?
11 Anu uses public transport to go to school each morning. The time she waits each morning for
the transport is normally distributed with a mean of
minutes.
minutes and a standard deviation of
a On a specific morning, what is the probability that Anu waits more than
minutes?
b During a particular week (Monday to Friday), what is the probability that:
minutes
i her total morning waiting time does not exceed
ii she waits less than
minutes on exactly
mornings of the week
iii her average morning waiting time is more than
minutes?
c Given that the total morning waiting time for the first four days is
probability that the average for the week is over
minutes.
minutes, find the
d Given that Anu’s average morning waiting time in a week is over
minutes, find the
probability that it is less than
Tip
Only consider the last day.
minutes.
Section 6: Cumulative distribution functions
In Key point 4.4 you saw a method for finding the median. This method can be generalised to find any
percentile, using a function called a cumulative distribution function. This function has many surprising
uses because, unlike a probability density function, it represents a real probability so can be combined
using laws of probability.
A cumulative distribution function (CDF) measures the probability of a random variable being less
than or equal to a particular value. Normally, if the probability density function is called
, the
cumulative distribution function is called
.
Key point 4.9
For a continuous distribution
Tip
The in the integral is a dummy variable. You could replace it with any other symbol. The
only real variable in this expression is the in the upper limit, which corresponds to the in
the left hand expression.
Since you can undo integration by differentiation, you can recover the probability density function from
.
Key point 4.10
Given that
is the cumulative distribution function, then you can find the probability
density function,
, using:
WORKED EXAMPLE 4.10
Find the cumulative distribution function, given that a continuous random variable
probability density function
If
If
.
If
for
and
has
otherwise.
State
when is below and above the range in which
is
defined. When is above
the probability of the random
variable being below is , because all observed values are
between and
.
:
Since there is no probability of the random variable being below
, the integral starts at
.
Once you have the cumulative distribution function you can use it to find the median, quartiles and
any other percentiles, since the th percentile is defined as the value such that
. i.e.
.
Rewind
You saw that you could do this without explicitly referring to a cumulative distribution
function, in Exercise 4B.
WORKED EXAMPLE 4.11
The continuous random variable
has cumulative distribution function
a Find the probability density function of
b Find the lower quartile of
.
PDF is the derivative of CDF.
a
and
otherwise.
b At the lower quartile:
is non-zero only if
Therefore
EXERCISE 4F
.
Lower quartile is
.
th percentile.
Decide which solution to choose.
EXERCISE 4F
1
Find the cumulative distribution function for each of these probability density functions, and
hence find the median of the distribution.
a
i
ii
b
i
ii
2
Given each continuous cumulative distribution function, find the probability density function and
the median.
a
i
ii
b
i
ii
3
Find the exact value of the
for
and
percentile of the continuous random variable
otherwise.
4
A continuous random variable has cumulative distribution function
a Find the value of .
b Find the probability density function.
c Find the median of the distribution.
that has pdf
Section 7: Piecewise-defined probability density functions
A probability density function can have different function rules on different parts of its domain. Such a
function is said to be defined piecewise. All the techniques from the Sections 1–6 still apply; however,
when evaluating definite integrals you need to split them into several parts.
Rewind
You already met this idea in the context of kinematics in A Level Mathematics Student Book 1,
Chapter 16.
WORKED EXAMPLE 4.12
A continuous random variable
a Sketch
has probability density function
.
b Find the value of .
c Find
a
.
f(x)
k
0
0
1
2
3
4
5
6
7
8
x
b Using the fact that the total
area under the graph of
must be :
The area is now made up of two separate parts, so you need
to work out the two separate areas and add these together.
c
Again, you need to split the integral for
into two parts.
You might be able to evaluate definite integrals on your
calculator.
WORKED EXAMPLE 4.13
A random variable
has probability density function
a Find the median of
.
b Find the cumulative distribution function of
a If the median is
If
.
, then
You don’t know whether the median is in
in
, so you need to try both cases.
:
or
Remember to check that any solution you find is
in the correct interval. In this case, neither of
these values can be the median.
If
:
You need to split the probability into two parts:
.
So the median is
b When
The median must be between
.
:
and .
You need to look at the two parts of the domain
separately.
When:
You need to split the probability into two parts to
use the different expressions.
Use
found.
Remember to write out the full expression for
.
So,
EXERCISE 4G
1
A continuous random variable
a Sketch the graph of
, which you have already
.
has probability density function
b Find the value of .
2
c Find the value of
such that
A random variable
has cumulative distribution function
a Find the median of
.
.
b Find the mean and the variance of
3
A continuous random variable
a Show that
.
has probability density function
.
b Write down the value of
.
c Find the upper quartile of
.
d Find the cumulative distribution function of
4
The continuous random variable
.
is defined by the probability density function
a Find the value of .
b Sketch the probability density function.
c Find
.
d Find the median of
5
Function
.
is defined by
a Show that
is a valid probability density function.
b Find the variance of a random variable
6
The continuous random variable
a Find the value of .
b Find the expectation of
.
whose probability density function is .
has probability density function
c Find the cumulative distribution function of
d Find the median of
.
e Find the lower quartile of
7
A continuous random variable
a Sketch
b Show that
c Find
.
.
.
.
d Find the exact value of
.
has probability density function
Section 8: Rectangular distribution
The rectangular distribution is related to the discrete uniform distribution. It is a distribution where
any equally sized part of the domain has an equal probability of occurring. It is defined by the
endpoints of the domain, and . The probability density function is a constant, and this constant must
be chosen so that the total area under the graph is .
Rewind
You met the discrete uniform distribution in Chapter 1.
Key point 4.11
If
follows a rectangular distribution between
and , then
for
.
Tip
The easiest way to get this result is not to use integration, but to realise that the graph forms
a rectangle with width
and total area .
b –a
1
b –a
Area = 1
a
b
You can find the mean of this distribution by using integration.
WORKED EXAMPLE 4.14
Prove that if
then
is a random variable following a rectangular distribution over
with
.
Use the definition of expectation from Key point 4.3.
Use the PDF of a rectangular distribution from Key point 4.11.
Use the laws of integration.
Use the difference of two squares.
Since
You can use a similar method to find the variance.
,
Fast forward
You are asked to prove this in question 6 in Exercise 4H.
Key point 4.12
Given that
is a random variable following a rectangular distribution over
:
WORKED EXAMPLE 4.15
When a measurement is quoted to the nearest
it is equally likely to be anywhere within
of the stated value. A large number of measurements of different objects, all of which round to
, are made and their accurate values noted.
a Find the probability that an object quoted as being
away from
to the nearest
is actually more than
.
b Find the standard deviation of the difference between the quoted value and the true value (with
quoted values below the true value giving a negative difference)
a
.
follows a rectangular distribution over
.
Required probability is
.
Define variables.
Identify the distribution.
Write the required distribution in
mathematical terms.
Use areas of rectangles rather than
integration.
b
Use Key point 4.5.
Use the formula for
4.12.
So, standard deviation =
EXERCISE 4H
=
from Key point
EXERCISE 4H
1
Find these probabilities. In parts a to d,
a
b
c
ii
;
i
;
ii
;
ii
d
i
ii
e
.
;
i
i
follows a rectangular distribution over
;
;
;
;
i When a measurement is quoted to the nearest cm it is equally likely to be anywhere within
of the stated value. Find the probability that a measurement quoted as being
to
the nearest cm is actually above
.
ii A car’s milometer shows the number of completed miles it has done. Jerry’s car shows
miles. What is the probability that it will show
miles in the next
miles?
2
Find the expected mean and standard deviation of:
a
b
i
given that it follows a rectangular distribution over
ii
given that it follows a rectangular distribution over
i the true value of a result quoted as being
to the nearest
ii the true age of a boy who (honestly) describes himself as eighteen years old.
3
A piece is cut off one end of a log of length
anywhere along the log, find:
. Given that the cut is equally likely to be made
a the probability that the length of the piece is less than
b the expected mean and standard deviation of the length of the piece.
4
A string of length
is randomly cut into two pieces. Find the probability that the length of the
shorter piece is less than
.
5
Five random numbers are selected from the interval
smaller than
.
6
a Prove, using integration, that the variance of the rectangular distribution between
b Hence prove that the ratio
7
. Find the probability that they are all
is independent of
and
is
and , stating its value.
A rod of length
is cut into two parts. The position of the cut is uniformly distributed along the
length of the rod. Find the mean and standard deviation of the length of the shorter part.
Section 9: Exponential distribution
When you model the waiting interval until a first success in a Poisson-type situation you can use the
exponential distribution. It is defined by the number of successes in a unit interval of time, , and it
is written as
. Since the waiting interval is a continuous variable, the probability distribution is
described using a probability density function.
Rewind
You met the Poisson distribution in Chapter 2.
Key point 4.13
Given that
, then:
You can find the mean and the variance of the exponential distribution by using integration.
Key point 4.14
Given that
, then:
Fast forward
You are asked to prove the formula for the variance in question 10 in Exercise 4I.
PROOF 5
Prove that if
, then
.
Start from the definition of expectation (Key point 4.3).
The integral starts from
probability distribution.
Identify:
as this is the lower limit of the
You need to use integration by parts. As usual when doing
integration by parts, start by identifying
So
... then find
So
Use
and
...
and .
When
the square bracket term is . It is less obvious
what happens when
, but it turns out that the
terms goes to zero faster than goes to infinity, so overall
it is zero at both limits.
When
When
,
,
.
.
WORKED EXAMPLE 4.16
The number of leaks in any
miles of pipes in a sewer system follows a Poisson distribution
with mean .
a Find the probability that the first leak will be found in the first half mile.
b Find the variance of the distance until the first leak is found.
a
Define variables.
Identify the distribution. Since the number of leaks in
miles follows a Poisson distribution, the distance until the
first leak will follow an exponential distribution. To find the
parameter, you need to find the number of leaks per unit of
distance (miles); here it is
.
Write the required probability in mathematical terms
and use the probability density function.
b
Use the formula for
from Key point 4.14.
You could also be asked to find a probability of a variable with an exponential distribution being greater
than a particular value. You can do this by integration, but it is useful to know the cumulative
distribution function. You can find this using integration. If
then
Rewind
You met integration by parts in A Level Mathematics Student Book 2, Chapter 11.
Key point 4.15
If
, then
.
The exponential distribution also has a property called memorylessness. Prior waiting does not change
how long you are likely to wait for an event. This means that as well as measuring the amount of time
until a first event, it also measures the interval between events, as shown in Worked example 4.17.
WORKED EXAMPLE 4.17
During the summer Tanis sneezes on average two times every hour.
a State an assumption that must be made to model the time until the next sneeze by using an
exponential distribution.
b Assuming that the time until the next sneeze can be modelled by an exponential distribution,
find the exact probability that Tanis goes more than ninety minutes after waking up before
sneezing.
a You must assume that
sneezes occur independently
of each other.
b
Define variables.
Identify the distribution. The exponential can be used with
any starting point, so the fact that it is time after waking
is not important.
Write the required probability in mathematical terms and
use the cumulative distribution function. Remember that
units are hours.
EXERCISE 4I
1
Find these probabilities.
a
if
i
if
ii
b
ii
c
if
i
if
seconds for an emission from a radioactive substance that emits three
i waiting more than
alpha particles per minute on average
ii waiting less than fifteen minutes for a bus that comes three times per hour on average.
2
Find the expected mean and standard deviation of:
a
i
ii
b
i the distance travelled in a car before reaching the first pot hole if pot holes along a certain
road are spread independently at an average rate of per kilometre
ii the time from the beginning of the day until the first phone call at a call centre that receives
an average of
calls per hour.
3
The number of emails Khaled receives in an hour follows a Poisson distribution with mean . What
is the probability that the next email arrives in less than minutes?
4
Birds arrive at a feeding table independently, at an average rate of
per hour.
a Find the probability that two birds will arrive in the next ten minutes.
b Find the probability that there is more than a ten-minute wait before the next bird arrives.
c Find the expected mean and standard deviation of the time (in minutes) spent waiting for a
bird.
5
When Ben walks down a particular street, he meets people he knows at an average rate of three
every 5 minutes. Different meetings are independent of each other. What is the probability that
Ben has to walk for more than minutes before he meets a person he knows?
6
The probability of waiting less than minutes for a bus is
. If the waiting time is modelled by an
exponential distribution, find the probability of waiting more than minutes.
7
The probability of waiting more than
minutes for a phone call is . Find an expression for the
mean waiting time for a phone call in terms of
modelled by the exponential distribution.
and , assuming the waiting time can be
8
Show that the probability of a variable with an exponential distribution
than its mean is independent of .
taking a value larger
9
The number of buses arriving at a bus stop in an hour follows a Poisson distribution with mean .
a Name the distribution which models the time, in minutes, Amanda has to wait until the next
bus arrives. State any necessary parameters.
b Given that Amanda has already been waiting for
wait at least
minutes.
minutes, find the probability that she has to
c Show that the answer in part b is the same as the probability that Amanda has to wait at least
minutes.
10 Prove that if
11
, then
.
is the number of successes that occur in one unit of time, so that
successes that occur in units of time.
a Write down the distribution of
.
b Find
, giving your answer in terms of
follows an exponential distribution
.
c Explain why
.
and .
.
d Hence prove that the probability density function of
is
.
is the number of
Section 10: Combining discrete and continuous random variables
It is possible for a random variable to be discrete in some parts of its domain and continuous in other
parts of its domain. For example, a doctor might measure the masses of babies less than
as
precisely as possible (creating a continuous part of the random variable) but masses above
might
be measured to the nearest
(creating a discrete part of the random variable).
If this is the case you apply all the rules learnt in this chapter and in Chapter 1 but using sums over the
discrete part of the random variable and integrals over the continuous part of the random variable.
Tip
Notice that in Worked example 4.18 the end point of the continuous part of the variable is a
part of the discrete random variable. You might worry about situations like this, but it is
perfectly possible to define random variables in this way.
WORKED EXAMPLE 4.18
The random variable
can only take the values
If
the variable has PDF
When
or
then
,
or .
.
.
a Find the value of .
b Find
.
a Total probability is:
So
therefore
The total probability is an integral over the
continuous range plus a sum over the discrete
range.
Use the fact that the total probability equals 1.
You need to split the expectation into an integral
over the continuous part of the variable and a sum
over the discrete part.
b
EXERCISE 4J
1
Find
a
b
and
for these mixed probability distributions.
i
for
ii
for
i
for
ii
for
for
for
for
for
c
d
2
i
for
for
ii
for
for
i
for
ii
for
The random variable
for
for
is defined for
and for
probability density function is given by
. Between
. It is also known that
and
is
the
. Find:
a the value of
b
c
d
3
.
The random variable
Between
that P
is defined for
and
is .
4
.
the probability density function is given by
a Find an expression for
b Given that
and for
. It is also known
in terms of .
, find
.
The mixed random variable
can take any value between
to . The distribution is defined by:
and
as well as integer values from
for
for
a Find the value of .
b Find
5
.
The mixed random variable
can take any values from
to
and the discrete values
and . It
has cumulative distribution function:
for
for
Find:
a
b
c
d
6
.
A mixed random variable can take the discrete values
and . It has cumulative distribution function:
for
for
Find:
a the values of
and
and
and continuous values between
b
7
.
The mixed random variable
and . Between
and
can take any values between
it has probability density function
Prove that there is only one possible value of
8
An athletics coach records the
and , and the discrete values
. When
or
.
and find its value.
time of a squad of junior sprinters. He records the time of
anyone who runs between
and
seconds as precisely as he can. Anyone who runs between
and
seconds gets their time recorded to the nearest tenth of a second. He models the time
recorded by this probability distribution:
for
for
a Find the value of .
b Find
, giving your answer to three decimal places.
c Find the standard deviation of
The true times
.
of the athletes have probability density function:
for
for
d Find the value of .
e Find
and comment on your answer in relation to part b.
Checklist of learning and understanding
The probability of a continuous random variable taking any single value is a meaningless
concept, but it is possible to work with the probability of it being in a given range. To do this
you use a probability density function such that the area under the curve represents
probability. The total area is therefore 1, and the function is never negative.
The summation formulae for the expectation of discrete random variables become integrals for
continuous random variables:
is still
The expectation and variance of a linear transformation are given by:
If
and
are independent random variables then
If
and
are independent random variables following a normal distribution and
, then also follows a normal distribution.
The expectation of a function of a continuous random variable is given by:
The cumulative distribution function gives the probability of the random variable taking a
value less than or equal to .
For a continuous distribution with PDF
:
The main uses of cumulative distribution functions are to find percentiles of a distribution and
to convert from a distribution of one continuous random variable to a distribution of a function
of that variable.
If
If
follows a rectangular distribution between
, then
for
and , then
Mixed practice 4
1
A continuous random variable has probability density function
value of
for
. Find the exact
.
Choose from these options.
A
B
C
D
2
and are independent random variables.
mean and standard deviation . Given that
?
has mean
and standard deviation . has
, what is the standard deviation of
Choose from these options.
A
B
C
D
3
The continuous random variable
has PDF
a Find the cumulative distribution function of
b Find
otherwise.
.
.
c Find
4
and
.
Given that
is a continuous random variable with PDF
, find:
a the value of
b the expectation of
c the variance of
5
.
The Jones’ expected spend on their garden is £
out of a bank account containing £
.
with a variance of £
. This is paid for
a What is the standard deviation in the amount remaining in the bank account after the
garden has been paid for?
However much the Jones’ spend on their garden, the Smiths will spend twice as much plus
£
.
b What is the expected amount that the Smiths will spend?
c What is the standard deviation in the amount that the Smiths will spend?
6
a If
is a continuous random variable with PDF
and
, find
the value of the constants
and .
b Evaluate:
i
ii
7
The continuous random variable
and
has probability density function
otherwise.
a Find the cumulative distribution function of
b Find the exact value of the median of
8
.
.
Given that the continuous random variable
otherwise, find the interquartile range of
has PDF
and
. You will have to make appropriate use of
technology.
9
The probability density function for the continuous random variable
is
for
and
otherwise.
a Find the value of .
b Find
.
c Find
.
d Find the exact value of
.
10 A doctor measures the masses of babies. If a baby has a mass between and
the mass is
recorded as accurately as possible. If the mass is between
and
the mass is recorded
to the nearest
. The doctor models the recorded masses using the random variable
with probability distribution defined by:
for
.
There are no masses recorded outside of the range from
to
.
a Write down the value of .
b Hence find the value of .
c Find
.
d Find
.
11 The time taken, in minutes, to wash the dishes is modelled by a random variable
expectation
and standard deviation .
with
The time taken, in minutes, to clean the table is modelled by a random variable
expectation and standard deviation .
with
In this model
and
are considered to be independent.
a Before leaving Hassan must wash the dishes then clean the table.
takes. Find the expectation and standard deviation of .
is the total time this
b When Alice visits the jobs can be shared. Hassan washes the dishes and Alice cleans the
table.
is the time Alice has to wait after finishing cleaning the table before they can
leave. Find the expected mean and standard deviation in .
c Is the assumption that
and
are independent reasonable in these situations?
d Hassan keeps a record of the total time he spends washing the dishes over days. He
assumes that the times taken each day are independent and models the total time in the
days using the random variable
the standard deviation of
?
. For what values of
will
be more than
times
12 The times Markus takes to answer a multiple choice question are normally distributed with
mean
consisting of
and standard deviation
questions.
. He has one hour to complete a test
Assuming the questions are independent, find the probability that Markus does not complete
the test in time.
13 The masses of men in a factory are known to be normally distributed with mean
and
standard deviation
. There is an elevator with a maximum recommended load of
.
With men in the elevator, calculate the probability that their combined mass exceeds the
maximum recommended load.
14 Davina makes bracelets by threading purple and yellow beads. Each bracelet consists of
seven randomly selected purple beads and four randomly selected yellow beads. The
diameters of the beads are normally distributed with standard deviation
. The average
diameter of a purple bead is
and the average diameter of a yellow bead is
. Find
the probability that the length of a bracelet is less than
.
15 The masses of the parents at a primary school are normally distributed with mean
variance
, and the masses of the children are normally distributed with mean
variance
. Let the random variable
chosen parents and the random variable
chosen children.
a Find the mean and variance of
represent the combined mass of two randomly
represent the combined mass of four randomly
.
b Find the probability that four children weigh more than two parents.
16 A random variable
has cumulative distribution function given by
The diagram shows the graph of
.
y
1
0.5
0
0
1
2
3
4
5
6
7
8
9
and
and
10
x
a Find
.
b Find the median of
.
c Find the probability density function for
d Show that the mean of
is
.
.
You are given that the variance of
is
.
e Find the probability that the mean of a random sample of
values of
is greater than .
17 The number of beta particles emitted by a radioactive substance follows a Poisson
distribution. The probability of observing no particles in hours is
a Find the expected waiting time until the first beta particle is observed.
b Find the probability of waiting more than
minutes to observe a beta particle.
c Given that no particles have been observed in the first
it takes more than hours to observe a beta particle.
18
minutes, find the probability that
is a continuous random variable following a rectangular distribution between
.
a Prove that
and , with
.
b Find the cumulative distribution function of
.
c Two independent observations of
are made. Find an expression for the probability that
the maximum of these two observations is less than where
.
19 The humidity of air is measured by a weather station. It can only take values from
inclusive.
It is modelled by a mixed random variable,
Between
and
to
, with these properties:
has PDF:
.
a Find the value of .
b Find
.
c Find the median of
.
20 The continuous random variable
has cumulative distribution function
. Find the probability that in four observations of
more than
two observations take a value of less than .
21 The continuous random variable
Find the values of , and .
has CDF
. The median of
is
.
22 The marks students scored in a Mathematics test follow a normal distribution with mean
and variance . The marks of the same group of students in an English test follow a normal
distribution with mean
and variance
.
a Find the probability that a randomly chosen student scored a higher mark in English than in
Mathematics.
b Find the probability that the average English mark of a class of
their average Mathematics mark.
23 The continuous random variable
a Show that
students is higher than
has probability density function:
.
b What is the probability that the random variable
has a value that lies between
and
?
Give your answer in terms of .
c Find the mean and variance of the distribution. Give your answers in terms of .
The random variable
represents the lifetime, in years, of a certain type of battery.
d Find the probability that a battery lasts more than six months.
A calculator is fitted with three of these batteries. Each battery fails independently of the
other two.
e Find the probability that at the end of six months:
i none of the batteries has failed
ii exactly one of the batteries has failed.
24 The random variable
has probability density function defined by:
a Sketch the graph of .
b Find the exact value of
.
c Prove that the distribution function , for
, is defined by
.
d Hence, or otherwise:
i find
ii show that the median,
, of
satisfies the equation
e Calculate the value of the median of
.
, giving your answer to three decimal places.
[© AQA, 2012]
25 The continuous random variable
a Sketch the graph of .
has probability density function defined by:
b Show that:
i
ii
.
c Hence write down the exact value of:
i the interquartile range of
ii the median,
, of
.
d Find the exact value of
.
[© AQA, 2011]
FOCUS ON … PROOF 1
Sums of discrete independent random variables
In this section, you will prove this important result:
Rewind
You studied discrete random variables in Chapter 1.
If
and
are discrete independent random variables, then
.
You need to know:
(Theorem 1)
(Theorem 2)
If
and
are independent random variables, then
(Theorem 3)
(Theorem 4)
A finite double sum of a sum can be split into two sums:
(Theorem 5)
PROOF 6
On each line, state which of these theorems are being applied.
1
Theorem ____
2
Theorem ____
3
Theorem ____
4
Properties of sums
5
Theorem ___ and
Theorem ___
6
Theorem ____
7
Theorem ____
QUESTIONS
Use techniques similar to those in Proof 6 to answer these questions.
and
random variables.
1
Prove that
2
a Prove that
b Hence prove that
3
Prove that
.
.
.
.
are independent discrete
FOCUS ON … PROBLEM SOLVING 1
Finding the parameters of a distribution
Often you are not told directly the parameters of a distribution, but have to infer them from given
information. If this is the case, sometimes the equations will be impossible to solve directly, so you have to
use technology to solve them.
WORKED EXAMPLE
In a Poisson distribution the probability of two events occurring is
. Find the probability of one
event occurring.
If
then
Write the information given in terms of
of the Poisson distribution.
, the parameter
0.3
0.2
(0.605, 0.1)
0.1
–1 O
0.5
1
1.5
So
2
(4.708, 0.1)
2.5
3
or
3.5
4
.
4.5
5
This equation is not solvable using standard functions, so
you can instead sketch it.
5.5
You can use graphing technology to find the intersection
points.
or
QUESTIONS
1
Given that
2
Given that
and
, find .
3
Given that
and
to
4
The probability of a biased coin showing a head is .
a In
and
, find
.
decimal places, find .
tosses, one head is observed. Show that the probability of this happening is
.
b In another
tosses, two heads are observed. Show that the probability of this happening, and the
observation in part a happening, is
.
c By using technology or otherwise, find the value of
heads in
tosses.
that maximises the probability of getting three
Did you know?
This type of method is called maximum likelihood estimation and is a very powerful tool in
advanced statistics. You could research and list the uses of this.
FOCUS ON … MODELLING 1
Situations for the Poisson distribution
The Poisson distribution is frequently used to model situations in which there is a rate of events. However,
it can be applied incorrectly because there are several conditions that must be met:
the process must be random, so that it is not totally predictable
there must be a constant average rate, not something that changes in different areas or over time
the events must be independent of each other.
QUESTIONS
To help you to understand the required conditions in context, here are some examples of real-life
situations. Comment on whether the Poisson distribution would be an appropriate model in each of these
situations. Where the Poisson is not appropriate, state which conditions are not met.
Did you know?
In several of the situations in which the Poisson conditions are not perfectly met, in reality
statisticians still use the Poisson model to make useful predictions. This is because all models are
imperfect and the errors in estimating the average rate might well be larger than the errors
caused by a weak dependency between the events. When interpreting models it is vital to
understand the sources and scales of uncertainty in the output.
1
The number of fish in a
2
The number of signals received in an hour by a mobile phone from a communication mast when a
signal is received every
seconds.
3
The number of beta particles emitted every minute by a radioactive substance that emits on average
beta particle every
seconds.
4
The waiting time for a bus when one arrives on average every
5
The number of errors in
6
The number of fish caught in ten hours in a small pond if an average of
7
The number of girls in
girls.
8
A binomial distribution
volume of an ocean where fish occur at an average rate of
.
minutes.
pages of a textbook if there is an average of
error on every
pages.
fish are caught every hour.
randomly selected people if it is expected that
when
per
is very large.
Tip
You might want to use technology to confirm your answer to question 8.
of the population are
5 Further hypothesis testing
In this chapter you will learn how to:
interpret the different types of errors that can be made while conducting
hypothesis tests, called type I and type II errors
calculate the probability of a type I error based on a Poisson distribution
calculate the probability of a type I error based on a binomial distribution.
If you are following the A Level course, you will also learn how to:
use a new type of hypothesis test for the mean, called a -test
calculate the probability of a type II error based on a Poisson distribution
calculate the probability of a type II error based on a binomial distribution
calculate the probability of type I and type II errors based on a normal distribution.
Before you start …
A Level Mathematics
Student Book 1,
Chapter 22
You should know how to calculate
unbiased estimates of the
population variance.
1 Find an unbiased estimate of the
variance of a population based
on this sample:
.
A Level Mathematics
Student Book 1,
Chapter 22
You should know how to conduct
hypothesis tests, using the binomial
distribution.
2 A six-sided dice is rolled five
times and three sixes are
observed. Test at the
significance level if this provides
evidence that the dice is biased
towards rolling more sixes than a
fair dice.
Chapter 2
You should know how to conduct
hypothesis tests, using the Poisson
distribution.
3 The number of bees arriving at a
flower is modelled by a Poisson
distribution. If six bees arrive in
one minute, does this provide
evidence at the
significance
level that the true mean is
greater than ?
A Level Mathematics
Student Book 2,
Chapter 21
You should know how to conduct
calculations with the normal
distribution.
4 If
A Level Mathematics
Student Book 2,
Chapter 22
You should know how to conduct
hypothesis tests with the normal
distribution.
5 A sample of
objects drawn
from a normal distribution with a
standard deviation of has a
, find
.
mean of
. Conduct a twotailed test at
significance to
decide if this provides significant
evidence of a change from a
mean of .
Rewind
You studied hypothesis testing using the normal distribution in A Level Mathematics
Student Book 2, Chapter 22.
Realistic hypothesis testing
When studying hypothesis testing, using the normal distribution (the -test), you might have wondered
about conditions required for using it. It is a test in which you are uncertain about the population mean but
you do know the population variance. This is not a situation that occurs very frequently. More often, you
need to use the sample to estimate the variance of the population. To do this, you use a -test.
One of the reasons hypothesis tests are so important in modern statistics is that they try to give a
probability of certain types of error. You will see in this chapter which types of errors are controlled and
which are not, and how to calculate their probabilities.
Section 1: -tests
In a -test to see if the population mean has changed from
you are testing the hypotheses:
,
You calculate the -score for
, your mean of a sample of size :
where
and are found from the sample while is the value in the null hypothesis and , the
population standard deviation, is assumed to be the same as a previously held value. You can use the
fact that
to do calculations with this statistic.
Tip
is a random variable representing the sample mean.
is a particular observation of this
random variable.
If you do not know (or have reason to believe that it has changed) you must instead estimate it
from the data. An appropriate way to estimate this is using the square root of the unbiased
estimate of the variance, . You can then construct a -score.
µ
x1 x2 x3
x
σ
S
-scores very rarely exceed (or go below
). However, if your sample just happens to have a very
small standard deviation – for example, if the sample is
and
in the graph shown – then the score can get quite large: it is not unusual for it to be around . This highlights that it does not follow
a normal distribution. The likelihood of getting a very tightly clustered sample depends on . At low
values of this possible clustering has a very big effect, but at large values of the sample standard
deviation is a very good approximation of the population standard deviation. This means that there
are lots of different -distributions, depending on the value of .
Tip
Although the -distribution gives the distribution of the random variable
is written with a lower case .
, conventionally it
Z-distribution
t-distribution with n = 7
t-distribution with n = 4
As the value of
grows, the -distribution gets closer and closer to the
-distribution.
Key point 5.1
The -test is based on the test statistic:
This will be given in your formula book.
You might wonder why there is an
in the formula in Key point 5.1. Rather than using the value of
to describe the -distribution, conventionally you use the degrees of freedom, . Because you fix
one parameter, the population mean, when you are doing a -test you use the formula
.
Tip
Some graphical calculators can perform a -test. You should state the test statistic, its
distribution and the -value from your calculator. If the mean and standard deviation of
the sample are not given in the question you should state those too: your calculator will
find them in the process of performing the -test.
To conduct a -test, you calculate the value of and then look up the critical value from the table
given in the formula book. This still requires some work as the information is given in terms of
cumulative probabilities. To do a one-tailed test with significance level , you look up the column
headed by
in the table. To do a two-tailed test with significance level , you look up the column
headed by
in the table. If the modulus of your -score is more than this value, you reject
1 –α
α
α
1 ––
2
α
–
2
α
–
2
WORKED EXAMPLE 5.1
The label of a pre-packaged steak claims that it has a mass of
steaks is taken and their masses are:
. A random sample of
.
Test at the
significance level whether the label’s claim is accurate, stating any
assumptions you need to make.
mass of a steak in
Define variables. You must use the -test since the
true variance is unknown, but to do this test the
underlying distribution must be normal.
Assume that
State the hypotheses.
State the test statistic and its distribution.
From your calculator:
Find sample statistics from your calculator. Since you
do not know the true population variance, you use an
unbiased estimate,
.
So
Calculate your -score.
The critical value is
.
To find the critical value when
two-tailed test you look in the
given in the formula book.
, therefore do not
reject
– there is no significant
evidence to doubt the label’s
claim.
and
for a
column in the table
Compare your calculated -score with the critical
value and conclude, putting the conclusion into
context.
Common error
Conclusions are often stated without a sense of statistical uncertainty. For example, it would
be wrong to state that the conclusion of the test in Worked example 5.1 is: ‘The label is
correct.’
EXERCISE 5A
1
In each of these situations it is believed that
is normally distributed. Decide the result of the
test if it is conducted at the
significance level.
a
;
i
;
ii
b
;
i
;
ii
c
i
Data:
Data:
ii
2
John believes that the average time taken for his computer to start is
seconds. To test his
belief, he records the times (in seconds) taken for the computer to start:
a State suitable hypotheses.
b Test John’s belief at the
significance level.
c Justify your choice of test, including any assumptions required.
3
Michael regularly buys
packets of tea. He has noticed recently that he gets more cups of
tea than usual out of one packet, and suspects that the packets contain more than
on
average. He weighs eight packets and finds that their mean mass is
and the standard
deviation of their masses is
.
a Find the unbiased estimate of the variance of the masses, based on Michael’s sample.
b Assuming that the masses are normally distributed, test Michael’s suspicion at the
significance.
level of
4
The crawling ages of
babies in a nursery are recorded. The sample has mean
months and
standard deviation
months. A parenting book claims that the average age for babies crawling
is months. Test at the
level whether babies in the nursery crawl significantly earlier than
average, assuming that the distribution of crawling ages is normal.
5
Penelope thinks that cleaning the kettle will decrease the amount of time it takes to boil (
seconds). She knows that the average boiling time before cleaning is
she boils the kettle times and summarises the results as:
seconds. After cleaning,
a State suitable hypotheses.
b Test Penelope’s idea at the
6
significance level.
A national survey of athletics clubs found that the mean time for a -year-old athlete to run
is
. A coach believes that athletes in his club are faster than average. To test his
belief he collects the times for
athletes from his club and summarises the results in this table.
Time,
Frequency
a Calculate an estimate of the mean time for the athletes in the club.
b Find an unbiased estimate of the population variance based on this sample.
c Test the coach’s belief at the
level of significance.
d State what assumption you have made about the distribution of the athletes’ times.
7
The lengths of bananas are found to follow a normal distribution with mean
Roland has
recently changed banana supplier and wants to test whether their mean length is different. He
takes a random sample of
bananas and obtains these summary statistics:
a State suitable hypotheses for Roland’s test.
b Test at the
significance level whether the data support the hypothesis that the mean
length of Roland’s bananas is different from
.
c Roland’s assistant Sonia suggests that they should test whether the mean length of bananas
from the new supplier is less than
.
i State suitable hypotheses for Sonia’s test.
ii Find the outcome of Sonia’s test at the
8
significance level.
The manufacturers of tins of soup claim that the tins contain, on average,
of soup. Aki
wants to test if this is an accurate claim. She samples tins of soup and finds that they have a
mean of
and an unbiased estimate of the population standard deviation of
.
a State appropriate null and alternative hypotheses.
b For what values of
will Aki reject the null hypothesis at the
significance level?
Section 2: Errors in hypothesis testing
Defining type I and type II errors
The acceptable conclusions to a hypothesis test are:
1 sufficient evidence to reject
at the significance level
2 insufficient evidence to reject
at the significance level.
It is always possible that these conclusions are wrong.
If the first conclusion is wrong – i.e. you have rejected
(spoken as ‘type one error’).
while it was true – it is called a type I error
If the second conclusion is wrong – i.e. you have failed to reject
when you should have done – it is called
a type II error (spoken as ‘type two error’).
Key point 5.2
In hypothesis tests, type I and type II errors can be summarised as:
Reality
is true
Claim
is false
is true
is false
type II error
type I error
WORKED EXAMPLE 5.2
In a court case, defendants are presumed innocent.
a What are the null and alternative hypotheses in this situation?
b What would a type I error be in this situation?
c What would a type II error be in this situation?
a
: Defendant is innocent.
: Defendant is guilty.
b A type I error would be saying that an innocent
person is guilty.
A type I error is rejecting
true.
c A type II error would be saying that a guilty
person is innocent.
A type II error is not rejecting
was false.
when it was
when it
Probability of type I errors
You cannot eliminate these errors, but you can find the probability that they occur. For a type I error to
occur, the test statistic must fall within the rejection region while
was true. The critical region is
designed so that this probability is the significance level.
Key point 5.3
In a hypothesis test, the probability of a type I error is equal to the actual significance level.
The phrase ‘actual significance level’ is used because you might find when testing a discrete random
variable that you cannot create a critical region that has exactly the desired significance level. If you are
asked to design a test with
significance, because of the discrete nature of the variables you might have
to create a critical level that in fact has an actual significance level near to
. Conventionally, you would
choose the largest significance level you can that is less than
.
You might not be told the significance level of a test, in which case you need to use a formula to calculate
it.
Key point 5.4
In a hypothesis test:
WORKED EXAMPLE 5.3
and these hypotheses are tested:
The null hypothesis is rejected if
. Find the probability of a type I error in this test.
Use the definition of a type I error.
Use the tables for the Poisson distribution from the
formula book.
WORKED EXAMPLE 5.4
Derren wants to test whether a six-sided dice is biased towards rolling sixes. He rolls a dice
times.
a State appropriate null and alternative hypotheses.
b If Derren is using a
significance level, how many sixes would he have to see to conclude at
significance that the dice is biased?
c For the test proposed in part b, find the probability of a type I error.
a If
is the probability of rolling a
b If
is the number of sixes rolled then if
is true,
then:
:
Use your calculator to work out the probabilities
from the binomial distribution. Since you are
looking for evidence of
, you need to find
as this is an event as extreme as
observed, or more extreme in the direction of the
alternative hypothesis, i.e. it is the -value of an
observed .
The dice can be said to be biased at the
significance level if or more sixes
are observed.
c The probability of a type I error is
.
You need to find the first value of
value less than
.
that has a -
You can just read this from the table. It is the
probability of observing five or more sixes while
the null hypothesis is true.
Probability of type II errors
If the true mean is anything other than that suggested by the null hypothesis and you have not
rejected the null hypothesis, then you have made a type II error (see Key point 5.2). The probability of
a type II error depends upon the true value of the population parameter. Suppose you are testing at
the
significance level the null hypothesis
with a standard deviation of . If the true mean
were
you would expect to be able to detect this very easily. If the true mean were
you might
have greater difficulty distinguishing this from
. If you knew the true mean, then you could find the
probability of an observation of this distribution falling in the acceptance region for
.
Rewind
You could also answer the type of test shown in Worked example 5.4 using a chi-squared
test – see Chapter 3 for a reminder of using a chi-squared test – with two categories: and
not . However, this would have only one degree of freedom so the fact that the chisquared test is only approximate is particularly problematic. It is preferable to use an exact
binomial test, as shown.
The following diagrams show the effect of different true population means on this test. In the first
diagram the true mean is
, so anything that falls in the red region gets this right, but falling in the
blue regions results in a type I error. In the second, third and fourth diagrams the true mean is getting
further and further away from
. Anything falling in the red regions picks this up, but anything falling
in the blue regions fails to detect that the true mean is not
. All of these are now type II errors.
Rejection
region
Acceptance
region
Rejection
region
H 0 is true
5% type I error
95% correctly not reject H
0
2.5%
95%
2.5%
x
µ = 120
H 0 is untrue
(small difference)
Type II error
Correctly reject H0
x
µ = 120.4
H 0 is untrue
(medium difference)
Type II error
Correctly reject H0
x
µ = 130
H 0 is untrue
(large difference)
Type II error
Correctly reject H0
µ = 160
x
Key point 5.5
In a hypothesis test:
WORKED EXAMPLE 5.5
Internet speeds to a household are normally distributed with a standard deviation of
.
The internet provider claims that the average speed of an internet connection has increased
above its long-term value of
. A sample is taken on occasions and a hypothesis test is
conducted at the
significance level. Find the probability of a type II error if the true average
speed is
.
is the continuous random variable speed
of internet connection
Define variables.
State hypotheses.
,
State the test statistic and its distribution
(assuming
is true).
x
a
9
Decide the range of
that falls into the tailed acceptance region.
So do not reject
if
.
State the acceptance region.
Use the definition of a type II error.
An important concept when studying tests is called the power of a test. It is defined as the
probability of rejecting
when it is false, so it is the probability of not making a type II error.
Key point 5.6
In a hypothesis test:
WORKED EXAMPLE 5.6
A call centre believes that it receives calls at an average rate of
per hour. To test this it looks
at the number of calls in a two-hour period. If that number is greater than
or lower than , it
rejects the hypothesis that the average rate is per hour. Given that the actual rate of calls is
calls per hour, find the power of the test.
number of calls in two hours
You are considering the actual rate rather than the
rate under the null hypothesis.
To find the probability of a type II error you look at how
likely
is to fall into the acceptance region.
EXERCISE 5B
1
Given that
a
i
ii
, find the probability of a type I error for each of these situations.
; reject
; reject
if
if
b
c
i
; reject
if
i
; reject
if
Given that
a
b
c
3
b
4
b
c
if
or
, find the probability of a type I error in each of these situations.
i
; reject
if
ii
; reject
if
i
; reject
if
ii
; reject
if
i
; reject
if
or
ii
; reject
if
or
, find the probability of a type I error for each of these situations.
i
; reject
if
ii
; reject
if
i
; reject
if
or
ii
; reject
if
or
i
with
,
ii
with
i
with
ii
with
i
with
ii
with
;
,
,
significance;
;
;
,
,
significance;
;
;
,
significance;
;
. In reality,
. In reality,
. In reality,
significance;
. In reality,
significance;
In reality,
significance;
.
.
.
. In reality,
.
.
.
Given that
, find the probability of a type II error for each of these situations. Find also
the power of the test.
a
b
c
i
b
c
if
; real
; reject
if
; real
i
; reject
if
; real
ii
; reject
if
; real
i
; reject
if
or
; real
or
; real
Given that
a
; reject
ii
ii
6
or
Find the probability of a type II error for each of these situations. The sample mean is being
tested and the sample size, , is specified in each case.
a
5
; reject
Given that
a
if
ii
ii
2
; reject
; reject
if
, find the probability of a type II error in each of these situations.
i
; reject
if
; true
ii
; reject
if
; true
i
; reject
if
; true
ii
; reject
if
; true
i
; reject
if
or
; true
ii
; reject
if
or
; true
7
What are the advantages and disadvantages of increasing the significance level of a hypothesis
test?
8
A television magician tries to trick an audience into believing that a coin is biased. He records
himself tossing a fair coin many hundreds of times until he tosses ten heads in a row. He then
shows the audience the film containing only the ten heads being tossed.
a State the null and alternative hypotheses in this situation.
b If an audience member believed that the coin is biased, is this an example of a type I or a
type II error?
9
A textbook says that there is positive correlation between two variables if the sample correlation
coefficient is more than
. Describe, in this context, what is meant by:
a a type I error
b a type II error.
10 A student conducts a binomial hypothesis test to see if a six-sided dice is fair. He rolls the dice
times and if he sees more than
sixes, he will claim that the dice is biased.
a Describe, in the context of this test, what is meant by:
i a type I error
ii a type II error.
b State two changes to the test that would make a type II error less likely.
11 The numbers of people arriving at a health club follow a Poisson distribution with mean
per
hour. After a new swimming pool is opened, the management want to test whether the number
of people visiting the club has increased.
a State suitable null and alternative hypotheses.
b They decide to record the number of people arriving at the club during a randomly chosen
hour, and to reject the null hypothesis if this number is larger than .
Find the significance level of this test. Comment on your result.
12 A long-term study suggests that traffic accidents at a particular junction occur randomly at a
constant rate of
per week. After new traffic lights are installed, it is believed that the number
of accidents has decreased. The number of accidents over a -week period is recorded.
a Let denote the average number of accidents in a -week period. State suitable hypotheses
involving .
b It is decided to reject the null hypothesis if the number of accidents recorded is less than or
equal to . Find the probability of making a type I error.
c The average number of accidents has in fact decreased to
per week. Find the probability
of making a type II error in this test.
13 The masses of eggs are known to be normally distributed with standard deviation
wants to test whether eggs produced by her hens have mass greater than
. Dhalia
on average.
a State suitable null and alternative hypotheses to test Dhalia’s idea.
Dhalia weighs
eggs and finds that their average mass is
.
b Test at the
significance level whether Dhalia’s eggs have mass greater than
average. State your conclusion clearly.
on
c Write down the probability of making a type I error in this test.
d What is the smallest average mass of the
hypothesis?
eggs that would lead Dhalia to reject the null
e Given that the average mass of Dhalia’s eggs is actually
14 A coin is flipped
times. It is decided that it is a biased coin if
, find the power of the test.
or
heads are observed.
a State the null and alternative hypotheses.
b Find the significance level of the test.
c Given that the true probability of flipping a head is , find the probability of a type II error as a
function of .
d Show that the probability of a type II error is maximised when
15 A population is known to have a normal distribution with a variance of
. It is proposed to test the hypotheses
.
.
and an unknown mean
using the mean of a sample of size
a Find the appropriate critical regions corresponding to a significance level of:
i
ii
b Given that the true population mean is
, calculate the probability of making a type II error
when the level of significance is:
i
ii
16 The number of worms in a square metre of forest is known to follow a Poisson distribution. The
mean is thought to be . This is rejected if no worms are observed when a square metre is
observed. If the true mean is , find an expression in terms of
for the power of this test.
Checklist of learning and understanding
A -test is a way of testing to see if a sample provides evidence of a change in the population
mean from a previously held belief. It is based on the -score:
A type I error is falsely rejecting
A type II error is not rejecting
.
.
when it is false.
Mixed practice 5
1
Find the value of the -score (to three significant figures) for these data when testing the
null hypothesis
.
Choose from these options.
A
B
C
D
2
What is the definition of the significance level in a hypothesis test of a continuous
parameter? Choose from these options.
A
B
C
D
3
A chemist collects data on the volume of
should be
.
produced in a reaction. He believes that it
a Write down null and alternative hypotheses for the chemist’s belief.
He measures the reaction
times and gets these results:
.
b Find the -score for these data.
c Hence conduct a -test at the
significance level.
d What assumption have you made in conducting a -test?
4
a Give the definition of a type II error.
A one-tailed -test is conducted for the hypotheses:
,
It is known that
It is decided to reject
if a mean of
observations is less than
.
b Find the probability of a type I error in this test.
In reality,
.
c Find the probability of a type II error in this test.
d What could be done to decrease the probability of a type II error without changing the
probability of a type I error?
5
The number of beta particles emitted in one second by an isotope of a radioactive element
is known to follow a
distribution. A theory suggests that
that this might be an underestimate.
but a physicist believes
a State the null and alternative hypotheses.
b The physicist decides that he will reject the null hypothesis if he sees more than
beta
particles in a five-second period. What is the significance level of this test?
c In reality,
6
. Find the power of this test.
A union representative wishes to test a company’s claim that it pays an average salary of
£
. She suspects that the company pays less than this.
a Write down null and alternative hypotheses for her test.
The union representative takes a random sample of employees and finds their wages
in thousands of pounds. Her results are summarised here:
b Find an unbiased estimate of the variance of
.
c What is the -score for her results?
d Conduct a -test at the
7
A coin is flipped
fewer than
significance level to test her suspicion.
times and it is decided that it is a biased coin if more than
heads or
heads are observed.
a State the null and alternative hypotheses.
b Find the significance level of the test.
c If the coin is actually biased so that heads occur
test.
8
of the time, find the power of the
Safeerah regularly cycles to and from work. She has a steel-framed bicycle that weighs
. Her mean journey time for the round trip is
minutes. Her friend, Josh, has a carbonframed bicycle that weighs
. Safeerah is thinking of buying a carbon-framed bicycle to
reduce her journey time, and Josh agrees to lend her his bicycle so that she can try it.
a The carbon-framed bicycle is sold using the slogan: ‘Less weight means more speed’.
Safeerah, who weighs
, is expecting that the
per cent reduction in bicycle mass
will substantially reduce her journey times. Josh tells her not to expect this as the
resultant mass reduction is actually closer to per cent.
Justify Josh’s figure of
per cent.
b Safeerah records her journey times with the carbon-framed bicycle on
typical days as:
Assuming that these times may be regarded as a random sample from a normal
distribution, test, at the
significance level, whether her mean journey time with the
carbon-framed bicycle is less than
minutes.
[© AQA 2013]
9
A company manufactures bath panels. The bath panels should be
deep, but a small
amount of variability is acceptable. The depths are known to be normally distributed with
standard deviation
.
a In order to check that the mean depth is
, Amir takes a random sample of bath
panels from the current production and measures their depths, in millimetres, with these
results.
Test whether the current mean is
, using the
significance level.
b Isabella, a manager, tells Amir that, in order to check whether the current mean is
, it is necessary to take a larger sample. Amir therefore takes a random sample of
size
from the current production and finds that the mean depth is
Test whether the current mean is
the
significance level.
.
, using the data from this second sample and
c It is proposed to carry out hypothesis tests at regular intervals to check that the mean
remains at
.
Amir proposes that the tests be based on random samples of size , but Isabella favours
random samples of size . Explain which, if either, sample size would lead to a smaller
risk:
i of a type I error
ii of a type II error.
[© AQA 2011]
10 A town council wanted residents to apply for grants that were available for home insulation.
In a trial, a random sample of
residents was encouraged, either in a letter or by a
phone call, to apply for the grants. The outcomes are shown in the table.
Applied for grant
Did not apply for grant
Total
Letter
Phone call
Total
a The council believed that a phone call was more effective than a letter in encouraging
people to apply for a grant. Use a -test to investigate this belief at the
significance
level.
b After the trial, all the residents in the town were encouraged, either in a letter or by a
phone call, to apply for the grants. It was found that there was no association between
the method of encouragement and the outcome. State, with a reason, whether a type I
error, a type II error or neither occurred in carrying out the test in part a.
[© AQA 2013]
6 Confidence intervals
In this chapter you will learn how to:
estimate the interval in which a population parameter lies, called a confidence
interval
estimate the confidence interval when the population variance is known
use confidence intervals to conduct hypothesis tests.
If you are following the A Level course, you will also learn how to:
find confidence intervals when the population variance is unknown, using the distribution.
Before you start…
A Level
Mathematics
Student Book 1,
Chapter 22
You should know
how to calculate
unbiased
estimates of the
population
1 Calculate an unbiased estimate of the variance of a
population based on this sample:
.
variance.
A Level
Mathematics
Student Book 2,
Chapter 21
You should know
how to conduct
calculations with
the normal
distribution.
2 Given that
, find
.
A Level
Mathematics
Student Book 2,
Chapter 22
You should know
how to conduct
hypothesis tests
with the normal
distribution.
3 A sample of
objects drawn from a normal
distribution with a standard deviation of has a
mean of
. Conduct a two-tailed test at
significance to decide if this provides significant
evidence of a change from a mean of .
Chapter 5
You should know
how to conduct tests.
4 Based on this sample, test to see if there is evidence
that
at the
significance level:
.
What is the best way of describing an estimate?
If you want to estimate a population parameter, which is better; having a single value that is very unlikely
to be correct, or having a range of values that is very likely to contain the population statistic? The latter is
usually preferable, and is called a confidence interval.
In this chapter you will learn how to construct confidence intervals for the mean in different situations and
see how such intervals can be interpreted.
Section 1: Confidence intervals
A single value calculated from a sample used to estimate a population parameter is called a point
estimate. You are trying to find an interval that has a specified probability of including the true population
value of the statistic you are interested in. This interval is called a confidence interval and the specified
probability is called the confidence level.
For example, given the data
, you can calculate the sample mean, which is
. However, it is
very unlikely that the mean of the population this sample was drawn from is exactly
. You will now
develop a method that will allow you to say with
confidence that the population mean is somewhere
between
and
. This does not mean that there is a probability of
that the true mean is between
and
, but rather that
of confidence intervals constructed from samples like this would contain
the true mean.
To develop the theory, you are going to look at creating
confidence levels, which are the default
choice. Suppose you are estimating the population mean, using the sample mean . Initially, you will only
consider random variables drawn from a normal distribution so that
mean (the thing you want to find) and
terms of
, where
is the standard deviation in one observation of
and , a region symmetrical about
that has a
2.5%
95%
Lower µ Upper
bound
bound
probability of containing
is the population
. You can find, in
.
2.5%
x
You can find the -score of the upper bound. Using the symmetry of the situation, you find that
distribution is above the upper bound, so the -score is
. You can say that:
You can use the fact that
of the
:
You can rearrange the inequalities to focus on :
Rewind
You saw in A Level Mathematics Student Book 2, Chapter 21, that
is the inverse
normal distribution that tells you the -score that results in the cumulative probability .
Be warned – although this looks like it is a statement about the probability of , in your derivation you
treated as a constant so it is meaningless to talk about a probability of . This statement is still
concerned with the probability distribution of .
So, if the sample mean is , your
confidence interval for
is:
Tip
The quantity
is sometimes referred to as the standard error.
You can generalise this method to other confidence levels. To find a
confidence interval you can find the
critical -score geometrically, using the properties of this graph.
c% c%
2
2
q
Z
x
50%
From this diagram you can see that the critical -score is the one where there is a probability of
being below it.
Tip
This process creates a symmetric interval around the sample mean. It is also possible to create a
non-symmetric interval, but that is beyond the scope of this course.
Tip
Some calculators can find these confidence intervals for you.
Key point 6.1
A
symmetric confidence interval for the population mean
where
is the sample mean,
is:
is the standard deviation in one observation of
and
WORKED EXAMPLE 6.1
The masses of fish in a pond are known to have a normal distribution with a standard deviation
. The mean mass of
fish from the pond is found to be
.
a Find a
confidence interval for the mean mass of all the fish in the pond.
b Guidance from a vet suggests that the pond is an unsuitable environment if the mean mass of the
fish is below
. Does your confidence interval suggest that the environment is unsuitable?
of
a For a
confidence interval
So confidence interval is
, which is
Use your calculator to find the score associated with a
confidence interval.
.
b A mean mass of
is within the confidence interv
so the confidence interval does not necessarily
suggest that the environment is unsuitable.
You need to consider whether a true
mean of
is consistent with your
confidence interval.
You do not need to know the centre of the interval to find the width of the confidence interval. From Key
point 6.1 you know that the confidence interval goes from
. Therefore its width is
.
Fast forward
The inference in part b of Worked example 6.1 is effectively a type of hypothesis test. You will
see later in this chapter how you can quantify the significance level when using a confidence
interval to perform a hypothesis test.
Key point 6.2
The width of a confidence interval is
.
WORKED EXAMPLE 6.2
The results in a test are known to be normally distributed with a standard deviation of
many people need to be tested to find an
For an
. How
confidence interval with a width of less than ?
confidence interval
Find the -score associated with an
interval.
confidence
Set up an inequality.
At least
people need to be tested.
If the sample size is sufficiently large (greater than ) and you do not know the true variance, you need to
use the unbiased estimate of the variance as a substitute for the true variance.
You can use confidence intervals to conduct hypothesis tests. For example, if you find a
interval you can use it to conduct a
confidence
significance two-tailed hypothesis test.
WORKED EXAMPLE 6.3
A vet is measuring the masses of a breed of dog
. Her data are summarised here:
It can be assumed that the masses follow a normal distribution.
a Find a confidence interval for the mean.
b A textbook claims that the average mass of this breed is
. Conduct a hypothesis test at the
significance level to decide if this sample suggests that the textbook figure is incorrect.
a
First you need to work
out the sample statistics.
You need to find the
unbiased estimate of the
variance.
You can use the formula
from Key point 6.1 to find
the appropriate -score.
So
b
You can then use the
expression in Key point
6.1, substituting for .
,
The true mean being
is consistent with the confidence
interval found, so you do not reject
at the
significance
level. There is not significant evidence that the textbook is
incorrect.
You must take care not to draw false inferences from confidence intervals. It is important to know the types
of error that can be made, as shown in Worked example 6.4.
WORKED EXAMPLE 6.4
Ramon works out a
a
confidence interval for the population mean as
of any observed data will be between
. He claims that:
and
b the probability that the population mean is between
c the median of the population is
to
and
is
.
Decide which of these statements, if any, are correct. Justify your answers.
a This is not necessarily true. The confidence interval is for the mean rather than a
single observation. Even if this statement was about the sample mean there would be
variations between samples.
b This is not true. The population mean is not a random variable so you cannot talk
about a probability associated with it.
c This is not necessarily true. The confidence interval will be centred on the sample
mean, which may not equal the population median.
EXERCISE 6A
1
1 Find the -value for these symmetric confidence levels:
a
b
2
.
Find the required symmetric confidence interval for the population mean for the summarised data. You
can assume that the data are taken from a normal distribution with known variance.
a
i
ii
,
,
,
;
,
confidence interval
;
confidence interval
b
,
i
,
ii
3
,
,
;
;
confidence interval
confidence interval
Copy and complete this table. You can assume that the data are taken from a normal distribution with
known variance and that the confidence level is symmetric.
Confidence level
a
Lower bound of Upper bound of
interval
interval
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
ii
4
The blood oxygen levels (measured as percentages) of an individual are known to be normally
distributed with a standard deviation of
. Based upon six readings, Niamh finds that her blood
oxygen levels are on average
a Find a
.
symmetric confidence interval for Niamh’s true blood oxygen level.
b A doctor needs to be called if the true mean oxygen level falls below
interval suggest that the true oxygen level is below
?
5
. Does the confidence
The birth masses of male babies in a hospital are known to be normally distributed with variance
.
a Find a
symmetric confidence interval for the average birth mass if a random sample of ten
male babies have an average mass of
.
b If average birth masses are below
then an investigation must be conducted. Based upon this
confidence interval, should an investigation be conducted?
6
A data set is summarised here:
Find a
symmetric confidence interval for the mean, assuming that the data are drawn from a
normal distribution.
7
a A sample of
people in a town have an average wage of £
with an unbiased estimate of the
population variance of
million. The wages follow a normal distribution. Find a
symmetric
confidence interval for the mean wage in the town.
b Is there significant evidence (at
£
?
8
significance) that the mean wage in this town is different from
When a scientist measures the concentration of a solution, the measurement obtained can be
assumed to be a normally distributed random variable with standard deviation
.
a He makes
independent measurements of the concentration of a particular solution and correctly
calculates the confidence interval for the true value as
. Determine the confidence
level of this interval.
b The scientist claims that this means that
of sample means will be between
this a correct interpretation of the confidence interval? Justify your answer.
and
. Is
c He is now given a different solution and is asked to determine a
confidence interval for its
concentration. The symmetric confidence interval is required to have a width less than
. Find
the minimum number of measurements required.
9
A supermarket wishes to estimate the average amount spent shopping each week by single men. It is
known that the amount spent has a normal distribution with standard deviation €
. What is the
smallest sample required so that the margin of error (the difference between the centre of the interval
and the boundary) for an
symmetric confidence interval is less than € ?
10 A physicist wishes to find a confidence interval for the mean voltage of some batteries. She therefore
randomly selects batteries and measures their voltages. Based on her results, she obtains the
confidence interval [
]. The voltages of batteries are known to be normally distributed
with a standard deviation of
.
a Find the value of .
b Assuming that the same confidence interval had been obtained from measuring
would be its level of confidence?
batteries, what
c A
confidence interval for the mean voltage of a different brand of batteries is found to be [
]. Is there significant evidence that the second brand of battery has a higher voltage than
the first brand of battery?
11 a A set of
data items produces a confidence interval for the mean of (
). You can assume
that the data are drawn from a normally distributed population.
Given that
, find the confidence level, giving your answer to two significant figures.
b Jasmine wants to test these hypotheses:
Use the given confidence interval to conduct a hypothesis test, stating the significance level.
12 From experience it is known that the variance in the increase between marks in a beginning-of-year
test and an end-of-year test is
. A random sample of four students in Mr Jack’s class was selected
and the results in the two tests were recorded.
Alma
Brenda
Ciaron
Dominique
Beginning of year
End of year
a Assuming that the difference can be modelled by a normal distribution with variance
, find a
symmetric confidence interval for the mean increase.
b How could the width of the confidence interval be decreased?
c Do these data provide evidence at the
the school average of a
13 Which of these statements are true for
a There is a probability of
significance level that Mr Jack’s class is doing better than
-mark increase?
symmetric confidence intervals of the mean?
that the true mean is within the interval.
b If you were to repeat the sampling process
times,
of the intervals would contain the true
mean.
c Once the interval has been created there is a
the interval.
d On average
e
chance that the next sample mean will be within
of intervals created in this way contain the true mean.
of sample means will fall within this interval.
14 For a given sample, which will be larger; an
symmetric confidence interval for the mean?
symmetric confidence interval for the mean or a
Section 2: Confidence intervals for the mean when the population
variance is unknown
In many real-life situations, when finding an estimate for the population mean you do not know the
true population variance – you estimate it from the sample variance,
. This means that the statistic
does not follow the normal distribution, but rather the -distribution (as long as
follows a
normal distribution). In Section 1 you assumed that when the sample size is large the difference
between the -distribution and the normal distribution is sufficiently small that it can be ignored. In
this section you will look at how you can adapt the theory from Section 1 when sample sizes are small
– less than about .
Rewind
The -distribution and associated calculations were covered in Chapter 5. Remember that
the number of degrees of freedom is given by
.
You can follow a similar analysis to the one leading to Key point 6.1 to get a formula for a confidence
interval using a t-distribution when the sample size is small.
Key point 6.3
If the estimated variance
is found from the sample and the sample size is small, the
symmetric confidence interval for the population mean
is given by:
where
is the sample mean and
is chosen so that
The sample must be drawn from a normal distribution.
Tip
You can find the value of
from some calculators or by using the percentage points
table in the formula book. For example, if you are looking at a
symmetric
confidence interval, that means that there is
below the upper bound of the
interval so you use the
percentage point.
95%
2.5%
2.5%
x
97.5%
WORKED EXAMPLE 6.5
Find a
confidence interval for the mean of the data
drawn from a normal distribution.
,
, assuming that the data is
Find the sample mean and unbiased estimate of the
variance.
Find the number of degrees of freedom.
th percentage point of
is
Use tables to find the -score associated with a
symmetric confidence interval when
. If there is
within the confidence interval then there is
below the upper bound.
Apply the formula from Key point 6.3.
EXERCISE 6B
1
Find the required symmetric confidence interval for the population mean for these data, some of
which have been summarised. You can assume that the data are taken from a normal
distribution.
a
b
c
i
,
,
;
confidence interval
ii
,
,
;
confidence interval
i
,
ii
,
i
;
;
;
2
confidence interval
confidence interval
;
ii
confidence interval
confidence interval
A garden contains a large number of rose bushes. A random sample of eight bushes was taken
and the heights in cm were measured and the data were summarised as:
,
a State an assumption that is necessary to find a confidence interval for the mean height of
rose bushes.
b Find the sample mean.
c Find an unbiased estimate for the population variance.
d Find an
3
symmetric confidence interval for the mean height of rose bushes in the garden.
A sample of three randomly selected students are found to have an unbiased estimate of the
population variance of
in the amount of time they watch television each weekday.
Based upon this sample, the symmetric confidence interval for the mean time a student spends
watching television is calculated as
. It can be assumed that the times follow a normal
distribution.
a Find the mean time spent watching television.
b Find the confidence level of the interval.
c A newspaper report on this study claims that most students watch between
and
hours of television each day. Is this a reasonable conclusion from this confidence interval?
Explain your answer.
4
The random variable
is normally distributed with mean . A random sample of
is taken on , and it is found that:
A symmetric confidence interval
observations
is calculated for this sample.
Find the confidence level for this interval.
5
The lifetime of a printer cartridge, measured in pages, is believed to be approximately normally
distributed. The lifetimes of randomly chosen printer cartridges are measured and the results
are:
A symmetric confidence interval for the mean was found to be
.
a Find the value of .
b What is the confidence level of this interval?
c The manufacturer claims that the lifetime of the printer cartridge is at least
confidence interval found consistent with this claim?
6
pages. Is the
The times taken for four people to complete a crossword puzzle are measured and the results
are shown in this table.
Person
Time (minutes)
John
Diane
David
Jane
a Find a
confidence interval for the true population mean, assuming that the times follow a
normal distribution.
b The newspaper says that the average time to complete the crossword is more than
minutes.
i State suitable null and alternative hypotheses for this test.
ii Use your confidence interval from part b i to determine the conclusion to this hypothesis
test at the
significance level.
7
The masses of four burgers, in grams, before and after being cooked for one minute, are
measured:
Burger
Before cooking
After cooking
A symmetric confidence interval for the mean mass loss was found to include values from
. It can be assumed that the masses follow a normal distribution.
a Find the value of .
b Find the confidence level of this interval.
8
The temperature of a block of wood minutes after being lifted out of liquid nitrogen is
measured and then the experiment is repeated. The results are
and
.
a Assuming that the temperatures are normally distributed, find a
the mean temperature of a block of wood
confidence interval for
minutes after being lifted out of liquid nitrogen.
b A different block of wood is subjected to the same experiment and the results are
and
, where
. A second
confidence interval is created. Prove that the two confidence
intervals overlap for all values of .
Checklist of learning and understanding
A confidence interval for the mean is a range of possible values for the population mean,
along with a confidence level.
If the true population variance is known and the sample mean follows a normal distribution
then the confidence interval takes the form:
where
The width of the confidence interval is given by
.
.
When carrying out a hypothesis test or finding a confidence interval for the mean, if the
sample size is sufficiently large (
) and you do not know the true variance, you can use the
unbiased estimate of the variance as a substitute for the true variance.
If the estimated variance
is found from the sample and the sample size is small, the
confidence interval for the population mean is given by:
where
is chosen so that
distribution.
. The sample must be drawn from a normal
Mixed practice 6
1
The mass of a particular breed of dog is known to be normally distributed with variance
. The masses of a random sample of dogs from this breed are found. What is the
smallest value of required to make the
confidence interval for the mean mass less
than
wide?
Choose from these options.
A
B
C
D
2
A data set taken from a normal distribution is summarised as:
,
,
a Calculate the unbiased estimate of the variance of these data.
b Find a
confidence interval for the mean.
c Conduct a two-tailed test at
3
significance to determine if there is a change from
The masses of bananas are investigated. The masses of a random sample of
of these
bananas were measured and the mean was found to be
with an unbiased variance of
. It is assumed that the masses follow a normal distribution.
Find a
symmetric confidence interval for .
4
The time taken for a mechanic to replace a set of brake pads on a car
is recorded. In a
week she changes
sets of brake pads and
minutes and
.
Assuming that the times are normally distributed, calculate a
symmetric confidence
interval for the mean time taken for the mechanic to replace a set of brake pads.
5
The pH of a river is believed to be normally distributed with a standard deviation of
.
What is the smallest number of samples that should be taken to get a
confidence
interval for the mean with a width of less than
6
?
A random sample of four students in a school was selected and the results they got in two
tests were recorded:
Alma
Brenda
Ciaron
Dominique
Beginning of year
End of year
a Find a
symmetric confidence interval for the mean increase in marks from the
beginning of year until the end of year, assuming that the differences follow a normal
distribution.
b Hence conduct a test at the
significance level to see if the results have changed
between the beginning and end of the year.
7
The random variable
is normally distributed with mean
A random sample of
observations of
a Find a
confidence interval for .
has a mean of
and standard deviation .
.
.
b It is believed that
confidence interval for .
8
. Determine whether or not this is consistent with your
From experience it is known that the variance in the mass decrease during a diet is
.A
random sample of four people was selected and their masses before and after their diet
were recorded.
Bobby
Sam
Francis
Alex
Before diet
After diet
a Assuming that the mass loss follows a normal distribution, find a
confidence interval
for the mean mass loss during the diet.
b Hence conduct a test at
9
A sample of
significance to see if the diet results in a change in mass.
eggs are weighed and the masses in grams are:
Assuming that these masses form a random sample from a normal population, calculate:
a unbiased estimates of the mean and variance of this population
b a
confidence interval for the mean.
10 a i
A
confidence interval for a population mean, , is to be constructed. What is the
probability that the interval will not include the value of ?
ii
If such confidence intervals are constructed from separate random samples from
the same population, find the probability that at least one of them will not include .
b Jurgen can run
metres in a mean time of
seconds. His coach changes his
training programme to concentrate on his starting speed. After following the new training
programme, a random sample of of Jurgen’s
-metre running times has mean
seconds and standard deviation
seconds.
i
Assuming Jurgen’s
-metre times are normally distributed, construct a
confidence interval for his new mean time to run
metres, giving the limits to three
decimal places.
ii
Use the confidence limits to decide whether there is significant evidence that the new
training programme has been effective. Justify your decision.
[© AQA 2015]
FOCUS ON … PROOF 2
Proving the expectation and variance of the binomial distribution
In A Level Mathematics Student Book 2, Chapter 21, you used the formulae for the mean and variance of
the binomial distribution:
If
, then
and
.
In this section you will prove these facts.
You need to know the formula for binomial probabilities and the binomial expansion. One part of the proof
also involves differentiation using the chain rule.
Rewind
Refer to A Level Mathematics Student Book 2 for revision on the binomial distribution and on
the chain rule.
QUESTIONS
1
Expand
2
Use your result from question 1 to prove that if
where
is a positive integer.
is the probability of success and
is the
probability of failure, then:
3
Explain why
4
By differentiating
5
a By writing the binomial coefficient in terms of factorials, explain why
b Hence prove that
6
.
with respect to
.
a Show that
b Hence prove that
and treating
.
.
as a constant, prove that
.
.
FOCUS ON … PROBLEM SOLVING 2
Investigating confidence intervals
A common misconception is what is meant by the confidence level of a confidence interval. In this section
you will use spreadsheets to simulate the construction of confidence intervals to gain a better
understanding of them. With many statistics problems, the ability to simulate the situation is an extremely
useful tool in getting started. The screenshots show the syntax of some common spreadsheets, although
you might need to adapt this for the program you are using.
The formula for the endpoints of a
symmetric confidence interval where the population variance is
known is approximately
.
Use a spreadsheet to create
Sample
1st
1
27.62
2
20.34
3
22.50
4
10.06
random numbers generated from the normal distribution
Observation
5
6
20.75
18.76
7
26.51
8
10.41
:
9
10
=NORMINV(RAND(),20,5)
NORMINV(p robabili ty, mean, standard_dev)
Then find the mean of the sample and use the formula
1
2
3
4
A
B
C
D
E
Sample
1st
1
13.20
2
26.32
3
17.06
4
14.70
F
G
Observation
5
6
19.77
13.47
to find the confidence interval:
H
I
J
K
L
7
21.86
8
21.30
9
20.27
10
15.74
Mean
18.37
M
N
O
Confidencenterval
i
Lower
Upper
15.27 =L3+1.96*5/SQRT(10)
Tip
Some spreadsheets have the option of generating random numbers from a given distribution. If
your spreadsheet does not have this facility, then you can still use a random number generator
which provides random numbers from the rectangular distribution between and ; most
spreadsheets do have this function. You might have to think about why the formula shown then
provides random numbers from the normal distribution; it is not obvious.
Check if the confidence interval does contain the true mean, which was
L
M
N
Confidence ni terval
Mean
Lower
Upper
20.97
17.87
24.07
O
P
Q
R
:
S
Check
=IF(AND(M3<20,N3>20),1,0)
IF(logical_test, [value_if_true], [value_if_false])
Then copy this all down to consider
Confidence ni terval
Mean
Lower
Upper
20.80
17.70
23.90
21.52
18.42
24.62
21.94
18.84
25.04
QUESTIONS
Check
1.00
1.00
1.00
samples, all of size
Counting:
=SUM( 03:0202)
SUM(number1, [number 2], …)
. Count how many do contain the true mean.
QUESTIONS
1
For each sample, can you say with certainty whether or not the true mean is within the
calculated confidence interval?
2
What percentage of the calculated confidence intervals contain the true mean?
3
If, instead of using the true standard deviation, the sample standard deviation is used, then a
-interval is required.
a How does this affect the width of the confidence intervals?
b How does this affect your answer to question 2?
4
Adapt the spreadsheet to create two samples of size
from a
distribution. For each of
these samples, create a
confidence interval. Note whether the two confidence intervals
overlap. Repeat for lots of pairs of samples of size
from a
distribution. What
percentage of the pairs have overlapping confidence intervals?
5
Repeat the investigation from question , but this time using one sample of size
distribution and one sample of size
from a
distribution.
from a
Tip
The purpose of questions 4 and 5 is to highlight that it is not a good idea to use the overlap of
two confidence intervals to test to see if the mean of two distributions is the same, as the
significance level is not obvious.
FOCUS ON … MODELLING 2
Simulating the -distribution
The normal distribution and the -distribution are very closely related. To get a better feel for their
similarities and differences, this exercise investigates the shapes of these two distributions.
QUESTIONS
1
Use a spreadsheet to create a list of
.
2
Find the mean of each sample. Is the mean of all the means of each sample zero?
samples of size , taken from the normal distribution
Tip
In Excel you can create a random number from the
syntax “
toolpak.
distribution using the
” or the function provided by the Data Analysis
3
Find the standard deviation of these samples. Is the mean of the standard deviations of each
sample approximately ?
4
Construct the -score for each sample mean using the formula:
Plot each of these -scores on a histogram. What do you observe?
Tip
If you are using Excel, you might want to use the Data Analysis toolpak to create
the histogram.
5
Construct the -score for each sample mean using the formula:
Plot each of these -scores on a histogram. What do you observe?
6
Repeat questions 1 to 5 using a list of
distribution. What do you observe?
samples of size
taken from the
Based on this exercise, you should see that for small sample sizes there is a noticeable difference
between -scores and -scores, necessitating the use of the -distribution. However, for larger
sample sizes the differences are small compared to most other sources of uncertainty, so the
normal distribution can be used as an approximation to the -distribution.
CROSS-TOPIC REVIEW EXERCISE 1
The questions in this exercise cover AS Level material only.
1
The discrete random variable
Choose from these options.
can only take the values
and . If
, find
.
A
B
C
D
2
The length of an athlete’s long jump is modelled by a normal distribution with standard
deviation
. A sample of
jumps is measured. What will be the width (to three
significant figures) of a
confidence interval for the mean? Choose from these options.
A
B
C
D
3
A continuous random variable
has probability density function defined by
a Sketch the graph of .
b Show that the value of
c
is
.
i Write down the median value of
.
ii Calculate the value of the lower quartile of
.
[© AQA 2013]
4
The numbers of people studying Mathematics at different levels in a sample of students
from two different schools were recorded.
North Academy
South High School
No Maths
Single Maths
Further Maths
a Conduct an appropriate test to show that there is evidence at the
significance level
that the level of Mathematics studied depends on the school attended.
b What assumptions are required to make the conclusion of the test in part a valid?
c Jane says that she would be more likely to study Further Mathematics if she attended
North Academy. Is this a valid inference from the data? Justify your answer.
5
The continuous random variable
for
and otherwise.
a Show that
.
It is given that
.
b Hence find the values of
c Find
has probability density function given by
.
and .
d Find the value of
6
.
Andrew travels to a meeting. His journey consists of two independent parts; a section by
car and a section by train. The amount of time spent on the car section is modelled by the
random variable
and the amount of time spent on the train section is modelled by the
random variable
. All times are in minutes.
Based on long experience, Andrew knows that the average time spent on the car section
is
minutes with standard deviation minutes, and the average time spent on the train
section is
minutes with standard deviation
minutes.
a Assuming that there is no waiting time, find the expectation and standard deviation in
Andrew’s total journey time.
b For the meeting Andrew gets paid £
plus £
per hour he spends travelling. Find the
expectation and standard deviation in the amount Andrew gets paid.
7
For the year 2014, this table summarises the masses, kilograms, of a random sample of
women residing in a particular city who are aged between
years and
years.
Mass (
)
Number of women
Total
a Calculate estimates of the mean and the standard deviation of these
b
masses.
confidence interval for the mean mass of women residing in the city,
i Construct a
who are aged between
years and
years.
ii Hence comment on a claim that the mean mass of women residing in the city, who
are aged between
years and
years has increased from that of
in 1965.
[© AQA 2014]
8
Two independent random variables have normal distributions,
.
a State the distribution of
b Find
9
and
, including any necessary parameters.
.
At a remote hospital, in an area where there are many venomous snakes, the number of
patients during one week requiring treatment after a venomous snake bite may be
modelled by a Poisson distribution with mean
.
a For this hospital, find the probability that:
i no more than patient requires treatment after a venomous snake bite during a
particular week
ii at least patients require treatment after a venomous snake bite during a particular
period of weeks
iii more than
patients but fewer than
patients require treatment after a venomous
snake bite during a particular period of
weeks.
b Each patient who has been bitten by a venomous snake is treated with a single dose of
an anti-venom which is effective against the venoms of all the snakes common in that
area.
The anti-venom is expensive and has a limited shelf life, so that a delivery of fresh antivenom is made at -week intervals.
The hospital stores just enough anti-venom so that the probability that it runs out of
anti-venom before the next delivery is less than per cent.
Quoting probabilities to justify your answer, state how many doses of anti-venom the
hospital should have in its store immediately after a delivery of fresh anti-venom.
[© AQA 2015]
10 Dana, a researcher in the USA, investigated game-related stress for sports officials in
inter-school baseball, basketball and soccer.
The
officials involved in this investigation were categorised as either adopting an
approach (AP) coping style or an avoidance (AV) coping style when dealing with gamerelated stress.
Table 1 summarises the results of this investigation.
Table 1
Coping style
AP
AV
Baseball
Sport
Basketball
Soccer
You may assume that the
sample.
officials involved in this investigation represent a random
a Use the information in Table 1 to complete the contingency table, Table 2, with
frequencies that could be analysed to investigate whether the coping style used by
officials is associated with the sport involved.
Table 2
Coping style
AP
AV
Baseball
Sport
Basketball
Soccer
b Examine, using the
level of significance, whether the coping style used by officials
is associated with the sport involved.
c By comparing observed and expected frequencies, identify, in context, two important
facts concerning coping style and sport involved.
[© AQA 2014, adapted]
11 The probability distribution of a discrete random variable
a Find
is given by:
in terms of .
b Show that
.
c What is the largest possible value of the variance of
?
12 A sample of size is drawn from a normally distributed population with standard deviation
.A
confidence interval for the mean was correctly calculated to be
.
Find:
a the unbiased estimate of the population mean
b the value of .
13 In a diamond mine, the number of diamonds found per cubic metre of material mined is
known to follow a Poisson distribution with mean .
a If
i
, find the probability of finding:
diamonds in
of mining
ii
diamond in each of two
sections of mining.
b To be economically viable a diamond mine needs more than
metre. To survey a potential new mine the owner examines a
diamonds per cubic
sample.
i State appropriate null and alternative hypotheses in terms of .
ii The survey results show that the sample contains
test at the
significance level.
diamonds. Conduct a hypothesis
iii What is the probability of a type I error in this context?
iv Why might the mine owner choose to use a
significance level when conducting this test?
significance level rather than a
14 A receptionist answers phone calls for a company.
a State two conditions needed for the number of phone calls answered in an hour to be
modelled by the Poisson distribution.
b Explain why these conditions are unlikely to be met in this situation.
For a certain period of time you can now assume that the number of phone calls answered
in an hour can indeed be modelled by the
distribution.
c Find the standard deviation in the number of phone calls answered.
d Find the probability that fewer than
phone calls are answered in a -hour shift.
e Find the longest time for which the probability that no phone calls are answered is at
least
.
15 The continuous random variable
has probability density function given by
for
and otherwise. Find:
a the value of
b
c
d
e the median of
.
16 The volume of lemonade in a can produced at a factory follows a normal distribution with
standard deviation
. A quality control test takes a random, independent sample of
cans. The factory manager claims that the cans should, on average, contain
.
a If the true mean is
, find the probability that exactly
cans contain less than
.
b Jane decides that if
or more cans in the sample contain less than
reject the batch.
i State in this context what is meant by a type I error.
she will
ii Find the probability of a type I error in Jane’s test.
c The mean of the sample is found to be
.
confidence interval for the true mean of the cans, giving your
i Construct a
answer to decimal place.
ii Phillip uses the confidence interval from part c i to determine whether the cans do
come from a population with a mean of less than
. What conclusion does Phillip
draw and what is the significance level of his conclusion?
17 Members of a library may borrow up to books. Past experience has shown that the
number of books borrowed, , follows the distribution shown in the table.
a Find the probability that a member borrows more than
books.
b Assume that the numbers of books borrowed by two particular members are
independent. Find the probability that one of these members borrows more than
books and that the other borrows fewer than books.
c Show that the mean of
is
, and calculate the variance of
.
d One of the library staff notices that the values of the mean and the variance of
similar and suggests that a Poisson distribution could be used to model .
are
Without further calculations, give two reasons why a Poisson distribution would not be
suitable to model .
e The library introduces a fee of
pence for each book borrowed.
Assuming that the probabilities do not change, calculate:
i the mean amount that will be paid by a member
ii the standard deviation of the amount that will be paid by a member.
[© AQA 2016]
CROSS-TOPIC REVIEW EXERCISE 2
1
It is assumed that people arrive in a queue randomly and at a constant average rate of
per minute. The random variable
the queue.
a State the distribution of
is the time, in minutes, between people arriving in
, including any parameters.
b Find the probability that there is a gap of between
c What is the expected standard deviation of
2
and
minutes.
?
In a particular town, a survey was conducted on a sample of
residents aged
years
to
years. The survey questioned these residents to discover the age at which they
had left full-time education and the greatest rate of income tax that they were paying at
the time of the survey.
The summarised data obtained from the survey are shown in the table.
Greatest rate of income tax paid
Age when leaving education (years)
or less
or
or more
Total
Zero
Basic
Higher
Total
a Use a -test, at the
level of significance, to investigate whether there is an
association between age when leaving education and greatest rate of income tax
paid.
b It is believed that residents of this town who had left education at a later age were
more likely to be paying the higher rate of income tax. Comment on this belief.
[© AQA 2015]
3
A digital thermometer measures temperatures in degrees Celsius. The thermometer
rounds down the actual temperature to one decimal place, so that, for example,
and
are both shown as
. The error,
, resulting from this rounding down can
be modelled by a rectangular distribution with the following probability density function.
a State the value of .
b Find the probability that the error resulting from this rounding down is greater than
.
c
i State the value for
.
ii Use integration to find the value for
.
iii Hence find the value for the standard deviation of
.
[© AQA 2016]
4
Julie, a driving instructor, believes that the first-time performances of her students in
their driving tests are associated with their ages.
Julie’s records of her students’ first-time performances in their driving tests are shown in
the table.
Age
Pass
Fail
a Use a
-test at the
level of significance to investigate Julie’s belief.
b Interpret your result in part a as it relates to the
age group.
[© AQA 2010]
5
The random variable
represents the number of soft drinks Manuel purchases while
eating a burger. Manuel models
using the
a Find the standard deviation of
.
distribution.
is the amount Manuel spends on his meal in pounds. If the burger costs £ and each
drink costs £ , find:
b
i
.
ii
6
The discrete random variable
follows the
distribution and satisfies
.
a Find
.
b In three independent observations of
, find the probability that fewer than two have
.
7
The random variable
measures the number of minutes Cauchy spends on a mobile
phone each month. The mean of
is
with standard deviation
.
Cauchy is on a contract with a fixed charge of £ each month, then
per minute.
a Find the mean and the variance in , the amount of money in pounds that Cauchy
spends each month on his mobile phone.
b Cauchy has a budget of £ per month for his phone. Anything that he does not
spend on his phone he saves. Find the mean and variance in , the amount saved in
pounds each month.
8
South Riding Alarms (SRA) maintains household burglar-alarm systems. The company
aims to carry out an annual service of a system in a mean time of
minutes.
Technicians who carry out an annual service must record the times at which they start
and finish the service.
a Gary is employed as a technician by SRA and his manager, Rajul, calculates the times
taken for annual services carried out by Gary. The results, in minutes, are as
follows:
Assume that these times may be regarded as a random sample from a normal
distribution.
Carry out a hypothesis test, at the
significance level, to examine whether the
mean time for an annual service carried out by Gary is
minutes.
b Rajul suspects that Gary may be taking longer than
minutes on average to carry
out an annual service. Rajul therefore calculates the times taken for
annual
services carried out by Gary.
Assume that these times may also be regarded as a random sample from a normal
distribution but with a standard deviation of
minutes.
Find the highest value of the sample mean which would not support Rajul’s suspicion
at the
significance level. Give your answer to two decimal places.
[© AQA 2014]
9
The time taken to complete a test is modelled by the normal distribution. The average
score on this test is
with standard deviation
. A sample of
students in a school
take the test and if their average is above
it will be decided that the school is doing
better than the rest of the population.
a Explain why the normal distribution is a plausible model for the test results.
b Assuming that the standard deviation is still
test.
, find the significance level of this
c If the true mean of students in the school is
, find the power of the test.
d If the true mean of the students was higher than
, would the power of the test be
higher or lower? Explain your answer. No further calculations are required.
10 The time in seconds between errors in a piano performance is modelled by an
exponential distribution, exp
.
a The probability that there is an error in any
seconds is
b Find the probability that there is no error in any
Find the value of .
seconds.
c Find the expected time until the first error.
11 Judith, the village postmistress, believes that, since moving the post office counter into
the local pharmacy, the mean daily number of customers that she serves has increased
from .
In order to investigate her belief, she counts the number of customers that she serves
on
randomly selected days, with the following results.
Stating a necessary distributional assumption, test Judith’s belief at the
level of
significance.
[© AQA 2010]
12 It is claimed that a new drug is effective in the prevention of sickness in holiday-makers.
A sample of
holiday-makers was surveyed, with the following results.
Sickness
No sickness
Total
Drug taken
No drug taken
Total
Assuming that the
holiday-makers are a random sample, use a
level of significance, to investigate the claim.
test, at the
[© AQA, 2010]
13 The discrete random variable
follows the
distribution and satisfies
.
a Find the value of .
b Show that
.
14 Lorraine bought a new golf club. She then practised with this club by using it to hit golf
balls on a golf range.
After several such practice sessions, she believed that there had been no change from
metres in the mean distance that she had achieved when using her old club.
To investigate this belief, she measured, at her next practice session, the distance,
metres, of each of a random sample of
shots with her new club. Her results gave
Investigate Lorraine’s belief at the
you make.
level of significance, stating any assumption that
[© AQA 2010]
15 Wellgrove village has a main road running through it that has a
speed limit. The
villagers were concerned that many vehicles travelled too fast through the village, and
so they set up a device for measuring the speed of vehicles on this main road. This
device indicated that the mean speed of vehicles travelling through Wellgrove was
.
In an attempt to reduce the mean speed of vehicles travelling through Wellgrove, lifesize photographs of a police officer were erected next to the road on the approaches to
the village. The speed,
following data obtained.
, of a sample of
vehicles was then measured and the
a State an assumption that must be made about the sample in order to carry out a
hypothesis test to investigate whether the desired reduction in mean speed had
occurred.
b Given that the assumption that you stated in part a is valid, carry out such a test,
using the
level of significance.
c Explain, in the context of this question, the meaning of:
i a type I error
ii a type II error.
[© AQA 2015]
16 The discrete random variable
a If
satisfies this distribution:
, find the possible values of .
b For the larger value of , find the value of
.
17 Long-term observations suggest that the number of cars passing the school gates
follows a Poisson distribution with the mean of cars per minute. Following the opening
of a new supermarket at the end of the road, the head teacher wishes to find out
whether this mean has increased. She sends a group of students to count the cars
passing the school gates during a -minute interval.
Let
be the number of cars passing the school gates in a
.
-minute interval, so that
a Write down suitable null and alternative hypotheses.
b Find the critical region for the test at the
c The students counted
significance level.
cars. State the conclusion of the test.
In reality, the mean number of cars has increased to
per minute.
d Find the probability that the test results in a type II error.
18 Groups of visitors arrive at a museum randomly, at a constant average rate of
per
hour. The director wants to find out whether this rate is smaller on rainy days. She
randomly selects a rainy day and records the number of groups arriving over a -hour
period. She then conducts a hypothesis test, using these hypotheses:
,
where is the population mean number of groups arriving at the museum in a -hour
period.
a Write down the value of
.
The manager decides that she will reject the null hypothesis if the number of visitor
groups arriving in the -hour period is less than or equal to .
b Find the probability of a type I error in this test.
The number of visitor groups in fact decreases to
per hour on a rainy day.
c Find the power of the test.
19 A physicist measures a quantity associated with the spin of an electron, . She takes
independent readings that have mean
. She calculates an unbiased estimate of
the variance as
.
She assumes that this quantity follows a normal distribution.
a Find a
confidence interval for the true mean of
suitable level of accuracy.
b The random variable
is defined as
interval for the mean of .
c A theory predicts that the true value of
in part a consistent with the theory?
, giving your answer to a
. Write down a
confidence
is exactly . Is the confidence interval found
d The physicist repeats her experiment three times. Each experiment consists of
independent readings followed by finding a confidence interval.
i What is the probability that at least two of these confidence intervals do contain
the true mean?
ii What is the probability that all of these confidence intervals are above the true
mean?
AS LEVEL PRACTICE PAPER
45 minutes, 40 marks
1
The number of beetles in a forest can be modelled by a Poisson distribution with parameter
beetles per square metre. Find the probability, to three significant figures, that in a
area
there are fewer than
beetles.
Choose from these options.
A
B
C
D
2
[1 mark]
The discrete random variable
and
has a probability distribution given by
otherwise. Find
for
.
Choose from these options.
A
B
C
D
3
[1 mark]
The discrete random variable
has this distribution:
a Find the value of .
[1 mark]
b Find
[1 mark]
.
c Find the standard deviation of
4
.
[4 marks]
Sarah models the number of buses arriving at a bus stop using a Poisson distribution. is the
number of Route buses arriving in an hour and is the number of Route buses arriving in
an hour. Sarah models these as being independent with
and
.
a Given that
, state in context an interpretation of the variable
and write down its
distribution, including any parameters.
b Find the probability that
[2 marks]
or fewer buses arrive in an hour.
c Give one reason why the assumption that
case.
and
are independent is unlikely to be the
[1 mark]
To check her model, Sarah counts the buses arriving in
randomly selected hours.
d Use suitable calculations to determine if a Poisson model is feasible.
5
The continuous random variable
and otherwise.
c Find
6
median of
.
[4 marks]
has probability density function given by
a Find the value of .
b Show that
[2 marks]
for
[3 marks]
.
[5 marks]
[3 marks]
This table shows the results of a survey in a school about weekly hours spent watching TV.
Test at the
significance level whether school year and hours spent watching TV are
independent.
School year
Hours
[5 marks]
7
a The number of leaks in a pipe is known by a water company to follow a Poisson distribution
with mean
leaks per
. A new contractor claims that they can reduce the number of
leaks. After they have maintained the pipes for some time, a random
investigated and found to have
level.
leaks. Test the contractor’s claim at the
stretch of pipe is
significance
[5 marks]
b It is decided that if three or fewer leaks are found in
, then the contractor has reduced
the number of leaks. What is the probability of a type I error?
[2 marks]
A LEVEL PRACTICE PAPER
60 minutes, 50 marks
1
The number of beta particles emitted by a radioactive isotope follows a Poisson
distribution. On average,
beta particles are emitted each second. What is the
probability (to three significant figures) that the second beta particle is emitted between
and seconds after the first beta particle is observed?
Choose from these options.
A
B
C
D
2
[1 mark]
The discrete random variable
and
has a probability distribution given by
otherwise. Find the median of
for
.
Choose from these options.
A
B
C
D
3
[1 mark]
The discrete random variable
a Find
b Find
follows the distribution shown.
.
[1 mark]
.
[2 marks]
c Write down the value of
4
.
[1 mark]
The contingency table shows information about whether a random sample of people
have music lessons, and their gender.
Music lessons
No music lessons
Female
Male
a State the null and alternative hypotheses when conducting a chi-squared test for
independence.
[2 marks]
b Write down the number of degrees of freedom in this test.
c Conduct a chi-squared test at the
gender and choice of lessons.
5
A continuous random variable
[1 mark]
significance level to see if there is a link between
[5 marks]
has probability density function given by
a Find the exact value of .
[3 marks]
b Find
[4 marks]
and
, giving your answers to three significant figures.
c Find the standard deviation of
6
, giving your answer to three significant figures.
[4 marks]
A researcher is testing if a new swimming technique is more effective. She
knows the average
time of swimmers in her club using the old technique is
seconds. After training
swimmers with the new technique she times them over
and summarises their times
in seconds:
Lower times are considered better in swimming.
a Show that the unbiased estimate of the variance is
to two decimal places.
b Write down appropriate null and alternative hypotheses to test if the new
swimming technique is effective.
[2 marks]
c Write down the number of degrees of freedom in the test.
[1 mark]
d Investigate, using the
mean
time.
[2 marks]
significance level, whether the new technique improved the
[4 marks]
e State one assumption required for your test to be valid. Comment on how reasonable
the assumption is in this context.
7
[2 marks]
The number of phone calls received by an IT helpline is known to follow a Poisson
distribution. It is thought to receive a mean of phone calls per hour.
A change to the IT system is designed to encourage fewer phone calls to the helpline. If
there are phone calls or fewer in a -hour period, the change will be deemed successful.
a Find the probability of a type I error in this process.
b In reality the number of phone calls was
error.
8
[3 marks]
per hour. Find the probability of a type II
[3 marks]
When a scientist records the volume of acid required to neutralise a solution she records
her results to the nearest millilitre. For example, if she records a volume of
believes that the true volume required is somewhere in between
and
possibilities equally likely.
The error,
, she
with all
, is a random variable defined as the true volume of acid required to
neutralise the solution minus the recorded volume.
a State an appropriate distribution to model
, including its parameters.
b Find the probability that the magnitude of the error,
, is less than
[2 marks]
.
[1 mark]
c Find the probability that in two independent observations the magnitude of the error is
less than .
[2 marks]
d Hence find the probability density function of the random variable
magnitude of the error in two observations.
, the maximum
[3 marks]
FORMULAE
Probability
Standard deviation
Discrete distributions
Distribution of
Mean
Variance
Binomial
Poisson
Sampling distributions
For a random sample
variance :
For a random sample of
of
independent observations from a distribution having mean
observations from
and
:
Distribution-free (non-parametric) tests
Contingency tables:
TABLE 1
is approximately distributed as
Percentage points of the student’s -distribution
The table gives the values of satisfying
-distribution with degrees of freedom.
p
0
x
, where
is a random variable having the student’s
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
45
50
55
60
65
70
75
80
85
90
95
100
125
150
200
TABLE 2
Percentage points of the
The table gives the values of
distribution with
satisfying
distribution
, where
is a random variable having the
degrees of freedom.
p
x
0
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
24
24
25
25
26
26
27
27
28
28
29
29
30
30
31
31
32
32
33
33
34
34
35
35
36
36
37
37
38
38
39
39
40
40
45
45
50
50
55
55
60
60
65
65
70
70
75
75
80
80
85
85
90
90
95
95
100
100
Answers
1 Discrete random variables
BEFORE YOU START
1
2
3
4
WORK IT OUT 1.1
Solution B is correct.
EXERCISE 1A
1 Answers are given to
a i
ii
b i
ii
c i
ii
d i
ii
2 a Proof.
b
3 a
b
4 a Proof.
b
5 a
b
c
where appropriate.
6
7 a
b
8
9 a
b
c
d
e
10 a
Profit £
Probability
b £
11 a
b i
ii Proof.
12 a
b
c
d i
ii
EXERCISE 1B
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
2 a i
ii
b i
ii
c i
ii
d i
ii
3
4
5
6
7 a
b
c
d
e
8 a
b
9 Proof.
10 Any value allowed, there is no upper limit.
EXERCISE 1C
1 a i
ii
b i
ii
2
3
4 a Proof;
b
5
6
7
8
9 Proof.
10 Proof.
MIXED PRACTICE 1
1 C
2 C
3 a
b
4 a
b
c
d
e
5 a
b
c
6 a
b
c
7
8
9 a
b
10 a
b
c
d
11 a
b
c
12
13 a
b
c
d
14
15 a
b
16 a
b i
ii Proof.
iii
17 a i Proof.
ii
iii Proof.
iv
b
2 Poisson distribution
BEFORE YOU START
1
2
3
4 Do not reject
.
WORK IT OUT 2.1
Solution A is correct.
EXERCISE 2A
In this exercise answers are given to
1 a i
ii
b i
ii
c i
ii
2 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
3
4
5 a
b
6 a
b
7 a
, where appropriate.
b
c
d
8 a
b
9 a
b
10
11 a
b
c There are alternative ways to get
emails in a week other than
12 a
b
c
13 a
b
c
14
15 a Proof.
b
WORK IT OUT 2.2
Solution B is correct.
EXERCISE 2B
In this exercise answers are given to
1 a i Do not reject
-value
ii Do not reject
-value
b i Reject
-value
ii Reject
-value
c i Do not reject
-value
ii Do not reject
-value
d i Reject
-value
ii Reject
-value
2 a i
ii
b i
, where appropriate.
every day.
ii
c i
ii
3 Reject
-value
4 a -value
. Do not reject
.
b Rate might be different at different times. Cars might not be independent – cars might
be travelling together.
5 a
b Do not reject
-value
6 a
b
c Reject
-value
7 a Constant rate over the day. Bees arrive independently.
b Reject
-value
8 Do not reject
-value
9 a Do not reject
b Reject
.
-value
-value
10
MIXED PRACTICE 2
In this exercise answers are given to
, where appropriate.
1 C
2 C
3 a
.
b
c
4 a Independent events. Constant rate of success.
b
c Reject
-value
.
5 a
b
6 a
b
7 a
b
c Do not reject
8 a
-value
b
9 a
b
10 a
b
11 a
b
c
12 a
b
c
d
e
f
g
13 a
b
14 a
b
15 Do not reject
-value
16 a The average rate must be constant. However, you might expect it to vary over different times
of the day and with different weather conditions. Birds must arrive independently, but they
might come in flocks.
b
c
d Do not reject
17
-value
18 a
b
c
d
19 a i
ii
b i
ii
c
20
21 a i
ii
iii
b
22 a i
ii
b
c i
ii The coins buried in a hoard are no longer independent. The Poisson assumption requires
independence, so brooches are more likely to be modelled by a Poisson distribution.
3 Chi-squared tests
BEFORE YOU START
1 Yes; the -value is
2
3
EXERCISE 3A
1 a i
ii
b i
ii
2 a
: Physics grade and Mathematics course are independent;
: Physics grade and Mathematics course are dependent.
b
or
or
or
Further Maths
Maths AS or A
No Maths
c
d Reject
. Significant evidence of association; students studying a higher level of Mathematics
tend to do better in physics.
3
critical value
. Reject
reading level and fiction/non-fiction are dependent.
4 a
Early
. There is significant evidence that the
On time
Late
Total
Walk
Car
Other
Total
b
c Reject
. Significant evidence that lateness depends on the mode of transport.
d He must assume that his data are representative; for example, it was not a day with unusual
traffic. He must also assume that the respondents were independent; for example, not lots of
students from the same bus.
5 Significant evidence of association;
more money on each visit.
6 No significant dependency;
increasing the speed of recovery.
7 a
. People who visit more often tend to spend
. The drug does not appear to be effective in
Male
£
£
Female
£
£
£
b No significant evidence of dependency;
8 a Proof.
b
c
9 a Proof.
b Proof;
.
c
d There will be random variation within the sample.
10 a
First factor
Second factor
Total
A
12
12
1
5
B
10
1
4
5
C
3
2
25
20
Total
b Proof.
11 a Proof.
b Proof.
WORK IT OUT 3.1
Solution C is correct.
EXERCISE 3B
1 a i
ii
b i
ii
2 Proof;
3 Independent;
.
4 Independent;
rural and urban libraries.
. Number of books does not seem to differ significantly between
5 There is significant evidence of an association;
. However, this does not establish
causality.
6 a Proof;
. A higher percentage of men are admitted
to be evidence of bias.
b
. This appears
. Two out of six departments have a higher proportion of men accepted.
7 You cannot do a calculation based on two factors being dependent unless you know exactly what
that dependency is.
MIXED PRACTICE 3
1 D
2 B
3
; no significant evidence of association.
4 a
b
c Do not reject
5
. No significant evidence that hair colour and eye colour are dependent.
; do not reject
. There is no evidence of association.
6 A
7 a
; significant evidence of association.
b i There are more of them.
ii Largest contribution to
8 a
.
; Fiona’s belief is justified.
b Fewer than expected gained Class . More than expected gained Class 2ii.
9 a i
; significant evidence of association.
ii Accidents involving changing lane to the left are less likely, and accidents involving changing
lane to the right are more likely than expected for foreign registered HGVs.
b i
Expected values
Prosecution resulted
No prosecution
years or under
Over
ii
years
; it appears that they are independent. There is no significant evidence that
prosecution is dependent on age.
10
; there is significant evidence of an association between age and department.
11 a Proof.
b Proof;
c
d The sample follows the long-term trend.
12 a
; development of Type 2 diabetes seems to be dependent on average level of
weekly alcohol consumption.
b Not generalisable to the whole population.
The ‘less than ’ category also has a large contribution.
c i No change,
ii
iii No change.
.
4 Continuous distributions
BEFORE YOU START
1
2
3
4
5
6
EXERCISE 4A
In this exercise answers are given to
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
f i
ii
g i
ii
h i
ii
2 a i
ii
b i
ii
c i
ii
3 a i
s.f., where appropriate.
ii
b i
ii
c i
ii
4 a
b
5
6
7
8 a
b
9
10
11 Proof;
EXERCISE 4B
1 a i
;
ii
;
;
;
;
;
b i
;
ii
;
;
;
c i
;
;
;
ii
2 a i
ii
b i
ii
3 a
b
4 a
;
;
d i
ii
;
;
;
;
;
;
;
;
;
b
5 a Proof.
b
6 a Proof.
b
7
EXERCISE 4C
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
2 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
3 a i
ii
b i
ii
c i
ii
d i
ii
4 a £
b £
5
6 a
b
7 a
b
8 a
b
c
9 a
if
is odd,
if
is even.
b
EXERCISE 4D
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
2 a
b
c
d
3 a Proof.
b
4
;
5 ;
6 ;
7
minutes;
minutes
8
is twice the mass of a single gerbil;
is the sum of the masses of two different gerbils.
EXERCISE 4E
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
ii
f i
ii
2 a
b
3 a
b
c
4 a
b
5 a
b
6 a
b
c
7 a
b
8
9 a
b
10 a
b Assumes that the rainfall each day is independent of the rainfall on other days. This is unlikely
to be the case.
11 a
b i
ii
iii
c
d
EXERCISE 4F
;
1 a i
ii
b i
ii
2 a i
ii
b i
ii
3
4 a
b
,
otherwise.
c
EXERCISE 4G
1 a
f(x)
5k
0
0
b
c
2 a
b
3 a Proof.
b
c
5
10
x
d
4 a
b f(w)
0
0
3
7
w
c
d
5 a Proof.
b
6 a
b
c
d
e
7 a
f(x)
k
0
0
π
–
2
5π
–
8
x
b Proof.
c
d
EXERCISE 4H
In this exercise answers are given to 3 s.f., where appropriate.
1 a i
ii
b i
ii
c i
ii
d i
ii
e i
i
2 a i
ii
b i
ii
3 a
b
4
5
6 a Proof.
b Proof;
.
7
EXERCISE 4I
In this exercise answers are given to
1 a i
ii
b i
ii
c i
ii
2 a i
ii
b i
ii
;
, where appropriate.
3
4 a
b
c
5
6
7
8 Proof; it equals
9 a Exponential;
b
c Proof.
10 Proof.
a
b
c Proof.
d Proof.
EXERCISE 4J
1 a i
ii
b i
ii
c i
;
ii
d i
ii
2 a
b
c
d
3 a
b
4 a
;
;
;
b
5 a
b
c
d
6 a
b
7 Proof;
8 a
b
c
d
e
. Rounding causes a slight underestimate of the true mean time.
MIXED PRACTICE 4
1
2
3 a
b
c
4 a
b
c
5 a £
b £
c £
6 a
b i
ii
7 a
b
8
9 a
b
c
d
10 a
b
c
d
11 a
b
c Probably not. Especially in the situation in part b it is likely that when Alice is finished Hassan
might try to speed up.
d
12
13
14
15 a
b
16 a
b
c
d Proof.
e
17 a
b
c
18 a Proof.
b
c
19 a
b
c
20
21
22 a
b
23 a Proof.
b
c
d
e i
ii
24 a f(x)
3
–
10
x+7
y =–
40
1
–
5
O
1
5
x
b
c Proof.
d i
ii Proof.
e
25 a f(x)
3
–
32
O
1
–
2
b i-ii Proof.
c i
ii
11
x
d
Focus on … Proof 1
Proof 6
1 Theorem 2.
2 Theorem 3.
3 Theorem 5.
4 Properties of sums.
5 Theorem 4 and Theorem 1
6 Theorem 1.
7 Theorem 4.
1–3 Proof.
Focus on … Problem solving 1
1 0.613 or 0.168
2 0.199 or 0.416
3 8
4 a, b Proof.
c 18
Focus on … Modelling 1
1 Not very appropriate. Rate might not be constant in every part of the ocean. The presence of fish might
not be independent.
2 Not appropriate. Not a random process.
3 This is well modelled by a Poisson distribution.
4 Not appropriate. This is not a number of events, and the rate might not be constant throughout the day.
Buses might not be independent.
5 The rate might not be constant, but the Poisson tends to work quite well in these situations.
6 Not appropriate. The number of fish being caught is sufficient that it might have a significant effect on
the number of fish remaining in the pond.
7 This is well modelled by the Poisson distribution (and indeed is used in a derivation of the chi-squared
statistic).
8 This is well modelled by the Poisson distribution.
5 Further hypothesis testing
BEFORE YOU START
1
2 Reject
3 Do not reject
4
5 Do not reject
.
EXERCISE 5A
1 a i Reject
ii Reject
b i Do not reject
ii Do not reject
c i Do not reject
ii Do not reject
2 a
b Reject
c True variance unknown. Assume that times are normally distributed.
3 a
b Do not reject
.
4 Reject
5 a
b Do not reject
6 a
b
c Reject
.
d Assume that they are normally distributed.
7 a
b Do not reject
c i
ii Reject
8 a
b
EXERCISE 5B
1 a i
ii
.
.
b i
ii
c i
ii
2 a i
ii
b i
ii
c i
ii
3 a i
ii
b i
ii
4 a i
ii
b i
ii
c i
ii
5 a i
ii
b i
ii
c i
ii
6 a i
ii
b i
ii
c i
ii
7 Decreases the risk of a type II error, but increases the risk of a type I error.
8 a
: Coin is fair,
: coin is biased.
b Type I error.
9 a Claiming that there is correlation when none really exists.
b Not recognising correlation when there is underlying correlation.
10 a i Claiming that the dice is biased when it is not.
ii Claiming that the dice is not biased when it is.
b For example: Roll the dice more times, look for more than
do a chi-squared test.
sixes, consider other numbers,
11 a
b
. This is very small and requires extreme evidence before change is found. This does not
seem to be required in this situation.
12 a
b
c
13 a
b Do not reject
. There is not enough evidence that Dhalia’s eggs are heavier.
c
d
e
14 a
b
c
d Proof.
15 a i
ii
b i
ii
16
MIXED PRACTICE 5
1 C
2 A
3 a
b
c Do not reject
d Assume that the data are drawn from a normal distribution.
4 a Not rejecting
when it is false.
b
c
d Increase the sample size.
5 a
b
c
6 a
b
c
d Reject
.
7 a
b
c
8 a Proof;
b
. no significant evidence that mean journey time is
9 a
. no significant evidence to doubt that the mean is
b
. significant evidence that the mean is not equal to
c i Neither. Risk of a type I error is
minutes.
.
.
regardless of sample size.
ii Larger sample size leads to a smaller risk of a type II error.
10 a
. evidence of association between method of receiving information and
outcome.
b Type I error.
Chapter 6 Confidence intervals
BEFORE YOU START
1
2
3 Do not reject
4 No significant evidence.
EXERCISE 6A
1 a
b
2 a i
ii
b i
ii
3
Confidence Lower bound of
level
interval
a
Upper bound of
interval
i
ii
b
i
ii
c
i
ii
d
i
ii
e
i
ii
4 a
b It is plausible that the true oxygen level is
.
5 a
b Yes
6
7 a
b No significant evidence of a difference in the mean wage.
8 a
b No. This is a confidence interval for the population mean, not sample means.
c
9
10 a
b
c No, since the confidence intervals overlap (although it is quite difficult to find the significance
level).
11 a
b Reject
at the
significance level.
12 a
b Increase the sample size.
c Yes
13 a False.
b False.
c False.
d True.
e False.
14
EXERCISE 6B
1 a i
ii
b i
ii
c i
ii
2 a Assume heights are normally distributed.
b
c
d
3 a
b
c No. The confidence interval is for the mean value for an individual. The sample is too small for
meaningful generalisations.
4
5 a
b
c Yes
6 a
b i
ii Do not reject
7 a
.
b
8 a
b Proof.
MIXED PRACTICE 6
1 D
2 a
b
c Do not reject
3
4
5
6 a
b Significant evidence that the results are different.
7 a
b No. the given probability suggests that
which does not fall in the confidence interval.
8 a
b Do not reject
9 a
b
10 a i
ii
b i
ii New programme seems to have been effective.
Focus on … Proof 2
1 (n0)pn+(n1)pn−1q+(n2)pn−2q2…+(nn)qn
2–3 Proof.
4 Proof.
5–6 Proof.
Focus on … Problem solving 2
1 Yes. No probability is involved.
2 About 95%.
3 a They tend to be wider.
b No change.
4 About 99.4%.
5 About 30%.
Focus on … Modelling 2
1 Investigation.
2 It should be close to zero.
3 No, it should be about 0.7.
4 It looks a lot like a normal distribution.
5 The shape looks like a normal distribution, but it is much wider – it extends to t-scores above 3 and
below −3.
6 The standard deviation is much closer to 1 and the t-scores histogram is very similar to the z-scores
histogram.
Cross-topic review exercise 1
1 B
2 D
3 a f(x)
9k
O
3
4
x
b Proof.
c i
ii
4 a
; degrees of freedom
.
b Random, representative sample from each school.
c No. Just because there is dependency does not mean that there is causality.
5 a Proof.
b
c
d
6 a
minutes;
b £
minutes
£
7 a Mean:
; s.d.:
b i
ii There is reason to doubt the claim.
8 a
b
9 a i
ii
iii
b
10 a
AP
AV
Baseball
Basketball
Soccer
b
; number of degrees of freedom
is associated with sport involved.
; significant evidence that coping strategy
c Soccer officials are far less likely than expected to use an AV coping style. Baseball officials are far
more likely than expected to use an AV coping style.
11 a
b Proof.
c
12 a
b
13 a i
ii
b i
ii Do not reject null hypothesis.
iii
iv Making a type II error, rejecting a genuine opportunity, might turn out to be very costly. Further
tests can always be done to be more certain.
14 a Phone calls are independent of each other, there is a constant average rate of phone calls.
b For example: The same customer might call back, breaking independence. The rate during office
hours might be different from the rate during the night.
c
(
)
d
e
hours ( s.f.).
15 a
b
c
d
e
16 a
b i
or more cans containing less than
even though the mean is actually
.
ii
c i
ii Significant evidence that the cans contain less than
on average; significance level is
17 a
b
c Proof;
d No probability of
e i
ii
books borrowed and no probability of more than
books borrowed.
.
Cross-topic review exercise 2
1 a
b
c
2 a
; significant evidence of association.
b Belief is supported at the
level of significance.
3 a
b
c i
ii
iii
4
evidence to support Julie’s belief at
More students than expected in the age group
significance level.
pass their test first time.
5 a
b i
ii
6 a
b
7 a
b
8 a
; insufficient evidence to reject null hypothesis.
b
9 a Most students will be close to the average, with fewer and fewer students getting scores as you
move further from the mean.
b
c
d Power would increase because the test will be more likely to pick up the difference from
.
10 a
b
c
11
; sufficient evidence to support Judith’s belief. Assumption that the population is normally
distributed.
12
no evidence at
against the sickness.
13 a
significance to support the claim that the drug is effective
b Proof.
14
. Evidence to support Lorraine’s belief. Assume that the distances follow a normal
distribution.
15 a Random sample.
b
; significant evidence that mean speed has reduced.
c i Concluding that the mean speed has reduced when in fact it has not.
ii Concluding that the mean speed is still
when in fact it has reduced.
16 a
b
17 a
b
c Sufficient evidence that the mean number of cars has increased.
d
18 a
b
c
19 a
b
c No
d i
ii
AS Level Practice Paper
1 C
2 C
3 a 0.1
b 0.8
c 0.8
4 a Total number of buses arriving in an hour; T~Po(7.5).
b 0.0591 (3 s.f.)
c For example: Both are dependent on traffic.
d Not feasible. Mean≠variance.
5 a 19
b Proof; median=2.38 (3 s.f.);E(X)=2.25.
c 0.338 (3 s.f.)
6 χ2=4.22,ν=6; do not reject H0. No significant evidence of an association.
7 a p − value=0.0212; reject H0. Significant evidence that the contractors have reduced the mean
number of leaks.
b 0.0212 (3 s.f.)
A Level Practice Paper
1 B
2 D
3 a 53
b 59
c 5
4 a H0: Gender and lessons are independent; H1: Gender and lessons are not independent.
b 1
c χYates2=4.84; reject H0.
5 a 1ln2
b E(X)=1.44 (3 s.f.);Var(X)=0.0827 (3 s.f.)
c 0.144 (3 s.f.)
6 a Proof.
b H0: μ=35;H1: μ<35
c 11
d t=(−)3.25; reject H0.
e Assume that the swimming times are drawn from a normal distribution. Any reasonable
comment, for example: OK because swimming times will be mainly clustered around the average
with few people at extremes or not OK because the swimming club is likely to have people at the
upper tail of the distribution.
7 a 4.58% (3 s.f.)
b 92.5% (3 s.f.)
8 a Rectangular, between −0.5 and 0.5.
b 0.8
c 4x2
d {8x0<x<0.50otherwise
Chapter 1 worked solutions
1 Discrete random variables
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 1A
2
a Using the fact that the summed probability has to be :
b
3
a Using the fact that the summed probability has to be :
Using the fact that
b
:
and
.
So the median is the mean of
Finding the mean of
4
and .
and :
a Using the fact that the summed probability has to be :
b
5
a Using the fact that the summed probability has to be :
b
c
6 Let the discrete random variable
7
represent the number rolled on the dice.
a Using the fact that the summed probability has to be :
Substituting this expression into the formula for the expected mean and using the fact that
:
Solving
for
and :
b
8 Using the formula for the expected mean and using the fact that you are looking for an expected profit
of
:
9
a
b
c
and
d
So,
.
.
e
10 a Profit £
Probability
b Working out the value of
that would give a zero profit:
He would have to charge at least
11 a Since
per game.
, you know that
.
On the other hand you know that the mode has to satisfy mode
Both have to be equal, so
.
Calculating the values of
and
using the total probability of
Substituting the values of
and
into the formulae for
and the known expected mean of :
and
:
. So the median is the mean of
and .
b i Calculating the probabilities and putting them in a table:
Combined score
Probability
So,
ii Using the table from part b i:
12 a
b
and
Median
c
d
,
. Let the random variable
represent the number of people who borrow no books,
.
i
ii
EXERCISE 1B
3 Using the facts that
and
:
4
5
Solving for
and :
6
7
a Using the fact that the summed probability has to be :
b
c
d
e
8
a Using
:
b
9
The expression for
So,
only gives real values for
.
only for those values of .
10 The more often the coin is tossed, the more likely it becomes to show a head at least once. However,
there is no value of that guarantees a head. Hence, the expectation value is infinite and it is not
possible to define a fair price for this game.
EXERCISE 1C
EXERCISE 1C
2 Expected mean
3 Expected mean
4
a
.
Let
, where
When
.
:
When
:
:
So,
, where
b Using the result that when
.
,
:
Then,
5 Let
be the random variable describing the position number of the broken bulb.
Then
Let
Then
.
be the distance from the plug, in
.
.
You know that
and
.
Using expectation algebra (Key point 1.5):
and
Expected mean
.
, variance
.
6 You can define a new variable,
for
.
7 Assuming equal probability you can use
8 Using
and the definition of
for
and
:
:
Taking only the positive solution:
9 Defining a new uniformly distributed variable,
. For
which is divisible by
10
MIXED PRACTICE 1
1
(Answer
)
2 Using the fact that the summed probability has to be :
(Answer
3
a
b
4
:
a
b
)
.
c
has the highest probability of
d
So,
.
.
e
and
. So the median is the mean of
and .
Median
5
a
b
.
c
6
.
a
So,
.
b Using the fact that
:
Using the fact that the summed probability has to be :
Solving
for
:
c
7
8
Substituting
and
into
to find the corresponding values for :
When
9
and when
a Using the fact that
:
Using the fact that the summed probability has to be :
Solving
for
b
10 a
b
c
.
So,
d
11 a
b
c
12
.
:
13 a Using the general relation
and the fact that the summed probability must be :
b Using the fact that
and substituting the value for
c Using the general relation
:
from part a:
d
14
15 a Expected mean
Standard deviation
b Using the fact that there are
likely:
possible positions of the Queen of Spades, each of which is equally
Expected number of points scored
16 a
b i Using the fact that the summed probability has to be :
Using the fact that
Solving
ii
iii
for
:
:
So the standard deviation of
is
17 a i Using the fact that the summed probability has to be :
ii
iii
iv
b
Chapter 2 Worked solutions
2 Poisson distribution
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 2A
4 Let
5
6
be the number of shooting stars observed in one hour,
:
a Let
be the number of white blood cells shown in a single high power field,
b Let
be the number of white blood cells shown in six high power fields,
a Let
be the number of flaws per metre of the wire,
b Let
be the number of flaws in
7
metres of the wire,
:
:
:
:
.
a
b
c
d Using the formula for conditional probability and your answers from parts a and b:
8 Let
a
b
9
a
be the number of eagles observed on a given day,
:
b
10
, so
Tip
Use technology or trial-and-improvement methods to find
11 a Let
be the number of emails per day,
Let
:
be the number of emails per seven-day week,
:
b
c There are alternative ways to get
12 a Let
emails in a week other than
be the number of errors in a piece of homework,
b
and
every day.
:
, with further probabilities decreasing.
So the most likely number of errors is .
c
Let
be the number of students with at least one error:
Then
13 Let
and
be the number of requests per day,
:
a
b If there are
or more requests, then some of them have to be denied.
c If there are no requests, no car is used. If there is a day with only a single request each car will be
used with half probability.
14 Using the fact that
Multiplying through by
:
and rearranging:
Taking only the positive solution:
15 a
b You have to ensure that the prefactor you found in part a is .
Taking only the positive solution:
EXERCISE 2B
3
is the number of alpha particles emitted in a millisecond.
Assuming
is true:
.
Using technology:
This is a two-tailed test, so comparing this to half of the significance level:
, so reject
at the
significance level.
There is significant evidence that the sample emits a different number of alpha particles.
Tip
Alternatively you could compare the
significance level
4
directly to the
.
a
Assuming
is true:
.
Calculating the -value and comparing to the significance level:
Do not reject
at the
significance level.
There is not significant evidence to suggest that the average number of cars travelling past the
traffic light is lower than
per minute.
b The rate might not be constant and the cars might not be independent. If one car drives slowly, it
might slow down the cars behind it.
5
a A Poisson distribution has equal mean and variance. Here they have approximately the same value.
b
Assuming
is true:
.
Calculating the -value and comparing to the significance level:
Do not reject
at the
significance level. There is not significant evidence that there has been a
reduction in the average number of accidents.
6
a The sample mean is and the unbiased estimate of the population variance is
deviation of approximately
.
, giving a standard
b A Poisson distribution has equal mean and variance. Here, they have approximately the same value.
c
Assuming
is true:
.
Calculating the -value and comparing to the significance level:
Reject
lower.
7
at the
significance level. There is significant evidence that the number of mistakes is
a You need a constant rate over the day and the bees have to arrive independently.
b Let
represent the number of bees arriving in
Assuming
is true:
minutes.
.
Calculating the -value and comparing to the significance level:
Reject
at the
increased.
8
significance level. There is significant evidence that the number of bees has
represents the number of leaks in
Assuming
is true:
of pipe.
.
Using technology:
.
This is a two-tailed test, so comparing this to half of the significance level:
so do not reject
at the
significance level.
There is not significant evidence of a change in the mean number of leaks.
Tip
Alternatively you could compare the
significance level
.
9
a
directly to the
represents the number of earthquakes in one year.
Assuming
is true:
.
Calculating the -value and comparing to the significance level:
Do not reject
at the
significance level. There is not significant evidence that the number of
earthquakes has increased.
b
represents the number of earthquakes in two years.
Assuming that
is true:
.
Calculating the -value and comparing to the significance level:
Reject
at the
has increased.
10 If
is true, then
significance level. There is significant evidence that the number of earthquakes
.
So,
So, for
to be rejected at the
The smallest possible value is
significance level:
.
MIXED PRACTICE 2
1 Let
be the number of complaints in a
hour shift,
(Answer C)
2
:
(Answer C)
:
3
a Assuming that thrushes and robins visit the table independently of each other:
b
c
4
a The events have to be independent and the rate of success has to be constant.
b
c
represents the number of burgers sold per hour.
Assuming that
is true:
.
Calculating the -value and comparing to the significance level:
Reject
at the
significance level. There is significant evidence that the average rate of burgers
ordered has increased from
5
a
b
6
.
:
a
b
7
a Mean
Unbiased estimate of population variance
Standard deviation
Tip
You learnt how to calculate an unbiased estimate of variance in A Level Mathematics
Student Book 1, Chapter 22. You should also be able to use your calculator to find the
unbiased estimate of the population variance.
b A Poisson distribution has equal mean and variance. Here they have approximately the same value.
c Let
represent the number of power surges per
days.
Assuming
is true:
.
Using technology:
.
This is a two-tailed test, so comparing this to half of the significance level:
, so do not reject
at the
significance level.
There is not significant evidence of a change in the average rate of power surges.
Tip
Alternatively you could compare the
directly to the
significance level
8
a
b
Let
9
be the number of days in a five-day week on which there are more than
.
a
b Using the result from part a:
Let
represent the number of years with fewer than seven rainy days.
10 a
So
b
11 a
So
b
c
Using the fact that
:
So
12 a Let
represent the number of eruptions in one hour:
calls. So
b Let
represent the number of eruptions in one day:
c Let
represent the number of eruptions in
There are
d Let
minutes.
half hours in a day.
represent the number of eruptions in one hour:
For the first eruption to occur between
by an hour with at least
and
e Over
days with an estimate of
f Let
represent the number of eruptions in one hour:
g Let
represent the number of eruptions in one hour:
13 a Let
b Let
Since
there are
hours with no eruptions followed
eruption:
eruptions per day, each producing
represent the number of rainstorms in a four-week period:
represent the number of rainstorms in
is an integer, the value of
complete weeks:
has to be at least .
litres of water:
14 a Let
b Let
represent the number of patients that arrive in
minutes:
represent the number of patients that arrive in one hour:
15
Assuming that
is true:
.
Calculating the -value and comparing to the significance level:
Do not reject
at the
significance level. There is not significant evidence that the average rate has
decreased.
16 a The average rate must be constant. However, you might expect it to vary over different times of the
day and with different weather conditions. Birds must arrive independently, but they might come in
flocks.
b
:
c
;
d Let
represent the number of birds arriving per hour.
Assuming
is true:
.
Using technology:
.
This is a two-tailed test, so comparing this to half of the significance level:
, so do not reject
at the
significance level.
There is no significant evidence of a change in the number of birds arriving each hour.
Tip
Alternatively you could compare the
directly to the
significance level
17
represents the number of leaks in a
section of pipe.
Assuming that
is true:
.
Calculating the -value and comparing to the significance level:
Do not reject
has increased.
at the
18 a Using
significance level. There is not significant evidence that the number of leaks
:
b Number requested
or more
Number sold
Probability
Probability
Most probable number sold in a week is .
c Let the number sold in a week be denoted by
.
d Let the smallest number of copies be .
Tip
Note that
is a constant.
, so
Summing up to
19 a i
ii
b i
ii
gives the first value
, so at least
copies should be ordered.
c
20 Exclude the probability of
21 a Let
. Then scale the old mean
by the new probability.
represent the number of bicycles brought in to be serviced per day.
.
i
ii
iii
b Let
represent the number of bicycles brought in to be serviced in one week ( days).
.
22 a Let
represent the number of coins found per
.
.
i
ii
b
c Let
Let
represent the number of coins found per
represent the number of brooches found per
.
.
i
ii The coins buried in a hoard are no longer independent. The Poisson assumption requires
independence, so brooches are more likely to be modelled by a Poisson distribution.
Worked solutions
3 Chi-squared tests
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 3A
2
a
: Physics grade and Mathematics course are independent,
: Physics grade and Mathematics course are dependent.
b
or lower
or
or
Further Maths
Maths AS or A
No Maths
c
d
(critical value from the table at the
significance level)
Reject
. There is significant evidence of association. Students studying a higher level of
Mathematics tend to do better in Physics.
3
: Reading level and fiction/non-fiction are independent.
: Reading level and fiction/non-fiction are dependent.
Observed values:
Elementary
Moderate
Advanced
Fiction
Non-Fiction
Expected values:
Elementary
Moderate
Advanced
Fiction
Non-Fiction
and
.
The critical value from the table at the
level of significance is
.
Reject
4
. There is significant evidence that the reading level and fiction/non-fiction are dependent.
a
Early
On Time
Late
Total
On Time
Late
Total
Walk
Car
Other
Total
b Expected values:
Early
Walk
Car
Other
Total
c
lateness and mode of transport are independent,
lateness and mode of transport are dependent.
Critical value from the table at the
significance level
critical value.
Reject
. Significant evidence that lateness depends on the mode of transport.
d He must assume that his data are representative. For example: it was not a day with unusual traffic.
He must also assume that the respondents were independent. For example: not lots of students from
the same bus.
5 The expected frequencies are given by:
number of visits and money spent are independent,
number of visits and money spent are dependent.
(critical value from the table at the
Reject
.
There is significant evidence of an association.
People who visit more often tend to spend more money on each visit.
level of significance).
6 The expected frequencies are given by:
No drug taken
Single dose
Double dose
days
days
days
days
recovery speed and dose of drug are independent,
recovery speed and dose of drug are dependent.
(critical value from the table at the
Do not reject
level of significance).
.
There is no significant dependency.
Hence, the drug does not appear to be effective in increasing the speed of recovery.
7
a
Male
Female
£
£
£
£
£
b
gender and salary are independent,
gender and salary are dependent.
(critical value from the table at the
significance).
Do not reject
.
There is no significant evidence of dependency between salary and gender.
8
a Using the fact that the sum over all
b
, since
a For both genders,
b
:
has to be positive.
c The critical value here is
9
is equal to the sum over all
. Hence find significance for
.
of the corresponding population voted for any party.
Gender and voting intention are independent.
level of
Gender and voting intention are not independent.
If
is true, half of the voters for each party will be males and half will be females.
Degrees of freedom,
is
.
. Critical value from the table at the
Observed
Male
level of significance
Expected
Female
Male
Female
, so do not reject
and conclude that, for a sample of size
at the
significant evidence to suggest that gender and voting intention are independent.
c Let the smallest sample size required be
times the sample of size
level, there is not
.
, so
Sample must be greater than
, so the smallest is
.
d There will be random variation within the sample.
10 a
b
11 a The square causes larger deviations to have a stronger effect and ensures
is positive.
b You divide by the expected value to prevent large groups contributing disproportionately and
overwhelming the contributions of smaller groups.
EXERCISE 3B
2
appearance and colour are independent,
appearance and colour are dependent.
The expected frequencies are given by:
Wrinkled
Yellow
Green
so use Yates’ correction.
Round
(critical value from the table at the
Do not reject
3
level of significance)
. There is no significant evidence of association at the
significance level.
The colleague's enjoyment is independent of whether milk or tea is added first.
The colleague's enjoyment depends on whether milk or tea is added first.
If
is true, then half the drinks that the colleague likes had tea added first, and half had milk added
first.
Observed
Tea first
Expected
Milk first
Tea first
Likes
Likes
Dislikes
Dislikes
Degrees of freedom,
Milk first
, so you must apply Yates' correction.
(critical value from the table at the
significance level), so do not reject
at the
level. There is evidence to suggest that the colleague's enjoyment is independent of whether milk or
tea is added first.
4
The number of books does not differ between rural and urban libraries.
The number of books differs between rural and urban libraries.
Combine last two columns due to an expected frequency being
Observed
.
Expected
Rural
Rural
Urban
Urban
Degrees of freedom,
, so you must apply Yates' correction.
(critical value from the table at the
significance level), so do not reject
at the
level. There is evidence to suggest that the number of books does not differ significantly between rural
and urban libraries.
5
There is no association between the amount spent on horror films and the number of murders each
year.
There is an association between the amount spent on horror films and the number of murders each
year.
Combine last two columns due to an expected being frequency
Observed
.
Expected
Degrees of freedom,
, so you must apply Yates' correction.
.
(critical value from the table at the
significance level), so reject
. At the
level,
there is significant evidence of an association between the amount spent on horror films and the
number of murders each year. However, this does not establish causality (it does not show that high
spending on horror films causes a high number of murders).
6
a
Acceptance patterns are independent of gender (no association).
Acceptance patterns are dependent on gender (association).
Observed
Accepted
Expected
Rejected
Accepted
Male
Male
Female
Female
Degrees of freedom,
, so you must apply Yates' correction.
(critical value from the table at the
b
Rejected
significance level), so reject
. There is
significant evidence at the
gender.
level to suggest that acceptance at this university is dependent on
The acceptance rates are
for males and
for females, which appears to be evidence of bias.
Acceptance patterns do not vary in different departments.
Acceptance patterns vary in different departments.
Expected
Dept.
Men
Women
Admitted
Rejected
Admitted
Total
Rejected
Total
Degrees of freedom,
.
(critical value from the table at the
significance level), so reject
. There is
significant evidence at the
level to suggest that acceptance patterns depend on department.
The proportion of men admitted is higher than the proportion of women admitted in two of the six
departments ( and ).
Dept
% Men admitted
% Women admitted
7 You cannot do a calculation based on two factors being dependent unless you know exactly what that
dependency is.
MIXED PRACTICE 3
1 More information is needed. (Answer D)
2 Require
to not be rejected in a test at the
level.
and the critical value from the table is
.
(Answer B)
3
loan outcome and recipient type are independent,
loan outcome and recipient type are dependent.
The expected frequencies are:
Recipient
Individual
Outcome
Satisfactory
Bad debt
Small
business
Large
business
(critical value from the table at the
Do not reject
significance level).
.
There is no significant evidence of an association.
4
a The expected frequencies are:
Eye colour
Blue
Green
Brown
Brown
Hair colour
Blonde
b
c
hair colour and eye colour are independent,
hair colour and eye colour are dependent.
Do not reject
with
(critical value from the table at the
significance level).
There is no significant evidence that hair colour and eye colour are dependent.
5 Expected frequencies:
Spring
Summer
Autumn
Winter
Boy
Girl
gender and time of year are independent,
gender and time of year are dependent.
(critical value from the table at the
Do not reject
significance level).
. There is no evidence of any association between the gender of the baby and the time
of year.
6 The expected frequencies are:
Using the Yates’ correction:
(Answer A)
7
a Combining the last two columns since otherwise the expected frequencies would be smaller than :
Semi-detached and
Flat
Terraced
detached
Sold within three months
Sold in more than three
months
type of property and time taken to sell it are independent,
type of property and time taken to sell it are dependent.
(critical value from the table at the
Reject
significance level)
. There is significant evidence of an association.
b i The larger total number of properties could make it easier to sell.
ii The large difference between observed and expected frequencies together with the small
expected frequency gives a large contribution to
8
a
.
There is no association between class of degree and A-level grade.
There is an association between class of degree and A-level grade.
Combine degree classes
and
due to an expected frequency being
.
Class of degree – Expected
or
Total
A-level grade
Total
Degrees of freedom,
.
(critical value from the table at the
significance level), so reject
evidence at the
level to suggest that Fiona's belief is justified.
b They obtained more Class
9
a i
degrees, but fewer Class
. There is some
degrees than expected.
The type of sideswipe accident is independent of where the
was registered.
The type of sideswipe accident is dependent on where the
was registered.
Type of sideswipe accident – Expected
Changing
lane to the
left
British
reg.
HGV
Foreign
reg.
HGV
Total
Changing
lane to the
right
Overtaking
moving
vehicle
Total
Degrees of freedom,
.
(critical value from the table at the
significance level), so reject
. There is
significant evidence at the
level to suggest that the type of sideswipe accident is dependent on
whether the
involved was British registered or foreign registered.
ii Accidents involving changing lane to the left are less likely and accidents involving changing lane
to the right are more likely than expected for foreign registered
b i
Observed
Prosecution
ii
.
Expected
No
prosecution
Prosecution
No
prosecution
Prosecutions result independently of the age of the driver.
Prosecutions resulting are dependent on the age of the driver.
Degrees of freedom,
, so you must apply Yates' correction.
(critical value from the table at the
significance level), so do not reject
. There
is significant evidence at the
level to suggest that prosecutions resulted independently of the
age of the driver. There is no significant evidence that the driver's age had an influence on
whether or not a prosecution resulted.
10
There is no association between the age of staff and the department they work in.
There is an association between the age of staff and the department they work in.
Combine the columns for staff aged
and
due to an expected frequency being
Expected
Total
Accounts
Personnel
Marketing
Comms.
Total
Degrees of freedom,
.
.
(critical value from the table at the
evidence at the
significance level), so reject
. There is significant
level of an association between the age of staff and the department they work in.
11 a A sixth of businesses do not repay their loans while a fifth of all mortgages and personal loans are
defaulted.
b The frequencies he would observe are:
Repaid
Defaulted
Personal
Mortgage
Business
type of loan and whether it is repaid are independent,
type of loan and whether it is repaid are dependent.
(critical value from the table at the
so do not reject
c For a sample of
significance level),
. There is no significant evidence of dependence.
loans:
Observed frequencies:
Repaid
Defaulted
Total
Repaid
Defaulted
Total
Personal
Mortgage
Business
Total
Expected frequencies:
Personal
Mortgage
Business
Total
To show dependence at the
significance level).
significance level,
So, the smallest whole number is
(critical value from the table at the
.
d The sample follows the long-term trend.
12 a
Development of Type
consumption.
diabetes is independent of the average level of weekly alcohol
Development of Type
consumption.
diabetes is dependent on the average level of weekly alcohol
Degrees of freedom,
.
Expected
Type diabetes
developed
Yes
No
Total
Less than
Between
Average level of weekly alcohol
consumption
and
More than
Total
(critical value from the table at the
significance level), so reject
. There is
significant evidence at the
level to suggest that development of Type diabetes is dependent on
the average level of weekly alcohol consumption, i.e. that there is an association.
b The reviewer’s statement is most likely based on these proportions:
Consumption
Diabetes developed
It appears that the advice given might lower the risk for people in the '
risk for people in the '
' group.
' group, but increase the
However, these proportions apply to this particular group of people, and there is no evidence to
suggest that the group is representative of the whole population, for whom the proportions might be
very different.
c i The number of rows and columns in the contingency table, the number of degrees of freedom and
the significance level of the test would not change, so the critical value would remain at
ii Compare
with
.
.
Each of the terms that are summed to find the test statistic,
would increase by the same factor from
to
.
, would increase by a factor of
, so
iii The conclusion is the same because the test statistic is still greater than the critical value, but the
evidence to support that conclusion would be stronger than before.
Chapter 4 Worked solutions
4 Continuous distributions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 4A
4
a Using the fact that the total probability has to be :
b Substituting the value of
from part a:
5 Setting the total probability equal to , and using the lower limit
6
7 Obtaining the CDF by integrating the PDF:
Using either
:
CDF is
Let the lower and upper quartiles be
and
8 You need to use the fact that
a Using the fact that
, respectively.
in both parts a and b.
:
and the upper limit
, to find :
b Using the fact that
9 The graph of the PDF is:
Using the fact that the probability of two independent observations of
Using the fact that the area under the graph must be equal to
both being below
and substituting
is
:
:
Area under graph
10 Using the fact that the total probability has to be
the sinh function or by substituting
:
Substituting the value of
and solving the exponential equation either by using
into the formula for
11 Using the fact that the total probability has to be :
Making the substitution
and solving for :
:
Since
can only take positive values, and so
.
has to be positive, giving only one possible solution:
EXERCISE 4B
3
a Using
:
b Using the formula for
4
5
and substituting
from part a:
a Using the fact that the total probability has to be :
b Using
and substituting the value of
a
and
is defined between
and that
. To show that
from part a:
is a PDF, you need to show that
, for all
.
for
and
for every .
satisfies the conditions to be a PDF.
b Using
6
:
a Using the fact that the total probability has to be :
b Using
, substituting in the value of
:
7 Using the formula,
:
from part a and using the fact that
Since
is a PDF:
so
Using integration by parts:
Set
Then
and
and
So
Evaluating the expression in the squared brackets and using
EXERCISE 4C
:
EXERCISE 4C
4
a Expected value
b Standard deviation
£
£
£
£
£
5
Using the fact that
6
:
a
b
7
a
b
8
a Using integration by parts to find
b
c
9
a
b Using your result from part a:
EXERCISE 4D
:
EXERCISE 4D
2
a
b
c
d
3
a
b
4 Let
represent the mass of a woman,
of the women, men and the lift:
represent the mass of a man and
5 Let
represent the outcome of the roll and let
6 Let
represent one student's exam scores, and let
represent the total mass
represent the difference between the two scores:
represent the difference between two students'
scores:
7 Let
represent Pamela's journey time, let
represent Adrian's journey time and let
difference in total weekly journey times:
8
is twice the mass of a single gerbil.
is the sum of the masses of two different gerbils.
EXERCISE 4E
2
a
The distribution is
.
represent the
b Using your calculator:
3 Let the times taken by Aaron and by Bashir be represented by
and
, respectively.
a
Standard deviation of
b
c
4
a
Let
represent the mean of a random sample of six rods. Then
.
b Using the distribution from part a,
5
a Let
represent the length of a randomly chosen pipe, then
b
Tip
Alternatively, for the total length of
6
a
pipes:
.
b
c Let the total mass be represented by
.
.
7 Let
represent the mass of a randomly chosen apple, and let
represent the mass of a randomly
chosen pear.
a
b
8 Let the standard deviation of the length of a corn snake be .
For
snakes:
Using the fact that for a randomly selected sample of
corn snakes,
:
So
9 Let
represent the score of a randomly chosen boy and let
represent the score of a randomly chosen
girl.
a
If the marks differ by less than , then
is between
and .
Tip
Alternatively, you could use
b Taking
as your random variable.)
as your random variable:
Mean of
Tip
Alternatively, you could use
10 a Daily rainfall
as your random variable.)
. Using the fact that
, so
:
, so
This gives
Weekly rainfall
.
Using the fact that in a randomly chosen -day week, there is a probability of
rainfall is less than
:
that the mean daily
This gives
Solving
and
:
and
b Assumes that the rainfall each day is independent of the rainfall on other days, which is unlikely to
be the case.
11 a
b i
Tip
Alternatively, if
, then mean
.
Mean wait for
days
ii On any day,
Let the number of mornings per week that she waits more than
minutes be
:
Tip
You can use a probability found from a normal distribution as the parameter
in a
binomial distribution.
iii
c Average for a week is more than
minutes if she waits more than
d Average waiting time is
.
EXERCISE 4F
3 Using the fact that for the
4
percentile,
:
a
b
c Using the fact that, for the median,
EXERCISE 4G
,
:
minutes on the last day.
EXERCISE 4G
1
a
b Using the fact that the lines have gradients of
c Using the fact that
From
2
:
:
, area is
a To find the median, you need to solve
.
From the CDF or from its graph:
, so the median is .
Tip
Alternatively, looking at the PDF and its graph:
b Integrating
Integrating
to find the mean,
:
and then subtracting the square of the mean to find
:
3
a
Using the fact that the area under the graph of the PDF is equal to :
b
c Using the fact that the area to the right of the upper quartile
This simplifies to
and is satisfied by
is
:
, which is the value of
Tip
You can use technology to find
value of
d The CDF,
First section:
At
Second section:
At
4
a
or, if working manually, test for a sign change in the
; start by evaluating
and
.
Using the fact that the area under the graph of the PDF is equal to :
Simplifying:
b
c
d Considering the areas under the two sections of the graph of the PDF:
Area
, so the median
Using area
is between
and .
:
Simplifying:
5
a You must show that
is never negative and that the area under the graph is .
The function
for all real , because
b Using integration by parts to find
:
and
Integrating by parts:
and
and
and
for all real
.
and using the variance formula
Tip
A sketch graph of the PDF would tell you to expect that
Integrating by parts twice:
and
So,
6
a
Using the fact that the area under the graph of the PDF is :
, giving
b
Substituting
from part a:
c The CDF is
First section:
, so
Substituting
, giving
:
Second section:
, giving
Substituting
:
CDF is
d By substitution:
, and
Let the median be
, which tell you that
. You know that
and that
contains the median.
.
e By substitution:
, which tell you that
Let the lower quartile be .
You know that
and that
.
, giving
7
a
b
Using the fact that the area under the graph of the PDF is equal to :
, so
c
contains the lower quartile.
So,
d
Using integration by parts for both functions:
and
EXERCISE 4H
EXERCISE 4H
3
a Let
represent the length of the piece.
b Expected mean
Standard deviation
4
5
6
a
b
7 Mean of
Standard deviation of
EXERCISE 4I
3 Let
represent the number of emails received in
minutes.
4 Let
represent the number of birds that arrive in ten minutes.
a
b
c The average rate is
bird every
minutes.
5 Let
represent the number of people he meets that he knows in
6 Let
represent the number of buses arriving in
minutes:
minutes.
Let
represent the number of buses arriving in
minutes:
7 Assuming is the average number of calls per minute, you can find an expression for the average
waiting time per call, :
8
9
a The waiting time per bus follows an exponential distribution with mean
Hence
.
b
c
10 Using integration by parts to find
:
and
and using the variance formula
and
Integrating by parts:
Integrating by parts twice:
So
, and
11 a Poisson distribution with mean
:
.
b
c The expression
describes the probability of no success in units of time, so it is the
probability of having to wait at least units of time for a success, which is described by
d From parts b and c:
The expression
, so
represents the CDF of the random variable
:
.
Now,
The PDF is
EXERCISE 4J
2
a Using the fact that the total probability has to be 1 and the fact that
:
So
Hence
b Substituting the value for
from part a:
c Splitting the calculation of
into an integral over the continuous part of the variable and then
adding the value for the discrete part:
d Splitting the caclulation of
into an integral over the continuous part of the variable and then
adding the value for the discrete part:
3
a Using the fact that the total probability has to be
So
Hence
and the fact that
:
:
.
b Splitting the calculation of
into an integral over the continuous part of the variable and then
adding the value for the discrete part, then substituting in the value of c from part a:
Substituting
4
into the calculation for
:
a Using the fact that if you add the cumulative probability for the continuous interval to the probability
for the
integer values you have to obtain a total probability of :
b For
5
Splitting the calculation of
over the discrete part:
into an integral over the continuous part of the variable and a sum
Splitting the calculation of
into an integral over the continuous part of the variable and a sum
a
b
over the discrete part:
c
d Splitting the calculation of
into an integral over the continuous part of the variable and a sum
over the discrete part:
6
a Using the fact that
and that
has to have equal values at
b
7 Using the fact that you have to obtain a total probability of :
and
due to continuity:
Since you know that
8
has to be positive, only
is a valid solution.
a Constructing a probability table:
Using the fact that the total probability has to equal :
b For
Splitting the calculation of
into an integral over the continuous part of the variable and a sum
over the discrete part:
c
d Using the fact that the total probability has to equal :
e
Rounding causes a slight underestimate of the true mean time.
MIXED PRACTICE 4
1
(Answer C)
2 Standard deviation of
(Answer B)
3
a Using the fact that the total probability has to equal :
b
c
4
5
a Using the fact that the total probability has to equal :
b Using
and substituting
c Using
:
a Standard deviation of
b
standard deviation of
£
c
£
£
£
Standard deviation
6
from part a:
£
a Using the fact that the total probability has to equal :
Using the fact that
:
b i
ii Using
7
and substituting the values of
a Using the fact that the total probability has to equal :
b For the median,
:
and
from part a:
8 Since the PDF is symmetric about
9
:
and
:
a Using the fact that the total probability has to equal :
b Substituting
into the formula for
:
c
d
10 a Using the fact that there are no recorded masses below
, so
:
So
b Using the fact that there are no recorded masses over
So
c For
d
11 a
, so
:
b
c Probably not. Especially in the situation in part b it is likely that when Alice is finished Hassan might
try to speed up.
d
Measures for
over
standard deviation
days
For
:
So, the expectation will be greater than the standard deviation for
12 Let the mean time taken per question for the
questions be
days.
minutes.
Mean of
Variance of
Markus will fail to complete the test if
.
Tip
Alternatively, let the mean time taken to answer all
questions be
minutes.
Mean of
Variance of
Note that when values are taken from the same normal distribution with variance
these values have variance
, not
He will fail to complete the test if
13 Let the mean mass of a man in a group of
Mean of
.
.
be
.
,
of
The random variable
.
The total mass of the
men exceeds
when
.
Alternatively, let the total mass of a group of
men be
Tip
Mean of
.
;
The random variable
14 Let the mean mass of
.
purple beads be
, and let the mass of
Mean of
The random variable
.
15 a Mean of
Variance of
b The random variable
.
16 a
b Let the median be
, then
.
From the graph:
The median,
or
c The PDF,
over each of the four section intervals.
CDF
PDF
yellow beads be
.
Interval
Tip
Only one statement is needed in the PDF for the intervals
and
.
d
e Let the mean of
values of
be
, then
and
Tip
Alternatively, you can find the probability that
values have a sum greater than
.
Let the sum of
values be .
;
17 a Let the mean rate of observations per
Mean rate of observations per
(no particles observed in
hour be .
hours is, therefore,
.
hours)
Using the fact that the probability of observing no particles in
hours is
:
Expected waiting time
b CDF is
hours
, where
is the waiting time in hours and, from part a,
.
c Using the memoryless property of the exponential distribution:
, where
and
both represent
for
, and
18 a The PDF of the distribution is
hours.
otherwise.
b
The CDF is
c Let the two independent observations of
be
and
.
You need to find the probability that both observations are less than , where each of these events
has probability
.
19 a Using the fact that the total probability must be equal to :
b Finding
by integrating
discrete part:
over the continuous part and summing
over the
c Let the median value of
be
, then
.
This gives
Tip
Alternatively, you could find m using
.
20
21
Using the fact that the median of
Using the fact that
is
:
:
22 a Let the marks in English and in Mathematics be represented by
The random variable
and by
.
b Let the average marks of the class in English and in Mathematics be
Mean of
The random variable
, respectively.
.
and
, respectively.
Tip
Alternatively, you could find the probability that the sum of the marks is higher in English
than in Mathematics
The random variable
.
.
23 a Using the fact that the total probability has to equal :
b
c
d Six months is half a year.
e i
ii
24 a
b
c
d i
ii
e Solving the equation from part d ii:
25 a
b i
ii
c i Using the results from part b:
ii You know that
.
d Substituting the value for
from part c ii:
Worked solutions
5 Further hypothesis testing
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 5A
2
a
b
Use the table to find the critical value. When
column.
and
for a two-tailed test you look at the
Comparing your calculated -score with the critical value:
Reject
at the
level. There is significant evidence that John's computer does not take
seconds to start on average.
c Assume that the times are normally distributed. True variance is unknown.
3
a
b
Use the table to find the critical value. When
at the
column.
and
for a one-tailed test you look
Comparing your calculated -score with the critical value:
Do not reject
average.
4
. There is no significant evidence that the packets contain more than
on
.
Use the table to find the critical value. When
at the
column.
Comparing your calculated -score with the critical value:
and
for a one-tailed test you look
Reject
5
. There is significant evidence that babies in the nursery crawl earlier.
a
b
Use the table to find the critical value. When
the
column.
and
for a one-tailed test you look at
Comparing your calculated -score with the critical value:
Do not reject
to boil.
6
. There is no significant evidence that cleaning the kettle decreases the time it takes
a
b
c
Use the table to find the critical value. When
look at the
column.
and
for a one-tailed test you
Comparing your calculated -score with the critical value:
Reject
. There is significant evidence that the athletes of the club do, indeed, run faster.
d Assume that the times are normally distributed.
7
a
b
Use the table to find the critical value. When
at the
column.
and
for a two-tailed test you look
Comparing your calculated -score with the critical value:
Do not reject
c i
. There is no significant evidence that the mean length of the bananas is different.
ii
Use the table to find the critical value. When
look at the
column.
and
for a one-tailed test you
Comparing your calculated -score with the critical value:
Reject
8
. There is significant evidence that the mean length of the bananas is less than
.
a
b Using
:
So, calculating the values of
for different
(since two-tailed test at
, and comparing to the critical values from the
level) column of the table:
Critical value
The first value of
level for all
that falls into the critical region is
so Aki will reject the null hypothesis at the
.
EXERCISE 5B
7 This decreases the risk of a type II error but increases the risk of a type I error.
8
a
: coin is fair,
: coin is biased.
b Type I error. This is an example of rejecting
9
when it is in fact true.
a Claiming that there is correlation when none really exists.
b Not recognising correlation when there is underlying correlation.
10 a i Claiming that the dice is biased when it is not.
ii Claiming that the dice is not biased when it is.
b For example: roll the dice more times, look for more than
sixes, consider other numbers, do a chi-
squared test.
11 a
b
The significance level is
, which is very small and requires extreme evidence before change is
found. This does not seem to be required in this situation.
12 a
b
c
13 a
b
and
, so do not reject
.
There is no evidence to suggest that the mean mass of the eggs is greater than
.
c
d Let the smallest average mass necessary be
.
, so the smallest average mass would be
e State the alternative mean value
14 a
b
So the significance level is
c You need to find the probability that
.
d
is accepted, given that the alternative value for
at a stationary point.
is true, i.e.
Solutions are
and
.
nd derivative:
, which is negative at
and positive at
Therefore, the probability of a type II error is maximised when
15 a i Critical -values for a two-tailed test at
are
ii Critical -values for a two-tailed test at
are
.
.
.
.
b i
ii
16
There is a special case when
, where you simply cannot make a type II error. Here the power is .
In all other cases, the power of the test is
.
MIXED PRACTICE 5
1 Using
¸
2 Using the definitions, answer
3
a
, and the estimate of
is correct. (Answer A)
:
b
c Use the table to find the critical value. When
at the
column.
and
for a two-tailed test you look
Comparing your calculated -score with the critical value:
Do not reject
differs from
. There is no significant evidence that the volume of
.
produced in a reaction
d Assumes that the data are drawn from a normal distribution.
4
a Not rejecting
when it is actually false.
b
c
d Increase the sample size, i.e. make more observations.
5
a
b For a
second period, the expected number of beta particles is
.
The significance level is
c If
6
per second are expected, then
per
seconds.
a
b
c
d Use the table to find the critical value. When
at the
column.
Comparing your calculated -score with the critical value:
and
for a one-tailed test you look
Reject
7
. There is significant evidence that the average salary is less than £
.
a
b
Significance level of the test is
c
8
a Josh has taken Safeerah's own body mass into account.
b
Use the table to find the critical value. When
look at the
column.
and
for a one-tailed test you
Comparing your calculated -score with the critical value:
Do not reject
9
a
. There is not enough evidence for a reduced journey time.
.
Sample mean,
Sample
Test statistic,
Critical values for a two-tailed test at
are
.
lies within the acceptance region, so do not reject
There is no significant evidence at the
.
b
.
Sample mean,
Sample
.
significance level to doubt that the current mean depth is
Test statistic,
Critical values for a two-tailed test at
are
.
lies outside the acceptance region, so reject
There is significant evidence at the
.
significance level that the current mean depth is not
.
c i Neither;
ii The value of
decreases as
increases, so the
, gets narrower as
Differences from
confidence interval, which is
increases.
are easier to detect in a narrower interval, so a sample size of
gives
smaller risk of making a type II error.
10 a
: method of encouragement and outcome are independent
: method of encouragement and outcome are dependent.
Calculating the expected frequencies:
Applied for grant
Did not apply
Letter
Phone
Using Yates’ correction:
(the critical value from the table at the
significance level)
Reject
. There is sufficient evidence of an association between method of receiving information and
outcome.
b A type I error:
was rejected despite it being correct.
Chapter 6 worked solutions
6 Confidence intervals
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
EXERCISE 6A
4
a
b
5
lies within the confidence interval found in part a. It is plausible that the true oxygen level is
.
a
b Yes, there is sufficient evidence that the true mean is below
the confidence interval found in part a.
since
is above the upper bound of
6
7
a
b
8
a
lies within the confidence interval found in part a. No significant evidence of a difference in the
mean wage.
,
The confidence level is
.
b No. This is a confidence interval for the population mean, not sample means.
c
Using the fact that the width of the confidence interval must be less than
since
:
must be an integer.
9
Using the fact that the width of the confidence interval must be less than
The minimum sample size is
, since
:
must be an integer.
10 a
b
,
The confidence level is
.
c No, since the confidence intervals overlap (although it is quite difficult to find the significance level).
11 a Using these unbiased estimates:
Upper bound of CI is
so
The confidence interval is
b In a one-tailed test at the
, so reject
.
.
significance level, a mean of
lies outside the confidence interval
12 a
Calculating the mean of the differences:
b Increase the sample size.
c Yes.
is below the lower bound of the confidence interval found in part a. The data suggests an
increase of at least
at the
level.
13 a False.
b False.
c False.
d True.
e False.
14 The larger the interval, the more confidence you can have that the true mean lies within it, hence the
interval will be larger than the
one.
EXERCISE 6B
2
a Assume heights are normally distributed.
b
c
d Using the table to find the -score when
3
a
b
Using the table:
The confidence level is
.
c No, the confidence interval shows the likely position of the mean, not of the individual member of the
population. The sample is too small for meaningful generalisations.
4
, so unbiased estimate of population standard deviation is:
Unbiased estimate of population mean
and
The confidence level is
5
.
a
b
Using the table:
The confidence level is
.
c Yes.
is below the lower bound of the confidence interval. It could even be claimed that the
lifetime is at least
pages.
6
a For
, the table gives
Variance of the four times is
confidence interval for
, so
is:
i.e.
b i
ii A time of
minutes is within the
confidence interval, so do not reject
at the
significance level. The test provides no significant evidence that the average time taken is more
than
minutes.
7
a
Using the calculation of the mean of the differences:
b
Using the table:
The confidence level is
8
a For
.
, the table gives a -score of
For the two blocks,
and
confidence interval for
, so
is:
i.e.
b As in part a,
, and
For the two blocks,
and
Let the lower bound for the
, so
confidence interval of
be , then:
, so
The value of
is always less than the upper bound of
for all .
.
So the two intervals always overlap.
MIXED PRACTICE 6
1
Since
2
must be an integer, the smallest value of
is
. (Answer D)
a
b
c
lies within the
different mean to
confidence interval. Do not reject
.
. There is not enough evidence to suggest a
3
4
Using the table to find the -score when
:
5
since
6
must be an integer.
a
Using the table to find the -score when
:
b
does not lie in the
confidence interval. Reject
.
There is significant evidence that the results are different.
7
a
A
confidence interval for
is
b If it is true that
, then
, giving
A mean of
8
is not consistent with the
a Sample mean,
and
confidence interval for
Interval is
b
.
Test statistic,
is
confidence interval for
in part a.
Two-tailed test at the
level of significance, so critical value is
is outside the rejection region, so do not reject
results in no change in mass.
9
.
. There is evidence to suggest that the diet
a
b Using the table to find the -score when
10 a i A
:
confidence interval for the population mean has a
probability of including the true
population mean.
So the probability that the interval will not include the value of
ii
(At least one interval will not include )
, or
(No intervals do not include )
Using your results from part a:
Tip
Alternatively, using the cumulative binomial function on your calculator with
b i Number of degrees of freedom
Small sample with
taken from the sample, so using the -distribution:
confidence interval is
Using the table to find the -score when
:
.
So, substituting in the values, the
ii
is beyond the upper limit of the
confidence interval is:
confidence interval.
So, the new programme seems to have been effective and the mean time seems to have
:
decreased.
Worked solutions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
Cross-topic review exercise 1
1 Let P(Y=1)=a and P(Y=4)=b.
Then, using the fact that the probabilities must have a total sum of 1:
a+b=1⇒a=1−b
Using the fact that E(Y)=3:
E(Y)=1×a+4×b=3⇒a=13,b=23
Var(Y)=E(Y2)−(E(Y))2=13×12+23×42−32=2 (Answer B)
2 Z=Φ−1(0.975)=1.96 (3 s.f.)
2×Zσn=2×Z3010=37.2 (3 s.f.) (Answer D)
3
a
b Using the fact that the total area under the graph must be 1:
∫03kx2 dx+9k(4−3)=9k+9k=1⇒k=118
c i ∫03kx2 dx=9k(4−3)=0.5
This means that P(X⩽3)=0.5.
So, the median is 3.
ii Using the fact that for the lower quartile, q , P(X⩽q)=0.25:
∫0qkx2 dx=kq33=0.25
⇒q=0.75k3=2.38 (3 s.f.)
4
a The expected frequencies are:
North Academy
South High School
No Maths
28.8
31.2
Single Maths
42.2
45.8
Further Maths
11.0
12.0
ν=(3−1)(2−1)=2
Comparing the χcalc2 value to the critical value from the table at the 5% significance level:
χcalc2=∑(Oi−Ei)2Ei=31.6 (3 s.f.)>5.991
There is significant evidence of an association.
b Random, representative sample from each school.
c No. Just because there is dependency does not mean that there is causality.
5
a Using the fact that the total summed probabilities must be equal to 1:
∫02px+q dx=[px22+qx]02=2p+2q=1⇒p+q=12
b Using the fact that E(X)=23 and substituting in q=12−p from part a:
∫02px2+qx dx=[px33+qx22]02=8p3+2q=8p3+1−2p=2p3+1=23⇒p=−12,q=1
c Calculating P(X>1) and substituting in the values for p and q from part b:
P(X>1)=∫12px+q dx=[px22+qx]12=1−p2−q=14
d Using the formula for Var(X) and substituting in the values for p and q from part b:
Var(X)=E(X2)−(E(X))2=∫02px3+qx2 dx−(23)2=4p+8q3−49=29
6
a X=C+T
E(X)=E(C)+E(T)=20+100=120 minutesσ(X)=σ2(C)+σ(T)2=52+102=11.2 minutes (3 s.f.)
b X=C+T,Y=200+1060X=200+16X
Substituting in the values from part a:
E(Y)=200+16E(X)=£220σ(Y)=136σ2(X)=16σ(X)=£1.86 (3 s.f.)
7
a Mean=10 065160=62.9 kg (3 s.f.);
Standard deviation=12.3 kg (3 s.f.)
b i Z=Φ−1(0.99)=2.33 (3 s.f.)
Zsn=2.27 (3 s.f.)x¯−Zsn<μ<x¯+Zsn⇒60.6<μ<65.2 (3 s.f.)
ii 61.7 lies within the confidence interval found in part b i. There is reason to doubt the claim.
8
a (X+Y) has a normal distribution with mean 5+3=8 and variance 22+52=29.
(X+Y)~ N(8,29).
9
b P(X⩾15−Y)=P(X+Y⩾15)=1 − P(X+Y<15)=1−Φ(15−829)=1−Φ(1.300)=0.0968
a Let X represent the number of patients during one week requiring treatment after a venomous snake
bite, X~ Po(0.5).
b i
P(X⩽1)=e−0.5+0.51e−0.51!=0.910 (3 s.f.)
ii Let Y represent the number of patients during an 8-week period requiring treatment after a
venomous snake bite, Y~ Po(4).
P(Y⩾5)=1−P(Y⩽4)=1−∑n=044ne−4n!=0.371 (3 s.f.)
iii Let Z represent the number of patients during a 26-week period requiring treatment after a
venomous snake bite, Z~Po(13).
P(10<Z<20)=P(Z<20)−P(Z⩽10)=∑n=01913ne−13n!−∑n=01013ne−13n!=0.706 (3 s.f.)
c Let W represent the number of patients during a 4-week period requiring treatment after a venomous
snake bite, W~Po(2).
P(W>5)=1−P(W⩽5)=1−∑n=052ne−2n!=0.0166 (3 s.f.)P(W>6)=1−P(W⩽6)=1−∑n=062ne
−2n!=0.004 53 (3 s.f.)
The hospital should have 6 doses of anti-venom after the delivery.
10 a
AP
AV
Baseball
275
50
Basketball
475
75
Soccer
350
25
b The expected frequencies are:
AP
AV
Baseball
286
39.0
Basketball
484
66.0
Soccer
330
45.0
ν=(3−1)(2−1)=2
Comparing the χcalc2 value to the critical value from the table at the 1% significance level:
χcalc2=∑(Oi−Ei)2Ei=15.0 (3 s.f.)>9.210
There is significant evidence of an association between coping strategy and the sport involved.
c Soccer officials are far less likely than expected to use an AV coping style. Baseball officials are far
more likely than expected to use an AV coping style.
11 a Using the fact that the summed probabilities must equal 1:
a+b=1⇒b=1−aE(X)=b=1−a
b Var(X)=E(X)2−(E(X))2=b−E(X)2 =1−a−(1−a)2=a−a2
c Var(X)=f(a)=a−a2
Differentiating with respect to a and setting equal to 0 to find the maximum value:
f′(a)=1−2a=0⇒a=12⇒Var(X)=14
12 a X¯=12.7+13.32=13
b Z=Φ−1(0.95)=1.65 (3 s.f.)
Zσn=0.3⇒n=(Zσ0.3)2=636 (3 s.f.)
13 a i Let X represent the number of diamonds found per 2 m3, X~ Po(5):
P(X=2)=522!×e−5=0.0842 (3 s.f.)
ii Let Y represent the number of diamonds found per 1 m3, Y~ Po(2.5):
[P(Y=1)]2 = [2.511!×e−2.5]2=0.0421(3 s.f.)
b i For a one-tailed test at the 10% significance level, and using 1 m3 as your standard unit of volume:
H0: λ=2.5, H1: λ>2.5.
Tip
Alternatively, you could use 2 m3 as our standard unit of volume with
H0: λ=5 and H1: λ>5.
ii Let X represent the number of diamonds found per 2 m3, X~ Po(5):
P(X⩾7)=1− P(X⩽6)=1− ∑n=065ne−5n!=1−0.7621…=0.238(3 s.f.)
P(X⩾7)>10%, so do not reject H0. There is no significant evidence at the 10% level to suggest that
the number of diamonds found is greater than 5 per 2 m3 (or 2.5 per 1 m3 ), i.e. that the new mine
will be economically viable.
iii For a type I error to be made, the true null hypothesis H0: λ=5 per 2 m3 is rejected.
This will occur at the 10% level of significance in cases where the number of diamonds found, r, is
such that P(X>r)<0.10.
First you need to find the least possible integer value of r:
You need to find the least value of r such that P(X>r)<0.10.
The table shows probabilities for P(X>r) for values of r from 5 to 8:
P(X>r)=1−P(X⩽r)=1− e−5∑0r5rr!
Result
P(X>5)=1−P(X⩽5)=0.384 039 349
>10%, so do not reject H0.
P(X>6)=1−P(X⩽6)=0.237 865 41
>10%, so do not reject H0.
P(X>7)=1−P(X⩽7)=0.133 371 678
>10%, so do not reject H0.
P(X>8)=1−P(X⩽8)=0.068 093 639
<10%, so reject H0.
Least value of r for which the true H0 is rejected is r=8.
P(type I error)= P(X>8)=0.0681 or 6.81%.(3 s.f.)
iv To reduce the likelihood of making a type II error. Making such an error would result in a lost
opportunity for the owner, as they would find significant evidence to suggest that the mine is not
economically viable when, in fact, it is.
14 a Phone calls are independent of each other. There is a constant average rate of phone calls.
b For example: The same customer might call back, breaking independence. The rate during office
hours might be different from the rate during the night.
c σ(X)=λ=4.5=2.12 (3 s.f.)
d Let X represent the number of phone calls answered in a 2-hour shift, X~ Po(2×4.5=9).
P(X<10)=∑n=099ne−9n!=0.587 (3 s.f.)
e Let Y represent the number of phone calls answered in an a-hour shift, Y~Po(a×4.5).
P(Y=0)=e−4.5a=0.5⇒a=−ln(0.5)4.5=0.154 hours (3 s.f.)
15 a Using the fact that ∫01f(x) dx=1:
∫01ax−x3 dx=a2−14=1⇒a=52
b E(X)=∫01x f(x) dx=∫01ax2−x4 dx=a3−15
Substituting a=52 from part a:
E(X)=56−15=1930
c E(30X+2)=30E(X)+2=21
d
E(1X)=∫011xf(x) dx=∫01a−x2 dx=a−13
Substituting a=52 from part a:
E(1X)=52−13=136
e Using the fact that, for median m, ∫0mf(x) dx=12:
∫0max−x3 dx=am22−m44=12n=m2⇒n2−5n+2=0⇒n=52±174≈4.56,0.438⇒m=
±0.662,±2.14 (3 s.f.)
Using the fact that 0<m<1:
m=0.662(3 s.f.) is the only possible solution.
16 a Let the number of cans containing less than 330 ml be X, then X~ B(25,0.5).
P(X=12)=(2512)×0.525=0.155(3 s.f.)
b i Rejecting the batch when the mean contents are, in fact, 330 ml.
P(type I error)= P(X⩾20)= ∑n=2025(25n)×0.525=0.00204(3 s.f.)
ii
c i A 95% confidence interval is 328±Z×σn, where Z=Φ−1(0.975)=1.960
328±1.960×425, giving 326.4<μ<329.6(1 d.p.)
ii H0: μ=330 and H1: μ<330
The probability that the mean volume is 330 ml lies outside the 95% confidence interval found in
part c i, so Philip will reject H0. There is evidence to suggest that the cans come from a population
whose contents are, on average, less than 330 ml.
Significance level = P(μ>329.6…)=2.5%
17 a P(X>3)=0.13+0.07+0.15=0.35
b The total probability is the product of the probabilities of each individual event happening multiplied
by two, since it is not specified who is borrowing more/fewer than 3 books.
2×P(X>3)×P(X<3)=2×0.35×0.45=0.315
c E(X)=1×0.19+2×0.26+3×0.20+4×0.13+5×0.07+6×0.15=3.08
Var(X) = 1×0.19+4×0.26+9×0.20+16×0.13 + 25×0.07+36×0.15−3.082 = 2.77 (3 s.f.)
d No probability of 0 books borrowed and no probability of more than 6 books borrowed.
e i 10×E(X)=30.8 pence
ii 10×σ(X)≈10×2.774=16.7 pence (3 s.f.)
Worked solutions
Worked solutions are provided for all levelled practice and discussion questions, as well as Cross-topic
review exercises. They are not provided for black coded practice questions.
Cross-topic review exercise 2
1
a X~exp(3)
b f(x)=3e−3x, so P(1<X<2)=∫123e−3x dx= [−e−3x]12= e−3− e−6=0.0473(3 s.f.)
c If X~exp(λ), then the standard deviation of X is 1λ, which in this case is 13.
2
a Calculating the expected frequencies:
Age when leaving education (years)
Greatest rate of income tax paid
16 or less
17 or 18
19 or more
Zero
29.445
3.9
5.655
Basic
98.905
13.1
18.995
Higher
22.65
3
4.35
Some of the frequencies are less than 5, so merging the 17 or 18 and the 19 or more frequencies:
<17
>17
Zero
29.445
9.555
Basic
98.905
32.095
Higher
22.65
7.35
ν=2×1=2
Comparing the χcalc2 value to the critical value from the table at the 5% significance level:
χcalc2=∑(Oi−Ei)2Ei=7.05 (3 s.f.)>5.991
There is evidence of an association between age when leaving education and greatest rate of income
tax paid.
b This belief is supported at the 5% level of significance.
3
a Using the fact that for this probability density function, f(x), ∫00.1f(x) dx=1:
∫00.1k dx=0.1k=1⇒k=10
b Substituting k=10 into the formula for probability:
P(X>0.03)=∫0.030.110 dx=10×0.07=0.7
c i
ii
E(X)=∫00.1xf(x) dx=∫00.110x dx=[5x2]00.1=0.05=120
E(X2)=∫00.1x2 f(x) dx=∫00.110x2 dx=[103x3]00.1=1300
iii σ(X)=E(X2)−(E(X))2=1300−1202=0.0289 (3 s.f.)
4
a H0: There is no association between students' first time performances and their ages.
H1: There is an association between students' first time performances and their ages.
Expected frequencies:
Pass
Fail
17–18
19.2
28.8
19–30
6.4
9.6
31–39
18
27
40–60
4.4
6.6
The frequency in the Pass category of the 40–60 age group is less than 5, so combining the 31–39
and the 40–60 age groups:
Observed frequencies:
Pass
Fail
17–18
28
20
19–30
2
14
31–60
18
38
Expected frequencies:
Pass
Fail
17–18
19.2
28.8
19–30
6.4
9.6
31–60
22.4
33.6
Calculating the χ2 value:
χcalc2=∑(Oi−Ei)2Ei=(28−19.2)219.2+…+(38−33.6)233.6=13.2 (3 s.f.)
Number of degrees of freedom, ν=(3−1)(2−1)=2.
Comparing the χcalc2 value to the critical value from the table:
χcalc2=13.2>9.210 (critical value from the table at the 1% significance level), so reject H0. There is
significant evidence at the 1% significance level to support Julie's belief.
b More students than expected in the age group 17−18 pass their test first time.
5
σ(Y)=n2−112=1.12 (3 s.f.)
a
b i Ε(Z)=6+2Ε(X)=6+n+1=11
ii Var(Z)=22Var(X)=n2−13=5
6
a X~ U(n) for n=1,2,3,…,n.
n−10=3−1, so n=12.
E(X)=12+12=132
b X~ U(12), so P(X=x)=112, and P(X⩽3)=312=14.
Let the random variable Y be the number of these three observations that have a value of 3 or less,
so Y~B(3,14).
P(Y<2) =P(Y=0)+ P(Y=1) =( 3 0)×( 1 4)0×( 3 4)3+( 3 1)×( 1 4)1×( 3 4)2 =2732=0.844(3 s.f.)
7
a Y=5+0.02X
E(Y)=5+0.02E(X)=10Var(Y)=0.022Var(X)=1
b Z=25−Y
E(Z)=25−E(Y)=15Var(Z)=Var(Y)=1
8
a H0: μ=20 minutes
H1: μ≠20 minutes
s=4.57 (3 s.f.),X¯=22.6,ν=7T=X¯−20s8=1.63 (3 s.f.)
Use the table to find the critical value. For a two-tailed test you need to look at the 0.95 column.
Comparing your calculated t-value with the critical value:
|T|=1.63<1.895
There is insufficient evidence to reject the null hypothesis.
b Z=Φ−1(0.95)=1.65 (3 s.f.),σ=4.6,n=100μ−Zσn<20⇒μ<20.76 (2 d.p.)
Any value less than or equal to 20.75(2 d.p.) would lead to rejection of Rajul's claim.
9
a The population is large. Most marks would be concentrated around the mean, and the further a mark
is from the mean, the less likely it is to occur.
b The null hypothesis 'The school is not doing better than the rest of the population (μ=60)’ will be
rejected if the 10 students' average is more than 70%.
Significance level=P(X>70)=1− P(X⩽70)=1−Φ(70−601510)=1.75%(3 s.f.)
c Now test H0: The school is not doing better than the rest of the population (μ=60) against H1: μ=65,
and find the probability that the 10 students' average is not more than 70.
P(type II error)=P(X⩽70|μ=65)=Φ(70−651510)=0.8541
Power of test =1−P(type II error)=0.146 or 14.6%(3 s.f.)
d The power of the test would increase. The test would be more likely to identify the difference from
60%.
In context, as the true mean gets closer to 70 (from below), it becomes more and more difficult to
believe that the school is not doing better than the rest of the population.
10 a e−10λ=0.25λ=0.1ln4
b Using the value of λ from part a:
Ρ(T=20)=e−20λ=0.0625
c Ε(T)=1λ=7.21 (3 s.f.)
11 Assume the population to be normally distributed.
H0: μ=79
H1: μ>79
s=5.58 (3 s.f.),X¯=82,ν=12−1=11T=X¯−79s12=1.86 (3 s.f.)
Use the table to find the critical value at the 5% significance level.
Comparing your calculated t-value with the critical value:
|T|=1.86>1.796
Reject H0. There is sufficient evidence to support Judith's belief.
12 H0: the new drug is not effective in the prevention of sickness in holiday-makers.
H1: the new drug is effective in the prevention of sickness in holiday-makers.
Calculating the expected frequencies:
Sickness
No sickness
Drug taken
28
52
No drug taken
7
13
n=100v=(2−1)(2−1)=1
Using Yates' correction:
χYates2=∑(|Oi−E|i−0.5)2Ei
Comparing the χYates2 value with the critical value from the table at the 5% significance level:
χYates2=3.37 (3 s.f.)<3.841
Do not reject H0. There is no evidence at the 5% significance level to support the claim that the drug is
effective against the sickness.
13 a Using the fact that E(X)= Var(X):
n+12=n2−112⇒n2−6n−7=0⇒n=7
b Var(2X)=22Var(X)=4Var(X)
Since E(X)= Var(X) from the question:
4Var(X)=4Ε(X)=2Ε(2X)≠Ε(2X)
14 Assume that the distances follow a normal distribution.
H0: μ=190 metres
H1: μ≠190 metres
s=11.7 (3 s.f.),X¯=184,ν=10−1=9T=X¯−190s10=−1.62 (3 s.f.)
Use the table to find the critical value. For a two-tailed test you need to look at the 0.99 column.
Comparing your calculated t-value with the critical value:
|T|=1.62 (3 s.f.)<2.821
Do not reject H0. There is sufficient evidence to support Lorraine's belief that there has been no
change.
15 a It must be a random sample.
b H0: μ=44.1 mph
H1: μ<44.1 mph
s=9.35 (3 s.f.),X¯=43.3,ν=100−1=99T=X¯−44.1s100=−2.71 (3 s.f.)
Use the table to find the critical value at the 1% significance level.
Comparing your calculated t-value with the critical value:
|T|=2.71 (3 s.f.)>2.364
Reject H0. There is significant evidence that the mean speed has reduced.
c i Concluding that the mean speed has reduced when in fact it has not.
ii Concluding that the mean speed is still 44.1 when in fact it has reduced.
16 a Ε(X)=14+22+n4=5+n4
4Ε(1X)=4(14×1+12×2+14×n)=2+1n
Setting E(X)=4E(1X) and equating to 0:
n2−3n−4=0⇒n=−1 or 4
b For n=4:
Var(X)=E(X2)−(E(X))2=124+222+424−(94)2=1.1875
Var(1X)=E(1X2)−(E(1X))2=14×12+12×22+14×42−0.56252=0.0742Var(X)Var(1X)=16
17 a H0: λ=70, H1: λ>70
b X represents the number of cars passing the school gates in a 10-minute interval, X~Po(70).
Ρ(X⩾x)=∑n=x∞70ne−70n!⩽0.1⇒X⩾82
c Reject H0. There is sufficient evidence that the mean number of cars has increased.
d The mean number of cars passing is 12 per minute, so now X~ Po(120).
Ρ(X⩽81)=∑n=081120ne−120n!=1.01×10−4 (3 s.f.)
18 a λ0=3×16=48
b
Ρ(X⩽35)=∑n=03548ne−48n!=0.0309 (3 s.f.)
c The number of visitor groups is now 12 per hour, so if X represents the number of groups arriving in a
3-hour period, X~Po(3×12=36).
Ρ(X>35)=∑n=36∞36ne−36n!≈0.522Power=1−P(X>35)=0.478 (3 s.f.)
19 a Interval is X¯−Z×sn<μG<X¯+Z×sn
2.0001−1.960×10−410<μG<2.0001+1.960×10−410
2.000 038<μG<2.000 162(7 s.f.)
b Lower bound=10 000(2.000 038−2)=0.380(3 s.f.)
Upper bound=10 000(2.000 161−2)=1.62(3 s.f.)
Interval is 0.380<μY<1.62
c No; G=2 lies outside the confidence interval.
d i Let the number of confidence intervals that contain the true mean be X, then X~ B(3,0.95).
P(X⩾2)=∑x=23(3x)×0.95x×0.053−x=39714000 or 0.993(3 s.f.)
ii Let the number of confidence intervals that are above the true mean be Y, then Y~B(3,0.025).
P(Y=3)=0.0253=164 000 or 0.000 015 6(3 s.f.)
Acknowledgements
The authors and publishers acknowledge the following sources of copyright material and are grateful for
the permissions granted. While every effort has been made, it has not always been possible to identify the
sources of all the material used, or to trace all copyright holders. If any omissions are brought to our
notice, we will be happy to include the appropriate acknowledgements on reprinting.
Thanks to the following for permission to reproduce images:
Cover image: Peter Medlicott Sola/Getty Images
Back cover: Fabian Oefner www.fabianoefner.com
Serdarbayraktar/Getty Images; Chris Hepburn/Getty Images; PM Images/Getty Images;
aaaaimages/Getty Images; John Lund/Getty Images; espy3008/Getty Images
AQA material is reproduced by permission of AQA.
University Printing House, Cambridge CB2 8BS, United Kingdom
One Liberty Plaza, 20th Floor, New York, NY 10006, USA
477 Williamstown Road, Port Melbourne, VIC 3207, Australia
314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025,
India
79 Anson Road, #06–04/06, Singapore 079906
Cambridge University Press is part of the University of Cambridge.
It furthers the University’s mission by disseminating knowledge in the pursuit of
education, learning and research at the highest international levels of excellence.
www.cambridge.org
Information on this title:
www.cambridge.org/9781316644508 (Paperback)
www.cambridge.org/9781316644324 (Paperback with Cambridge Elevate edition)
www.cambridge.org/9781316644584 (Cambridge Elevate edition 2 years)
www.cambridge.org/9781316644614 (Cambridge Elevate edition 1 year School Site
Licence)
© Cambridge University Press 2018
This publication is in copyright. Subject to statutory exception and to the provisions of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published 2018
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Printed in the United Kingdom by Latimer Trend.
A catalogue record for the print publication is available from the British Library
ISBN 978-1-316-64450-8 Paperback
ISBN 978-1-316-64432-4 Paperback with Cambridge Elevate edition
Additional resources for this publication at www.cambridge.org/education
Cambridge University Press has no responsibility for the persistence or accuracy of
URLs for external or third-party internet websites referred to in this publication, and
does not guarantee that any content on such websites is, or will remain, accurate or
appropriate.
NOTICE TO TEACHERS IN THE UK
It is illegal to reproduce any part of this work in material form (including photocopying
and electronic storage) except under the following circumstances:
(i) where you are abiding by a licence granted to your school or institution by the
Copyright Licensing Agency;
(ii) where no such licence exists, or where you wish to exceed the terms of a licence,
and you have gained the written permission of Cambridge University Press;
(iii) where you are allowed to reproduce without permission under the provisions of
Chapter 3 of the Copyright, Designs and Patents Act 1988, which covers, for
example, the reproduction of short passages within certain types of educational
anthology and reproduction for the purposes of setting examination questions.
Message from AQA
This textbook has been approved by AQA for use with our qualification. This
means that we have checked that it broadly covers the specification and we are
satisfied with the overall quality. Full details of our approval process can be found
on our website.
We approve textbooks because we know how important it is for teachers and
students to have the right resources to support their teaching and learning.
However, the publisher is ultimately responsible for the editorial control and
quality of this book.
Please note that when teaching the A/AS Level Further Mathematics (7366, 7367)
course, you must refer to AQA’s specification as your definitive source of
information. While this book has been written to match the specification, it cannot
provide complete coverage of every aspect of the course.
A wide range of other useful resources can be found on the relevant subject pages
of our website: www.aqa.org.uk
IMPORTANT NOTE AQA has not approved any Cambridge Elevate content.