0DWKHPDWLFV Statistics Higher

advertisement
0DWKHPDWLFV
Statistics
Higher
Spring 1999
HIGHER STILL
0DWKHPDWLFV
6WDWLVWLFV
+LJKHU
6XSSRUW0DWHULDOV
*+,-./
EXPLORATORY DATA ANALYSIS (EDA)
PREVIOUS KNOWLEDGE
Measures of Central Tendency
Students should be aware of the following:
n
• the sample mean , x =
x1 + x 2 + x 3 ... + x n
=
n
∑x
i=1
n
i
.
advantages - takes account of all the data , easy to handle mathematically
disadvantages - can be distorted by a single low or high value
• the mode is the most frequently occurring observation in a data set
advantages - easy to find
disadvantages - does not take account of all the data , difficult to handle mathematically
• the median (Q 2 ) is the middle observation in a set of data, arranged in numerical order
and splits the data into two equal halves
advantages - not affected by unusually low or high values
disadvantages - does not take account of all the data , difficult to handle mathematically
• the lower quartile (Q1 ) , the median (Q 2 ) and the upper quartile (Q 3 ) split an ordered
data set into four equal quarters .
Example
The number of matches in a box were counted for a sample of 17 boxes. The results
were:
51
52
52
51
48
48
53
49
48
52
50
50
51
48
50
47
46
.
For the above data find
(a) the mean (b) the mode (c) the median (d) the lower and upper quartiles.
Answer
To order the data, we first draw a stem-and-leaf diagram (see later section on EDA).
4
5
8 8 6 8 9 8 7
1 2 3 0 1 0 2 1 2 0
4 8 means 48 matches
Mathematics: Statistics (Higher) Teachers notes
1
followed by an ordered stem-and-leaf diagram
4
5
6 7 8 8 8 8 9
0 0 0 1 1 1 2 2 2 3
4 8 means 48 matches
∑x
.
846
= 49.8
17
n
(b) The most frequent observation is 48 so the mode or modal value = 48 .
(a) x =
=
(c) In the above data set the median is the 9th observation .
From the stem-and-leaf diagram, the 9th observation is 50 so the median = 50 .
 n + 1
NB If there are n observations in a data set , the median is the value in the 
 th position.
 2 
(d) A number of different methods exist to find the quartiles of a data set .
Two of the most commonly used methods are illustrated below.
METHOD 1
The lower quartile can be defined as the median of the lower half of the ordered data set,
with the upper quartile being the median of the upper half. Ignoring the median, the lower
half of the data is 46 47 48 48 48 48 49 50.
The lower quartile lies halfway between the 4th and 5th observations, so the lower quartile
48 + 48
is
= 48.
2
Ignoring the median, the upper half of the data is 50 51 51 51 52 52 52 53. The upper
quartile lies halfway between the 13th and 14th observations, so the upper quartile is
51 + 52
= 51.5.
2
METHOD 2
 n +1
The lower quartile can be defined as the 
 th position within an ordered data set.
 4 
 3(n + 1) 
Similarly, the upper quartile can be defined as the 
 th position within an
 4 
ordered data set.
With n = 17, the lower quartile is in the (17+1)/4 = 4.5th position, so the lower quartile is
48 + 48
= 48. The upper quartile is the value in the 3(17+1)/4 = 13.5th position, so the
2
51 + 52
upper quartile is
= 51.5.
2
Mathematics: Statistics (Higher) Teachers notes
2
NB
(a) The above methods give the same answers for the quartiles. However, this will not
always be the case . Students should be encouraged in their written work to be clear
about the method they have used for determining the quartiles of a data set.
(b) Almost all scientific and graphic calculators will find the mean. Some graphic
calculators will also find the upper and lower quartiles.
Samples and Populations
In a statistical study, the complete set of objects under investigation is called a population.
Collecting information on every member of the population is known as a census. A
census, however, is regularly rejected as a means of gathering information as it can be very
time - consuming and expensive to administer.
Instead, a sample or subset of the population is taken. It is extremely important that the
sample is representative of the population, i.e. that its characteristics mirror the
characteristics of the population.
The sample will be analysed and conclusions made about the population. Samples which
are unrepresentative or biased may lead to biased or unjustified conclusions. The process
of using a sample to infer details about the population is known as statistical inference.
Sample statistics, such as the mean, x , will allow the statistician to estimate the
parameter µ, the true value of the population. For example, in a sample of 100 senior
pupils, we might discover the proportion of the 100 pupils in the sample who are
vegetarians. We might assume that the proportion in the population of all senior school
pupils is similar, but we would not know this population proportion exactly.
Probability sampling methods should be used to avoid sampling bias. Simple random
sampling is a method of sampling where each member of the population has an equal
chance of being selected for the sample. For example, a sample of n objects could be
selected from a population of N objects by putting N tickets (numbered consecutively from
1 to N) in a hat and n tickets picked out a random. Other sampling methods are available
but will not be covered within this course.
Mathematics: Statistics (Higher) Teachers notes
3
Measures of Variability
Students should be aware of the following:
• the range = maximum value - minimum value
• the interquart ile range (IQR) = upper quartile - lower quartile = Q 2 - Q1 ,
1
• the semi - interquart ile range (SIQR) = half of the interquart ile range = 2 (Q 2 - Q1 ) .
advantages - not affected by extreme values
disadvanta ges - difficult to handle mathematic ally
• the sample standard deviation is a measure of the variabili ty of the data about the sample mean ,
2
 n 
x
 ∑ xi  n
(x
x
)
∑
∑ i
i=1
i =1 
i=1
,
s =
or
n -1
n -1
advantages- takes account of all the data , easy to handle mathematically
disadvanta ges - can be distorted by extreme values
n
n
2
i
2
NB
(a) s2 is called the sample variance.
(b) The (n – 1) divisor is used in the formula because on average it produces better
estimates of the population standard deviation σ.
Example
The number of matches in a box were counted for a sample of 17 boxes. The results
were:
51
52
52
51
48
48
53
49
48
52
50
50
51
48
50
47
46
.
For the above data find
(a) the range (b) the interquartile range (c) the sample standard deviation .
Answer
(a) range = maximum – minimum = 53 – 46 = 7
(b) From the example on page 1,
Q3 = 51.5 and Q1 = 48 ⇒ interquartile range = 51.5 – 48 = 3.5
Mathematics: Statistics (Higher) Teachers notes
4
(c) n = 17 ,
∑ x = 846 , ∑ x
2
= 42166
42166 - (846 ) 17
=
17 - 1
2
so s =
42166 - 42100.941...
= 2.02
16
n
∑ (x - x )
NB (i) The formula s =
2
i=1
is not used since the use of x = 49.7647...
n-1
could lead to rounding error .
(ii) Almost all scientific and graphic calculators will find the sample standard deviation .
Now try Exercise 1 - Average/Variability.
Exploratory Data Analysis (EDA)
Before a detailed analysis of a data set is carried out, an initial impression of the data is
normally sought. First impressions can be gained by displaying the data in a simple but
convenient form and by calculating simple measures of central tendency and variability.
In particular, the student should be able to interpret the following diagrams:
•
•
•
stem-and-leaf diagrams
dotplots
boxplots
Stem-and-leaf diagrams
The stem-and-leaf diagram is a very useful way of organising data and can be used as an
alternative to a frequency table or bar chart. It can also be used to compare two data sets.
Example
The heights of 30 adult males are recorded to the nearest centimetre.
175
195
168
168
167
169
169
165
176
190
173
179
180
175
172
188
178
183
161
172
174
171
173
184
160
184
169
167
171
179
Draw a stem-and-leaf diagram for the above data.
Answer
Initially, an unordered stem-and-leaf diagram is created with the stems being represented
by the hundred/ten digits and the leaves by the unit digit (a key which explains this
representation is included beneath each diagram). The unordered diagram is then
converted to an ordered stem-and-leaf diagram.
Mathematics: Statistics (Higher) Teachers notes
5
16
17
18
19
unordered
8910775899
5135823169249
08434
05
16
17
18
19
16 8 means 168 centimetres
ordered
0157788999
1122334556899
03448
05
16 8 means 168 centimetres
The stems can also be split to give a more detailed picture of the distribution of the leaves.
For each stem, the leaves between 0 and 4 (inclusive) are separated from the leaves
between 5 and 9 (inclusive). The ordered stem and leaf diagram above can be adjusted to
the following:
16
16
17
17
18
18
19
19
01
57788999
1122334
556899
0344
8
0
5
16 8 means 168 centimetres
Example
The heights of 30 adult females are recorded to the nearest centimetre.
155
169
172
160
169
171
190
171
172
156
166
163
166
170
172
156
170
163
170
172
169
164
172
169
171
153
169
174
161
175
Draw a back-to-back stem-and-leaf diagram using the above data and the data from the
previous example. Comment on any differences between the two sets of data.
Answer
The back-to-back stem-and-leaf diagram allows a simple comparison of the two data sets
to be made. Working as before, we place the leaves for the male data on the left of the
stems and the leaves for the female data on the right of the stems.
Mathematics: Statistics (Higher) Teachers notes
6
Males
Females
15 3
15 5 6 6
1 0 16 0 1 3 3 4
9 9 9 8 8 7 7 5 16 6 6 9 9 9 9 9
4 3 3 2 2 1 1 17 0 0 0 1 1 1 2 2 2 2 2 4
9 9 8 6 5 5 17 5
4 4 3 0 18
8 18
0 19 0
5 19
16 8 means 168 centimetres
From the above data, males are generally taller than females.
Dotplots
A dotplot is another alternative to the bar chart.
Example
The number of matches in a box were counted for a sample of 17 boxes. The results were:
51
52
52
51
48
48
53
49
48
52
50
50
51
48
50
47
46
Draw a dotplot for the above data.
Answer
Firstly we might construct an ordered stem-and-leaf diagram.
4
5
6 7 8 8 8 8 9
0 0 0 1 1 1 2 2 2 3
4 8 means 48
The dot plot can then be easily constructed.
• •
•
•
•
•
•
• •
• •
• •
•
•
•
•
46 47 48 49 50 51 52 53
Number of matches
Now try Exercise 2 - Exploratory Data Analysis (EDA) , Questions 1 - 5.
Mathematics: Statistics (Higher) Teachers notes
7
Boxplots
Once an ordered stem-and-leaf diagram has been produced, it can easily be converted into
a boxplot (or box and whisker diagram). The boxplot is a graphical representation of the
five number summary: minimum , lower quartile , median upper quartile and
maximum. The boxplot is an extremely useful way of comparing two or more data sets.
Example
The heights of 30 adult males are recorded to the nearest centimetre.
175
195
168
168
167
169
169
165
176
190
173
179
180
175
172
188
178
183
161
172
174
171
173
184
160
184
169
167
171
179
Draw a boxplot for the above data.
Answer
Firstly, we construct an ordered stem-and-leaf diagram and then calculate the median and
the quartiles.
16
16
17
17
18
18
19
19
01
57788999
1122334
556899
0344
8
0
5
16 8 means 168 centimetres
The minimum is 160 and the maximum is 195.
The median is the value in the
30 + 1
= 15.5 th position.
2
173 + 173
⇒ Q2 =
= 173
2
The lower quartile is the median of the lower half of
the data, i.e. the 8th position..
⇒ Q1 = 169
The upper quartile is the median of the upper half of
the data, i.e. the 23rd position.
⇒ Q3 = 179
Mathematics: Statistics (Higher) Teachers notes
8
The boxplot for the above data is constructed as follows:
Q1
Q2
Q3
max
min
160
200
190
180
170
The ‘box’ represents the middle 50% of the data, the lower 25% of the data by the lower
‘whisker’ and the upper 25% of the data by the upper ‘whisker’.
Outliers
From time to time, extreme values may occur within a data set. These values are called
outliers, being much smaller or much larger than the rest of the data. They can occur as a
result of natural variation or by some error in the data collection. Outliers are commonly
identified by using fences or boundaries within the data set. Any values which lie beyond
these fences are considered to be possible outliers.
The lower fence is defined as Q1 – 1.5 x IQR and the upper fence as Q3 + 1.5 x IQR,
where IQR represents the interquartile range.
For the above example:
IQR = 179 – 169 = 10
lower fence = 169 – 1.5 x 10 = 154
upper fence = 179 + 1.5 x 10 = 194
Since there are no values of the data less than 154 or greater than 194, we can conclude
that there are no outliers within the data.
If, however, an outlier is identified, we adjust the boxplot of the data by clearly labelling it
(usually with an asterisk) and draw the appropriate whisker to the nearest piece of data just
inside the fence. Consider the following example.
Example
The heights of 30 adult females are recorded to the nearest centimetre.
155
169
172
160
169
171
190
171
172
156
166
163
166
170
172
156
170
163
170
172
169
164
172
169
171
153
169
174
161
175
Draw a boxplot for the above data.
Mathematics: Statistics (Higher) Teachers notes
9
Answer
As before, we construct an ordered stem-and-leaf diagram and then calculate the median
and the quartiles.
15
15
16
16
17
17
18
18
19
3
566
01334
6699999
000111222224
5
The minimum is 153 and the maximum is 190.
The median is the value in the
30 + 1
= 15.5 th position.
2
169 + 169
⇒ Q2 =
= 169
2
The lower quartile is 8th value.
⇒ Q1 = 163
0
16 9 means 169 centimetres
The upper quartile is the 23rd value.
⇒ Q3 = 172
The IQR = 172 – 163 = 9, the lower fence = 163 – 1.5 x 9 = 149.5, and the upper fence = 172 + 1.5 x 9 =
185.5. Since 190 is beyond the upper fence it can be considered an outlier, although its occurrence is more
likely due to natural variation rather than recording error.
The boxplot for the above data is constructed as follows:
∗
150
160
170
180
190
200
Now try Exercise 2 - Exploratory Data Analysis (EDA), Questions 6 - 13.
Mathematics: Statistics (Higher) Teachers notes
10
Using Boxplots for Comparisons
Example
Use the boxplots from the previous two examples to compare the relative heights of adult
males and adult females.
Answer
Adult male heights
∗
150
160
170
180
190
Adult female heights
200
Observations on the above boxplots might include:
•
•
•
•
the median of the male heights is greater than the median of the female heights
the variability within both groups is broadly similar (see range and IQR)
the median of the female heights is roughly equal to the lower quartile of the male
heights, i.e. 50% of female heights are below 169 whereas 75% of male heights
are greater than 169
some females were taller than some males, with the tallest man being only 5
centimetres taller than the tallest woman.
From the above observations, there would appear to be some evidence to suggest that adult
males are generally taller than adult females.
Now try Exercise 3 - Interpreting an EDA.
Mathematics: Statistics (Higher) Teachers notes
11
PROBABILITY
Simple Probability
Probability is a measure of how likely something is to happen. Some simple definitions
are necessary to enable us to discuss probability in an informed manner. These include :
•
A random experiment or trial is one in which there are a number of possible
outcomes where we have no way of predicting which outcome or outcomes will
actually occur.
•
The sample space, usually denoted by S, is the set of all possible outcomes of the trial.
•
An event is any set of possible outcomes of a trial. An event is therefore a subset of
the sample space S.
•
The relative frequency is the frequency of an event divided by the total frequency.
In experimental situations it is used as an estimate for the probability of that event.
Example
A simple random experiment would be the rolling of an ordinary six-sided die since,
before the die is rolled, we are unable to predict the outcome of the trial. The possible
outcomes are 1, 2, 3, 4, 5, 6 with the sample space written as S = {1, 2, 3, 4, 5, 6}.
Possible events include ‘the outcome is a prime number’ and ‘the outcome is a number
greater than 2’.
Probability is measured on a scale of 0 to 1:
•
a probability of 0 means that the event can never happen and the closer the probability
of an event is to 0 the less likely it is to happen
•
a probability of 1 means that the event is certain to happen and the closer the
probability of an event is to 1 the more likely it is to happen
•
events are often described as impossible, unlikely, possible, likely or certain
•
where all outcomes of a trial are equally likely, probability is defined as:
number of favourable outcomes
total number of outcomes
•
in experimental situations, as the number of trials increases, the probability of an event
occurring is given by the limit of the relative frequency of that event
•
an event is usually denoted by a capital letter e.g. A , B , C etc with the probability of
its occurring being denoted by P(A)
Mathematics: Statistics (Higher) Teachers notes
12
•
n(A)
where n(A) = the number of outcomes described by the event A, n(S)
n(S)
= the total number of outcomes in the sample space
•
0 ≤ P(A) ≤ 1
P(A) =
Example
A single card is selected from a standard pack of 52 playing cards. Find the probability that
the card is
(a) a King (b) a Heart (c) the Ace of Spades .
Answer
There are 52 equally likely outcomes of this trial.
n(Kings)
4
1
(a) P(King) =
=
=
n(total)
52
13
n(Hearts)
13
1
(b) P(Heart) =
=
=
n(total)
52
4
n(Ace of Spades)
1
(c) P(Ace of Spades) =
=
n(total)
52
Data could also be presented in tabular form.
Example
A survey of 100 people revealed the following voting intentions.
Labour
SNP
Conservative
Liberal Democrat
Total
Women
22
18
6
3
49
Men
20
22
4
5
51
Total
42
40
10
8
100
A person is chosen at random from this group.
Find the probability that the person
(a) is a woman
(b) intends to vote SNP
(c) is a man intending to vote Liberal Democrat .
Answer
(a) P(woman) =
n(women)
49
=
= 0.49
n(total)
100
Mathematics: Statistics (Higher) Teachers notes
13
n(SNP)
40
=
= 0.4
n(total)
100
n(male Lib Dem)
5
(c) P(male Lib Dem) =
=
= 0.05
n(total)
100
(b) P(SNP) =
Now try Exercise 1 - Simple Probability.
Sample Spaces & Further Simple Probability
When events become more complex, it is extremely important that students can list the
members of the sample space. The members of the sample space can be conveniently
identified by systematic listing or by using tables or tree diagrams. Consider the following
examples.
Example
The menu in a restaurant has 4 choices of main course and 3 choices of dessert .
Main Course Chicken (C)
Salmon (S)
Lamb (L)
Pork (P)
Desserts
Fruit salad (F)
Ice cream (I)
Gateau (G)
How many different combinations could be chosen from the above menu ?
Answer
Representing each choice by its first letter, a systematic list could be set out as follows.
Chicken (C) could be combined with any of the desserts to give
Similarly, for Salmon (S) we have
for Lamb (L) we have
and for Pork (P) we have
CF
CI
CG
SF
LF
PF
SI
LI
PI
SG
LG
PG
There are 4 x 3 = 12 different possible combinations.
Example
An unbiased die is rolled and a fair coin is tossed.
(a) List the sample space for this experiment.
(b) Calculate the probability of obtaining a head and an even number.
Mathematics: Statistics (Higher) Teachers notes
14
Answer
(a)
The coin can land Heads (H) or Tails (T) . The die can show 1, 2, 3, 4, 5 or 6.
A systematic list would produce the following
H1
T1
H2
T2
H3
T3
H4
T4
H5
T5
H6
T6
Alternatively , the above list could have been set out in tabular form .
1
2
Die
3
H
H1
H2
H3
H4
H5
H6
T
T1
T2
T3
T4
T5
T6
4
5
6
Coin
This method works well but is limited to situations where only 2 choices are to be made.
A further alternative is to represent this situation using a tree diagram.
Coin
H
T
Die
1
2
3
4
5
6
Outcome
H1
H2
H3
H4
H5
H6
1
2
3
4
5
6
T1
T2
T3
T4
T5
T6
This is an excellent method, but can become overly complicated if there are too many
branches. The sample space S = {H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
(b)
P(head and even) =
n(head and even)
3
1
=
=
4
n(total)
12
Now try Exercise 2 - Sample Spaces & Further Simple Probability.
Mathematics: Statistics (Higher) Teachers notes
15
Mutually Exclusive & Exhaustive Events
Two or more events are said to be mutually exclusive if they cannot occur at the same
time.
Two or more events are said to be exhaustive if they combine to form the entire sample
space.
Example
Decide if each pair of events X and Y is mutually exclusive and/or exhaustive.
(a)
X : selecting a spade from a standard pack of 52 playing cards
Y : selecting a red card from a standard pack of 52 playing cards
(b)
X : obtaining an even number on the roll of a fair six - sided die
Y : obtaining a number greater than 4 on the roll of a fair six - sided die
Answer
(a) A spade is a black card. A black card and a red card cannot be selected at the same
time, so X and Y are mutually exclusive. X and Y are not exhaustive as neither event
includes the selection a club.
(b) Obtaining a six allows X and Y to occur at the same time, so X and Y are not mutually
exclusive. X and Y are not exhaustive as neither event includes 1 or 3 .
Venn Diagrams
A useful way of representing events and their probabilities is to use the Venn diagram.
The Venn diagram provides a picture of how simple events are related to each other.
S
S
A
A
Figure 1
A
Figure 2
The area within the rectangle represents the entire sample space S and the area within
the circle represents the situation when event A occurs (see Figure 1 above).
The area outwith A , denoted by A (read as ' not A' ) , represents the event that A
does not occur (see Figure 2 above) .
A ∩ B ≠ 0
A,B not mutually exclusive
S
A ∩ B = 0
A,B mutually exclusive
S
A
B
Figure 3
Mathematics: Statistics (Higher) Teachers notes
A
B
Figure 4
16
The event A and B , denoted by A ∩ B , represents the area of overlap between A and
B (see Figure 3) . For mutually exclusive events there is no overlap between A and B
⇒ P(A ∩ B) = 0 (see Figure 4) .
A∪B
A,B not mutually exclusive
S
A∪B
A,B mutually exclusive
S
A
B
Figure 5
A
B
Figure 6
The event A or B , denoted by A ∪ B , represents the area in either A or B or the area
in A ∩ B (see Figure 5) . The following probability rule can therefore be deduced :
P(A ∪ B) = P(A) + P(B) - P(A ∩ B) .
For mutually exclusive events (see Figure 6) , since P(A ∩ B) = 0 , the above result simplifies
to P(A ∪ B) = P(A) + P(B) .
This result is known as the Addition Rule for mutually exclusive events.
NB
Only the simplified rule is required at this level. Students must verify that events are
mutually exclusive before using this rule.
Example
An unbiased six-sided die is thrown. Calculate the probability of scoring a 2 or an odd
number.
Answer
The events ‘scoring a 2’ and ‘scoring an odd number’ are mutually exclusive.
P(2 or an odd number) = P(2) + P(odd)
=
=
1
3
+
6
6
2
3
Mathematics: Statistics (Higher) Teachers notes
17
Example
1
1
5
, P(B) =
and P(A or B) =
.
4
3
12
Are the events A and B mutually exclusive ?
For events A and B , P(A) =
Answer
1
1
7
+
=
≠ P(A or B)
4
3
12
So events A and B are not mutually exclusive .
P(A) + P(B) =
Now try Exercise 3 - Mutually Exclusive and Exhaustive Events.
Independent Events
Two or more events are said to be independent if the occurrence of any one event does
not affect the occurrence of any of the other events.
Example
For each pair of events X and Y listed below, decide whether or not it is likely that the
events are independent.
(a)
(b)
X : I throw a fair die and score a 5 .
Y : I throw the same fair die again and score another 5 .
X : I catch measles .
Y : My brother catches measles .
Answer
(a) Scoring a 5 on the first throw does not affect what happens on the second throw.
Therefore, events X and Y are independent.
(b) As measles is an infectious disease, it is likely that event Y is influenced by event X
and so events X and Y are not independent.
For independen t events A and B we have the following rule :
P(A and B) = P(A ∩ B) = P(A) x P(B) = P(A).P(B) .
This result is known as the Multiplication Rule for independent events.
NB
Students must verify that events are independent before using this rule.
Mathematics: Statistics (Higher) Teachers notes
18
Example
Two fair dice, each numbered 1 to 6, are rolled. Events A, B, C and D are defined as
follows:
A : The first die scores 5
B : The second die scores 5
C : The total is 6
D : The total is 7
(a) Find P(A ∩ B)
(b) Find P(A ∩ C) and P(A ∩ D)
(c) Which of the events A , C and D are independent ?
Answer
(a) Since events A and B are independent , P(A ∩ B) = P(A) x P(B)
1 1
=
x
6 6
1
=
36
n(A ∩ C) 1
(b) P(A ∩ C) =
=
n(total) 36
n(A ∩ D) 1
P(A ∩ D) =
=
n(total)
36
1 5
5
(c) P(A) x P(C) = x
=
≠ P(A ∩ C)
6 36 216
So events A and C are not independent .
1 1 1
P(A) x P(D) = x =
= P(A ∩ D)
6 6 36
So events A and D are independent .
Now try Exercise 4 - Independent Events.
Tree Diagrams - With and Without Replacement
Example
A bag contains 4 green and 5 yellow balls. One ball is selected and then replaced. A
second ball is then selected. Find the probability that both balls are green.
Answer
In this situation the first selection is returned to the bag before the second selection is
made. We would describe this type of selection as sampling with replacement.
Mathematics: Statistics (Higher) Teachers notes
19
First ball
Second ball
Green
Outcome
Green and Green
5
9
4
9
Yellow
Green and Yellow
Green
Yellow and Green
5
9
Yellow
Yellow and Yellow
4
9
Green
4
9
5
9
Yellow
P(both green) = P(green and green)
4 4
x
9 9
16
=
81
=
The tree diagram is an extremely useful way of illustrating the outcomes of two or more
trials. It provides the student with a simple but clear means of displaying what is
happening in a problem.
In general, the outcomes of the first trial are represented by lines extended from a fixed
point. From the ends of these lines other lines are extended to represent the outcomes of
the second trial. (The diagram can be extended depending on the number of trials involved
in the problem.) The probability of each outcome is written above its line. The probability
of the event at the end of each branch is found by multiplying the probabilities along that
branch.
Example
An electrical system consists of three components A, B and C and will only work when all
three components are in working order. The three components are manufactured by three
different companies. From past experience the following information is available:
P(component A is defective) = 0.03
P(component B is defective) = 0.02
P(component C is defective) = 0.04
Calculate the probability that the electrical system will not be operational.
Answer
P(component A defective) = 0.03 ⇒ P(component A non - defective) = 0.97 ,
P(component B defective) = 0.02 ⇒ P(component B non - defective) = 0.98 ,
P(component C defective) = 0.04 ⇒ P(component C non - defective) = 0.96 .
Mathematics: Statistics (Higher) Teachers notes
20
Component A
defective ?
Component B
defective ?
0.02
0.98
0.04
Yes
YYY
0.96
0.04
No
Yes
YYN
YNY
0.96
0.04
No
Yes
YNN
NYY
0.96
0.04
No
Yes
NYN
NNY
0.96
No
NNN
No
Yes
0.02
0.97
Outcome
Yes
Yes
0.03
Component C
defective ?
No
0.98
No
P(electrical system not operational)
= P(YYY) + P(YYN) + P(YNY) + P(YNN) + P(NYY) + P(NYN) + P(NNY)
= 1 - P(NNN)
= 1 - 0.97 x 0.98 x 0.96
= 0 .087424
Now try Exercise 5 - Tree Diagrams (With Replacement).
Example
A bag contains 4 green and 5 yellow balls. One ball is selected and not replaced. A
second ball is then selected. Find the probability that both balls are green.
Answer
Sometimes the probabilities of subsequent events may change as a result of earlier events.
In this situation, the first selection is not returned to the bag before the second selection is
made. We would describe this type of selection as sampling without replacement. The
conditions for calculating probabilities have now been changed - the number of green or
yellow balls has been reduced by one, as has the total number of balls.
First ball
4
9
5
9
3
8
Second ball
Green
Outcome
Green and Green
Yellow
Green and Yellow
Green
Yellow and Green
Yellow
Yellow and Yellow
Green
5
8
1
2
Yellow
1
2
Mathematics: Statistics (Higher) Teachers notes
21
P(both green) = P(green and green)
4 3
x
9 8
1
=
6
=
Example
A committee consists of 5 people: 3 women and 2 men. Two members are to be chosen at
random to be the Chairperson and the Vice-Chairperson. Find the probability that the two
chosen are of opposite sex.
Answer
Chairperson
3
5
2
5
1
2
Vice - Chairperson
Woman
Outcome
Woman and Woman
Man
Woman and Man
Woman
Man and Woman
Man
Man and Man
Woman
1
2
3
4
Man
1
4
P(opposite sex) = P(woman and man) + P(man and woman)
3 1
2
3
x
+
x
5 2
5 4
3
=
5
=
Now try Exercise 6 - Tree Diagrams (Without Replacement).
Combinations
The number of unordered arrangements of r objects selected from a collection of n
n
objects is denoted by n C r or   (read as ‘n c r’ or ‘n choose r’). Each collection of
r 
selected objects is called a combination.
n
The general formula for   is :
r
n
n!
n(n - 1 )(n - 2 ) ... (n - r + 1 )
  =
=
r!(n - r)!
r(r - 1 )(r - 2 ) ... 3.2.1
r
where n! = n(n - 1 )(n - 2) ... 3.2.1 and 0! = 1 .
n
At some point the   notation should be linked to Pascal' s triangle .
r
Mathematics: Statistics (Higher) Teachers notes
22
Example
Evaluate 8 C3 .
Answer
8
C3 =
8!
8.7.6
=
= 56
3! 5!
3.2.1
Example
A school committee of 5 people is to be chosen from 12 volunteers.
(a) In how many ways can the committee be chosen ?
(b) The Headteacher selects one of the volunteers to be the chairperson of the committee.
In how many ways can the committee now be chosen ?
ANSWER
12  12!
(a) Number of ways =   =
= 792
 5  5!7!
(b) As one member of the committee has been pre-selected, we now have a choice of 4
people from the remaining 11 volunteers.
11 11!
Number of ways =   =
= 330
 4  4!7!
Now try Exercise 7 - Combinations.
Combinations are a useful means of evaluating probabilities. Consider the following
example.
Example
From a well shuffled pack of 52 cards a hand of 7 cards is dealt.
Find the probability that the hand will contain
(a) exactly 3 kings
(b) at least 3 kings.
Mathematics: Statistics (Higher) Teachers notes
23
Answer
(a)
To select exactly 3 kings within a hand of 7 cards we must select 3 kings
(from a total of 4 kings) and any other 4 cards (from a total of 48 cards) .
P(exactly 3 kings)
 4  48
  x 
 3  4 
=
 52
 
 7
4 x 194580
133784560
≈ 0.00582
=
(b)
P(at least 3 kings)
= P(3 kings) +
 4  48
 x 
 3  4 
=
 52
 
 7
4 x 194580
133784560
≈ 0.00595
=
P(4 kings)
+
 4  48
 x 
 4  3 
 52
 
 7
+
1 x 17296
133784560
Now try Exercise 8 - Combinations (Probability).
Simulation
An alternative to the calculation of probabilities is to simulate the outcomes of a random
experiment using random numbers. Random numbers can be produced by tossing coins,
rolling dice, drawing numbered balls from a hat etc. Experiments of this kind, however,
can become very tedious and time-consuming if large samples of random numbers are
required. Instead, it is possible to make use of psuedo-random numbers which, although
not strictly random, have been computer-generated using a mathematical formula.
Mathematics: Statistics (Higher) Teachers notes
24
In practice, the prefix ‘psuedo’ is usually omitted and the numbers are described simply as
random numbers. Random numbers are usually set out in tabular form (see extract below).
37057
33724
43737
16929
10131
83986
28633
15929
84478
98571
98419
85953
19659
31341
20877
76401
82213
52804
60265
34585
15412
07827
72335
19404
22353
68418
48740
25208
27881
54505
The starting point and direction (right, left, up, down, diagonal etc.) should be predetermined before using such a list. The digits within the list can be taken as individuals
(3, 7, 0, 5, 7 ...), as pairs of digits (37, 05, 78, 39, 86 ...) , as decimals (0.37057, 0.83986,
0.98419 ...) or in whatever manner is convenient.
The ‘rand’ or ‘rand#’ function on most scientific and graphic calculators is designed to
produce random numbers. For example, different types of random number can be
produced on a graphic calculator by using the following simple routines:
n x rand ENTER .............. produces random numbers between 0 and n
n x rand + 1 ENTER ........ produces random numbers between 1 and n + 1
int(n x rand + 1) ENTER ... produces the whole number part of random numbers between
1 and n + 1 (i.e. the numbers 1 , 2 , 3 , ... , n) .
Example
Using the above list of random numbers, simulate the results of tossing a coin 10 times.
Answer
Let Heads be represented by the digits 0, 1, 2, 3 and 4 and Tails by the digits 5, 6, 7, 8 and
9 (or alternatively, let Heads be represented by an even number and Tails by an odd
number).
Starting at the sixth number on the second row and working towards the right we have:
2
H
8
T
6
T
3
H
3
H
8
T
5
T
9
T
5
T
3
H
giving 4 Heads and 6 Tails.
Mathematics: Statistics (Higher) Teachers notes
25
Example
Simulate the results of rolling an unbiased die 30 times.
Answer
A calculator produced the following list of random numbers:
0.925
0.312
0.240
0.017
0.118
0.930
0.622
0.817
0.617
0.334
0.043
0.086
0.853
0.012
0.451
0.674
0.881
0.982
0.807
0.455
0.114
0.997
0.374
0.696
0.989
0.798
0.124
0.492
0.773
0.805
0.670
0.198
0.597
0.701
0.700
0.552
0.450
0.404
0.464
0.868
0.985
0.398
0.606
0.882
0.544
0.338
0.467
0.229
0.925
0.257
0.633
0.117
0.077
0.371
0.638
0.219
0.286
0.628
0.624
0.717
Discarding the first zero and the decimal point in each number produces the following
table.
925
312
240
017
118
930
622
817
617
334
043
086
853
012
451
674
881
982
807
455
114
997
374
696
989
798
124
492
773
805
670
198
597
701
700
552
450
404
464
868
985
398
606
882
544
338
467
229
925
257
633
117
077
371
638
219
286
628
624
717
Starting with the fourth number on the second row and working towards the right (ignoring
the digits 0, 7, 8, 9) we have the following simulated scores:
6
6
1
1
6
4
3
1
5
1
1
1
1
2
5
2
6
4
4
6
4
5
1
6
3
2
3
3
3
3
4
1
6
5
1
4
In total, the above gives 7 sixes, 4 fives, 6 fours, 6 threes, 3 twos and 10 ones. The results
of this simulation do not agree exactly with the theoretical probabilities. Generally, we
would expect there to be some variation between theoretical and simulated results,
although this variation should reduce considerably as the size of the simulation increases.
Now try Exercise 9 - Simulation.
Mathematics: Statistics (Higher) Teachers notes
26
Random Variables
In statistics, a variable is described as random if its value is the result of a random
observation or experiment. There are two types of random variable: discrete and
continuous.
A discrete random variable is a variable for which a list of its possible numerical values
can be made. Discrete random variables are usually associated with counting.
Discrete random variable
The number of heads when two fair coins are tossed.
The number of sunny days in June.
The total score when two unbiased dice are rolled.
The number of rolls of an unbiased die until a 6 is obtained.
Possible values
0, 1, 2
1, 2, 3 ... , 29, 30
2, 3, 4, 5 ... , 11, 12
1, 2, 3, 4, ...
A continuous random variable can take any real numbered value within a certain range. It
is not possible, however, to make a list of the numerical values of the variable. Continuous
random variables are usually associated with measurement.
Continuous random variable
The height of an S6 pupil.
The true mass of a 2kg bag of flour.
The lifetime of a dog.
The height of a wave in the North Atlantic.
Possible range of values
1.3 m to 2.3 m
1.99 kg to 2.01kg
0 to 15 years
0.5 m to 12 m
NB
Random variables are usually named using upper case letters, e.g. X, Y, Z ... whereas the
values of the random variable are denoted by the corresponding lower case letters x, y, z….
Discrete Probability Distributions
The probability distribution of a discrete random variable X sets out the relationship
between the values of the random variable and their associated probabilities. It shows how
the total probability of 1 is distributed amongst the possible values of X. A formal
definition could be stated as follows:
X is a discrete random variable if:
• for each of its values x, 0 < P(X = x) < 1
• ∑ P( X = x ) = 1
The probability distribution of a discrete random variable X is often set out in tabular form
but can also be described using a formula.
Mathematics: Statistics (Higher) Teachers notes
27
Example
A discrete random variable X has probability distribution:
X
P(X = x)
Find
1
2k
2
3k
3
5k
4
3k
5
7k
(b) P(1 < X ≤ 4)
(a) the value of the constant k
Answer
(a) The sum of all the probabilities must be 1.
⇒ 2k + 3k + 5k + 3k + 7k = 1
20k = 1
1
k =
20
3
3
5
(b) P(1 < X ≤ 4) =
+
+
20
20
20
11
=
20
Example
A discrete random variable X has probability function given by:
P(X = x) = k (x + 2 )2 , x = 1 , 2 , 3 , 4 .
(a) Tabulate the probability distribution of X and find the value of the constant k .
(b) Find P(X < 4) .
Answer
(a)
X
P(X = x)
1
9k
2
16k
3
25k
4
36k
9k + 16k + 25k + 36k =
(b)
1
1
k =
86
9
16
25
P(X < 4) =
+
+
86
86
86
25
=
43
Mathematics: Statistics (Higher) Teachers notes
28
Example
The random variable H represents the number of Heads obtaining when 3 fair coins are
tossed. Find the probability distribution of H .
Answer
The random variable H can take the values 0, 1, 2, 3 . To evaluate the corresponding
probabilities we use a tree diagram.
First coin
Second coin
1
2
1
2
1
2
1
2
1
1 2
2
Tail
Head
HHT = 2 Heads
HTH = 2 Heads
1
2
Tail
Head
HTT = 1 Head
THH = 2 Heads
Tail
Head
THT = 1 Head
TTH = 1 Head
Tail
TTT = 0 Heads
Tail
1
2
Head
Tail
1
2
Outcome
HHH = 3 Heads
Head
Head
1
2
Third Coin
Head
1
2
1
2
1
2
Tail
1
2
As the results obtained on each coin are independent of each other, each branch of the tree
1 1 1 1
has probability × × = .
2 2 2 8
P(1 Head) = P(HTT) + P(THT) + P(TTH)
1
1
1
+
+
8
8
8
3
=
8
3
.
Similarly , P(2 Heads) =
8
The probability distribution of H can be set out as follows:
=
h
P(H = h)
0
1
8
1
3
8
2
3
8
3
1
8
.
Now try Exercise 10 - Discrete Probability Distributions.
Mathematics: Statistics (Higher) Teachers notes
29
Discrete Probability Distributions - Expectation and Variance
The mean or expected value of a random variable X is denoted by E(X) or µ and is
given by ∑ xP(X = x).
Example
x
0
1
2
P(X = x)
1
1
8
2
1
4
3
1
8
Find the expected value of X.
Answer
E(X) = 0 ×
1
1
1
1
+ 1× + 2 × + 3 × = 1
8
4
8
2
Example
A man buys 20 tickets out of a total of 1000 tickets sold in a raffle. The price of a ticket is
50p and there is only one prize of £100. Calculate the man's expected gain or loss.
Answer
x
P(X=x)
-10
980
E(X) = - 10 ×
1000
90
20
1000
20
980
+ 90 ×
1000
1000
= -8
The man would make an expected loss of £8 .
The variance of a random variable X is denoted by Var(X) or σ2 and is given by
Var(X) = E(X2) – {E(X)}2
Where E(X) = ∑ xP(X = x) and E(X2) = ∑ x 2 P(X = x) .
NB
(a) σ2 represents the variance of the population whereas s2 represents the variance of a
sample taken from the population.
(b) Since probabilities and squared real quantities are never negative we can deduce that
Var(X) ≥ 0 or E(X2) ≥ {E(X)}2.
(c) The standard deviation of X, denoted by SD(X) or σ, is simply the square root of the
variance of X.
Mathematics: Statistics (Higher) Teachers notes
30
Example
x
0
1
2
P(X = x)
1
1
8
2
1
4
3
1
8
Find the variance of X.
Answer
1
1
1
1
= 1
+3 x
+2 x
+1 x
8
4
8
2
1
1
1
1
1
= 24
+ 32 x
+ 22 x
+12 x
E(X 2 ) = 02 x
8
4
8
2
E(X) = 0 x
1
1
Var(X) = 2 4 - 12 = 1 4
Example
A box contains 2 yellow marbles and 3 green marbles. Two marbles are taken at random
without replacement. If G represents the number of green marbles selected, find Var(G).
Answer
Using a tree diagram the following probability distribution can be found:
g
P(G = g)
0
1
10
1
+1 x
10
1
+12
E(G 2 ) = 02 x
10
Var(G) = 1.8 - (1.2) 2
E(G) = 0 x
1
3
5
2
3
10
3
3
= 12
+2 x
.
10
5
3
3
= 1.8
+ 22 x
x
10
5
= 0.36
Now try Exercise 11 - Discrete Probability Distributions (Expectation and Variance) .
Discrete Probability Distributions - Simulation
The results of a random experiment can be modelled by the probability distribution of a
suitable discrete random variable. We now consider how results can be simulated from
such distributions.
Example
The discrete random variable X represents the number of heads when 3 unbiased coins are
tossed. X has the following probability distribution.
x
0
1
2
3
P(X = x)
1
8
3
8
3
8
1
8
Mathematics: Statistics (Higher) Teachers notes
31
Use the following sequence of calculator generated random numbers to simulate the
tossing of 3 unbiased coins on 24 occasions.
0.921 0.836 0.255 0.726 0.247 0.101 0.731 0.222 0.594 0.820
0.934 0.492 0.095 0.402 0.646 0.352 0.815 0.729 0.020 0.389
0.367 0.233 0.187 0.235 0.784 0.451 0.331 0.718 0.942 0.730
Answer
Firstly, convert the probabilities within the distribution into decimals.
x
0
1
2
3
P(X = x)
0.125
0.375
0.375
0.125
We can now assign the above random numbers (r) in the following way:
0.001 ≤ r ≤ 0.125 ⇒ x = 0
0.126 ≤ r ≤ 0.500 ⇒ x = 1
0.501 ≤ r ≤ 0.875 ⇒ x = 2
0.876 ≤ r ≤ 0.999 ⇒ x = 3
The 24 simulations are:
0.921 0.836 0.255 0.726 0.247 0.101 0.731 0.222 0.594 0.820
3
2
1
2
1
0
2
1
2
2
0.934 0.492 0.095 0.402 0.646 0.352 0.815 0.729 0.020 0.389
3
1
0
1
2
1
2
2
0
1
0.367 0.233 0.187 0.235
1
1
1
1
giving a total of 3 zeros, 11 ones, 8 ones and 2 threes. Theoretically, we would have
expected 3 zeros, 9 ones, 9 twos and 3 threes.
Now try Exercise 12 - Discrete Probability Distributions (Simulation).
Mathematics: Statistics (Higher) Teachers notes
32
Continuous Probability Distributions
A continuous random variable X can take any value within an interval on the real number
line. As there are an infinite number of possible values within this interval, it is not
possible to assign probabilities to each and every value within this interval. We would say
that P(X = x) = 0 for all possible values, x, within this interval. A continuous random
variable does not have a probability distribution but is described using the concept of
probability density. Consider the following example.
A sample of men's heights is taken and illustrated in the histogram below.
The histogram shows relative frequency density against height where
relative frequency density =
relative frequency
.
width of interval
Thus relative frequency = relative frequency density x width of interval
= height of bar x width of bar
= area of bar .
For large samples, the relative frequency becomes the probability. The area of each bar,
therefore, represents the probability associated with each interval.
Mathematics: Statistics (Higher) Teachers notes
33
As the number of intervals is increased (i.e. the width of each interval is being decreased
as the accuracy of the measurement is improved), we can see (above) that the overall
shape of the distribution of heights is tending towards a continuous curve . This curve is
called the probability density function. For a continuous random variable X, the
probability that X lies in a particular interval is represented by an area under the
probability density curve and can be found by integrating the probability density curve
over the given interval.
f(x)
probability
density
a
p
q
b
x
The probability density function (pdf) of a continuous random variable X is given by the
function f(x) such that
q
P(p ≤ X ≤ q) =
∫ f ( x)dx
where
p
•
f(x) ≥ 0 for all values of x (probabilities cannot be negative)
b
•
∫ f ( x)dx = 1
(for X defined on the interval a ≤ x ≤ b)
a
NB
(a) For any continuous random variable X, P(a ≤ X ≤ b) = P(a < X < b)
(b) The mode occurs at the maximum point on the probability density curve.
Example
The continuous random variable X has probability density function given by :
kx( 4 - x)
f(x) = 
 0
(a) Find k and sketch the graph of f(x) .
(b) Write down the mode of X.
(c) Calculate P(1 < X < 2).
Mathematics: Statistics (Higher) Teachers notes
for 0 ≤ x ≤ 4
elsewhere
34
NB
The statement that f(x) = 0 ‘elsewhere’ reminds us that our attention should be solely
restricted to the interval 0 ≤ x ≤ 4.
Answer
(a)
∫
4
∫
4
0
0
kx(4 - x) dx = 1
f(x)
(4kx - kx 2 ) dx = 1
4

kx 3 
2
2kx

 = 1
3 0

64k
= 1
32k 3
32k
= 1
3
3
k =
32
3
8
0
2
4
x
(b) From the graph of , we can see that f the mode occurs at x = 2 .
(If the pdf is a more complex function, it may be necessary to find the mode
by solving the equation f '(x) = 0 .)
(c) P(1 < X < 2) =
∫
2
1
=
=
3
32
∫
x( 4 - x) dx
2
3
1
( 8x-
3
16
x2 -
[
1
32
3
32
x 2 ) dx
x3
]
2
1
11
=
32
Now try Exercise 13 - Continuous Probability Distributions.
Continuous Probability Distributions - Expectation and Variance
The definition for the expected value and variance of a continuous random variable X,
defined on the interval a ≤ x ≤ b , are given below:
E(X) = ∫
b
a
x f(x) dx
and Var(X) = E(X 2 ) - {E(X)}
2
where E(X 2 ) = ∫ x 2 f(x) dx .
b
a
Mathematics: Statistics (Higher) Teachers notes
35
Example
The lifetime, X years, of an electrical component is a continuous random variable with pdf
given by:
 92 x(3 − x) for 0 ≤ x ≤ 3
f ( x) = 
0 elsewhere
Calculate
(a) E(X)
(b) Var(X)
(c) SD(X)
Answer
(a) E(X) = ∫
3
∫
=∫
=
=
x.
0
3
2
0
9
3
=
=
x 2( 3 - x) dx
2
2
9
3
1
18
x4
x2 .
2
9
-
2
9
x 3 ) dx
]
3
0
1
2
(b) E(X 2 ) = ∫
=
x( 3 - x) dx
( 3 x2 -
0
[x
=1
2
9
3
0
∫
3
∫
3
0
0
[x
1
6
2
9
x( 3 - x) dx
x 3( 3 - x) dx
2
( 3 x3 4
-
2
45
x5
2
9
x 4 ) dx
]
3
0
7
10
=2
Var(X) = E(X 2 ) - {E(X)}
2
7
1
= 2 10 - (1 2 ) 2
=
(c) SD(X) =
9
20
9
≈ 0.671
20
Now try Exercise 14 - Continuous Probability Distributions (Expectation and
Variance).
Mathematics: Statistics (Higher) Teachers notes
36
The Cumulative Distribution Function
The pdf of a continuous random variable does not directly calculate probabilities.
Probabilities can only be found indirectly by integrating the pdf over an interval.
However, a function which does calculate probabilities is the cumulative distribution
function (cdf). It is defined as follows:
 f ( x) for a ≤ x ≤ b
If X is a continuous random variable with pdf given by 
elsewhere
0
then the cumulative distribution function, F(x), is given by:
x
F(x) = P(X ≤ x) =
∫ f (t )dt
a
(t is a dummy variable since x has been used as the upper limit of integration).
EXAMPLE
The continuous random variable X has pdf given by :
 6 x( 5 - x)
f(x) =  125
0

for 0 ≤ x ≤ 5
elsewhere .
Find and sketch the cumulative distribution function F(x) .
Answer
F(x) =
∫
=
x
6
125
0
∫
[
x
0
(
t( 5 - t) dt
6
25
t-
3
25
t2 -
2
125
=
3
25
x2 -
2
125
=
1
125
=
6
125
t3
t 2 ) dt
]
x
0
x3
x 2( 15 - 2 x)
The cdf is then written as follows :
0

1 2
F( x) =  125 x ( 15 - 2 x)

1

A sketch of the cdf :
for x < 0
for 0 ≤ x ≤ 5
for x > 5
Mathematics: Statistics (Higher) Teachers notes
37
for a ≤ x ≤ b
 f(x)
A continuous random variable X has a pdf given by 
elsewhere .
 0
The above definition of the cdf allows us to find the value of the median , m ,
by solving any one of the equations below :
1
2
F(m) =
1
2
b
1
.
or P(X ≥ m) = ∫ f(x) dx =
m
2
Quartiles can also be found by solving similar equations :
l
1
• Lower quartile - solve ∫ f(x) dx =
a
4
u
3
• Upper quartile - solve ∫ f(x) dx =
.
a
4
or P(X ≤ m) = ∫
m
f(x) dx =
a
EXAMPLE
The continuous random variable X has pdf given by :
 18 (x + 2 )
for 1 ≤ x ≤ 3
f(x) = 
elsewhere .
 0
Find (a) the median value of X (b) the interquartile range of X .
Answer
m 1
1 8
(a) The median value is given by ∫
∫
m
1
1
(8 x +
[
1
16
(
1
16
1
) (
m2 + 4 m -
1
16
1
16
1
4
1
)
1
13
16
]
m
x1
.12 + 4 .1 =
m2 + 4 m -
1
2
.
1
2
1
=
2
) dx =
1
4
x2 +
(x + 2 ) dx =
1
2
= 0
m 2 + 4m - 13 = 0
- 4 ± 68
⇒ m = - 6.12 or 2.12
2
Hence the median value of X is m = 2.12 (since X is defined on the interval 1 ≤ x ≤ 3) .
Using the quadratic formula , m =
Mathematics: Statistics (Higher) Teachers notes
38
l
(b) The lower quartile is given by
∫
1
8
( x + 2)dx =
1
l
∫
1
8
( x + 2)dx =
1
16
x 2 + 14 x 1 =
1
[
]
l
1
4
1
4
( 161 l 2 + 14 l ) − ( 161 .12 + 14 .1) =
1
16
1
4
1
4
l 2 + 14 l − 169 = 0
l 2 + 4l − 9 = 0
− 4 ± 52
⇒ l = −5.61 or 1.61
2
Hence the lower quartile of X is l = 1.61 (since X is defined on the interval 1 ≤ x ≤ 3).
u
3
The upper quartile is given by ∫ 18 ( x + 2)dx =
4
1
Using the quadratic formula, l =
l
∫
1
8
( x + 2)dx =
1
16
x 2 + 14 x 1 =
1
[
]
u
3
4
3
4
( 161 u 2 + 14 u ) − ( 161 .12 + 14 .1) =
1
16
3
4
u 2 + 14 u − 17
16 = 0
u 2 + 4u − 17 = 0
− 4 ± 84
⇒ u = −6.58 or 2.58
2
Hence the upper quartile of X is u = 2.58 (since X is defined on the interval 1 ≤ x ≤ 3).
Using the quadratic formula, u =
The interquartile range = 2.58 – 1.61 = 0.97
Now try Exercise 15 - The Cumulative Distribution Function followed by
Exercise 16 - Continuous Probability Distributions (Miscellaneous Examples).
Mathematics: Statistics (Higher) Teachers notes
39
CORRELATION & LINEAR REGRESSION
Correlation
When starting to work with bivariate data i.e. data involving two variables, it is always
best to draw a scattergraph. The resulting scattergraph should give some indication of the
presence of a linear relationship (in this course, we will be concerned only with linear
relationships) between the variables and how strong this linear relationship might be. If
two variables are related in this way, they are said to have a linear correlation.
Consider the following examples.
Diagram 1
Diagram 2
80
70
60
50
40
30
20
10
0
0 50 10 15 20 25 30 35 40
90
85
80
75
70
65
8
12
16
20
24
strong , positive
linear correlation
moderate , negative
linear
linearcorrelation
correlation
Diagram 3
Diagram 4
6
20
5
4
15
3
10
2
5
1
0
0
1
2
3
4
5
6
zero
linear correlation
0
0
2
4
6 8 10
zero
linear correlation
The first diagram illustrates a strong positive linear correlation where both variables
increase together. The second diagram illustrates a moderate negative linear correlation
where one variable decreases as the other increases. In the third diagram, as one variable
increases, there appears to be no clear pattern as to how the other variable behaves - this is
an example of a zero linear correlation. The fourth diagram is also an example of zero
linear correlation although there appears to be some non-linear (possibly quadratic)
relationship between the variables.
In mathematical terms, the strength of the association between the two variables is
measured using a correlation coefficient. The most commonly used correlation
Mathematics: Statistics (Higher) Teachers notes
40
coefficient is Pearson's Product Moment Correlation Coefficient, r, and is defined as
follows:
r =
∑ (x - x )(y - y )
∑ (x - x ) ∑ (y - y )
2
or
2
S xy
.
S xx S yy
As the above form of the correlation coefficient can be difficult to calculate, we convert it
to the more useful version shown below:
r =

2
∑ x 
∑ xy (∑ x )
2
n
∑x ∑y
n

2
∑ y 
(∑ y )
2
n



(Most graphic calculators will calculate the correlation coefficient.)
The correlation coefficient, r, has the following properties:
•
•
•
-1 ≤ r ≤ 1
r > 0 positive correlation (Sxy positive)
r < 0 negative correlation (Sxy negative)
(Note that Sxx and Syy will always be positive)
r = 1 perfect positive correlation
r = -1 perfect negative correlation
r = 0 zero correlation
NB
(a) Care needs to be taken when interpreting a correlation coefficient. For instance, a high
level of correlation between variables A and B does not imply that A causes B or that
B causes A. It may well be that a third variable C causes both A and B. Alternatively,
the relationship between the variables may be coincidental - this is said to be an
example of spurious correlation.
(b) Any outliers within the data set can have a major effect on the value of the correlation
coefficient. If it can be established that these outliers are incorrectly recorded data
points then they may be removed from the data set and omitted from subsequent
calculations.
(c) Scattergraphs should be closely scrutinised before a correlation coefficient is
calculated. Take care that a single correlation coefficient has not been calculated for
data which are clearly separated into two or more distinct groups. Calculation of a
single correlation coefficient would be inappropriate in such circumstances.
Similarly, care must also be taken with data which appear as a single data set but,
after more careful scrutiny, can be separated into more than one distinct group. For
Mathematics: Statistics (Higher) Teachers notes
41
(d) example, different relationships may exist for males and females but these
relationships may go undetected if the data are analysed as a single data set.
Example
Student
IQ (x)
maths
score (y)
A
112
B
106
C
127
D
102
E
134
F
128
G
98
H
109
I
115
J
123
53
62
75
41
70
68
47
76
63
71
(a) Plot a scattergraph for the above data.
(b) Calculate the correlation coefficient and comment on the relationship between x and y.
maths score
Answer
(a)
80
70
60
50
40
30
20
10
0
95 100 105 110 115 120 125 130 135
IQ
(b)
Totals
IQ
maths score
x2
x
y
112
53
12544
106
62
11236
127
75
16129
102
41
10404
134
70
17956
128
68
16384
98
47
9604
109
76
11881
115
63
13225
123
71
15129
1154 626 134492
y2
2809
3844
5625
1681
4900
4624
2209
5776
3969
5041
40478
xy
5936
6572
9525
4182
9380
8704
4606
8284
7245
8733
73167
The summary statistics are:
n = 10, ∑ x = 1154, ∑ y = 626, ∑ x 2 = 134492, ∑ y 2 = 40478, ∑ xy = 73167
Mathematics: Statistics (Higher) Teachers notes
42
NB
The summary statistics may be given in an examination.
(∑ x )
2
S xx =
∑x 2
n
(∑ y )
= 134492 -
1154 2
= 1320.4
10
= 40478 -
626 2
= 1290.4
10
2
S yy = ∑ y
S xy =
r =
2
∑ xy
-
n
(∑ x )(∑ y )
n
S xy
926.6
=
S xx S yy
1154 x 626
= 926.6
10
= 73167 -
1320.4 x 1290.4
= 0.710 .
This represents a moderately strong positive correlation.
Now try Exercise 17 - Correlation.
Linear Regression
When a scattergraph suggests the presence of a linear correlation , it is useful to know the
equation of the best fitting straight line. An attempt could be made to draw this best fitting
line by eye and y = mx + c or y - b = m(x - a) used to determine its equation. Drawing a
line by eye, however, can be an unreliable method, particularly if the data are reasonably
well scattered. We now consider a method which will produce the equation of the best
fitting straight line - the method of least squares.
y
•
rn
•
(x2 , y2 )
•
•
r2
•, y
•
(xn
n
)
r3
( x •, y
εr1i
•
(x , y
1
3
1
3
)
)
0
x
For a set of bivariate data (x1, y1), (x2, y2), (x3, y3),…, (xn, yn), the equation of the best
fitting line is of the form y = α + βx. The difference between a predicted y-value (using
the equation) and its actual y-value is given by εi = (α + βxi) – yi. These εi are called
residuals (or errors). To obtain the best fitting straight line the εi must be reduced as much
as possible. Since the εi can be positive, negative or zero, we square them and proceed to
2
find values of α and β which minimise ∑ ε i .
Mathematics: Statistics (Higher) Teachers notes
43
n
Let Z = ∑ (α + βx i - y i ) 2 .
i =1
Since Z has to be a minimum with respect to both α and β , we partially differentiate as follows :
Treating β as a constant ..........
n
∂Z
= 2∑ (α + βxi - y i ) = 0 when Z is a minimum
∂α
i =1
n
∑
n
∑ xi -
+ β
i=1
i=1
n
∑y
= 0
i
i=1
nα + β n x - n y = 0
α + β x = y ..... equation (1)
NB This tells us that the point ( x , y ) always lies on the best fitting line .
Treating α as a constant ..........
n
∂Z
= 2∑ (α + β xi - y i )xi = 0 when Z is a minimum
∂β
i=1
n
n
α ∑ xi + β
∑ xi2 -
i=1
x x equation (1)
1
n
αx + β
:
1
β 
n
Subtracting gives
n
∑x
2
i
=
i=1
()

∑ x - (x ) 
αx +
:
∑x y
i
i
= 0
i=1
1 n 2
1 n
x
=
xi y i ..... equation (2)
∑ i n i∑
n i=1
=1
Dividing through by n gives α x + β
equation (2)
i=1
n
β x
2
n
2
2
i
1
n
n
∑x y
i
i
i=1
= xy
=
i=1
1
n
n
∑x y
i
i
- xy
i=1
β S xx = S xy
Substituting for β in equation (1) gives
S xy
β
=
α
= y - βx .
S xx
.
NB The above proof is beyond the scope of this course.
The equation of the least squares regression line of y on x is given by
y = α + βx
Assuming that we only have a random sample, the values of α and β have to be estimated
as follows:
βˆ = b =
S xy
S xx
=
αˆ = a = y - b x
∑ x∑ y
∑ xy ∑x
2
-
n
(∑ x )2
,
n
.
Mathematics: Statistics (Higher) Teachers notes
44
In line with current technology it is often the practice to use a for α̂ and b for β̂ so that
y = a + bx is our estimate of y = α + βx
NB
(a) a and b are calculated from samples and so are estimates of the population parameters
α and β.
(b) Most graphic calculators will calculate estimates a and b.
(c) Care needs to be taken when using the equation of the linear regression line for
prediction purposes. Interpolation, prediction within the range of the data, is
generally reliable, if the correlation is high. However, extrapolation, prediction
outwith the range of data, should be avoided as an unjustified assumption is being
made that the linear relationship extends outwith the range of data.
(d) Any outliers within a data set can have a considerable effect on the equation of the
regression line. If it can be established that the outliers are incorrectly recorded data
points then they may be removed from the data set and omitted from subsequent
calculations.
(e) Scattergraphs should be closely scrutinised before the equation of a regression line is
calculated. Take care that a single regression line is not being used to represent data
which are clearly separated into two or more distinct groups. Calculation of a single
regression equation would be inappropriate in such circumstances. For example, in
data involving both males and females, separate regression lines, one for each of the
sexes, may provide more reliable predictions.
Example
Student
IQ (x)
maths
score (y)
A
112
B
106
C
127
D
102
E
134
F
128
G
98
H
109
I
115
J
123
53
62
75
41
70
68
47
76
63
71
(a) Find the least squares regression line of y on x and draw it on a scattergraph.
(b) Predict the maths score of a student with an IQ of 100.
Mathematics: Statistics (Higher) Teachers notes
45
Answer
(a) From the correlation example above we have the following summary statistics :
n = 10 ,
b=
∑ x = 1154 , ∑ y = 626 , ∑ x
∑ xy
∑x
∑ x∑ y
2
-
n
(∑ x )2
n
2
= 134492 ,
∑y
2
= 40478 , ∑ xy = 73167 .
1154 x 626
926.6
10
= 0.702 ,
=
2
1320.4
1154
134492 10
73167 =
626
926.6 1154
= - 18.383 .
x
10
1320.4 10
The equation of the regression line of y on x is y = -18.383 + 0.702 x .
a = y - bx =
80
70
60
50
maths score 40
30
20
10
0
95 100 105 110 115 120 125 130 135
IQ
(b) A student with an IQ of 100 ⇒ x = 100.
Prediction for his/her maths score is y = -18.383 + 0.72 x 100
= 51.817
≈ 52%
This should be a reliable prediction as we have been interpolating within the range of the
data and the correlation is moderately high.
Now try Exercise 18 - Linear Regression.
Mathematics: Statistics (Higher) Teachers notes
46
STUDENT EXERCISES – PREVIOUS KNOWLEDGE
Exercise 1 - Average/ Variability
1. Two sets of workers earn the following weekly wages.
Set A
Set B
96
80
98
90
100
100
102
105
105
102
108
110
98
120
Determine the mean and interquartile range for both groups.
Comment on your findings.
2. The scores awarded by two judges, A and B, in an ice-skating competition were as
follows:
Judge A
Judge B
9.7
8.8
9.5
9.0
9.7
9.1
9.8
9.2
9.9
8.8
9.6
9.8
9.0
9.4
9.1
9.8
9.1
9.1
9.3
9.3
For each judge find the mean score and the semi-interquartile range.
How does the scoring of each judge compare?
3. To check the weight (in grams) of biscuits in a packet, a random sample of 10
packets was weighed. The contents were as follows:
198, 198, 200, 200, 201, 201, 201, 202, 202, 203
Find the mean weight and the standard deviation of the sample
4. In a physics experiment, a student measured the electrical resistance of a piece of
wire. The same experiment is repeated 9 times. The results were as follows:
64.2
63.7
63.7
65.0
65.0
63.7
64.0
64.5
64.1
Calculate the mean and standard deviation of this sample.
5. The weights (kg) of five workers in a particular office block are given below.
69, 71, 73, 77, 82
(a) Calculate the mean and standard deviation of this sample.
(b) The office lift has a maximum safe load of 1000 kg. Suggest a safe limit for
the number of people in the lift.
6. For a random sample of five numbers Σx = 131 and Σx2 = 3451. Find the mean
and standard deviation of this sample.
7. Given that for a certain random sample of five numbers Σx = 225 and
Σx2 = 10165, find the mean and standard deviation of this sample.
Mathematics: Statistics (Higher) Students exercises
1
8. For a set of ten numbers Σx = 306 and Σx2 = 9746. Find the mean and sample
standard deviation.
9. The mean of 5, 9, 10, y, 14 is 10. Find the standard deviation of this sample of
numbers.
10. The eight forwards in a rugby team weigh (kg)
95, 90, 100, 85, y, 82, 102, and 100.
If the mean is 93.125 kg what is the standard deviation of this sample of
forwards?
Mathematics: Statistics (Higher) Students exercises
2
Exercise 2 - Exploratory Data Analysis (EDA)
1. The weights (in kilograms) of 11 members of a football team are given below.
59 65 68 68 75 72 81 79 75 75 72
(a) Draw a dot-plot of this data.
(b) What is the median and mode of this data?
2. The weights of 20 newly born babies are given below.
2.7
2.2
2.8
2.8
3.0
3.7
3.1
3.5
3.1
4.2
3.2
4.0
3.3
3.6
3.4
3.4
3.4
3.7
3.5
3.9
(a) Draw a dot plot of this data.
(b) Find the median and modal baby weight.
3. The times in seconds for 15 people to complete a jig-saw are:
64
53
76
51
48
83
53
67
68
64
56
55
74
45
60
(a) Draw a stem and leaf diagram using this data.
(b) Find the median time.
(c) What is the range of times?
4. A group of children were given a reading test when they were age 7 and another at
age 8. Their scores are given below.
Age 7
Age 8
376
332
341
369
350
332
388
385
326
298
350
356
304
323
361
397
328
337
383
404
310
366
335
415
326
314
392
370
328
315
374
422
342
290
426
400
294
311
381
399
(a) Draw a back to back stem and leaf diagram to illustrate this data.
(b) Find the median, mode and quartiles for each set of data.
(c) What conclusions, if any, can you draw?
5. The heart beat of 28 men and 28 women (at rest) was measured as part of an
experiment to measure the effect of exercise. The rates are given below.
Men
64
68
60
76
58
62
62
62
62
76
66
72
66
90
66
60
64
80
74
62
74
92
70
70
Mathematics: Statistics (Higher) Students exercises
84
68
74
68
3
Women
61
66
80
86
64
84
82
76
94
62
76
82
60
66
87
88
72
78
90
72
58
68
78
86
88
72
68
67
(a) Draw a back to back stem and leaf diagram to illustrate this data.
(b) Find the median, mode and quartiles for each set of data.
(c) What conclusions, if any, can you draw about the heart beat of men and
women?
6. The reaction times of a class of students was measured. The results were as
follows (in tenths of a second).
Median
Lower
Quartile
8
7
10
10
Girls
Boys
Upper Quartile
Min.
Max.
15
13
6
4
19
16
(a) Draw two box plots to compare the reaction times of the boys and the girls.
(b) Comment on the relative reaction times of the boys and the girls.
7. A random sample of 30 people were asked to record, to the nearest mile, the
distances that they travelled by car in one week. The results were as follows:
41
69
88
63
61
38
89
49
39
41
54
85
37
59
60
67
61
61
69
70
80
57
61
45
78
55
64
84
63
72
(a) Construct a stem and leaf diagram to represent this data.
(b) From the stem and leaf diagram identify the median and the quartiles of this
data.
(c) Draw a box plot to represent the data.
8. The temperatures (°F) during the month of September were as follows:
96
81
81
77
(a)
(b)
(c)
(d)
77
79
79
73
73
73
73
72
72
72
70
70
70
70
66
68
70
68
69
68
56
61
63
64
64
64
63
61
For this data draw a stem and leaf diagram.
Hence obtain a box plot for this distribution.
Use the box plot to identify any ‘outliers’, if any exist.
Comment on your results.
Mathematics: Statistics (Higher) Students exercises
4
9. A local Health Board compiled the following data relating to the length of stay of
patients in hospital. A random sample of 21 patients yielded the following data on
length of stay in days.
4
3
10
(a)
(b)
(c)
(d)
4
6
13
12
15
5
18
7
7
9
3
1
6
55
23
12
1
9
Determine the interquartile range.
Obtain the five number summary.
Identify any possible outliers.
Construct and interpret a box plot.
10. As part of an evaluation programme, a college gave a sample of students a nonverbal reasoning test. The scores of 25 randomly selected students are given
below.
91
102
95
88
(a)
(b)
(c)
(d)
96
96
111
129
106
124
105
112
116
115
101
82
97
121
86
104
118
127
102
66
98
Determine the interquartile range.
Obtain the five number summary.
Identify any possible outliers.
Construct and interpret a box plot.
11. As part of an experiment in Chemistry, a class was asked to measure the time, in
seconds, taken for a particular reaction to be completed. The results were
51
35
18
45
85
27
43
62
31
97
20
16
22
18
51
23
57
34
49
35
22
(a) Draw the box plot and use it to identify any possible outliers.
(b) Comment on your results.
Mathematics: Statistics (Higher) Students exercises
5
12. The data below gives the amount, in pounds, spent each day by 20 holiday makers.
43
67
68
(a)
(b)
(c)
(d)
39
61
93
65
58
62
72
60
71
49
51
51
63
47
62
70
52
Determine the interquartile range.
Obtain the five number summary.
Identify any possible outliers.
Construct and interpret a box plot.
13. The number of hours spent studying each week by 20 students is given below.
10
12
12
(a)
(b)
(c)
(d)
12
20
13
1
28
14
14
15
12
14
10
9
15
13
14
11
14
Determine the interquartile range.
Obtain the five number summary.
Identify any possible outliers.
Construct and interpret a box plot.
Mathematics: Statistics (Higher) Students exercises
6
Exercise 3 - Interpreting an EDA
1. Researchers who wished to find out if there was a relationship between death and age
obtained data from fifteen western countries. Their results are summarised below in
two boxplots.
30
25
20
15
10
5
Age
20 -25
Age
60 -65
Each boxplot shows the percentage of the population in the fifteen countries who
die between the ages 20 -25 and 60-65.
In what ways do the death rates differ between the two age groups?
2. In an attempt to compare the effectiveness of two different types of soil, identical
plants (azaleas) were grown and their heights measured after four weeks. The
resulting data was used to construct the boxplots below.
12
*
10
8
6
4
Soil A
Soil B
(a) From the evidence of the boxplots, is there any difference between the two
soils? Give reasons for your answer.
(b) Suggest an explanation for the outlier marked.
Mathematics: Statistics (Higher) Students exercises
7
3. Twenty five thirteen year old boys and girls took part in an experiment which
measured the time taken to complete a sorting task. Each student was blind
folded and asked to place each of four shapes in its correct place in a box. The
time taken, in seconds, was recorded. The resulting data is presented below in a
stem and leaf diagram.
Boys
Girls
8
9 9 9 8 6 6 5 3 2 2 1
9 6 5 5 4 4 3 2 1
5 1
2 0
3 4 means 34 seconds
1 3 6 7
2 0 1 2 2 3 5 6 7 7 7 8 8
3 0 0 0 0 1 1 3 4 5 7
4
5
(a) Find the median and the lower and upper quartiles for each group.
(b) Is there any difference in the performance between the groups? Justify your
answer.
4. A major retail company records the number of sales of two different computers
over a period of twenty weeks. The sales of Type A and Type B are shown below
in a back to back stem and leaf diagram.
Type A
Type B
9
8
9
7
5
8
6
5
6
5
5
9
5
5
3
5
5
2
2
3
3
4
5
6
7
0
9
0
1
1
6
4
2
3
6
4
4
6
5
8
5
9
8
9
9
56 means 56
Comment on any difference in the sales of the two types computer.
Mathematics: Statistics (Higher) Students exercises
8
5. A pharmaceutical company compared the weights (grams) of two groups of mice
in an experiment into the effectiveness of a weight control drug. The weights of
Group A and Group B are shown below in a back to back stem and leaf diagram.
Group A
Group B
9
7
9
7
9
8
6
5
7
6
6
4
2
6
1
4
5
3
1
2
1
1
0
4
5
6
7
8
9
8
7
1
1
5
2
1
6
2
1
6
5
3
5
3
5
5
5
6
8
7
9
8
67 MEANS 67G
IF GROUP A RECEIVED THE DRUG WHILE GROUP B RECEIVED A
PLACEBO, COMMENT ON THE EFFECTIVENESS OF THIS DRUG IN
REDUCING WEIGHT.
Mathematics: Statistics (Higher) Students exercises
9
PROBABILITY
Exercise 1 - Simple Probability
1. One card is drawn at random from a standard pack of 52 playing cards.
Find the probability of drawing
(a) the two of diamonds
(b) a Jack
(c) a black card
(d) a face card.
2. A fair die, numbered 1 to 6, is thrown once. Find the probability of obtaining
(a) a three
(b) an odd number
(c) a number greater than two
(d) a one or a four.
3. Unbiased 50p and 20p coins are tossed at the same time.
Find the probability of obtaining
(a) two tails
(b) a head and a tail.
4. A box contains 10 coloured pencils, 6 red and 4 blue.
(a) Find the probability of selecting at random
(i) a red pencil
(ii) a blue pencil.
(b) One red pencil is removed from the box.
Find the new probability of selecting at random
(i) a red pencil
(ii) a blue pencil.
5. One letter is selected at random from the word MISSISSIPPI.
Find the probability of selecting the letter(s)
(a) M
(b) P (c) I (d) M or S .
6. A game has a regular pentagonal spinner with faces numbered from 1 to 5. When
the spinner is spun, what is the probability of obtaining
(a) a number 5
(b) a number 1
(c) an even number
(d) a number less than 4
(e) a number greater than 4 ?
7. In a class of 32 children, 18 picked Art as their favourite school subject, 8 picked
Science and the rest picked PE. What is the probability that a child, chosen at
random from the class, picked PE as their favourite subject ?
Mathematics: Statistics (Higher) Students exercises
10
8. In a game of Scrabble, I have the following eight letters on my rack:
AE
O
O
D
P
S
V
As only seven letters are required, one of the other players removes one letter
without looking at it .
What is the probability that she takes:
(a) the A
(b) an O
(c) a vowel
(d) not a D
(e) P , S or V ?
9. Ahmed has a bag of 20 marbles. 7 of the marbles are red. He selects a marble at
random from the bag.
What is the probability that
(a) he gets a red marble?
(b) he gets a marble which is not red?
10. The distribution of pupils by age in a secondary school at the start of a new session
is given below.
Age in years 11
Frequency
73
12
116
13
123
14
128
15
106
16
99
17
90
18
45
A pupil is chosen at random from the school roll.
Find the probability that the pupil is
(a) 14 years old
(b) less than 16 years old .
11. On a particular day the output of eggs on two poultry farms was compared. Eggs
were graded as large, standard or small.
Farm A
Farm B
Large
81
129
Standard
243
215
Small
126
86
An egg is chosen at random from the entire output.
Find the probability that the egg will be
(a) graded small
(b) from Farm A
(c) from Farm B and graded standard.
Mathematics: Statistics (Higher) Students exercises
11
Exercise 2 - Sample Spaces & Further Simple Probability
1. Ruth goes for a meal with her friends to the local burger bar. To drink she can
have
either juice (J), milk (M) or tea (T). For eating she can have a chickenburger (C),
a beefburger (B) or a veggieburger (V).
Make a list of all the possible outcomes for her choice of meal.
2. A shop sells three flavours of ice cream , vanilla (V) , strawberry (S) and chocolate
(C). Cones are sold with a topping of raspberry (R) or mint sauce (M). Cones are
also available without a topping (W).
Make a list of all possible types of ice cream cone .
3. Three girls, Ann, Mary and Linda, decide to have a swimming competition. They
will do the crawl and then the backstroke.
If Ann wins the crawl and Mary wins the backstroke, record this outcome as AM.
(a) In a similar way, make a list of all the 9 possible outcomes.
(b) If only Ann and Mary take part in the competition there will be fewer possible
outcomes. List the outcomes in this case.
(c) If Judith also takes part in the competition, list all the possible outcomes for
the four competitors.
4. Packets of crisps contain free football team pictures. There are five different
pictures: Arsenal (A), Manchester United (M), Liverpool (L), Blackburn Rovers
(B) and Newcastle (N).
A mother buys two packets of crisps for her children. List all the possible
combinations of the cards when the packets are opened.
5. At a school Prizegiving three different sorts of prizes are given out. One is a Book
Token (B), one is a CD voucher (C) and the other a general gift voucher (G).
List all the possible outcomes for a girl who wins
(a) two prizes
(b) three prizes.
6. A fair die is rolled twice. List the sample space for this situation.
Find the probability of obtaining:
(a) a total of 6 from the two throws
(b) a total of 10 from the two throws
(c) a total between 7 and 11 , inclusive from the two throws
(d) a number on the second throw which is three times the number on the first
throw
(e) a number on the first throw which is double the number on the second throw .
Mathematics: Statistics (Higher) Students exercises
12
7. A (regular) pentagonal spinner, numbered 1 to 5, is spun and a fair die is thrown at
the same time.
List the sample space for this situation.
Find the probability of obtaining:
(a) a total of 4
(b) a total less than 7
(c) a total greater than 10
(d) the same number on the die and the spinner
(e) a win, if a win occurs when the number on the spinner is greater than or equal
to the number on the die.
8. Three unbiased coins are tossed at the same time. List the sample space for this
situation.
Find the probability of obtaining:
(a) three heads
(b) no heads
(c) one head and two tails
(d) at least one tail.
9. Four unbiased coins are tossed at the same time. List the sample space for this
situation.
Find the probability of obtaining:
(a) two heads and two tails
(b) four heads
(c) at least one head
(d) one head and three tails.
10.
(a) How many members of the sample space are there when five unbiased coins
are tossed together ?
(b) What is the probability of tossing five tails ?
Mathematics: Statistics (Higher) Students exercises
13
Exercise 3 - Mutually Exclusive & Exhaustive Events
1. When a card is selected from a pack of cards , the following outcomes can be used
to describe the result.
P
Q
R
S
:
:
:
:
A red card is obtained
A black card is obtained
A diamond is obtained
A club is obtained
(a) Write down pairs of outcomes which are mutually exclusive.
(b) Write down pairs of outcomes which are not mutually exclusive.
2. Which of the following pairs of events are mutually exclusive and/or exhaustive:
(a) X
Y
(b) X
Y
(c) X
Y
(d) X
Y
(e) X
Y
:
:
:
:
:
:
:
:
:
:
obtaining an even number on the roll of a die
obtaining an odd number on the roll of a die
selecting a heart from a pack of playing cards
selecting a queen from a pack of playing cards
winning a football match
losing a football match
a dog has two pups - both are black
a dog has two pups - one is female
it rains tomorrow
it is sunny tomorrow
3. When Annie , Fatima and Kate play a computer game, the probability that Annie
wins is ½ and the probability that Fatima wins is 3 8 . What is the probability that
they lose ?
4. When Motherwell football team play in a league match the probability that they
win is 0.3 and the probability that they draw is 0.4. What is the probability that
they lose ?
5.
S
A
5
4
3
1
8
7
C
2
6
B
Mathematics: Statistics (Higher) Students exercises
9
10
14
(a)
(b)
(c)
(d)
(e)
The sample space S = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, A = {2, 4, 6, 8}, B = {6, 7,
8} and C = {1, 2}.
Find P(A), P(B) and P(C).
Find P( A ), P( B ) and P( C ).
Find P( A ∪ B ), P( A ∪ C ) and P( B ∪ C ).
Which pair of events are mutually exclusive ?
Are events A, B and C exhaustive ?
6. For events A and B, P(A) = 0.43 , P(B) = 0.18 and P(A or B) = 0.5.
(a) Are events A and B mutually exclusive ?
(b) Are events A and B exhaustive ?
7. An integer is chosen at random from the set of integers from 1 to 25 inclusive.
What is the probability that the integer is
(a) at least 5
(b) a prime number greater than 11 ?
8. A large bag of crisps contains the following flavours: 10 cheese and onion, 6 salt
and vinegar and 4 plain. A packet of crisps is selected from the bag at random.
Find the probability that the packet of crisps is:
(a) cheese and onion
(b) not plain
(c) either cheese and onion or plain
(d) either salt and vinegar or plain
(e) neither cheese and onion nor salt and vinegar .
9. A box contains a 100 balls of different colours. The probability of obtaining a ball
of a particular colour is given in the table below.
Colour
Black
White
Red
Blue
Probability
0.33
0.41
0.19
0.07
What is the probability that a ball taken from the bag is:
(a) black or white
(b) neither red nor blue ?
10. A spinner with unequal sectors is numbered 1, 2, 3, 4, 5. The probability of
obtaining each number is shown in the table below:
Number
Probability
1
2
3
4
5
0.32
0.05
0.2
0.15
0.28
(a) The spinner is spun once. What is the probability of getting
i) either of the numbers 2 or 4
ii) a number less than 4 ?
Mathematics: Statistics (Higher) Students exercises
15
(b) The spinner is spun 200 times. Approximately how many times would you
expect to get
i) the number 3
ii) a number greater than 3 ?
11. Two dice are thrown together. Find the probability of getting
(b) a total of 10 or 11
(b) a total less than 6.
12. Two dice are thrown together and the difference between the two numbers is
recorded. So if the dice show a '2' and a '6', the difference recorded would be 4.
Find the probability of obtaining a difference of 2 or 4.
13. In an opinion poll, people were asked to state the party for which they were going
to vote. The probability that people vote Labour or Liberal Democrat is 0.6. The
probability that people vote SNP is 0.3.
(a) What is the probability that people vote for another party ?
(b) The probability of voting Labour is twice the probability of voting Liberal
Democrat. What is the probability that people vote SNP or Liberal Democrat ?
14. Every day Mrs Scott buys only one of the following newspapers: The Daily
Herald, The Moon or The Daily Post.
The probability of her buying The Daily Herald or The Moon is 56 .
The probability of her buying The Moon or The Daily Post is 12 .
Find the probability of her buying each newspaper.
Mathematics: Statistics (Higher) Students exercises
16
Exercise 4 - Independent Events
1. For each pair of events X and Y listed below, decide whether or not it is likely that
the events are independent.
(a) X : The sun is shining today.
Y : The sun will be shining tomorrow.
(b) X : It rains on Saturday this week.
Y : It rains on Saturday next week.
(c) A baby is born.
X : The baby has blue eyes.
Y : The baby's mother has blue eyes.
(d) A die is rolled and a coin is tossed.
X : The die produces a 4.
Y : The coin produces a tail.
(e) Alice and Karen are sisters.
X : Alice catches the cold.
Y : Karen catches the cold.
(f) X : David is good at Art.
Y : David is good at Physics.
2.
(a) For events A and B, P(A) = 0.3 and P(B) = 0.7. If A and B are independent
find P( A ∩ B ).
(b) For two events A and B, P(A) = 0.4 and P( A ∩ B ) = 0.3. Given that A and B
are independent, find P(B).
3. Two fair dice, each numbered 1 to 6, are rolled. Events A, B and C are defined as
follows:
A : The first die scores 1 B : The second die has an even score
C : Their total is 3
Which of the above events are independent ?
4. A coin is tossed and a die is thrown. A is the event that a tail is obtained on the
coin and B is the event that an even number is thrown on the die.
Write down the values of P(A), P(B) and P( A ∩ B ).
5. A card is drawn from a pack of playing cards and a die is thrown. Events X and Y
are as follows:
X - an ace is drawn from the pack
Y - a prime number is thrown on the die .
Write down the values of P(X), P(Y) and P( X ∩ Y ).
6. A die is thrown twice. Find the probability that
(a) two even numbers are obtained
(b) the same two numbers are obtained .
Mathematics: Statistics (Higher) Students exercises
17
7. A pentagonal spinner has five equal sections; three sections are coloured white and
two sections are coloured black. B is the event that the spinner points to Black
and W is the event that the spinner points to White.
(a) Find the following probabilities :
i)
P(B)
ii) P(W)
(b) The spinner is spun twice. Using your answers to (a) find the probabilities of
the following outcomes:
i) Black is obtained both times.
ii) A different colour is obtained on each spin.
iii) The same colour is obtained on each spin.
8. A box contains 8 blue cubes and 2 red cubes. A cube is taken out and replaced. B
is the event that a Blue cube is selected. R is the event that a Red cube is selected.
(a) Find the following probabilities
i)
P(B)
ii) P(R)
(b) If two cubes are taken in turn, use your answers to (a) to find the probabilities
of the following outcomes:
i) they are both blue
ii) they are different colours
iii) they are the same colour.
9. In a school 5% of the children have red hair and 25% wear glasses. If a child is
selected at random, what are the probabilities that they:
(a) have red hair and wear glasses
(b) have red hair but do not wear glasses.
10. Five questions in an examination are multiple choice in format with five possible
answers for each question. What is the probability of guessing the correct answers
to all five questions ?
Mathematics: Statistics (Higher) Students exercises
18
Exercise 5 - Tree Diagrams (With Replacement)
1. Ben and Julie are playing a game. Before they can start they must select a red card
from a standard pack of playing cards. After a card is selected it is returned to the
pack and the pack shuffled before the next player selects their card. Ben starts
first.
(a) Copy and complete the tree diagram by adding the appropriate probabilities to
each branch.
(b) Calculate the probability of each outcome shown on the tree diagram.
Ben's turn
Julie's turn
Outcome
Red and Red
Red
Red and Black
Black and Red
Black
Black and Black
(c) Find the probability that:
i) both children start the game on their first selection
ii) only one of them starts the game on their first selection
iii) neither of them starts the game on their first selection.
2. A bag contains 10 marbles, 7 are red and 3 blue. A marble is selected, and then
replaced. A second marble is then selected.
(a) Copy and complete the tree diagram by adding the appropriate probabilities to
each branch.
(b) Calculate the probability of each outcome shown on the tree diagram.
First marble
Second marble
Outcome
Red and Red
Red
Red and Blue
Blue and Red
Blue
Blue and Blue
(c) Find the probability of the following:
i) both marbles are blue
ii) both marbles are red.
Mathematics: Statistics (Higher) Students exercises
19
3. To pass your driving test you must pass a theory test and a practical driving test.
The probability of passing the theory test is 0.85 and the probability of passing the
practical test is 0.65.
(a) Copy and complete the tree diagram below.
Theory
Practical
Outcome
Pass and Pass
Pass
Pass and Fail
Fail and Pass
Fail
Fail and Fail
(b) What is the probability that someone:
i) passes both tests
ii) fails both tests
iii) passes only one of the tests ?
4. A bag contains 8 red balls and 4 green balls. A ball is drawn and then replaced
before a second ball is drawn. Draw a tree diagram to show all the possible
outcomes.
Find the probability that:
(a) two green balls are drawn
(b) the first ball is red and the second is green.
5. Draw a tree diagram to show the possible outcomes when two coins are tossed.
Include the probabilities on your tree diagram.
Find the probability of obtaining:
(a) two heads
(b) no heads
(c) only one head .
6. The probability of a footballer being injured before a match is 0.2. Brian has two
important games in one week.
Find the probability that he is able to play
(a) in both games
(b) in only one game
(c) in neither of the games.
7. Tariq and Ahmed play two sets of tennis together. The probability that Tariq wins
a set is 0.55.
Find the probability that:
(a) Tariq wins two sets
(b) Ahmed wins two sets
(c) they win one set each.
Mathematics: Statistics (Higher) Students exercises
20
8.
(a) Draw a tree diagram to show the possible outcomes when a coin is tossed three
times .
(b) Find the probability of obtaining :
i) 3 tails
ii) at least 2 heads
iii) exactly one head.
9. A fair die is thrown three times. What is the probability of throwing:
(a) 3 sixes
(b) exactly two sixes
(c) at least two sixes ?
10. When a seed is planted the probability that it will grow is 0.6. Three seeds are
planted.
Find the probability that
(a)
all three grow
(b)
one of them grows .
11. A bag contains 7 black and 3 white marbles. A marble is drawn at random and
then replaced. Two further draws are made, again with replacement. Find the
probability of drawing:
(a) three black marbles
(b) two white marbles
(c) at least two black marbles
(d) at least one white marble .
12. A card is drawn at random from a pack of 52 playing cards. The card is replaced
and a second card is drawn. This card is replaced and a third card is drawn.
What is the probability of drawing :
(a) three spades
(b) at least two spades
(c) exactly one spade ?
Mathematics: Statistics (Higher) Students exercises
21
Exercise 6 - Tree Diagrams (Without Replacement)
1. A bag contains 4 red discs and 4 green discs. Two discs are taken out of the bag.
Draw a tree diagram to illustrate the above probabilities.
(a) Calculate the probability that both discs are green.
(b) Calculate the probability that the discs are different colours.
2. Bill's hobby is walking. If it is a sunny day, the probability that he goes for a walk
is 0.95. If the day is not sunny, the probability that he goes for a walk is 0.75. The
probability that tomorrow will be sunny day is 0.7.
(a) Draw a tree diagram to illustrate these probabilities.
(b) Calculate the probability that Bill will go for a walk tomorrow.
3. Jack and Arnold play three rounds of golf. The probability that Jack wins the first
round is 0.5. If Jack wins a round, the probability of his winning the next round is
0.8. If Arnold wins a round, the probability of his winning the next round is 0.7.
(a) Draw a tree diagram to illustrate the above probabilities.
(b) Calculate the probability that Arnold wins all three rounds.
(c) Calculate the probability that Jack wins less than two rounds.
4. The probability that Lee will pass his driving test at the first attempt is 0.45. If he
fails his first test then the probability that he will pass the test on any subsequent
attempt is 0.8.
(a) Draw a tree diagram to illustrate these probabilities.
(b) Calculate the probability that Lee passes on his second attempt.
(c) Calculate the probability that he passes the test after three attempts .
5. A card is drawn at random from a standard pack of 52 playing cards. It is not
replaced. A second card is then drawn from the same pack.
Find the probability that
(a) both cards are hearts
(b) only one card is a spade
(c) both cards are aces
(d) neither card is a jack .
6. The probability that a tennis player wins her next match is 0.3 if she won her
previous match, but 0.8 if she lost her previous match. Find the probability that, if
a match is lost, the next two will be won.
7. A bag contains 10 pairs of socks - 5 blue pairs and 5 green pairs. Two socks are
taken from the bag at random. What is the probability that a colour match is
obtained ?
8. A box contains 4 blue marbles, 3 green marbles and 3 yellow marbles. Two
marbles are chosen at random from the box. What is the probability that:
(a) both marbles are green
(b) both marbles are the same colour
(c) both marbles are different colours ?
Mathematics: Statistics (Higher) Students exercises
22
9. There are 5 boys and 10 girls in a Higher Statistics class. Two pupils are chosen at
random.
What is the probability that:
(a) both are boys
(b) both are girls
(c) one is a boy and one is a girl ?
10. In a box of 200 electrical components 5 are known to be defective. Two
components are chosen at random. What is the probability that:
(a) both are defective
(b) neither are defective
(c) just one is defective ?
11. As part of a card trick a magician asks a member of his audience to select two
cards at random from a pack of 32 cards. His pack consists of 8 black cards, 8
white cards, 8 blue cards and 8 green cards. Find the probabilities that a player
selects 2 cards of the same colour.
12. In a batch of 500 packets of cereal 50 are known to be underweight. Three packets
of cereal are chosen at random. What is the probability that:
(a) all three are underweight
(b) none are underweight ?
13. The box of chocolates contains 8 soft centres, 5 hard centres and 7 nutty centres. 3
chocolates are chosen at random. Find the probability that:
(a) no hard centres were chosen
(b) the 3 chocolates had the same centres
(c) one of each kind was chosen.
14. A bag contains 3 black cubes, 4 yellow cubes and 3 white cubes. Three cubes are
chosen without replacement. Find the probability that the three cubes chosen are:
(a) all yellow
(b) all white
(c) one of each colour
If this selection process was repeated 6000 times, how often would you expect to
choose three black cubes ?
15. There are 6 boys and 9 girls in a class. Three children are chosen at random. What
is the probability that:
(a) all three are boys
(b) all three are girls
(c) one is a girl and two are boys ?
16. On a snooker table pocket X contains 3 colours and 6 reds. Pocket Y contains 2
colours and 4 reds. A ball is taken at random from pocket X and placed in pocket
Y. A ball is then chosen from pocket Y. What is the probability that the ball
taken from Y is red ?
Mathematics: Statistics (Higher) Students exercises
23
17. A student has only managed to revise 50% of the topics for a multiple choice
exam. If a question appears on a topic she has revised she will get that question
right. If she has not revised the topic then she will make a guess at one of the five
possible answers. The paper has 40 questions.
What mark do you expect her to get ?
18. A box of assorted chocolates contains p dark chocolates and q white chocolates.
Two chocolates are selected at random. Find, in terms of p and q, the probability
of choosing:
(a) two dark chocolates
(b) two white chocolates
(c) one of each sort.
Mathematics: Statistics (Higher) Students exercises
24
Exercise 7 - Combinations
1. Evaluate
(a) 7 C 3 (b)
6
C4 (c)
8
C1 (d)
5
C0
2. Verify that
 n 
n

 =  
n - r
r
for the cases (a) n = 9 , r = 3
(b) n = 7 , r = 4 .
3. A shop stocks 8 different kinds of cereal. In how many ways can 3 packets of
cereal, each of a different variety, be chosen ?
4. How many different combinations of five letters can be chosen from the letters A,
B, C, D, E, F, G, H if each letter is chosen only once ?
5. In how many ways can
(a) 3 stamps be chosen from a book of 10 different stamps
(b) a team of 14 players be selected from a pool of 17 footballers
(c) 6 representatives be chosen from 30 students
(d) a hand of 5 cards be dealt from a standard pack of 52 cards ?
6. Find the number of different combinations of two letters which can be made from
the letters of the word INTEGRAL. How many of these selections do not contain
a vowel ?
7. How many different hands of seven cards can be dealt from a suit of thirteen
cards? If one of the cards dealt is the ace, how many different hands of seven
cards are there ?
8. A team of five children is to be selected from a class of thirty children to compete
in an inter-school quiz competition. In how many ways can the team be chosen if
(a) any five children can be chosen
(b) the five chosen must include the oldest in the class ?
9. A debating team of 4 players is to be selected from 12 pupils. In how many ways
can the team be chosen if
(a) the best debater is to be included
(b) the best debater and the oldest pupil are to be included ?
10. A shop stocks eight different kinds of chocolate biscuits. In how many ways can a
shopper buy four packets of chocolate biscuits if
(a) each packet is a different kind
(b) two packets are the same kind ?
Mathematics: Statistics (Higher) Students exercises
25
11. A large box of chocolates contains nine different varieties. In how many ways can
four chocolates be chosen if
(a) all four are different varieties
(b) two are the same and the others different
(c) three are the same - and the fourth is different ?
12. A committee of 8 is to be formed from 12 men and 8 women. In how many ways
can the committee be selected given that
(a) it must consist of 5 men and 3 women
(b) it must have at least one member of each sex ?
Mathematics: Statistics (Higher) Students exercises
26
Exercise 8 - Combinations (Probability)
1. There are only 3 girls in a group of 8 pupils. A group of 5 pupils is to be selected.
Find the probability that all three girls in the group are selected.
2. A bag contains four black discs and one white disc. If two discs are removed at
random, what is the probability that the white disc is not removed ?
3. An exam consists of selecting 4 questions from a choice of 8 questions. The
questions are numbered 1, 2, 3, 4, 5, 6, 7 and 8. Assuming the questions are
selected at random, find the probability that a pupil's selection will include two
even numbered questions.
4. A box contains 10 cubes of which 4 are green and 6 are yellow. If 4 cubes are
selected at random, find the probability that 2 green and 2 yellow cubes are
selected.
5. Four letters are chosen at random from the word COMPLEX. Find the probability
that both vowels are in the group chosen.
6. A random sample of five children is chosen from a class of 8 girls and 12 boys.
What is the probability that the sample contains
(a) all boys
(b) at least one girl ?
7. A bag contains 8 red sweets, 5 yellow sweets and 3 green sweets. Two sweets are
selected. What is the probability that two sweets chosen at random are both red ?
8. Four cards are chosen at random from an ordinary pack of 52 playing cards. What
is the probability that the four cards
(a) are all black
(b) are all kings
(c) contain at least one king ?
9. There are 5 green, 4 yellow and 3 blue discs in a bag from which 4 discs are
chosen at random.
Find the probability that the 4 discs selected will contain
(a) exactly 3 blue discs (b) exactly 3 yellow discs
(c) at least one green disc.
10. From a well shuffled pack of 52 cards a hand of 7 cards is dealt.
Find the probability that the hand will contain
(a) 4 aces
(b) exactly 3 aces
(c) at least 3 aces.
Mathematics: Statistics (Higher) Students exercises
27
11. A hand of 6 cards is dealt from a shuffled pack of 52 cards. Find the probability
that the hand will contain
(a) all black cards
(b) exactly 5 black cards
(c) at least 5 black cards.
12. Three cards are dealt from a well shuffled pack of eight cards . The cards are
numbered 1, 2, 3, 4, 5, 6, 7, 8. Find the probability that
(a) the three cards are all even
(b) the product of the numbers drawn is odd.
Mathematics: Statistics (Higher) Students exercises
28
Exercise 9 - Simulation
In questions 1 - 4 use the list of random numbers below.
37057
33724
43737
16929
10131
83986
28633
15929
84478
98571
98419
85953
19659
31341
20877
76401
82213
52804
60265
34585
15412
07827
72335
19404
22353
68418
48740
25208
27881
54505
1. Simulate the results of tossing a coin 20 times. Start at the eleventh number on the
first row and work to the right.
2. Simulate the results of rolling an unbiased die 10 times. Start at the first number
on the third row and work to the right.
3. Simulate the selection of six numbers, from the numbers 1 - 49, for the national
lottery. Start at the sixteenth number on the first row and work to the right.
4. In a school there are 78 pupils in S5. Simulate the selection of 6 S5 pupils. Start
at the first number on the second row and work to the right.
In questions 5 - 8 use the calculator generated list of random numbers below.
0.925
0.312
0.240
0.017
0.118
0.930
0.622
0.817
0.617
0.334
0.043
0.086
0.853
0.012
0.451
0.674
0.881
0.982
0.807
0.455
0.114
0.997
0.374
0.696
0.989
0.798
0.124
0.492
0.773
0.805
0.670
0.198
0.597
0.701
0.700
0.552
0.450
0.404
0.464
0.868
0.985
0.398
0.606
0.882
0.544
0.338
0.467
0.229
0.925
0.257
0.633
0.117
0.077
0.371
0.638
0.219
0.286
0.628
0.624
0.717
5. In a card game, a hand of 7 cards is dealt from a standard pack of 52 cards.
Simulate the 7 cards dealt from the pack.
6. Simulate the selection of 10 dates from any non - leap year.
7. A driver, approaching a roundabout from the North, is equally likely to go South,
East or West. Simulate the directions taken by 10 drivers.
8. Simulate the results of rolling a biased die where P(3) = 0.5.
Mathematics: Statistics (Higher) Students exercises
29
Exercise 10 - Discrete Probability Distributions
1. Which of the following could describe discrete probability distributions ?
Find the value of k when a probability distribution is defined.
u
0
1
2
3
4
P(U = u)
1
3
1
6
k
1
6
1
4
(a)
v
-1
0
1
P(V = v)
0.35
k
0.55
w
2
3
4
5
P(W = w)
k
2
3
1
4
1
6
(b)
(c)
2. A discrete random variable X has probability distribution:
Find
x
1
2
3
4
5
P(X = x)
k
2k
3k
4k
5k
(b) P(X ≤ 4)
(a) the value of constant k
3. A discrete random variable Y has probability distribution :
Find
y
1
2
3
4
5
P(Y = y)
0.22
0.35
k
0.07
0.29
(a) the value of constant k
(b) P(3 ≤ Y ≤ 5)
(c) P(Y ≥ 2)
4. A discrete random variable S has probability distribution :
Find
s
1
2
3
4
P(S = s)
1
3
1
4
k
1
6
(a) the value of constant k
Mathematics: Statistics (Higher) Students exercises
(b) P(2 ≤ S < 4)
(c) P(S < 3)
30
5. A discrete random variable T has probability distribution :
Find
t
0
1
2
3
P(T = t)
k
6
k
4
k
3
k
2
(a) the value of constant k
(b) P(T ≥ 2)
(c) P(0 < T < 4) .
6. The probability function of a discrete random variable is given by:
P(X = x) = kx , x = 1 , 2 , 3 , 4 .
(a) Tabulate the probability distribution of X and find the value of the constant k.
(b) Find P(X < 3).
7. The probability function of a discrete random variable is given by:
1
P(Y = y) =
ky , y = 1 , 2 , 3 , 4 , 5 .
5
(a) Tabulate the probability distribution of Y and find the value of the constant k.
(b) Find P(2 < Y ≤ 5).
8. The discrete random variable S has the probability function given by:
P(S = s) = k(7 - s) , s = 0 , 1 , 2 , 3 , 4 .
Find
(a) the value of the constant k
(b) P(1 < S ≤ 4) .
9. The random variable X has the following probability distribution :
x
2
6
10
P(X = x)
p
0.25
q
where p and q are constants .
Given that P(X < 5) = P(X > 5) and P(X ≤ 6) = 3P(X > 6) find the values
of p and q .
Mathematics: Statistics (Higher) Students exercises
31
10. Find the probability distribution for each of the following random variables:
(a) H , the number of heads obtained when two fair coins are tossed.
(b) S , the number of sixes obtained when two normal dice are rolled.
(c) 30% of a population have blue eyes. Two people are selected at random. Find
the probability distribution of B, the number of people with blue eyes.
(d) T , the sum of the scores when two normal dice are rolled.
(e) D , the difference of the scores when two normal dice are rolled .
(f) G , the number of girls in a family of three children .
11. In a game, a fair die is rolled and a fair coin is tossed. If a head occurs then S is
the score on the die minus one. If a tail occurs then S is twice the score on the die.
Find the probability distribution of S.
12. Two tetrahedral dice each numbered 1, 2, 3, 4 are rolled together. Let S = the sum
of the two scores and let D = the difference between the two scores.
1
(a) Show that P(S = 7) = .
8
(b) Find the probability distribution of the random variable S.
1
(c) Show that P(D = 0) = .
4
(d) Find the probability distribution of the random variable D .
13. In a gambling game using a normal pack of playing cards, a heart wins 40p, a
diamond wins 20p, a spade loses 10p and a club loses 40p. Two playing cards are
selected at random, with replacement. The random variable X represents the
profit, in pence, made after each selection.
(a) Show that
1
1
(i) P(X = 10) =
(ii) P(X = - 50) =
8
8
(b) Find the probability distribution of X.
14. A fair die is rolled repeatedly until a six appears or 3 rolls of the die have been
made. The random variable R represents the number of rolls of the die.
5
(a) Show that P(R = 2) =
.
36
(b) Find the probability distribution of R.
(c) The random variable S represents the number of sixes. Find the probability
distribution of S.
15. A box contains three blue and two red pens. Three pens are taken at random from
the box. The random variable R is the number of red pens obtained. Find the
probability distribution of R.
16. Three committee members are to be selected from 5 men and 4 women. The
random variable M is the number of men appointed to the committee assuming the
selection is done at random. Find the probability distribution of M.
Mathematics: Statistics (Higher) Students exercises
32
Exercise 11 - Discrete Probability Distributions (Expectation and Variance)
1. Find the expected value of X for each of the following probability distributions.
x
1
2
3
4
P(X = x)
0.2
0.4
0.3
0.1
x
1
2
3
4
P(X = x)
1
12
1
6
1
3
5
12
( a)
( b)
x
-2
-1
0
1
2
P(X = x)
0.15
0.05
0.27
0.3
0.23
( c)
2. Y is a random variable with probability distribution given in the table below.
y
2
3
5
p
10
P(Y = y)
0.1
0.4
0.2
0.1
0.2
The expected value of Y is 5.2 . Find p.
3. Z is a random variable with probability distribution given in the table below.
z
1
2
3
4
5
P(Z = z)
0.15
x
0.1
y
0.25
The expected value of Z is 2.9 . Find x and y.
4. The probability function of a discrete random variable T is given by
P(T = t) = 2kt , t = 1, 2, 3, 4 and k is a constant.
Find k then E(T).
5. A man buys 10 tickets from a total of 500 tickets in a raffle where there is only
one prize of £40. The price of a ticket is 20p. If all the tickets are sold, calculate
his expected loss.
6. The School Fair runs a stall offering a £25 prize for a £2 stake to anyone who can
roll a total of eleven or more on two dice. Calculate the expected gain or loss
made by the school if 240 people take part.
Mathematics: Statistics (Higher) Students exercises
33
7. In a multiple-choice examination, a candidate is awarded three marks for a correct
answer but loses one mark for an incorrect answer. Each question has 5
alternative answers. Assuming a candidate selects answers at random, find the
expected marks gained or lost per question.
8. An unbiased tetrahedral die has four faces labelled 1, 2, 3, 4. If the die lands on
the face marked 1, the player has to pay 40p. If it lands on a face marked 2 or 3
the player wins 20p, and if it lands on the face labelled 4 then the player wins 10p.
(a) What is the expected profit or loss of the player on each roll of the die ?
(b) A fair game is one where the expected profit is zero on each roll of the die. To
ensure that the above game is fair , what should be the stake for each roll of the
die ?
9. Two cards are selected at random from a normal set of 52 playing cards (the first
card being replaced before the second card is selected). £1 is paid for selecting a
heart, 50p for selecting a diamond and nothing for selecting a club or a spade. The
entrance stake is also lost regardless of a win or loss. If the game is to be fair, how
much should the entrance stake be ?
10. Find the variance of X for each of the following probability distributions:
x
0
1
2
3
P(X = x)
0.15
0.3
0.35
0.2
(a )
x
-2
-1
0
1
2
P(X = x)
0.3
0.3
0.2
0.1
0.1
( b)
x
1
2
3
P(X = x)
1
3
1
2
1
6
( c)
x
-10
0
10
20
P(X = x)
1
5
3
10
2
5
1
10
( d)
Mathematics: Statistics (Higher) Students exercises
34
11. Y is a random variable with probability distribution given in the table below.
y
1
a
P(Y = y)
1
4
3
4
The variance of Y is 6.75 . Find a, given that it is positive.
12. The random variable Z has the following probability distribution:
z
3
4
5
P(Z = z)
b
a
b
where a and b are constants .
(a) Write down E(Z).
(b) Given that Var(Z) = 0.8 , find the values of a and b.
13. Birds of a particular species lay either 0, 1, 2, or 3 eggs in their nests. The random
variable N, the number of eggs laid, has the following probability distribution:
n
0
1
2
3
P(N = n)
0.25
0.35
0.3
0.1
Calculate the expectation and variance of N.
14. The number, X, of people queuing at a bus stop has the following probability
function:
P(X = x) = k(7 - x)(x + 1) , x = 1, 2, 3, 4, 5, 6 and k is a constant .
(a) Find k.
(b) Find the expected value and variance of X.
15. X represents the score when a single unbiased cubical die is rolled. Find E(X) and
Var(X).
16. A fair coin is tossed twice and the random variable T represents the number of
tails recorded. Calculate E(T) and Var(T).
17. Two fair cubical dice are rolled. The random variable S is the sum of their scores
and the random variable D is the difference between their scores.
(a) Calculate E(S) and Var(S).
(b) Calculate E(D) and Var(D).
18. A fair cubical die is rolled repeatedly until a six appears or three rolls of the die
have been made. The random variable R represents the number of rolls of the die.
Calculate E(R) and Var(R).
Mathematics: Statistics (Higher) Students exercises
35
19. A debating team of two has to be chosen from two boys and three girls. The
number of boys in the team is the random variable N. Find E(N) and Var(N).
20. Two counters are drawn without replacement from a box containing three blue and
five red counters. The random variable R represents the number of red counters
selected. Find E(R) and Var(R).
21. A committee of three has to be selected from three men and four women. The
number of women on the committee is the random variable W. Find E(W) and
Var(W).
Mathematics: Statistics (Higher) Students exercises
36
Exercise 12 - Discrete Probability Distributions (Simulation)
In the questions below use the following calculator generated list of random numbers.
0.921 0.836 0.255 0.726 0.247 0.101 0.731 0.222 0.594 0.820
0.934 0.492 0.095 0.402 0.646 0.352 0.815 0.729 0.020 0.389
0.367 0.233 0.187 0.235 0.784 0.451 0.331 0.718 0.942 0.730
1. For each of the following probability distributions simulate 10 observations.
x
0
1
2
3
P(X = x)
0.3
0.4
0.2
01
.
x
1
2
3
4
5
P(X = x)
0.2
0.2
0.4
0.1
0.1
(a)
(b)
x
1
2
3
4
P(X = x)
1
5
1
4
1
4
3
10
(c)
x
3
4
5
6
P(X = x)
1
10
7
20
3
10
1
4
(d)
2. The number of days taken by a builder to complete the construction of a garage is
represented by the discrete random variable N. N has the following probability
distribution:
n
5
6
7
8
9
P(N = n)
0.2
0.2
0.4
0.1
0.1
Simulate the times taken to complete 10 such constructions.
Mathematics: Statistics (Higher) Students exercises
37
3. An Advanced Higher Maths class has 5 students. The number of students who
attend class on a Friday is represented by the discrete random variable S. S has the
following probability distribution:
s
1
2
3
4
5
P(S = s)
0.05
012
.
0.31
0.45
0.07
Simulate the attendance at the class on 10 such Fridays.
4. The number of consecutive days spent in a hotel by business executives is
represented by the discrete random variable D. D has the following probability
distribution:
d
1
2
3
4
5
6+
P(D = d)
21
50
1
4
1
5
2
25
1
20
0
Simulate the length of stay at the hotel by 10 business executives.
Mathematics: Statistics (Higher) Students exercises
38
Exercise 13 - Continuous Probability Distributions
1.
(a) Sketch the graph of f(x) and verify that it is a probability density function.
 x
f(x) =  2
 0
(b) Find P( X > 1) .
for 0 ≤ x ≤ 2
elsewhere
2.
(a) Sketch the graph of f(x) and verify that it is a probability density function.
for 0 ≤ x ≤ 1
 1 (4 x + 3)
f(x) =  5
elsewhere
 0
(b) Find P(0 ≤ X ≤ 0.75).
3.
(a) Sketch the graph of f(x) and verify that it is a probability density function.
 3 x( 2 - x)
f(x) =  4
0

for 0 ≤ x ≤ 2
elsewhere
(b) Find P(1 < X < 2) .
4. The continuous random variable X has probability density function given by:
kx 2
for - 2 ≤ x ≤ 2
f(x) = 
elsewhere
 0
(a) Find k and sketch the graph of f(x).
(b) Calculate P(-1 < X < 0).
5. The continuous random variable X has probability density function given by;
 13 x
for 1 ≤ x ≤ k
=
f(x) 
elsewhere
 0
(a) Find k and sketch the graph of f(x).
(b) Calculate i) P( 32 < X < 2 ) ii) P(2 < X < 3)
6. The lifetime, X years, of a light bulb has a continuous probability distribution with
the following probability density function:
for 0 ≤ x ≤ 4
kx( 4 - x)
f(x) = 
0
elsewhere

(a) Find the value of constant k.
(b) Find the probability that the light bulb will last for less than one year.
Mathematics: Statistics (Higher) Students exercises
39
7. The length, X metres, of a certain species of snake has a continuous probability
distribution with the following probability density function:
kx 2( 3 - x)
for 0 ≤ x ≤ 3
f(x) = 
0
elsewhere

(a) Find the value of constant k.
(b) i) Find the proportion of snakes under 2 metres.
ii) Find the proportion of snakes between 1 metres and 3 metres in length.
8. When a boy throws a discus, the distance, X metres, it travels has a continuous
probability distribution with probability density function given by:
k( 900 - x 2 )
for 0 ≤ x ≤ 30
f(x) = 
0
elsewhere

1
(a) Show that k =
and sketch the graph of f(x).
18000
(b) Find the probability that he throws the discus further than twenty metres.
9. At a garage the volume of weekly sales, X, in thousands of gallons, has a
continuous probability distribution with probability density function given by:
kx( 2 - x)2
for 0 ≤ x ≤ 2
f(x) = 
0
elsewhere

(a) Find the value of the constant k.
(b) Find the probability that less than 1500 gallons are sold.
10. For each of the following random variables, sketch the probability density function
and calculate the modal value of X.
1 x
 for 0 ≤ x ≤ 6
(a)
f(x) =  3 18
 0
elsewhere
(b)
2
 x( 3 - x)
f(x) = 9
 0
(c)
12 x 2( 1 - x)
f(x) = 
0

(d)
0.0064 x 3( 5 - x)
f(x) = 
0

for 0 ≤ x ≤ 3
elsewhere
for 0 ≤ x ≤ 1
elsewhere
Mathematics: Statistics (Higher) Students exercises
for 0 ≤ x ≤ 5
elsewhere
40
Exercise 14 - Continuous Probability Distributions (Expectation and Variance)
1. A continuous random variable X has probability density function given by:
kx 2
for 0 ≤ x ≤ 3
f(x) = 
elsewhere
 0
where k is a positive constant. Find the values of
(a) k
(b) E(X)
(c) Var(X).
2. A continuous random variable X has probability density function given by:
 2(x + 1 )

f(x) =  3
 0
Calculate
(a) E(X)
for 0 ≤ x ≤ 1
elsewhere
(b) Var(X).
3. A continuous random variable X has probability density function given by:
3
 (1 - x 2 )
f(x) =  4

0
Calculate
(a) E(X)
for - 1 ≤ x ≤ 1
elsewhere
(b) Var(X).
4. A continuous random variable X has probability density function given by:
1
for - 12 ≤ x ≤ 12
f(x) = 
elsewhere
0
Calculate
(a) E(X)
(b) Var(X).
5. A continuous random variable X has probability density function given by:
kx 3
for 2 ≤ x ≤ 3
f(x) = 
elsewhere
 0
where k is a positive constant. Find the values of
(a) k
(b) E(X)
(c) Var(X) .
6. A continuous random variable X has probability density function given by:
k

f(x) =  x 4
 0
for 1 ≤ x ≤ 2
elsewhere
where k is a positive constant. Find the values of
(a) k
(b) E(X)
(c) Var(X).
Mathematics: Statistics (Higher) Students exercises
41
7. The incubation period , X days , for a particular disease is a continuous random
variable with probability density function given by:
k( 25 - x 2 )
for 0 ≤ x ≤ 5
f(x) = 
0
elsewhere

where k is a positive constant.
3
(a) Show that k =
250
(c) Calculate
i) the expected incubation time
ii) the probability that a particular individual will catch the disease during the
third day.
8. The lifetime of an electrical component is X years, where X is a continuous
random variable with probability density function given by:
kx 2( 6 - x)
0≤ x ≤6
f(x) = 
0
elsewhere

where k is a positive constant. Find the values of
(a) k
(b) the expected lifetime µ
(c) P(X < µ) .
9. In a Greek holiday resort the number, X hours, of sunshine per day from 7.00 a.m.
to 7.00p.m. is a continuous random variable with probability density function
given by:
 1
 [(x - 3 ) 2 + k ]
f(x) =  300

0
for 0 ≤ x ≤ 12
elsewhere
where k is a positive constant. Calculate
(a) k
(b) E(X)
(c) Var(X)
(d) the probability that , on any randomly chosen day, there will be
more than ten hours of sunshine.
Mathematics: Statistics (Higher) Students exercises
42
Exercise 15 - Cumulative Distribution Function
1. For each of the following, find the cumulative distribution function, F(x), of the
random variable X with probability density function given by:
for 2 ≤ x ≤ 3
1
(a) f ( x) = 
elsewhere
0
1
 ( 2 - x)
for 0 ≤ x ≤ 2
(b) f ( x) =  2
 0
elsewhere
3 2
 x
for - 1 ≤ x ≤ 1
(c) f ( x) =  2
 0
elsewhere
6 x - 1 - 3 x 2
for 0 ≤ x ≤ 1
(d) f ( x) = 
0
elsewhere

3
 x( 2 - x)
(e) f ( x) =  4

0
20 x 3( 1 - x)
(f) f ( x) = 
0

for 0 ≤ x ≤ 2
elsewhere
for 0 ≤ x ≤ 1
elsewhere
2. For each of the following, calculate the median of the random variable X with
probability density function given by:
1
 x
(a) f ( x) =  72
 0
x
1

(b) f ( x) =  8 + 4
 0
3 2
 x
(c) f ( x) =  2
 0
for 0 ≤ x ≤ 12
elsewhere
for 1 ≤ x ≤ 3
elsewhere
for - 1 ≤ x ≤ 1
elsewhere
2
 (x + 2)
(d) f ( x ) =  5

0
3
 x( 2 - x)
(e) f ( x) =  4

0
4 x( 1 - x )
(f) f ( x) = 
0

2
for 0 ≤ x ≤ 1
elsewhere
for 0 ≤ x ≤ 2
elsewhere
for 0 ≤ x ≤ 1
elsewhere
Mathematics: Statistics (Higher) Students exercises
43
3. The probability density function of the random variable X is given by:
for 0 ≤ x ≤ 1
2( 1 - x)
f ( x) = 
elsewhere
 0
Find (a) the median value of X
(b) the interquartile range of X .
4. The continuous random variable X has probability density function given by:
for 0 ≤ x ≤ 1
a + bx
f ( x) = 
elsewhere
 0
(a) Given that F(0.2) = 0.6, find the values of a and b.
(b) Calculate the median value of X.
5. For each of the following, find and sketch the probability density function of the
random variable X:
0

(a) F ( x) =  x 2
1

for x < 0
for 0 ≤ x ≤ 1
for x > 1
 0

(b) F(x) =  14(x - 4 )
 1

for x < 4
for 4 ≤ x ≤ 8
for x > 8
 0

(c) F(x) = 18(x 2 - 1 )
 1

for x < 1
for 1 ≤ x ≤ 3
for x > 3
 0

(d) F(x) = 2 x - x 2
 1

for x < 0
for 0 ≤ x ≤ 1
for x > 1
Mathematics: Statistics (Higher) Students exercises
44
Exercise 16 - Continuous Probability Distributions (Miscellaneous)
1. The random variable X has probability density function given by:
for 1 ≤ x ≤ 3
0.5
f ( x) = 
elsewhere
0
(a)
(b)
(c)
(d)
Sketch the probability density function of X .
Find P(1.5 ≤ X ≤ 2.5).
Find the cumulative distribution function of X.
Find the mean and median of X.
2. The random variable X has probability density function given by:
for 10 ≤ x ≤ 11
k - 10 + x
f ( x) = 
0
elsewhere

where k is a positive constant.
(a) Find k and sketch the probability density function of X.
(b) Find the cumulative distribution function of X.
(c) Find the mode and median of X.
3. The random variable X has probability density function given by:
 x2

f(x) =  21
 0
(a)
(b)
(c)
(d)
for 1 ≤ x ≤ 4
elsewhere
Sketch the probability density function of X.
Find E(X) and Var(X).
Find the cumulative distribution function of X.
Find the mode and median of X.
4. The random variable X has probability density function given by:
3
 ( 4 - x2 )
f(x) = 16

0
(a)
(b)
(c)
(d)
for 0 ≤ x ≤ 2
elsewhere
Sketch the probability density function of X.
Find E(X) and Var(X).
Find the cumulative distribution function of X
Find the mode and median of X.
Mathematics: Statistics (Higher) Students exercises
45
5. The random variable X has probability density function given by:
 3x 2 + 1

f(x) =  4
 0
(a)
(b)
(c)
(d)
for -1 ≤ x ≤ 1
elsewhere
Sketch the probability density function of X .
Find E(X) and Var(X).
Find the cumulative distribution function of X.
Find the mode and median of X.
6. The height, X metres, of a particular type of tree is a continuous random variable
with probability density function given by:
(a)
(b)
(c)
(d)
(e)
kx 2( 6 - x)
for 0 ≤ x ≤ 6
f(x) = 
0
elsewhere

Find the value of the constant k.
Find the expected height.
Calculate the probability that any tree chosen at random will be less than 4
metres high.
Find the cumulative distribution function of X.
Find the mode of X.
7. The age, X years, to which a newborn infant will live is a continuous random
variable with a probability density function given by:
(a)
(b)
(c)
(d)
(e)
kx 3( 100 - x)
for 0 ≤ x ≤ 100
f(x) = 
0
elsewhere

Find the value of the constant k.
Find the expected lifespan of the infant.
Calculate the probability that any infant chosen at random will live longer than
80 years.
Find the cumulative distribution function of X.
Find the mode of X.
8. The cumulative distribution function of a continuous random variable X is given
by:

0
x
F(x) =  ( 4 + x)
12
1

(a)
(b)
(c)
(d)
for x < 0
for 0 ≤ x ≤ 2
for x > 2
Find the probability density function of X.
Find the mode of X.
Find the median of X.
Find the lower quartile of X.
Mathematics: Statistics (Higher) Students exercises
46
9. The cumulative distribution function of a continuous random variable X is given
by:
 0

F(x) = 2 x 2 - x 4
 1

for x < 0
for 0 ≤ x ≤ 1
for x > 1
(a) Find the probability density function of X.
(b) Find the mode of X.
(c) Find P(0.3 < X < 0.6).
10. On any day the amount of time, X hours, that a person spends watching television
is a continuous random variable with cumulative distribution function given by:
(a)
(b)
(c)
(d)
(e)
for x < 0
 0
 20 x - x 2
F(x) = 
for 0 ≤ x ≤ 10
 100
for x > 10
 1
Find the probability density function of X.
Find the probability that on any day a person chosen at random will spend
between 6 and 8 hours watching television.
Find mode, mean and median of X.
Find the variance of X.
Find the interquartile range of X
Mathematics: Statistics (Higher) Students exercises
47
CORRELATION AND LINEAR REGRESSION
Exercise 1 - Correlation
Note that in the following questions “correlation coefficient” refers to Pearson’s
Product Moment Correlation Coefficient.
1. For each of the following sets of data:
(a) Plot a scattergraph.
(b) Calculate the correlation coefficient.
(c) Comment on the relationship between x and y.
A
2
3
4
5
6
x 1
y 4 11 11 18 20 24
B
x
y
1
20
2
12
3
16
C
x
y
1
2
2
5
3
9
4
10
D
x
y
1
0
2
9
3
4
4
2
4
12
5
8
5
4
5
7
6
8
6
1
6
3
2. The marks for 9 students in a maths and a physics test are given below.
Maths mark
Physics mark
55
70
66
48
42
60
73
73
81
79
57
72
64
69
74
81
37
70
(a) Plot a scattergraph for this data and calculate the correlation coefficient.
(b) Comment on the significance of both.
3. The data below relates to the systolic blood pressure and percentage body fat of
six heart patients.
Blood Pressure
Body Fat
110
16
120
9
135
20
135
22
140
25
150
23
(a) Plot a scattergraph for this data and calculate the correlation coefficient.
(b) Comment on the significance of both.
Mathematics: Statistics (Higher) Students exercises
48
4. The heights of 8 sons and their fathers were measured in attempt to establish a
link.
Fathers’ height (cm) 162
Sons’ height (cm)
180
170
182
176
179
180
182
187
187
190
185
192
199
192
175
(a) Plot a scattergraph for this data and calculate the correlation coefficient.
(b) Comment on the significance of both.
Mathematics: Statistics (Higher) Students exercises
49
Exercise 2 - Linear Regression
1.
(a) Find the regression line, using the method of least squares, for the following
data.
x
y
1
1
1
3
5
2
5
4
(b) Use the regression equation to predict y when x = 3.
2.
(a) Find the least squares regression line for the following data points
x
y
0
4
2
2
2
0
5
-4
6
-4
(b) Plot a scattergraph for these data points and the regression line.
3. A scientist is investigating the effectiveness of a particular weed killer. He uses
different concentrations of weed killer and counts the number of surviving weeds
in a fixed area. The results were as follows:
Concentration
(mg/litre)
No. of Weeds
(per 10 m2)
1
3
5
7
9
11
13
15
30
24
22
19
16
13
10
6
(a) Plot a scattergraph for this data.
(b) Obtain the least squares regression equation and plot it on your graph.
(c) Use the regression equation to predict the number of weeds when the
concentration of weed killer is 10 mg/litre.
4. Maximum heart rate decreases with age and is used as a guide for exercising
safely. The data below was obtained from a treadmill experiment.
Age(x)
26
40
43
44
41
27
40
39
40
26
Max. Heart Rate (y)
192
178
172
175
173
191
173
175
179
191
Mathematics: Statistics (Higher) Students exercises
50
(a) Plot a scattergraph of these data and comment on any relationship.
(b) Determine the least squares regression equation.
(Σx = 366, Σy = 1799, Σx2 = 13688, Σxy = 65329)
(c) Predict the maximum heart rate of a person who is 35 years old.
5. A government survey obtained the following data about the money spent annually
by a family of four on food.
Income (x)
(£1000s)
22
20
24
27
16
24
19
25
Expenditure on food (y)
(£100s)
50
45
42
44
37
26
39
43
(a) Determine the least squares regression equation for the data.
(Σx = 177, Σy = 326, Σx2 = 4007, Σxy = 7228)
(b) Predict the annual food expenditure of a family whose (disposable) income is
£21 000.
6. A Maths class has two tests per session, Christmas and summer. The marks for
ten randomly selected pupils are given below.
Christmas (x)
Summer (y)
69
75
42
66
43
63
40
63
100
78
80
73
90
73
77
68
47
62
68
65
(a) Plot a scattergraph of this data. Comment on the relationship between the
marks in both tests.
(b) Determine the equation of the least squares regression line.
(Σx = 656, Σy = 686, Σx2 = 47236, Σxy = 45956)
(c) Predict what a pupil who scored 50 at Christmas would get in the summer test.
7. A factory manager records the cost of production and the number of units
produced over a 6 month period.
Units produced (x) (1000s)
Cost of Production (y) (£1000s)
13.1
29.3
13.2
29.2
13.3
29.4
14.1
30.4
14.2
31.0
15.0
32.4
(a) Plot a scattergraph of this data and comment upon the nature of the
relationship between the number of units produced and the production costs.
(b) Obtain the least squares regression equation.
(Σx = 82.9, Σy = 181.7, Σx2 = 1148.19, Σxy = 2315.13)
Mathematics: Statistics (Higher) Students exercises
51
(c) Suggest a practical interpretation of the slope estimate.
(d) Estimate the production costs of 14 000 units.
8. When a metal bar is heated it expands. The amount by which it expands and the
increase in temperature of such a bar are given below.
Temperature rise (x)
(°C)
Expansion (y)
(cm)
50
100
150
200
250
300
0.35
0.85
1.20
1.54
1.92
2.32
(a) Plot a scattergraph for this data.
(b) Obtain the least squares regression equation.
(Σx = 1050, Σy = 8.81, Σx2 = 227500, Σxy =1766.5)
(c) Use this equation to predict the expansion of the rod when its temperature is
increased by 175 °C.
9. The stopping distance and velocity of six cars is given below.
Velocity (m/s)
(x)
Stopping distance (m)
(y)
9
14
18
23
27
32
15
22
37
50
77
93
(a) Plot a scattergraph for this data and comment on the nature of the relationship.
(b) Calculate the least squares regression equation.
(Σx = 123, Σy = 294, Σx2 = 2883, Σxy =7314)
(c) Comment on the practical significance of your answer.
(d) Can you predict the stopping distance of a car travelling at 120m/s ?
10. In a particular chemical reaction the volume of gas produced and the concentration
of acid used was measured.
Acid concentration (moles/litre) (x)
Volume of gas (cm3)
0.10
200
0.12
270
0.15
320
0.17
391
0.20
440
(a) Plot a scattergraph for this data.
(b) Obtain the least squares regression equation.
(Σx = 0.74, Σy = 1621, Σx2 = 0.1158, Σxy =254.87)
(c) Predict the volume of gas produced when the acid concentration is 0.16
moles/litre.
Mathematics: Statistics (Higher) Students exercises
52
11. The height and weight of 10 randomly selected male students are as follows:
Height (cm)
164
170
180
180
167
190
170
177
180
175
Weight (kg)
80
60
84
74
57
90
70
72
69
70
(a) Determine the least squares regression equation for the data.
(Σx = 1753, Σy = 726, Σx2 = 307839, Σxy = 127693)
(b) Plot the data points and the regression line.
(c) Use the regression equation to predict the weight of a student who is:
(i) 172 cm
(ii) 185 cm.
Mathematics: Statistics (Higher) Students exercises
53
Mathematics: Statistics (Higher) Students exercises
54
ANSWERS - PREVIOUS KNOWLEDGE
Exercise 1 - Average/Variability
1 Set A: mean = 101; Q1 = 98; Q3 = 105; IQR = 7.
Set B: mean = 101; Q1 = 90; Q3 = 110; IQR = 20.
The means of both groups are very similar. However the spread of B is much
greater.
2 Judge A: mean = 9.47; Q1 = 9.1; Q3 = 9.7; IQR = 0.6.
Judge B: mean = 9.23; Q1 = 9.0; Q3 = 9.4; IQR = 1.0.
Judge A has a higher mean, but his scores cover a smaller range of values.
3
n = 10; Σx = 2006 ; Σx 2 = 402428 ; mean = 200.6
1  2 ( Σx) 2 
 Σx −
 =
n - 1
n 
sample s tan dard deviation, s =
1
(2006) 2 
.
 402428 −
 = 165
9
9 
4
n = 9 ; Σx = 577.9 ; Σx 2 = 37109.77 ; mean = 64.21
sample s tan dard deviation =
1
1  2 ( Σx) 2 
(577.9) 2 
 Σx −
 =
 37109.77 −
 = 0.521
n -1 
n 
8
9 
5 (a) mean = 74.4 ; sample standard deviation = 5.18
(b) 1000/74.4 = 13.4 so a minimum of 13 people but 12 would be safer
6
n = 5 ; Σx = 131 ; Σx 2 = 3451 ; mean = 26.2
sample s tan dard deviation =
7
8
9
10
mean = 45 ;
mean = 30.6;
y = 12 ;
y = 91 ;
1  2 ( Σx) 2 
 Σx −
 =
n -1 
n 
1
(131) 2 
 3451 −
 = 2.17
4
5 
sample standard deviation = 3.16
sample standard deviation = 3.53
sample standard deviation = 3.39
sample standard deviation = 7.38
Exercise 2 - Exploratory Data Analysis
1
•
• • •
•
• • • • • •
59 65 68 72 75 79 81
(b)median = 72 kg mode = 75 kg
Mathematics: Statistics (Higher) Student Exercise Answers
1
2
•
•
•
•
•
• •
•
•
• •
• • •
•
• •
•
•
•
2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9
median = 3.4 ;
mode = 3.4
4
5
6
7
8
64 means 64 seconds
median = 60 seconds ;
5
1
0
4
3
3
4 (a)
8
3 3 5 6
4 4 7 8
6
range = 83-35 = 48 seconds.
Age 7
8 4 0
4
5 4 1 0
8 8 6 6 3
7 2 2
2
0
6
6
Age 8
29
30
31
32
33 5
34 1
35 0
36 1
37 0
38 1
39 2
40 0
41 5
42 2
6
9
4
3 5 8
7 9
4
6
356 means 356
(b) Age 7 median = 326
mode = 326 lower quartile = 310.75
upper quartile = 333.25
Age 8
median = 384
mode = none
lower quartile = 367
upper quartile = 399.25
(c) The higher median and lower and upper quartiles suggest an improvement in
reading from age 7 to age 8.
Women
5
8
4
8
8
8
4
8
2
7
2
8
2
7
1
6
2
6
2
6
4
8
0
6
2
6
0
6
0
Men
5
5
6
6
7
7
8
8
9
9
8
0
6
0
6
0
0
6
0
6
4
0
2
Mathematics: Statistics (Higher) Student Exercise Answers
2
6
2
2
8
4
2
8
4
2
8
4
2
4
4
2
72 means 72
Women
79
72
72
86
median
mode
Q1
Q3As
Men
70
66
66
74.5
The data suggests that the men’s heartbeats are, on average, slower than those of
women and have less variability
6.
Girls
Boys
4 5
6 7 8 9 10 11 12 13 14 15 16 17 18 19
Although the medians are the same, the boys appear to be quicker than the girls for
whom the maximum, minimum, and interquartile range are all smaller.
7
3
4
5
6
7
8
7
1
4
0
0
0
median
mode
Q1
Q3
8
1
5
1
2
4
9
5
7
1
8
5
9
9
1
1
8
9
3
3
4
7
9
3
3
9
65.5
61
54
70
54
8
5
5 6
6 1 1 3 3
6 6 8 8 8
7 0 0 0 0
7 7 7 9 9
8 1 1
9
9 6
72 means 72°F
4
9
2
70
4
4
2
3
3
Mathematics: Statistics (Higher) Student Exercise Answers
3
median
mode
Q1
Q3
72.5
70
65
73
As 96> Q3 .5×IQR , this is an outlier. This could be an instrument error or an error in
recording.
*
65
73
The boxplot shows a distribution skewed towards the lower end of the temperature
range. The interquartile range is 8 indicating that fifty percent of the temperatures lie
in this range.
9 (a) Interquartile range = 7 days
(b)
min
Q1
median
mode
Q3
max
1
4
7
1
11
55
(c) 23 and 55 days are both outliers.
(d)
*
*
1 2 3 4 5 6 7 8 9 10
This is a fairly symmetrical distribution with two significant outliers. These could be
patients with unusual conditions or, for example, elderly patients needing nursing care
but none is available.
10 (a) IQR = 19
(b)
Min
66
Q1
95.5
Median
102
Q3
115.5
Max
129
(c) There are no outliers.
(d)
66
95.5
115
129
11
Min=16, Q1=22, Median=35, Q3=51, Max=97
Mathematics: Statistics (Higher) Student Exercise Answers
4
*
16
22
35
51
There is one outlier at 97 seconds. The boxplot shows a distribution skewed towards
the lower end. Probably some explanation other than experimental/ human error .
12 (a) The interquartile range is 16.5
(b)
Min
39
Q1
51
Median
61.5
Q3
67.5
Max
93
(c) The value 93 is an outlier.
(d)
*
39
51
67.5
This is a distribution skewed towards the high end with an outlier at 93.
13 (a) The interquartile range is 2.5
(b)
Min
1
Q1
11.5
Median
13
Q3
14
Max
28
(c) The values 1 and 28 represent outliers.
(d)
*
11.5 14
There are a variety of possible explanations for the these two outliers.
*
Exercise 3 - Statistical Graphs
1 The median for the age group 20 -25 is lower than for the older age group,
suggesting that for the 15 countries the death rate is lower among this age group.
The range and interquartile range for the older age group is higher than for the 2025 group. This suggests that although the death rate in this group is higher, there is
a wider variation between the 15 countries for this age group. This could be
because of poverty, variation in welfare provision, exposure to disease,
environmental effects, etc.
2 The plants grown in soil A appear to be taller than those grown in soil B. Both
soils show similar distributions, with similar ranges and interquartile ranges, but
with soil A having a higher median. The outlier shown could arise from a
measurement error or the plant may have grown in a favourable position, receiving,
for example, more sunlight, fertiliser, etc., than the others.
3 (a) For the boys the median is 31, and the quartiles are 25.5 and 35.5. The
corresponding statistics for the girls are 27, 22 and 30.5
Mathematics: Statistics (Higher) Student Exercise Answers
5
(b) The girls completed this task more quickly. Their median time was lower and
the spread (interquartile range) was smaller.
4 There appears to be little difference between the sales of Type A or Type B. The
median and IQR of both is very similar.
5
The median of group A is lower than that of Group B, suggesting that the drug has
made a difference. However there is a greater spread of weights in this group
shown by the larger interquartile range.
Mathematics: Statistics (Higher) Student Exercise Answers
6
ANSWERS - PROBABILITY
Exercise 1 - Simple Probability
1
1
3
(c)
(d)
13
2
13
1
2
1
2
(b)
(c)
(d)
2
3
3
1
3
(b)
2
5
4
4 (a) (i) 0.6 (ii) 0.4 (b) (i)
(ii)
9
9
5
4
5 (a) (i) 0.6 (ii) 0.4 (b) (i)
(ii)
9
9
1
1
2
3
1
6 (a)
(b)
(c)
(d)
(e)
5
5
5
5
5
3
7
16
1
1
1
7
5
8 (a)
(b)
(c)
(d)
(e)
8
4
2
8
8
7
13
9 (a)
(b)
20
20
32
7
10 (a)
(b)
195
10
53
45
43
11 (a)
(b)
(c)
220
88
176
1
1
52
1
(a)
6
1
(a)
4
(a)
(b)
Mathematics: Statistics (Higher) Student Exercise Answers
7
Exercise 2 - Sample Spaces & Further Simple Probability
1
JC
JB
JV
MC
MB
MV
TC
TB
TV
2
VR SR
VM SM
VW SW
CR
CM
CW
3
(a) AA MM LL
AM MA LM
AL ML LA
4
AA
MA
LA
BA
NA
AM
MM
LM
BM
NM
5
(a) BB BC BG
CB CC CG
GB GC GG
6
11
21
31
41
51
61
(a)
7
11
21
31
41
51
(a)
8
12
22
32
42
52
62
5
36
12
22
32
42
52
1
10
AL
ML
LL
BL
NL
(b) AA MA
AM MM
AB
MB
LB
BB
NB
(b)
14
24
34
44
54
64
1
12
13
23
33
43
53
(b)
(c)
14
24
34
44
54
1
2
(c)
MA LA
MM LM
ML LA
MJ LJ
JA
JM
JL
JJ
AN
MN
LN
BN
NN
(b) BBB BBC BBG
BCB BCC BCG
BGB BGC BGG
13
23
33
43
53
63
(c) AA
AM
AL
AJ
15
25
35
45
55
65
5
9
CBB CBC CBG GBB GBC GBG
CCB CCC CCG GCB GCC GCG
CGB CGC CGG GGB GGC GGG
16
26
36
46
56
66
(d)
1
18
15
25
35
45
55
16
26
36
46
56
1
30
(d)
(e)
1
6
1
12
(e)
1
2
HHH HHT HTH THH HTT THT TTH TTT
1
1
3
7
(a)
(b)
(c)
(d)
8
8
8
8
Mathematics: Statistics (Higher) Student Exercise Answers
8
9
HHHH
HTHH
THHH
TTHH
3
(a)
8
HHHT
HHTH
HHTT
HTHT
HTTH
HTTT
THHT
THTH
THTT
TTHT
TTTH
TTTT
1
15
1
(b)
(c)
(d)
16
16
4
10
(a) 32
(b)
1
32
Exercise 3 - Mutually Exclusive and Exhaustive Events
1
(a) PQ , PS , QR , RS
(b) PR , QS
2
(a)
(b)
(c)
(d)
(e)
3
4
5
mutually exclusive
yes
no
yes
no
no
exhaustive
yes
no
no
no
no
1
8
0.3
2
3
1
,
,
5 10 5
3
7
4
(b)
,
,
5 10 5
1 1 1
( c)
,
,
2 2 2
(d) B and C are mutually exclusive
(a)
(e) 3 , 5 , 9 and 10 are in S but not in either of A , B or C so A , B and C are not exhaustive .
6
A and B are neither mutually exclusive nor exhaustive .
7
(a)
8
21
25
1
(a)
2
4
25
4
(b)
5
(b)
9
(a) 0.74
10
(i) (a) 0.2
(ii) (a) 40
(c)
7
10
(d)
1
2
(e)
1
5
(b) 0.74
(b) 0.57
(b) 86
Mathematics: Statistics (Higher) Student Exercise Answers
9
11
(a)
12
1
3
13
14
5
5
( b)
36
18
(a) 0.1
(b) 0.5
P(Post) =
1
1
1
, P(Herald) =
, P(Moon) =
2
3
6
Exercise 4 - Independent Events
1
(a) unlikely
(d) definitely
(b) likely
(e) unlikely
2
(a) 0.21
(b) 0.75
3
A & B and B & C are pairs of independent events .
4
5
1 1 1
,
,
2 2 4
1
1
1
,
,
13 2 26
6
(a)
1
4
(b)
1
6
7
(i)
(a)
2
5
(b)
8
(i)
9
10
(c) unlikely
(f) likely
4
5
16
(ii) (a)
25
(a)
3
5
1
5
8
(b)
25
(b)
(c)
17
25
(a) 0.0125
(b) 0.0375
1
= 0.00032
3125
Mathematics: Statistics (Higher) Student Exercise Answers
10
Exercise 5 - Tree Diagrams (With Replacement)
1
(a)
Ben's turn
1
2
Julie's turn
Red
Outcome
Red and Red
P(RR) = 0.25
Black
Red and Black
P(RB) = 0.25
Red
Black and Red
P(BR) = 0.25
Black
Black and Black
P(BB) = 0.25
Red
1
2
1
2
1
1
2
2
Black
1
2
(b)
(i) 0.25
2
(a) 0.09
3
(a)
Theory
0.15
0.65
Practical
Pass
Outcome
Pass and Pass
P(PP) = 0.5525
0.35
Fail
Pass and Fail
P(PF) = 0.2975
0.65
Pass
Fail and Pass
P(FP) = 0.0975
0.35
Fail
Fail and Fail
P(FF) = 0.0525
Pass
Fail
(i) 0.5525 (ii) 0.0525 (iii) 0.395
4
First ball
Second ball
Red
Outcome
Red and Red
1
3
Green
Red and Green
2
3
Red
Green and Red
Green
Green and Green
2
3
Red
2
3
1
3
Green
1
3
(a)
(iii) 0.25
(b) 0.49
0.85
(b)
(ii) 0.5
1
9
(b)
2
9
Mathematics: Statistics (Higher) Student Exercise Answers
11
5
First coin
Second coin
Head
Outcome
Head and Head
1
2
Tail
Head and Tail
1
2
Head
Tail and Head
Tail
Tail and Tail
1
2
Head
1
2
1
2
Tail
1
2
(a)
1
4
(b)
1
4
(c)
1
2
6
(a) 0.64
7
(a) 0.3025 (b) 0.2025 (c) 0.495
8
(a)
(b) 0.32
(c) 0.04
First coin
Second coin
1
2
1
2
Third Coin
Head
Outcome
HHH
1
2
Tail
Head
HHT
HTH
Tail
Head
HTT
THH
Tail
Head
THT
TTH
Tail
TTT
Head
Head
1
2
1
2
Tail
1
2
1
2
1
2
1
2
1
2
Head
Tail
1
2
1
2
1
2
Tail
1
2
(b) (i)
1
8
1
216
(ii)
1
2
(iii)
3
8
5
72
(c)
2
27
9
(a)
10
(a) 0.216 (b) 0.288
11
(a) 0.343 (b) 0.189 (c) 0.784 (d) 0.657
12
(a)
1
64
(b)
(b)
5
32
(c)
27
64
Mathematics: Statistics (Higher) Student Exercise Answers
12
Exercise 6 - Tree Diagrams (Without Replacement)
1
(a)
First disc
Second disc
Red
Outcome
Red and Red
Green
Red and Green
Red
Green and Red
Green
Green and Green
Goes for walk ?
Yes
Outcome
Yes and Yes
No
Yes and No
0.75
Yes
No and Yes
0.25
No
No and No
3
7
Red
1
2
4
7
4
7
1
2
Green
3
7
(b)
2
3
14
(a)
0.7
0.3
(c)
4
7
Sunny day ?
0.95
Yes
0.05
No
(b) 0.89
3
(a)
First round
0.7
Arnold
0.5
0.3
0.5
0.2
Jack
0.8
(b) 0.245
Second round
0.7
Arnold
0.3
0.2
Jack
0.8
0.7
Arnold
0.3
0.2
Jack
0.8
Third Round
Arnold
Outcome
AAA
Jack
Arnold
AAJ
AJA
Jack
Arnold
AJJ
JAA
Jack
Arnold
JAJ
JJA
Jack
JJJ
(c) 0.45
Mathematics: Statistics (Higher) Student Exercise Answers
13
4
(a)
1st attempt
2nd attempt
Pass .................................................. Pass at 2nd attempt
0.8
0.55
Fail
0.2
13
34
(c)
1
221
(a)
1
15
(b)
4
15
(c)
11
15
(a)
2
21
(b)
3
7
6
0.24
7
9
19
8
9
Pass ......................... Pass at 3rd attempt
0.2
Fail .......................... Fail at 3rd attempt
(c) 0.088
(b)
(a)
0.8
Fail
3
51
5
(c)
(d)
188
221
10
21
1
≈ 0.0005 (b)
1990
3783
≈ 0.951
3980
10
(a)
11
7
31
12
(a) 0.00095
13
(a)
75
101
≈ 0.493 (b)
≈ 0.0886
152
1140
14
(a)
1
30
(b)
1
120
15
(a)
4
91
(b)
12
65
(c)
39
≈ 0.049
796
(b) 0.729
17
2
3
24 correct out of 40
18
(a)
16
Outcome
Pass .......................................................................... Pass at 1st attempt
0.45
(b) 0.44
3rd attempt
(c)
(c)
p(p - 1)
(p + q)(p + q - 1)
(c)
14
≈ 0.246
57
3
; on 50 occasions
10
27
91
(b)
q(q - 1)
(p + q)(p + q - 1)
Mathematics: Statistics (Higher) Student Exercise Answers
(c)
2pq
(p + q)(p + q - 1)
14
Exercise 7 - Combinations
1
2
3
4
5
6
7
8
9
10
11
12
(a) 35
 9
(a)   =
 6
56
56
(a) 120
28 , 10
1716 , 924
(a) 142506
(a) 165
(a) 70
(a) 126
(a) 44352
(b) 15
(c) 8
(d) 1
 9
 7
 7
  = 84 (b)   =   = 35
 3
 3
 4
(b) 680
(b)
(b)
(b)
(b)
(c) 593775
(d) 2598960
23751
45 (assuming best debater and oldest pupil are different)
168
252
(c) 72
(b) 125474
Exercise 8 - Combinations (Probability)
1
2
3
4
5
5
28
3
5
18
35
3
7
2
5
6
(a)
7
7
30
8
(a)
9
(a)
10
(a)
11
(a)
12
(a)
33
646
46
833
1
55
1
7735
253
22372
1
14
(b)
613
646
(b)
(b)
(b)
(b)
(b)
1
270725
32
495
9
1547
3289
39151
1
2
15229
54145
92
(c)
99
46
(c)
7735
14927
(c)
156604
(c)
Mathematics: Statistics (Higher) Student Exercise Answers
15
Exercise 9 - Simulation
1
Let Heads be represented by the digits 0 , 1 , 2 , 3 and 4 and Tails by the digits
5 , 6 , 7 , 8 and 9 .
9
T
8
T
4
H
1
H
9
T
7
T
6
T
4
H
0
H
1
H
1
H
5
T
4
H
1
H
2
H
6
T
8
T
4
H
1
H
8
T
This simulation produced 11 Heads and 9 Tails .
2
Ignoring the digits 0 , 7 , 8 , 9 , the 10 rolls of the unbiased die are simulated as
follows :
4
3
3
1
5
2
1
6
5
5
.
After dividing the data into pairs and ignoring two digit numbers of 50 and
above (and any duplication which occurs) , the following selection is obtained :
40
4
3
11
12
41
37
24
.
Again, dividing the data into pairs and ignoring two digit numbers of 79 and
above (and any duplication which occurs) , the following selection is obtained pupils numbered
33
72
42
38
22
13
Mathematics: Statistics (Higher) Student Exercise Answers
.
16
Exercise 10 - Discrete Probability Distributions
1
1
12
(b) probability distribution with k = 0.1
(a) probability distribution with k =
(c) not a probability distribution since
2
3
4
5
6
1
15
(a) 0.07
1
(a)
4
4
(a)
5
2
3
(b) 0.43
1
(b)
2
2
(b)
3
x
1
(a)
(c) 0.78
7
(c)
12
13
(c)
15
2
3
9
, k =
k
2k
1
k
P(Y = y)
5
1
12
(a)
(b)
25
25
p = 0.5 , q = 0.25
3k
4k
2
2k
5
3
3k
54
0
1
2
1
4
0
1
2
1
1
4
25
36
10
36
(a)
h
10
> 1
4
(a)
y
8
i
(b)
P(X = x)
7
∑p
1
10
4
4k
5
(b)
3
10
5
, k =
k
1
3
(b)
4
5
(a)
P(H = h)
s
2
(b)
P(S = s)
1
36
b
0
1
2
P(B = b)
0.49
0.42
0.09
(c)
(d)
t
P(T= t)
2
1
36
3
1
18
4
1
12
t
P(T= t)
10
1
12
11
1
18
12
1
36
5
1
9
6
5
36
Mathematics: Statistics (Higher) Student Exercise Answers
7
1
6
8
5
36
9
1
9
17
d
0
1
2
3
4
5
P(D = d)
1
6
5
18
2
9
1
6
1
9
1
18
g
0
1
2
3
P(G = g)
1
8
3
8
3
8
1
8
1
12
4
1
6
(e)
(f)
11
s
P(S= s)
0
1
12
1
1
12
S
P(S= s)
10
1
12
12
12
2
1
6
3
5
6
8
1
12
1
12
1
12
1
12
s
2
3
4
5
6
7
8
P(S = s)
1
16
1
8
3
16
1
4
3
16
1
8
1
16
d
0
1
2
3
P(D = d)
1
4
3
8
1
4
1
8
(b)
(d)
s
-80
- 50
- 20
0
10
30
40
60
80
P(S = s)
1
16
1
8
3
16
1
8
1
8
1
8
1
16
1
8
1
16
13
14
r
1
2
3
P(R = r)
1
6
5
36
25
36
(b)
s
0
1
P(S = s)
125
216
91
216
(c)
Mathematics: Statistics (Higher) Student Exercise Answers
18
r
0
1
2
P(R = r)
1
10
6
10
3
10
15
m
0
1
2
3
P(M = m)
1
21
5
14
10
21
5
42
16
Exercise 11 - Discrete Probability Distributions (Expectation and Variance)
1
(a) 2.3
2
p=8
x = 0.4 , y = 0.1
1
k=
, E(T) = 3
20
a loss of £1.20
3
4
5
6
7
8
1
(b) 3 12
(c) 0.41
15
a loss of £20
a loss per question of 0.2 marks
(a) 2.5p (b) 2.5p
75p
17
(a) 0.94 (b) 1.64 (c)
(d) 84
36
a = 7
(a) E(Z) = 4a + 8b (b) a = 0.2 , b = 0.4
E(N) = 1.25 , Var(N) = 0.8875
1
3
46
(a) k =
(b) E(X) = 3 11 , Var(X) = 2 121
77
11
E(X) = 3 12 , Var(X) = 2 12
16
E(T) = 1 , Var(T) = 12
17
(a) E(S) = 7 , Var(S) = 5 56
9
10
11
12
13
14
17
(b) E(D) = 1 17
, Var(D) = 2 324
18
Mathematics: Statistics (Higher) Student Exercise Answers
19
19
755
18
E(R) = 2 36 , Var(R) = 1296
19
E(N) = 0.8 , Var(N) = 0.36
20
45
E(R) = 1 14 , Var(R) = 112
21
E(W) = 1 75 , Var(W) =
24
49
Exercise 12 - Discrete Probability Distributions (Simulation)
(Random numbers have been selected from the top left hand corner and then to the
right using the first digit of each triple.)
1
(a)
(b)
(c)
(d)
2
Simulations are 9 , 8 , 6 , 7 , 6 , 5 , 7 , 6 , 7 , 8 .
3
Simulations are 4 , 4 , 3 , 4 , 3 , 2 , 4 , 3 , 4 , 4 .
4
Simulations are 4 , 3 , 1 , 3 , 1 , 1 , 3 , 1 , 2 , 3 .
Simulations are 3 , 2 , 0 , 2 , 0 , 0 , 2 , 0 , 1 , 2
Simulations are 5 , 4 , 2 , 3 , 2 , 1 , 3 , 2 , 3 , 4
Simulations are 4 , 4 , 2 , 4 , 2 , 1 , 4 , 2 , 3 , 4
Simulations are 6 , 6 , 4 , 5 , 4 , 4 , 5 , 4 , 5 , 6
.
.
.
.
Exercise 13 - Continuous Probability Distributions
1
(a)
f(x)
(b)
3
4
(b)
0.675
1
0
∫
2
(a)
2
2
0
x
f(x) dx = 1
f(x)
1.4
0
∫
1
1
0
x
f(x) dx = 1
Mathematics: Statistics (Higher) Student Exercise Answers
20
3
(a)
(b)
∫
2
0
4
f(x) dx = 1
(a)
(b)
k =
5
(a)
0.5
1
16
3
16
f(x)
(b) (i)
7
24
(ii)
5
6
7
3
1
3
6
7
8
0
1
k =
7
3
32
4
(a) k =
27
(a) k =
(a)
(b)
x
7
5
32
(b) (i)
16
27
(ii)
8
9
f(x)
(b)
13
8
20
0
9
(a) k =
3
4
15
(b)
30
4
27
x
243
256
Mathematics: Statistics (Higher) Student Exercise Answers
21
10
(a) f(x)
(b) f(x)
1
3
1
2
0
6
x
Modal value is 0 .
(d) f(x)
16
9
27
64
2
3
Modal value is 23
3
x
Modal value is 1.5 .
(c) f(x)
0
1.5
0
3
1 x
0
.
Modal value is 3 34 .
3
4
5
x
Exercise 14 - Continuous Probability Distributions (Expectation and Variance)
1
9
1
1
(a) k =
2
(a) E(X) =
3
(a) E(X) = 0
4
(a) E(X) = 0
5
(a) k =
6
(a) k = 3 7
(b) E(X) = 1 7
7
(b) (i) 1.875
(ii) 0.224
8
(a) k =
9
(a) k = 4
(b) E(X) = 2 4
5
9
4
65
3
1
108
(c) Var(X) =
27
80
13
162
1
(b) Var(X) =
5
1
(b) Var(X) =
12
(b) Var(X) =
(b) E(X) = 2 325 ≈ 2.6
194
2
(c) Var(X) =
(c) Var(X) =
≈ 0.0765
3
49
µ3
(8 - µ )
432
(b) µ = 3.6
(c)
(b) E(X) = 8.88
(c) Var(X) = 8.3136
Mathematics: Statistics (Higher) Student Exercise Answers
24242
316875
(d)
41
90
22
Exercise 15 - Cumulative Distribution Function
1
0

(a) F(x) = x
1

for x < 2
for 2 ≤ x ≤ 3
for x > 3

0
1
(b) F(x) =  4 x(4 - x)

1

 0
1
(c) F(x) =  2 x 3
 1

for x < 0
for 0 ≤ x ≤ 2
for x > 2
for x < -1
for -1 ≤ x ≤ 1
for x > 1

0
 2
(d) F(x) = 3x - x - x 3

1


0
1 2
(e) F(x) =  4 x (3 - x)

1


0
 4
(f) F(x) = x (5 - 4x)

1

2
3
(a) 8.49
(a) 0.29
4
(a) a = 7 , b = - 5 (b) 0.16
2
5
(a)
(b) 2.12
(b) 0.37
for x < 0
for 0 ≤ x ≤ 1
for x > 1
for x < 0
for 0 ≤ x ≤ 2
for x > 2
for x < 0
for 0 ≤ x ≤ 1
for x > 1
(c) 0
(d) 0.55
for 0 ≤ x ≤ 1
elsewhere
2x
f(x) = 
0
f(x)
(e) 1
 1
(b) f(x) =  4
 0
f(x)
for 4 ≤ x ≤ 8
elsewhere
1
4
2
0
(f) 0.54
1
x
Mathematics: Statistics (Higher) Student Exercise Answers
0
4
8
x
23
1
 4 x
(c) f(x) = 
 0
5
for 1 ≤ x ≤ 3
elsewhere
for 0 ≤ x ≤ 1
2 - 2x
(d) f(x) = 
 0
f(x)
elsewhere
f(x)
2
3
4
1
4
1
0
3 x
0
1
x
Exercise 16 Continuous Probability Distributions (Miscellaneous)
1
(a) f(x)
(b) 0.5
0

(c) F(x) = 05
.x
1

1
2
0
2
1
3
x
1
k =
2
(a) f(x)
3
2
for x< 1
fo
r 11≤ x ≤ 3
for
for x > 3
mode 0.5 , median 2
(d) mean
(b)

0
1
F(x) =  2 x(x -19)

1

(c)
mode 11 , median 10.62
for x < 10
for 10 ≤ x ≤ 11
for x > 11
1
2
0
3
10
11
x
(b) E(X) = 3 1 , Var(X) = 2067 ≈ 0.527
(a) f(x)
28
16
21
1
21
0
1
4
x
(c)
 0
1 3
F(x) =  x
63
 1

(d)
mode 4 , median 3.19
Mathematics: Statistics (Higher) Student Exercise Answers
3920
for x < 1
for 1 ≤ x ≤ 4
for x > 4
24
4
19
3
, Var(X) =
80
4

0
1
(c) F(x) =  16 x(12 - x 2 )

1

(a) f(x)
(b) E(X) =
3
4
2
0
5
(d) mode 0 , median 0.69
x
f(x)
(a)
(b) E(X) = 0 , Var(X) =
3
4
1
4
6
(a) k =
1
108
1
(b) E(X) = 3.6
0

 1 3
(d) F(x) =  432 x (8 - x)

1

(e) mode is 4
7
1
(a) k = 500 000
000
7
15

0
1
2
(c) F(x) =  4 x(x + 1)

1

-1
for x < 0
for 0 ≤ x ≤ 2
for x > 2
for x < -1
for -1 ≤ x ≤ 1
for x > 1
(d) mode -1 or 1 , median 0
x
16
27
for x < 0
for 0 ≤ x ≤ 6
(c)
for x > 6
(b) E(X) = 66 23
(c) 0.26272
0


1
(d) F(x) =  2 500 000 000 x 4 (125 - x)

1

for x < 0
for 0 ≤ x ≤ 100
for x > 100
(e) mode is 75
8
9
 1 (2 + x)
for 0 ≤ x ≤ 2
(a) f(x) =  6
 0
elsewhere
(b) mode is 2 (c) 1.16 (d) 0.65
4 x(1 - x 2 )
for 0 ≤ x ≤ 1
(a) f(x) = 
0
elsewhere

1
(b) mode is
(c) 0.4185
3
Mathematics: Statistics (Higher) Student Exercise Answers
25
 501 (10 - x)
10 (a) f(x) = 

0
(b) 0.12
5
(d) 5 9
for 0 ≤ x ≤ 10
elsewhere
1
(c) mode is 0 , mean is 3 3 , median is 2.93
(e) IQR is 3.66
Mathematics: Statistics (Higher) Student Exercise Answers
26
ANSWERS - CORRELATION AND LINEAR REGRESSION
Exercise 1 - Correlation
1 (a) Strong linear relationship with narrow spread or scatter- data is highly
correlated.
1
1
1
S XX = 91 − (21) 2 = 17.5 S YY = 1558 − (88) 2 = 267.3 S XY = 375 − (21)(88) = 67
6
6
6
S XY
67
r=
=
= 0.980
S XX S YY
(17.5)(267.33)
(b) Data appears to be linearly related with negative slope. Linear model seems
appropriate.
1
1
1
S XX = 91 − (21) 2 = 17.5 S YY = 1072 − (76) 2 = 109.3 S XY = 228 − (21)( 76) = −38
6
6
6
S XY
−38
r=
=
= −0.869
S XX S YY
(17.5)(109.33)
i.e. a strong negative correlation.
(c) Scattergraph shows a curvilinear relationship. Data are not independent although
r is close to zero (r = -0.102). There is a quadratic relationship in this example.
(d) No obvious relationship between data and coefficient is close to zero. (r = 0.113)
No relationship.
2
(a) Graph shows a weak positive relationship with a fair degree of scatter.
r = 0.365
(b) Both suggest that there is little evidence for claiming a linear
relationship between the maths and physics marks.
3 (a) Graph looks roughly linear; r = 0.758
(b) There is some evidence for a linear relationship between body fat and high
blood pressure.
4
(a) r = 0.389
(b) There is evidence of a linear relationship between father and son’s height up
to 190 cm.
Mathematics: Statistics (Higher) Student Exercise Answers
27
Exercise 2 - Linear Regression
1
x
1
1
5
5
Σ x = 12
y
1
3
2
4
Σ y = 10
xy
1
3
10
20
Σ xy = 34
x2
1
1
25
25
2
Σ x = 52
y2
1
9
4
16
2
Σ y = 30
x2
0
4
4
25
36
2
Σ x = 69
y2
16
4
0
16
16
2
Σ y = 52
1
1
Sxx = ∑ x 2 − (∑ x) 2 = 52 - (12) 2 = 16
4
n
1
1
Sxy = ∑ xy − (∑ x)(∑ y ) = 34 − (12)(10) = 4
4
n
The least squares regressiom line is y = α + βx
where,
Sxy
4
β is estimated by b =
=
= 0.25
Sxx 16
α is estimated by a = y − bx = 2.5 − 0.25 × 3 = 1.75
Thus the fitted least squares model is
yˆ = 0.25 x + 1.75
(b) 0.25 × 3 + 1.75 = 2.5
2
x
0
2
2
5
6
Σ x = 15
y
4
2
0
-4
-4
Σ y = -2
xy
0
4
0
-20
-24
Σ xy = -40
1
1
( ∑ x ) 2 = 69 - (15) 2 = 24
5
n
1
1
S xy = ∑ xy − ( ∑ x )( ∑ y ) = −40 − (15)( −2) = −34
5
n
S xx = ∑ x 2 −
-34
-34
)(3) = 3.85
. and a = -0.4 - (
= −142
24
24
Thus the fitted least squares model is
y = 3.85 - 1.42x
b=
Mathematics: Statistics (Higher) Student Exercise Answers
28
3.
x
1
3
5
7
9
11
13
15
Σ x = 64
y
30
24
22
19
16
13
10
6
Σ y = 140
xy
30
72
110
133
144
143
130
90
Σ xy = 852
x2
1
9
25
49
81
121
169
225
2
Σ x = 680
y2
900
576
484
361
256
169
100
36
2
Σ y = 2882
1
1
Sxx = ∑ x 2 − (∑ x) 2 = 680 - (64) 2 = 168
8
n
1
1
Sxy = ∑ xy − (∑ x)(∑ y ) = 852 − (64)(140) = −268
8
n
- 268
b=
= −1.60 and a = 17.5 - (-1.60)(8) = 30.3
168
Thus the fitted least squares model is
y = 30.3 - 1.60 x
(c) The predicted number of plants is 14.3.
4 (a) There appears to be a linear relationship between Maximum Heart Rate
and Age.
(b)
1
(366) 2 = 472.4
10
1
S XY = 65329 − (366)(1799) = −514.4
10
− 514.4
b=
= −1.09
472.4
a = 179.9 − (−1.09)(36.6) = 220
Equation is y = 220 - 1.09 x
S XX = 13868 −
(c) Max. Heart Rate of a 35 year old = 182
5
(a) y = 0.17 x + 37.0
(b) Family with income of £21 000 spends £4057 annually on food.
6
(a) There appears to be a positive linear relationship
(b) y = 0.23 x + 54
(c) Pupil scores 66 in summer test.
7
(a) There appears to be a strong positive linear relationship
(b) y = 1.66 x + 7.28
(c) cost per unit
(d) Production costs = £30 500
Mathematics: Statistics (Higher) Student Exercise Answers
29
8
(b) y = 0.008 x + 0.023
(c) Expansion is 1.42 mm
9
(a) There appears to be a strong positive linear relationship
(b) y = 3.56 x - 24.0
(c) Stopping distance can be predicted for speeds between 9 and 32m/s.
(d) No, speed is too great (>32m/s).
10 (b) y = 2380 x - 28.4
(c) Volume of gas produced is 352 cm3.
11 (a) y = 0.790 x - 65.9
(b) weight is 69.98 kg ; weight is 80.25 kg.
Mathematics: Statistics (Higher) Student Exercise Answers
30
Download