Uploaded by spammyfammy15

statistics ai hl

advertisement
statistics ai hl [614 marks]
1.
[Maximum mark: 6]
SPM.1.AHL.TZ0.17
Mr Burke teaches a mathematics class with 15 students. In this class there are 6
female students and 9 male students.
Each day Mr Burke randomly chooses one student to answer a homework
question.
In the first month, Mr Burke will teach his class 20 times.
(a)
Find the probability he will choose a female student 8 times.
(b)
The Head of Year, Mrs Smith, decides to select a student at random
from the year group to read the notices in assembly. There are 80
students in total in the year group. Mrs Smith calculates the
probability of picking a male student 8 times in the first 20
assemblies is 0.153357 correct to 6 decimal places.
Find the number of male students in the year group.
[2]
[4]
2.
[Maximum mark: 27]
SPM.3.AHL.TZ0.1
Two IB schools, A and B, follow the IB Diploma Programme but have different
teaching methods. A research group tested whether the different teaching
methods lead to a similar final result.
For the test, a group of eight students were randomly selected from each school.
Both samples were given a standardized test at the start of the course and a
prediction for total IB points was made based on that test; this was then
compared to their points total at the end of the course.
Previous results indicate that both the predictions from the standardized tests
and the final IB points can be modelled by a normal distribution.
It can be assumed that:
the standardized test is a valid method for predicting the final IB points
that variations from the prediction can be explained through the
circumstances of the student or school.
(a)
(b)
Identify a test that might have been used to verify the null
hypothesis that the predictions from the standardized test can be
modelled by a normal distribution.
[1]
State why comparing only the final IB points of the students from
the two schools would not be a valid test for the effectiveness of the
two different teaching methods.
[1]
The data for school A is shown in the following table.
For each student, the change from the predicted points to the final points
(f − p) was calculated.
(c.i)
Find the mean change.
[1]
(c.ii)
Find the standard deviation of the changes.
[2]
(d)
Use a paired t-test to determine whether there is significant evidence
that the students in school A have improved their IB points since the
start of the course.
[4]
The data for school B is shown in the following table.
(e.i)
(e.ii)
Use an appropriate test to determine whether there is evidence, at
the 5 % significance level, that the students in school B have
improved more than those in school A.
[5]
State why it was important to test that both sets of points were
normally distributed.
[1]
School A also gives each student a score for effort in each subject. This effort
score is based on a scale of 1 to 5 where 5 is regarded as outstanding effort.
It is claimed that the effort put in by a student is an important factor in improving
upon their predicted IB points.
(f.i)
Perform a test on the data from school A to show it is reasonable to
assume a linear relationship between effort scores and
improvements in IB points. You may assume effort scores follow a
normal distribution.
(f.ii)
Hence, find the expected improvement between predicted and final
points for an increase of one unit in effort grades, giving your
answer to one decimal place.
A mathematics teacher in school A claims that the comparison between the two
schools is not valid because the sample for school B contained mainly girls and
that for school A, mainly boys. She believes that girls are likely to show a greater
improvement from their predicted points to their final points.
She collects more data from other schools, asking them to class their results into
four categories as shown in the following table.
(g)
Use an appropriate test to determine whether showing an
improvement is independent of gender.
[3]
[1]
[6]
(h)
If you were to repeat the test performed in part (e) intending to
compare the quality of the teaching between the two schools,
suggest two ways in which you might choose your sample to
improve the validity of the test.
[2]
3.
[Maximum mark: 7]
EXN.1.AHL.TZ0.12
It is believed that the power P of a signal at a point d km from an antenna is
inversely proportional to d where n ∈ Z .
n
+
The value of P is recorded at distances of 1 m to 5
and log P are plotted on the graph below.
m
and the values of log
10
d
10
(a)
Explain why this graph indicates that P is inversely proportional to
d .
n
[2]
The values of log
(b)
10
d
and log
10
P
are shown in the table below.
Find the equation of the least squares regression line of log
against log d.
10
10
P
[2]
(c.i)
(c.ii)
Use your answer to part (b) to write down the value of n to the
nearest integer.
[1]
Find an expression for P in terms of d.
[2]
4.
[Maximum mark: 26]
EXN.3.AHL.TZ0.1
An estate manager is responsible for stocking a small lake with fish. He begins by
introducing 1000 fish into the lake and monitors their population growth to
determine the likely carrying capacity of the lake.
After one year an accurate assessment of the number of fish in the lake is taken
and it is found to be 1200.
Let N be the number of fish t years after the fish have been introduced to the
lake.
Initially it is assumed that the rate of increase of N will be constant.
(a)
Use this model to predict the number of fish in the lake when t = 8.
[2]
When t = 8 the estate manager again decides to estimate the number of fish in
the lake. To do this he first catches 300 fish and marks them, so they can be
recognized if caught again. These fish are then released back into the lake. A few
days later he catches another 300 fish, releasing each fish after it has been
checked, and finds 45 of them are marked.
(b)
Assuming the proportion of marked fish in the second sample is
equal to the proportion of marked fish in the lake, show that the
estate manager will estimate there are now 2000 fish in the lake.
[2]
Let X be the number of marked fish caught in the second sample, where X is
considered to be distributed as B(n, p). Assume the number of fish in the lake is
2000 .
(c.i)
Write down the value of n and the value of p.
[2]
(c.ii)
State an assumption that is being made for X to be considered as
following a binomial distribution.
[1]
The estate manager decides that he needs bounds for the total number of fish in
the lake.
(d.i)
Show that an estimate for Var(X) is 38. 25.
(d.ii)
Hence show that the variance of the proportion of marked fish in the
sample, Var(
X
300
[2]
, is 0. 000425.
[2]
)
The estate manager feels confident that the proportion of marked fish in the lake
will be within 1. 5 standard deviations of the proportion of marked fish in the
sample and decides these will form the upper and lower bounds of his estimate.
(e.i)
(e.ii)
(f )
Taking the value for the variance given in (d) (ii) as a good
approximation for the true variance, find the upper and lower
bounds for the proportion of marked fish in the lake.
[2]
Hence find upper and lower bounds for the number of fish in the
lake when t = 8.
[2]
Given this result, comment on the validity of the linear model used
in part (a).
[2]
The estate manager now believes the population of fish will follow the logistic
model N (t)=
L
1+C e −k t
where L is the carrying capacity and C,
k > 0
.
The estate manager would like to know if the population of fish in the lake will
eventually reach 5000.
(g.i)
(g.ii)
(h)
Assuming a carrying capacity of 5000 use the given values of N (0)
and N (1) to calculate the parameters C and k.
[5]
Use these parameters to calculate the value of N (8) predicted by
this model.
[2]
Comment on the likelihood of the fish population reaching 5000.
[2]
5.
[Maximum mark: 10]
EXM.1.AHL.TZ0.15
Adesh wants to model the cooling of a metal rod. He heats the rod and records
its temperature as it cools.
He believes the temperature can be modeled by T (t)
a, b ∈ R .
= ae
(a)
Show that ln (T
(b)
Find the equation of the regression line of ln (T
bt
+ 25
, where
.
[2]
− 25) = bt + ln a
− 25)
on t.
[3]
Hence
(c.i)
find the value of a and of b.
[3]
(c.ii)
predict the temperature of the metal rod after 3 minutes.
[2]
6.
[Maximum mark: 10]
EXM.1.AHL.TZ0.55
Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The
owner believes that the number of brown eggs in a box can be modelled by a
binomial distribution. He examines 100 boxes and obtains the following data.
(a.i)
Calculate the mean number of brown eggs in a box.
[1]
(a.ii)
Hence estimate p, the probability that a randomly chosen egg is
brown.
[1]
By calculating an appropriate χ statistic, test, at the 5% significance
level, whether or not the binomial distribution gives a good fit to
these data.
[8]
(b)
2
7.
[Maximum mark: 15]
EXM.1.AHL.TZ0.59
The heights, x metres, of the 241 new entrants to a men’s college were measured
and the following statistics calculated.
∑ x = 412.11,
(a)
∑x
2
= 705.5721
Calculate unbiased estimates of the population mean and the
population variance.
[3]
The Head of Mathematics decided to use a χ test to determine whether or not
these heights could be modelled by a normal distribution. He therefore divided
the data into classes as follows.
2
(b.i)
State suitable hypotheses.
[1]
(b.ii)
Calculate the value of the χ statistic and state your conclusion
using a 10% level of significance.
2
[11]
8.
9.
[Maximum mark: 14]
EXM.1.AHL.TZ0.58
The number of telephone calls received by a helpline over 80 one-minute
periods are summarized in the table below.
(a)
Find the exact value of the mean of this distribution.
[2]
(b)
Test, at the 5% level of significance, whether or not the data can be
modelled by a Poisson distribution.
[12]
[Maximum mark: 10]
EXM.1.AHL.TZ0.15
Adesh wants to model the cooling of a metal rod. He heats the rod and records
its temperature as it cools.
He believes the temperature can be modeled by T (t)
a, b ∈ R .
= ae
(a)
Show that ln (T
(b)
Find the equation of the regression line of ln (T
bt
+ 25
, where
.
[2]
− 25) = bt + ln a
− 25)
on t.
[3]
Hence
(c.i)
find the value of a and of b.
[3]
(c.ii)
predict the temperature of the metal rod after 3 minutes.
[2]
10.
[Maximum mark: 10]
EXM.1.AHL.TZ0.55
Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The
owner believes that the number of brown eggs in a box can be modelled by a
binomial distribution. He examines 100 boxes and obtains the following data.
(a.i)
Calculate the mean number of brown eggs in a box.
[1]
(a.ii)
Hence estimate p, the probability that a randomly chosen egg is
brown.
[1]
By calculating an appropriate χ statistic, test, at the 5% significance
level, whether or not the binomial distribution gives a good fit to
these data.
[8]
(b)
2
11.
[Maximum mark: 12]
EXM.2.AHL.TZ0.24
The hens on a farm lay either white or brown eggs. The eggs are put into boxes
of six. The farmer claims that the number of brown eggs in a box can be
modelled by the binomial distribution, B(6, p). By inspecting the contents of 150
boxes of eggs she obtains the following data.
(a)
Show that this data leads to an estimated value of p
(b)
Stating null and alternative hypotheses, carry out an appropriate
test at the 5 % level to decide whether the farmer’s claim can be
justified.
= 0.4
.
[1]
[11]
12.
[Maximum mark: 20]
EXM.2.AHL.TZ0.29
(a)
A horse breeder records the number of births for each of 100 horses
during the past eight years. The results are summarized in the
following table:
Stating null and alternative hypotheses carry out an appropriate test
at the 5% significance level to decide whether the results can be
modelled by B (6, 0.5).
(b)
(c)
Without doing any further calculations, explain briefly how you
would carry out a test, at the 5% significance level, to decide if the
data can be modelled by B(6, p), where p is unspecified.
[10]
[2]
A different horse breeder collected data on the time and outcome of
births. The data are summarized in the following table:
Carry out an appropriate test at the 5% significance level to decide
whether there is an association between time and outcome.
[8]
13.
[Maximum mark: 27]
In this question you will explore possible models for the spread of an infectious disease
EXM.3.AHL.TZ0.9
An infectious disease has begun spreading in a country. The National Disease
Control Centre (NDCC) has compiled the following data after receiving alerts
from hospitals.
A graph of n against d is shown below.
The NDCC want to find a model to predict the total number of people infected,
so they can plan for medicine and hospital facilities. After looking at the data,
they think an exponential function in the form n = ab could be used as a
model.
d
(a)
Use an exponential regression to find the value of a and of b, correct
to 4 decimal places.
[3]
Use your answer to part (a) to predict
(b.i)
the number of new people infected on day 6.
[3]
(b.ii)
the day when the total number of people infected will be greater
than 1000.
[2]
The NDCC want to verify the accuracy of these predictions. They decide to
perform a χ goodness of fit test.
2
(c)
Use your answer to part (a) to show that the model predicts 16.7
people will be infected on the first day.
[1]
The predictions given by the model for the first five days are shown in the table.
(d.i)
Explain why the number of degrees of freedom is 2.
[2]
(d.ii)
Perform a χ goodness of fit test at the 5% significance level. You
should clearly state your hypotheses, the p-value, and your
conclusion.
[5]
2
In fact, the first day when the total number of people infected is greater than
1000 is day 14, when a total of 1015 people are infected.
(e)
Give two reasons why the prediction in part (b)(ii) might be lower
than 14.
[2]
Based on this new data, the NDCC decide to try a logistic model in the form
n =
L
1+ce
−k d
.
Use the data from days 1–5, together with day 14, to find the value of
(f.i)
L
(f.ii)
c
(f.iii)
k
.
[2]
.
[1]
.
[1]
(g)
(h)
Hence predict the total number of people infected by this disease
after several months.
[2]
Use the logistic model to find the day when the rate of increase of
people infected is greatest.
[3]
14.
[Maximum mark: 29]
22N.3.AHL.TZ0.1
In this question, you will explore possible approaches to using historical
sports results for making predictions about future sports matches.
Two friends, Peter and Helen, are discussing ways of predicting the outcomes of
international football matches involving Argentina.
Peter suggests analysing historical data to help make predictions. He lists the
results of the most recent 240 matches in which Argentina played, in
chronological order, then considers blocks of four matches at a time. He counts
how many times Argentina has won in each block. The following table shows his
results for the 60 blocks of four matches.
(a)
Determine the mean number of wins per block of four matches for
Argentina.
[2]
Peter thinks that this data can be modelled by a binomial distribution with
n = 4
(b)
(c.i)
(c.ii)
and decides to carry out a χ goodness of fit test.
2
Use Peter’s data to write down an estimate for the probability p for
this binomial model.
[1]
Use the binomial model to find the probability that Argentina win
zero matches in a block of four matches.
[1]
Find the expected frequency for zero wins.
[2]
As some expected frequencies are less than 5, Peter combines rows in his table to
produce the following observed frequencies. He then uses his binomial model to
find appropriate expected frequencies, correct to one decimal place.
Peter uses this table to carry out a χ goodness of fit test, to test the hypothesis
that the data follows a binomial distribution with n = 4, at the 5% significance
level.
2
For this test, state
(d.i)
the null hypothesis;
[1]
(d.ii)
the number of degrees of freedom;
[1]
(d.iii) the p-value;
[2]
(d.iv) the conclusion, justifying your answer.
[2]
(e)
Using Peter’s binomial model, find the probability that Argentina will
win at least one of their next four international football matches.
Helen thinks that a better prediction might be made by considering the
transition between matches. To keep the model simple, she decides to use only
two states: Argentina won (A) or Argentina did not win (B). Helen looks at Peter’s
list of results and counts the number of times that:
Argentina won, twice in succession (AA),
Argentina won, then did not win (AB),
Argentina did not win, then won (BA),
[2]
Argentina did not win, twice in succession (BB).
She recorded the following results.
Helen uses the relative frequencies to estimate the probabilities in a transition
matrix.
(f.i)
Given that Argentina won the previous match, show that Helen’s
estimate for the probability of Argentina winning the next match is
17
29
.
[2]
(f.ii)
Write down the transition matrix, T , for Helen’s model.
(g.i)
Show that the characteristic polynomial of T is
2
1363λ
(g.ii)
− 1263λ − 100 = 0
.
Hence or otherwise, find the eigenvalues of T .
(g.iii) Find the corresponding eigenvectors.
(h)
In her retirement, many years from now, Helen is planning to travel
to three consecutive international football matches involving
Argentina. Use Helen’s model to find the probability that Argentina
will win all three matches.
[2]
[3]
[1]
[3]
[4]
15.
[Maximum mark: 8]
22M.1.AHL.TZ2.9
A psychologist records the number of digits (d) of π that a sample of IB
Mathematics higher level candidates could recall.
(a)
Find an unbiased estimate of the population mean of d.
[1]
(b)
Find an unbiased estimate of the population variance of d.
[2]
The psychologist has read that in the general population people can remember
an average of 4. 4 digits of π. The psychologist wants to perform a statistical test
to see if IB Mathematics higher level candidates can remember more digits than
the general population.
H0 :
μ = 4. 4
is the null hypothesis for this test.
(c.i)
State the alternative hypothesis.
[1]
(c.ii)
Given that all assumptions for this test are satisfied, carry out an
appropriate hypothesis test. State and justify your conclusion. Use a
5% significance level.
[4]
16.
[Maximum mark: 13]
22M.2.AHL.TZ1.3
A Principal would like to compare the students in his school with a national
standard. He decides to give a test to eight students made up of four boys and
four girls. One of the teachers offers to find the volunteers from his class.
(a)
Name the type of sampling that best describes the method used by
the Principal.
[1]
The marks out of 40, for the students who took the test, are:
25,
29,
38,
37,
12,
18,
27,
31.
For the eight students find
(b.i)
the mean mark.
[2]
(b.ii)
the standard deviation of the marks.
[1]
The national standard mark is 25. 2 out of 40.
(c)
(d)
Perform an appropriate test at the 5% significance level to see if the
mean marks achieved by the students in the school are higher than
the national standard. It can be assumed that the marks come from a
normal population.
[5]
State one reason why the test might not be valid.
[1]
Two additional students take the test at a later date and the mean mark for all
ten students is 28. 1 and the standard deviation is 8. 4.
For further analysis, a standardized score out of 100 for the ten students is
obtained by multiplying the scores by 2 and adding 20.
For the ten students, find
(e.i)
their mean standardized score.
[1]
(e.ii)
the standard deviation of their standardized score.
[2]
17.
[Maximum mark: 28]
22M.3.AHL.TZ2.2
This question compares possible designs for a new computer network
between multiple school buildings, and whether they meet specific
requirements.
A school’s administration team decides to install new fibre-optic internet cables
underground. The school has eight buildings that need to be connected by these
cables. A map of the school is shown below, with the internet access point of
each building labelled A–H.
Jonas is planning where to install the underground cables. He begins by
determining the distances, in metres, between the underground access points in
each of the buildings.
He finds AD =
(a)
,
89. 2 m DF = 104. 9 m
and ADĚ‚F =
83°
.
Find AF.
The cost for installing the cable directly between A and F is $21
(b)
[3]
310
.
Find the cost per metre of installing this cable.
Jonas estimates that it will cost $110 per metre to install the cables between all
the other buildings.
[2]
(c)
State why the cost for installing the cable between A and F would
be higher than between the other buildings.
[1]
Jonas creates the following graph, S , using the cost of installing the cables
between two buildings as the weight of each edge.
The computer network could be designed such that each building is directly
connected to at least one other building and hence all buildings are indirectly
connected.
(d.i)
(d.ii)
By using Kruskal’s algorithm, find the minimum spanning tree for S ,
showing clearly the order in which edges are added.
[3]
Hence find the minimum installation cost for the cables that would
allow all the buildings to be part of the computer network.
[2]
The computer network fails if any part of it becomes unreachable from any other
part. To help protect the network from failing, every building could be
connected to at least two other buildings. In this way if one connection breaks,
the building is still part of the computer network. Jonas can achieve this by
finding a Hamiltonian cycle within the graph.
(e)
(f )
(g)
State why a path that forms a Hamiltonian cycle does not always
form an Eulerian circuit.
[1]
Starting at D, use the nearest neighbour algorithm to find the upper
bound for the installation cost of a computer network in the form of
a Hamiltonian cycle.
Note: Although the graph is not complete, in this instance it is not
necessary to form a table of least distances.
[5]
By deleting D, use the deleted vertex algorithm to find the lower
bound for the installation cost of the cycle.
[6]
After more research, Jonas decides to install the cables as shown in the diagram
below.
Each individual cable is installed such that each end of the cable is connected to
a building’s access point. The connection between each end of a cable and an
access point has a 1. 4% probability of failing after a power surge.
For the network to be successful, each building in the network must be able to
communicate with every other building in the network. In other words, there
must be a path that connects any two buildings in the network. Jonas would like
the network to have less than a 2% probability of failing to operate after a
power surge.
(h)
Show that Jonas’s network satisfies the requirement of there being
less than a 2% probability of the network failing after a power surge.
18.
[5]
[Maximum mark: 5]
21N.1.AHL.TZ0.12
The following table shows the time, in days, from December 1st and the
percentage of Christmas trees in stock at a shop on the beginning of that day.
The following table shows the natural logarithm of both d and x on these days
to 2 decimal places.
(a)
Use the data in the second table to find the value of m and the value
of b for the regression line, ln x = m(ln d) + b.
[2]
(b)
Assuming that the model found in part (a) remains valid, estimate
the percentage of trees in stock when d = 25.
[3]
19.
[Maximum mark: 7]
21N.1.AHL.TZ0.14
On Paul’s farm, potatoes are packed in sacks labelled 50 kg. The weights of the
sacks of potatoes can be modelled by a normal distribution with mean weight
49. 8 kg and standard deviation 0. 9 kg .
(a)
Find the probability that a sack is under its labelled weight.
[2]
(b)
Find the lower quartile of the weights of the sacks of potatoes.
[2]
(c)
The sacks of potatoes are transported in crates. There are 10 sacks in
each crate and the weights of the sacks of potatoes are independent
of each other.
Find the probability that the total weight of the sacks of potatoes in
a crate exceeds 500 kg.
[3]
20.
[Maximum mark: 7]
21M.1.AHL.TZ1.11
A factory, producing plastic gifts for a fast food restaurant’s Jolly meals, claims
that just 1% of the toys produced are faulty.
A restaurant manager wants to test this claim. A box of 200 toys is delivered to
the restaurant. The manager checks all the toys in this box and four toys are
found to be faulty.
(a)
Identify the type of sampling used by the restaurant manager.
[1]
The restaurant manager performs a one-tailed hypothesis test, at the 10%
significance level, to determine whether the factory’s claim is reasonable. It is
known that faults in the toys occur independently.
21.
(b)
Write down the null and alternative hypotheses.
[2]
(c)
Find the p-value for the test.
[2]
(d)
State the conclusion of the test. Give a reason for your answer.
[2]
[Maximum mark: 6]
21M.1.AHL.TZ1.14
The weights of apples from Tony’s farm follow a normal distribution with mean
158 g and standard deviation 13 g . The apples are sold in bags that contain six
apples.
(a)
Find the mean weight of a bag of apples.
[2]
(b)
Find the standard deviation of the weights of these bags of apples.
[2]
(c)
Find the probability that a bag selected at random weighs more
than 1 kg.
[2]
22.
[Maximum mark: 8]
21M.1.AHL.TZ2.9
A newspaper vendor in Singapore is trying to predict how many copies of The
Straits Times they will sell. The vendor forms a model to predict the number of
copies sold each weekday. According to this model, they expect the same
number of copies will be sold each day.
To test the model, they record the number of copies sold each weekday during a
particular week. This data is shown in the table.
A goodness of fit test at the 5% significance level is used on this data to
determine whether the vendor’s model is suitable. The critical value for the test is
9. 49 .
(a)
Find an estimate for how many copies the vendor expects to sell
each day.
[1]
(b.i)
State the null and alternative hypotheses for this test.
[2]
(b.ii)
Write down the degrees of freedom for this test.
[1]
(b.iii) Write down the conclusion to the test. Give a reason for your answer.
[4]
23.
[Maximum mark: 18]
21M.2.AHL.TZ2.4
In a small village there are two doctors’ clinics, one owned by Doctor Black and
the other owned by Doctor Green. It was noted after each year that 3. 5% of
Doctor Black’s patients moved to Doctor Green’s clinic and 5% of Doctor Green’s
patients moved to Doctor Black’s clinic. All additional losses and gains of
patients by the clinics may be ignored.
At the start of a particular year, it was noted that Doctor Black had 2100 patients
on their register, compared to Doctor Green’s 3500 patients.
(a)
(b)
(c)
(d)
Write down a transition matrix T indicating the annual population
movement between clinics.
[2]
Find a prediction for the ratio of the number of patients Doctor Black
will have, compared to Doctor Green, after two years.
[2]
Find a matrix P , with integer elements, such that T
where D is a diagonal matrix.
[6]
Hence, show that the long-term transition matrix T
T
(e)
∞
=(
10
10
17
17
7
7
17
17
)
= PD P
∞
−1
,
is given by
.
Hence, or otherwise, determine the expected ratio of the number of
patients Doctor Black would have compared to Doctor Green in the
long term.
[6]
[2]
24.
[Maximum mark: 16]
21M.2.AHL.TZ2.2
It is known that the weights of male Persian cats are normally distributed with
mean 6. 1
kg
and variance 0. 5
2
kg
2
.
(a)
Sketch a diagram showing the above information.
[2]
(b)
Find the proportion of male Persian cats weighing between 5. 5
and 6. 5 kg.
kg
[2]
A group of 80 male Persian cats are drawn from this population.
(c)
Determine the expected number of cats in this group that have a
weight of less than 5. 3 kg.
[3]
The male cats are now joined by 80 female Persian cats. The female cats are
drawn from a population whose weights are normally distributed with mean
4. 5 kg and standard deviation 0. 45 kg .
Ten female cats are chosen at random.
(d.i)
Find the probability that exactly one of them weighs over 4. 62
(d.ii)
Let N be the number of cats weighing over 4. 62
kg
kg
[4]
.
Find the variance of N .
(e)
.
[1]
A cat is selected at random from all 160 cats.
Find the probability that the cat was female, given that its weight
was over 4. 7 kg.
[4]
25.
[Maximum mark: 28]
21M.3.AHL.TZ1.2
A firm wishes to review its recruitment processes. This question considers the
validity and reliability of the methods used.
Every year an accountancy firm recruits new employees for a trial period of one
year from a large group of applicants.
At the start, all applicants are interviewed and given a rating. Those with a rating
of either Excellent, Very good or Good are recruited for the trial period. At the end of this
period, some of the new employees will stay with the firm.
It is decided to test how valid the interview rating is as a way of predicting which
of the new employees will stay with the firm.
Data is collected and recorded in a contingency table.
(a)
Use an appropriate test, at the 5% significance level, to determine
whether a new employee staying with the firm is independent of
their interview rating. State the null and alternative hypotheses, the
p-value and the conclusion of the test.
The next year’s group of applicants are asked to complete a written assessment
which is then analysed. From those recruited as new employees, a random
sample of size 18 is selected.
The sample is stratified by department. Of the 91 new employees recruited that
year, 55 were placed in the national department and 36 in the international
department.
(b)
Show that 11 employees are selected for the sample from the
national department.
[6]
[2]
At the end of their first year, the level of performance of each of the 18
employees in the sample is assessed by their department manager. They are
awarded a score between 1 (low performance) and 10 (high performance).
The marks in the written assessment and the scores given by the managers are
shown in both the table and the scatter diagram.
The firm decides to find a Spearman’s rank correlation coefficient, r , for this data.
s
(c.i)
Without calculation, explain why it might not be appropriate to
calculate a correlation coefficient for the whole sample of 18
(c.ii)
(c.iii)
employees.
[2]
Find r for the seven employees working in the international
department.
[4]
Hence comment on the validity of the written assessment as a
measure of the level of performance of employees in this
department. Justify your answer.
[2]
s
The same seven employees are given the written assessment a second time, at
the end of the first year, to measure its reliability. Their marks are shown in the
table below.
(d.i)
State the name of this type of test for reliability.
[1]
(d.ii)
For the data in this table, test the null hypothesis, H : ρ = 0,
against the alternative hypothesis, H : ρ > 0, at the 5%
significance level. You may assume that all the requirements for
carrying out the test have been met.
0
1
(d.iii) Hence comment on the reliability of the written assessment.
The written assessment is in five sections, numbered 1 to 5. At the end of the
year, the employees are also given a score for each of five professional
attributes: V, W, X, Y and Z.
The firm decides to test the hypothesis that there is a correlation between the
mark in a section and the score for an attribute.
They compare marks in each of the sections with scores for each of the attributes.
[4]
[1]
(e.i)
Write down the number of tests they carry out.
(e.ii)
The tests are performed at the 5% significance level.
[1]
Assuming that:
there is no correlation between the marks in any of the sections
and scores in any of the attributes,
the outcome of each hypothesis test is independent of the
outcome of the other hypothesis tests,
find the probability that at least one of the tests will be significant.
(e.iii) The firm obtains a significant result when comparing section 2 of the
written assessment and attribute X. Interpret this result.
[4]
[1]
26.
[Maximum mark: 24]
21M.3.AHL.TZ2.1
Juliet is a sociologist who wants to investigate if income affects happiness
amongst doctors. This question asks you to review Juliet’s methods and
conclusions.
Juliet obtained a list of email addresses of doctors who work in her city. She
contacted them and asked them to fill in an anonymous questionnaire.
Participants were asked to state their annual income and to respond to a set of
questions. The responses were used to determine a happiness score out of 100. Of the
415 doctors on the list, 11 replied.
(a.i)
(a.ii)
Describe one way in which Juliet could improve the reliability of her
investigation.
[1]
Describe one criticism that can be made about the validity of Juliet’s
investigation.
[1]
Juliet’s results are summarized in the following table.
(b)
Juliet classifies response K as an outlier and removes it from the
data. Suggest one possible justification for her decision to remove it.
[1]
For the remaining ten responses in the table, Juliet calculates the mean
happiness score to be 52. 5.
(c.i)
Calculate the mean annual income for these remaining responses.
[2]
(c.ii)
Determine the value of r, Pearson’s product-moment correlation
coefficient, for these remaining responses.
[2]
Juliet decides to carry out a hypothesis test on the correlation coefficient to
investigate whether increased annual income is associated with greater
happiness.
(d.i)
State why the hypothesis test should be one-tailed.
[1]
(d.ii)
State the null and alternative hypotheses for this test.
[2]
(d.iii) The critical value for this test, at the 5% significance level, is 0. 549.
Juliet assumes that the population is bivariate normal.
Determine whether there is significant evidence of a positive
correlation between annual income and happiness. Justify your
answer.
[2]
Juliet wants to create a model to predict how changing annual income might
affect happiness scores. To do this, she assumes that annual income in dollars, X,
is the independent variable and the happiness score, Y , is the dependent
variable.
She first considers a linear model of the form
.
Y = aX + b
(e.i)
Use Juliet’s data to find the value of a and of b.
[1]
(e.ii)
Interpret, referring to income and happiness, what the value of a
represents.
[1]
Juliet then considers a quadratic model of the form
Y = cX
2
.
+ dX + e
(e.iii) Find the value of c, of d and of e.
[1]
(e.iv) Find the coefficient of determination for each of the two models she
considers.
[2]
(e.v)
[1]
Hence compare the two models.
(e.vi) Juliet decides to use the coefficient of determination to choose
between these two models.
Comment on the validity of her decision.
[1]
After presenting the results of her investigation, a colleague questions whether
Juliet’s sample is representative of all doctors in the city.
A report states that the mean annual income of doctors in the city is $80
Juliet decides to carry out a test to determine whether her sample could
realistically be taken from a population with a mean of $80 000.
000
.
(f.i)
State the name of the test which Juliet should use.
[1]
(f.ii)
State the null and alternative hypotheses for this test.
[1]
(f.iii)
Perform the test, using a 5% significance level, and state your
conclusion in context.
[3]
27.
[Maximum mark: 7]
19N.3.AHL.TZ0.Hsp_1
Peter, the Principal of a college, believes that there is an association between the
score in a Mathematics test, X, and the time taken to run 500 m, Y seconds, of his
students. The following paired data are collected.
It can be assumed that (X, Y ) follow a bivariate normal distribution with
product moment correlation coefficient ρ.
(a.i)
(a.ii)
(b)
28.
State suitable hypotheses H and H to test Peter’s claim, using a
two-tailed test.
[1]
Carry out a suitable test at the 5 % significance level. With reference
to the p-value, state your conclusion in the context of Peter’s claim.
[4]
Peter uses the regression line of y on x as y = 0.248x + 83.0 and
calculates that a student with a Mathematics test score of 73 will
have a running time of 101 seconds. Comment on the validity of his
calculation.
[2]
0
1
[Maximum mark: 7]
19M.1.AHL.TZ1.H_6
Let X be a random variable which follows a normal distribution with mean μ .
Given that P (X < μ − 5) = 0.2 , find
(a)
P (X > μ + 5)
.
(b)
P (X < μ + 5 | X > μ − 5 )
[2]
.
[5]
29.
[Maximum mark: 13]
19M.2.AHL.TZ1.H_9
A café serves sandwiches and cakes. Each customer will choose one of the
following three options; buy only a sandwich, buy only a cake or buy both a
sandwich and a cake.
The probability that a customer buys a sandwich is 0.72 and the probability that
a customer buys a cake is 0.45.
Find the probability that a customer chosen at random will buy
(a.i)
both a sandwich and a cake.
[3]
(a.ii)
only a sandwich.
[1]
On a typical day 200 customers come to the café.
(b.i)
Find the expected number of cakes sold on a typical day.
[1]
(b.ii)
Find the probability that more than 100 cakes will be sold on a
typical day.
[3]
It is known that 46 % of the customers who come to the café are male, and that
80 % of these buy a sandwich.
(c.i)
(c.ii)
A customer is selected at random. Find the probability that the
customer is male and buys a sandwich.
[1]
A female customer is selected at random. Find the probability that
she buys a sandwich.
[4]
30.
[Maximum mark: 1]
19M.2.AHL.TZ1.H_3
The marks achieved by eight students in a class test are given in the following
list.
The teacher increases all the marks by 2. Write down the new value for
(b.ii)
the standard deviation.
[1]
31.
[Maximum mark: 8]
19M.2.AHL.TZ2.H_3
Iqbal attempts three practice papers in mathematics. The probability that he
passes the first paper is 0.6. Whenever he gains a pass in a paper, his confidence
increases so that the probability of him passing the next paper increases by 0.1.
Whenever he fails a paper the probability of him passing the next paper is 0.6.
(a)
Complete the given probability tree diagram for Iqbal’s three
attempts, labelling each branch with the correct probability.
[3]
(b)
(c)
Calculate the probability that Iqbal passes at least two of the papers
he attempts.
[2]
Find the probability that Iqbal passes his third paper, given that he
passed only one previous paper.
[3]
32.
[Maximum mark: 16]
19M.2.AHL.TZ2.H_10
Steffi the stray cat often visits Will’s house in search of food. Let X be the discrete
random variable “the number of times per day that Steffi visits Will’s house”.
The random variable X can be modelled by a Poisson distribution with mean
2.1.
(a)
Find the probability that on a randomly selected day, Steffi does not
visit Will’s house.
[2]
Let Y be the discrete random variable “the number of times per day that Steffi is
fed at Will’s house”. Steffi is only fed on the first four occasions that she visits each
day.
(b)
Copy and complete the probability distribution table for Y.
[4]
(c)
(d)
(e)
Hence find the expected number of times per day that Steffi is fed at
Will’s house.
[3]
In any given year of 365 days, the probability that Steffi does not
visit Will for at most n days in total is 0.5 (to one decimal place). Find
the value of n.
[3]
Show that the expected number of occasions per year on which
Steffi visits Will’s house and is not fed is at least 30.
[4]
33.
[Maximum mark: 5]
19M.2.AHL.TZ2.H_2
Timmy owns a shop. His daily income from selling his goods can be modelled as
a normal distribution, with a mean daily income of $820, and a standard
deviation of $230. To make a profit, Timmy’s daily income needs to be greater
than $1000.
(a)
(b)
Calculate the probability that, on a randomly selected day, Timmy
makes a profit.
The shop is open for 24 days every month.
Calculate the probability that, in a randomly selected month, Timmy
makes a profit on between 5 and 10 days (inclusive).
34.
[2]
[Maximum mark: 6]
Consider two events, A and B, such that P (A)
P (A ∩ B) = 0.1 .
[3]
= P (A ∩ B) =
18N.1.AHL.TZ0.H_1
0.4 and
(a)
By drawing a Venn diagram, or otherwise, find P (A ∪ B).
[3]
(b)
Show that the events A and B are not independent.
[3]
′
35.
[Maximum mark: 18]
18N.2.AHL.TZ0.H_10
Willow finds that she receives approximately 70 emails per working day.
She decides to model the number of emails received per working day using the
random variable X, where X follows a Poisson distribution with mean 70.
(a.i)
Using this distribution model, find P (X
(a.ii)
Using this distribution model, find the standard deviation of X.
< 60)
.
[2]
[2]
In order to test her model, Willow records the number of emails she receives per
working day over a period of 6 months. The results are shown in the following
table.
From the table, calculate
(b.i)
(b.ii)
(c)
an estimate for the mean number of emails received per working
day.
[3]
an estimate for the standard deviation of the number of emails
received per working day.
[2]
Give one piece of evidence that suggests Willow’s Poisson
distribution model is not a good fit.
[1]
Archie works for a different company and knows that he receives emails
according to a Poisson distribution, with a mean of λ emails per day.
(d)
(e)
36.
Suppose that the probability of Archie receiving more than 10
emails in total on any one day is 0.99. Find the value of λ.
[3]
Now suppose that Archie received exactly 20 emails in total in a
consecutive two day period. Show that the probability that he
received exactly 10 of them on the first day is independent of λ.
[5]
[Maximum mark: 8]
18N.2.AHL.TZ0.H_3
It is known that 56 % of Infiglow batteries have a life of less than 16 hours, and
94 % have a life less than 17 hours. It can be assumed that battery life is modelled
by the normal distribution N (μ,
37.
2
σ )
.
(a)
Find the value of μ and the value of σ.
[6]
(b)
Find the probability that a randomly selected Infiglow battery will
have a life of at least 15 hours.
[2]
[Maximum mark: 5]
18M.1.AHL.TZ1.H_3
Two unbiased tetrahedral (four-sided) dice with faces labelled 1, 2, 3, 4 are
thrown and the scores recorded. Let the random variable T be the maximum of
these two scores.
The probability distribution of T is given in the following table.
(a)
Find the value of a and the value of b.
[3]
(b)
Find the expected value of T.
[2]
38.
39.
[Maximum mark: 6]
18M.1.AHL.TZ2.H_3
The discrete random variable X has the following probability distribution, where
p is a constant.
(a)
Find the value of p.
[2]
(b.i)
Find μ, the expected value of X.
[2]
(b.ii)
Find P(X > μ).
[2]
[Maximum mark: 5]
18M.2.AHL.TZ1.H_4
The age, L, in years, of a wolf can be modelled by the normal distribution L ~ N(8,
5).
(a)
(b)
Find the probability that a wolf selected at random is at least 5 years
old.
[2]
Eight wolves are independently selected at random and their ages
recorded.
Find the probability that more than six of these wolves are at least 5
years old.
[3]
40.
[Maximum mark: 5]
18M.2.AHL.TZ1.H_6
The mean number of squirrels in a certain area is known to be 3.2 squirrels
per hectare of woodland. Within this area, there is a 56 hectare woodland
nature reserve. It is known that there are currently at least 168 squirrels in
this reserve.
Assuming the population of squirrels follow a Poisson distribution,
calculate the probability that there are more than 190 squirrels in the
reserve.
41.
[Maximum mark: 7]
18M.2.AHL.TZ1.H_8
Each of the 25 students in a class are asked how many pets they own. Two
students own three pets and no students own more than three pets. The
mean and standard deviation of the number of pets owned by students in
the class are
18
25
and
24
25
respectively.
Find the number of students in the class who do not own a pet.
42.
[5]
[7]
[Maximum mark: 7]
18M.2.AHL.TZ2.H_8
The random variable X has a binomial distribution with parameters n and p.
It is given that E(X) = 3.5.
(a)
Find the least possible value of n.
(b)
It is further given that P(X ≤ 1) = 0.09478 correct to 4 significant
figures.
Determine the value of n and the value of p.
[2]
[5]
43.
[Maximum mark: 6]
18M.2.AHL.TZ2.H_3
The random variable X has a normal distribution with mean μ = 50 and variance σ
2 = 16 .
(a)
Sketch the probability density function for X, and shade the region
representing P(μ − 2σ < X < μ + σ).
[2]
(b)
Find the value of P(μ − 2σ < X < μ + σ).
[2]
(c)
Find the value of k for which P(μ − kσ < X < μ + kσ) = 0.5.
[2]
44.
[Maximum mark: 11]
17N.1.AHL.TZ0.H_10
Chloe and Selena play a game where each have four cards showing capital
letters A, B, C and D.
Chloe lays her cards face up on the table in order A, B, C, D as shown in the
following diagram.
Selena shuffles her cards and lays them face down on the table. She then turns
them over one by one to see if her card matches with Chloe’s card directly above.
Chloe wins if no matches occur; otherwise Selena wins.
(a)
Show that the probability that Chloe wins the game is .
3
8
[6]
Chloe and Selena repeat their game so that they play a total of 50 times.
Suppose the discrete random variable X represents the number of times Chloe
wins.
(b.i)
Determine the mean of X.
[3]
(b.ii)
Determine the variance of X.
[2]
45.
[Maximum mark: 6]
17N.2.AHL.TZ0.H_6
The number of bananas that Lucca eats during any particular day follows a
Poisson distribution with mean 0.2.
(a)
(b)
46.
47.
Find the probability that Lucca eats at least one banana in a
particular day.
[2]
Find the expected number of weeks in the year in which Lucca eats
no bananas.
[4]
[Maximum mark: 6]
Events A and B are such that P(A ∪ B)
P(A|B) = 0.75 .
= 0.95, P(A ∩ B) =
17N.2.AHL.TZ0.H_2
0.6 and
(a)
Find P(B).
[2]
(b)
Find P(A).
[2]
(c)
Hence show that events A and B are independent.
′
[2]
[Maximum mark: 6]
17N.2.AHL.TZ0.H_4
It is given that one in five cups of coffee contain more than 120 mg of
caffeine.
It is also known that three in five cups contain more than 110 mg of caffeine.
Assume that the caffeine content of coffee is modelled by a normal
distribution.
Find the mean and standard deviation of the caffeine content of coffee.
[6]
48.
[Maximum mark: 6]
Consider two events A and B such that
17M.2.AHL.TZ1.H_1
2
P(A) = k, P(B) = 3k, P(A ∩ B) = k
49.
(a)
Calculate k;
(b)
Find P(A
′
∩ B)
and P(A ∪ B)
= 0.5
.
[3]
.
[3]
[Maximum mark: 8]
17M.2.AHL.TZ1.H_9
The times taken for male runners to complete a marathon can be modelled by a
normal distribution with a mean 196 minutes and a standard deviation 24
minutes.
(a)
Find the probability that a runner selected at random will complete
the marathon in less than 3 hours.
[2]
It is found that 5% of the male runners complete the marathon in less than T
minutes.
1
(b)
Calculate T .
1
[2]
The times taken for female runners to complete the marathon can be modelled
by a normal distribution with a mean 210 minutes. It is found that 58% of female
runners complete the marathon between 185 and 235 minutes.
(c)
Find the standard deviation of the times taken by female runners.
[4]
50.
[Maximum mark: 4]
17M.2.AHL.TZ2.H_1
There are 75 players in a golf club who take part in a golf tournament. The scores
obtained on the 18th hole are as shown in the following table.
(a)
(b)
51.
One of the players is chosen at random. Find the probability that this
player’s score was 5 or more.
[2]
Calculate the mean score.
[2]
[Maximum mark: 9]
17M.2.AHL.TZ2.H_5
John likes to go sailing every day in July. To help him make a decision on
whether it is safe to go sailing he classifies each day in July as windy or calm.
Given that a day in July is calm, the probability that the next day is calm is 0.9.
Given that a day in July is windy, the probability that the next day is calm is 0.3.
The weather forecast for the 1st July predicts that the probability that it will be
calm is 0.8.
(a)
Draw a tree diagram to represent this information for the first three
days of July.
[3]
(b)
Find the probability that the 3rd July is calm.
[2]
(c)
Find the probability that the 1st July was calm given that the 3rd July
is windy.
[4]
52.
[Maximum mark: 7]
17M.2.AHL.TZ2.H_3
Packets of biscuits are produced by a machine. The weights X, in grams, of
packets of biscuits can be modelled by a normal distribution where
. A packet of biscuits is considered to be underweight if it weighs
less than 250 grams.
2
X ∼ N(μ, σ )
(a)
Given that μ = 253 and σ = 1.5 find the probability that a
randomly chosen packet of biscuits is underweight.
[2]
The manufacturer makes the decision that the probability that a packet is
underweight should be 0.002. To do this μ is increased and σ remains
unchanged.
(b)
Calculate the new value of μ giving your answer correct to two
decimal places.
[3]
The manufacturer is happy with the decision that the probability that a packet is
underweight should be 0.002, but is unhappy with the way in which this was
achieved. The machine is now adjusted to reduce σ and return μ to 253.
(c)
53.
Calculate the new value of σ.
[2]
[Maximum mark: 4]
17M.2.AHL.TZ2.H_1
There are 75 players in a golf club who take part in a golf tournament. The scores
obtained on the 18th hole are as shown in the following table.
(a)
(b)
One of the players is chosen at random. Find the probability that this
player’s score was 5 or more.
[2]
Calculate the mean score.
[2]
54.
[Maximum mark: 9]
Consider two events A and A defined in the same sample space.
(a)
Show that P(A ∪ B)
Given that P(A ∪ B)
(b)
=
4
9
′
= P(A) + P(A ∩ B)
, P(B|A) =
(i) show that P(A)
=
1
3
1
3
.
[3]
and P(B|A )
′
=
1
6
,
;
(ii) hence find P(B).
55.
16N.1.AHL.TZ0.H_10
[6]
[Maximum mark: 4]
16N.1.AHL.TZ0.H_2
The faces of a fair six-sided die are numbered 1, 2, 2, 4, 4, 6. Let X be the discrete
random variable that models the score obtained when this die is rolled.
(a)
Complete the probability distribution table for X.
[2]
(b)
Find the expected value of X.
[2]
56.
57.
[Maximum mark: 5]
16N.2.AHL.TZ0.H_1
A random variable X has a probability distribution given in the following table.
(a)
Determine the value of E(X ).
[2]
(b)
Find the value of Var(X).
[3]
2
[Maximum mark: 8]
16N.2.AHL.TZ0.H_8
A random variable X is normally distributed with mean μ and standard
deviation σ, such that P(X < 30.31) = 0.1180 and
P(X > 42.52) = 0.3060 .
(a)
Find μ and σ.
(b)
Find P (|X − μ|
[6]
< 1.2σ)
.
© International Baccalaureate Organization, 2023
[2]
Download