statistics ai hl [614 marks] 1. [Maximum mark: 6] SPM.1.AHL.TZ0.17 Mr Burke teaches a mathematics class with 15 students. In this class there are 6 female students and 9 male students. Each day Mr Burke randomly chooses one student to answer a homework question. In the first month, Mr Burke will teach his class 20 times. (a) Find the probability he will choose a female student 8 times. (b) The Head of Year, Mrs Smith, decides to select a student at random from the year group to read the notices in assembly. There are 80 students in total in the year group. Mrs Smith calculates the probability of picking a male student 8 times in the first 20 assemblies is 0.153357 correct to 6 decimal places. Find the number of male students in the year group. [2] [4] 2. [Maximum mark: 27] SPM.3.AHL.TZ0.1 Two IB schools, A and B, follow the IB Diploma Programme but have different teaching methods. A research group tested whether the different teaching methods lead to a similar final result. For the test, a group of eight students were randomly selected from each school. Both samples were given a standardized test at the start of the course and a prediction for total IB points was made based on that test; this was then compared to their points total at the end of the course. Previous results indicate that both the predictions from the standardized tests and the final IB points can be modelled by a normal distribution. It can be assumed that: the standardized test is a valid method for predicting the final IB points that variations from the prediction can be explained through the circumstances of the student or school. (a) (b) Identify a test that might have been used to verify the null hypothesis that the predictions from the standardized test can be modelled by a normal distribution. [1] State why comparing only the final IB points of the students from the two schools would not be a valid test for the effectiveness of the two different teaching methods. [1] The data for school A is shown in the following table. For each student, the change from the predicted points to the final points (f − p) was calculated. (c.i) Find the mean change. [1] (c.ii) Find the standard deviation of the changes. [2] (d) Use a paired t-test to determine whether there is significant evidence that the students in school A have improved their IB points since the start of the course. [4] The data for school B is shown in the following table. (e.i) (e.ii) Use an appropriate test to determine whether there is evidence, at the 5 % significance level, that the students in school B have improved more than those in school A. [5] State why it was important to test that both sets of points were normally distributed. [1] School A also gives each student a score for effort in each subject. This effort score is based on a scale of 1 to 5 where 5 is regarded as outstanding effort. It is claimed that the effort put in by a student is an important factor in improving upon their predicted IB points. (f.i) Perform a test on the data from school A to show it is reasonable to assume a linear relationship between effort scores and improvements in IB points. You may assume effort scores follow a normal distribution. (f.ii) Hence, find the expected improvement between predicted and final points for an increase of one unit in effort grades, giving your answer to one decimal place. A mathematics teacher in school A claims that the comparison between the two schools is not valid because the sample for school B contained mainly girls and that for school A, mainly boys. She believes that girls are likely to show a greater improvement from their predicted points to their final points. She collects more data from other schools, asking them to class their results into four categories as shown in the following table. (g) Use an appropriate test to determine whether showing an improvement is independent of gender. [3] [1] [6] (h) If you were to repeat the test performed in part (e) intending to compare the quality of the teaching between the two schools, suggest two ways in which you might choose your sample to improve the validity of the test. [2] 3. [Maximum mark: 7] EXN.1.AHL.TZ0.12 It is believed that the power P of a signal at a point d km from an antenna is inversely proportional to d where n ∈ Z . n + The value of P is recorded at distances of 1 m to 5 and log P are plotted on the graph below. m and the values of log 10 d 10 (a) Explain why this graph indicates that P is inversely proportional to d . n [2] The values of log (b) 10 d and log 10 P are shown in the table below. Find the equation of the least squares regression line of log against log d. 10 10 P [2] (c.i) (c.ii) Use your answer to part (b) to write down the value of n to the nearest integer. [1] Find an expression for P in terms of d. [2] 4. [Maximum mark: 26] EXN.3.AHL.TZ0.1 An estate manager is responsible for stocking a small lake with fish. He begins by introducing 1000 fish into the lake and monitors their population growth to determine the likely carrying capacity of the lake. After one year an accurate assessment of the number of fish in the lake is taken and it is found to be 1200. Let N be the number of fish t years after the fish have been introduced to the lake. Initially it is assumed that the rate of increase of N will be constant. (a) Use this model to predict the number of fish in the lake when t = 8. [2] When t = 8 the estate manager again decides to estimate the number of fish in the lake. To do this he first catches 300 fish and marks them, so they can be recognized if caught again. These fish are then released back into the lake. A few days later he catches another 300 fish, releasing each fish after it has been checked, and finds 45 of them are marked. (b) Assuming the proportion of marked fish in the second sample is equal to the proportion of marked fish in the lake, show that the estate manager will estimate there are now 2000 fish in the lake. [2] Let X be the number of marked fish caught in the second sample, where X is considered to be distributed as B(n, p). Assume the number of fish in the lake is 2000 . (c.i) Write down the value of n and the value of p. [2] (c.ii) State an assumption that is being made for X to be considered as following a binomial distribution. [1] The estate manager decides that he needs bounds for the total number of fish in the lake. (d.i) Show that an estimate for Var(X) is 38. 25. (d.ii) Hence show that the variance of the proportion of marked fish in the sample, Var( X 300 [2] , is 0. 000425. [2] ) The estate manager feels confident that the proportion of marked fish in the lake will be within 1. 5 standard deviations of the proportion of marked fish in the sample and decides these will form the upper and lower bounds of his estimate. (e.i) (e.ii) (f ) Taking the value for the variance given in (d) (ii) as a good approximation for the true variance, find the upper and lower bounds for the proportion of marked fish in the lake. [2] Hence find upper and lower bounds for the number of fish in the lake when t = 8. [2] Given this result, comment on the validity of the linear model used in part (a). [2] The estate manager now believes the population of fish will follow the logistic model N (t)= L 1+C e −k t where L is the carrying capacity and C, k > 0 . The estate manager would like to know if the population of fish in the lake will eventually reach 5000. (g.i) (g.ii) (h) Assuming a carrying capacity of 5000 use the given values of N (0) and N (1) to calculate the parameters C and k. [5] Use these parameters to calculate the value of N (8) predicted by this model. [2] Comment on the likelihood of the fish population reaching 5000. [2] 5. [Maximum mark: 10] EXM.1.AHL.TZ0.15 Adesh wants to model the cooling of a metal rod. He heats the rod and records its temperature as it cools. He believes the temperature can be modeled by T (t) a, b ∈ R . = ae (a) Show that ln (T (b) Find the equation of the regression line of ln (T bt + 25 , where . [2] − 25) = bt + ln a − 25) on t. [3] Hence (c.i) find the value of a and of b. [3] (c.ii) predict the temperature of the metal rod after 3 minutes. [2] 6. [Maximum mark: 10] EXM.1.AHL.TZ0.55 Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The owner believes that the number of brown eggs in a box can be modelled by a binomial distribution. He examines 100 boxes and obtains the following data. (a.i) Calculate the mean number of brown eggs in a box. [1] (a.ii) Hence estimate p, the probability that a randomly chosen egg is brown. [1] By calculating an appropriate χ statistic, test, at the 5% significance level, whether or not the binomial distribution gives a good fit to these data. [8] (b) 2 7. [Maximum mark: 15] EXM.1.AHL.TZ0.59 The heights, x metres, of the 241 new entrants to a men’s college were measured and the following statistics calculated. ∑ x = 412.11, (a) ∑x 2 = 705.5721 Calculate unbiased estimates of the population mean and the population variance. [3] The Head of Mathematics decided to use a χ test to determine whether or not these heights could be modelled by a normal distribution. He therefore divided the data into classes as follows. 2 (b.i) State suitable hypotheses. [1] (b.ii) Calculate the value of the χ statistic and state your conclusion using a 10% level of significance. 2 [11] 8. 9. [Maximum mark: 14] EXM.1.AHL.TZ0.58 The number of telephone calls received by a helpline over 80 one-minute periods are summarized in the table below. (a) Find the exact value of the mean of this distribution. [2] (b) Test, at the 5% level of significance, whether or not the data can be modelled by a Poisson distribution. [12] [Maximum mark: 10] EXM.1.AHL.TZ0.15 Adesh wants to model the cooling of a metal rod. He heats the rod and records its temperature as it cools. He believes the temperature can be modeled by T (t) a, b ∈ R . = ae (a) Show that ln (T (b) Find the equation of the regression line of ln (T bt + 25 , where . [2] − 25) = bt + ln a − 25) on t. [3] Hence (c.i) find the value of a and of b. [3] (c.ii) predict the temperature of the metal rod after 3 minutes. [2] 10. [Maximum mark: 10] EXM.1.AHL.TZ0.55 Eggs at a farm are sold in boxes of six. Each egg is either brown or white. The owner believes that the number of brown eggs in a box can be modelled by a binomial distribution. He examines 100 boxes and obtains the following data. (a.i) Calculate the mean number of brown eggs in a box. [1] (a.ii) Hence estimate p, the probability that a randomly chosen egg is brown. [1] By calculating an appropriate χ statistic, test, at the 5% significance level, whether or not the binomial distribution gives a good fit to these data. [8] (b) 2 11. [Maximum mark: 12] EXM.2.AHL.TZ0.24 The hens on a farm lay either white or brown eggs. The eggs are put into boxes of six. The farmer claims that the number of brown eggs in a box can be modelled by the binomial distribution, B(6, p). By inspecting the contents of 150 boxes of eggs she obtains the following data. (a) Show that this data leads to an estimated value of p (b) Stating null and alternative hypotheses, carry out an appropriate test at the 5 % level to decide whether the farmer’s claim can be justified. = 0.4 . [1] [11] 12. [Maximum mark: 20] EXM.2.AHL.TZ0.29 (a) A horse breeder records the number of births for each of 100 horses during the past eight years. The results are summarized in the following table: Stating null and alternative hypotheses carry out an appropriate test at the 5% significance level to decide whether the results can be modelled by B (6, 0.5). (b) (c) Without doing any further calculations, explain briefly how you would carry out a test, at the 5% significance level, to decide if the data can be modelled by B(6, p), where p is unspecified. [10] [2] A different horse breeder collected data on the time and outcome of births. The data are summarized in the following table: Carry out an appropriate test at the 5% significance level to decide whether there is an association between time and outcome. [8] 13. [Maximum mark: 27] In this question you will explore possible models for the spread of an infectious disease EXM.3.AHL.TZ0.9 An infectious disease has begun spreading in a country. The National Disease Control Centre (NDCC) has compiled the following data after receiving alerts from hospitals. A graph of n against d is shown below. The NDCC want to find a model to predict the total number of people infected, so they can plan for medicine and hospital facilities. After looking at the data, they think an exponential function in the form n = ab could be used as a model. d (a) Use an exponential regression to find the value of a and of b, correct to 4 decimal places. [3] Use your answer to part (a) to predict (b.i) the number of new people infected on day 6. [3] (b.ii) the day when the total number of people infected will be greater than 1000. [2] The NDCC want to verify the accuracy of these predictions. They decide to perform a χ goodness of fit test. 2 (c) Use your answer to part (a) to show that the model predicts 16.7 people will be infected on the first day. [1] The predictions given by the model for the first five days are shown in the table. (d.i) Explain why the number of degrees of freedom is 2. [2] (d.ii) Perform a χ goodness of fit test at the 5% significance level. You should clearly state your hypotheses, the p-value, and your conclusion. [5] 2 In fact, the first day when the total number of people infected is greater than 1000 is day 14, when a total of 1015 people are infected. (e) Give two reasons why the prediction in part (b)(ii) might be lower than 14. [2] Based on this new data, the NDCC decide to try a logistic model in the form n = L 1+ce −k d . Use the data from days 1–5, together with day 14, to find the value of (f.i) L (f.ii) c (f.iii) k . [2] . [1] . [1] (g) (h) Hence predict the total number of people infected by this disease after several months. [2] Use the logistic model to find the day when the rate of increase of people infected is greatest. [3] 14. [Maximum mark: 29] 22N.3.AHL.TZ0.1 In this question, you will explore possible approaches to using historical sports results for making predictions about future sports matches. Two friends, Peter and Helen, are discussing ways of predicting the outcomes of international football matches involving Argentina. Peter suggests analysing historical data to help make predictions. He lists the results of the most recent 240 matches in which Argentina played, in chronological order, then considers blocks of four matches at a time. He counts how many times Argentina has won in each block. The following table shows his results for the 60 blocks of four matches. (a) Determine the mean number of wins per block of four matches for Argentina. [2] Peter thinks that this data can be modelled by a binomial distribution with n = 4 (b) (c.i) (c.ii) and decides to carry out a χ goodness of fit test. 2 Use Peter’s data to write down an estimate for the probability p for this binomial model. [1] Use the binomial model to find the probability that Argentina win zero matches in a block of four matches. [1] Find the expected frequency for zero wins. [2] As some expected frequencies are less than 5, Peter combines rows in his table to produce the following observed frequencies. He then uses his binomial model to find appropriate expected frequencies, correct to one decimal place. Peter uses this table to carry out a χ goodness of fit test, to test the hypothesis that the data follows a binomial distribution with n = 4, at the 5% significance level. 2 For this test, state (d.i) the null hypothesis; [1] (d.ii) the number of degrees of freedom; [1] (d.iii) the p-value; [2] (d.iv) the conclusion, justifying your answer. [2] (e) Using Peter’s binomial model, find the probability that Argentina will win at least one of their next four international football matches. Helen thinks that a better prediction might be made by considering the transition between matches. To keep the model simple, she decides to use only two states: Argentina won (A) or Argentina did not win (B). Helen looks at Peter’s list of results and counts the number of times that: Argentina won, twice in succession (AA), Argentina won, then did not win (AB), Argentina did not win, then won (BA), [2] Argentina did not win, twice in succession (BB). She recorded the following results. Helen uses the relative frequencies to estimate the probabilities in a transition matrix. (f.i) Given that Argentina won the previous match, show that Helen’s estimate for the probability of Argentina winning the next match is 17 29 . [2] (f.ii) Write down the transition matrix, T , for Helen’s model. (g.i) Show that the characteristic polynomial of T is 2 1363λ (g.ii) − 1263λ − 100 = 0 . Hence or otherwise, find the eigenvalues of T . (g.iii) Find the corresponding eigenvectors. (h) In her retirement, many years from now, Helen is planning to travel to three consecutive international football matches involving Argentina. Use Helen’s model to find the probability that Argentina will win all three matches. [2] [3] [1] [3] [4] 15. [Maximum mark: 8] 22M.1.AHL.TZ2.9 A psychologist records the number of digits (d) of π that a sample of IB Mathematics higher level candidates could recall. (a) Find an unbiased estimate of the population mean of d. [1] (b) Find an unbiased estimate of the population variance of d. [2] The psychologist has read that in the general population people can remember an average of 4. 4 digits of π. The psychologist wants to perform a statistical test to see if IB Mathematics higher level candidates can remember more digits than the general population. H0 : μ = 4. 4 is the null hypothesis for this test. (c.i) State the alternative hypothesis. [1] (c.ii) Given that all assumptions for this test are satisfied, carry out an appropriate hypothesis test. State and justify your conclusion. Use a 5% significance level. [4] 16. [Maximum mark: 13] 22M.2.AHL.TZ1.3 A Principal would like to compare the students in his school with a national standard. He decides to give a test to eight students made up of four boys and four girls. One of the teachers offers to find the volunteers from his class. (a) Name the type of sampling that best describes the method used by the Principal. [1] The marks out of 40, for the students who took the test, are: 25, 29, 38, 37, 12, 18, 27, 31. For the eight students find (b.i) the mean mark. [2] (b.ii) the standard deviation of the marks. [1] The national standard mark is 25. 2 out of 40. (c) (d) Perform an appropriate test at the 5% significance level to see if the mean marks achieved by the students in the school are higher than the national standard. It can be assumed that the marks come from a normal population. [5] State one reason why the test might not be valid. [1] Two additional students take the test at a later date and the mean mark for all ten students is 28. 1 and the standard deviation is 8. 4. For further analysis, a standardized score out of 100 for the ten students is obtained by multiplying the scores by 2 and adding 20. For the ten students, find (e.i) their mean standardized score. [1] (e.ii) the standard deviation of their standardized score. [2] 17. [Maximum mark: 28] 22M.3.AHL.TZ2.2 This question compares possible designs for a new computer network between multiple school buildings, and whether they meet specific requirements. A school’s administration team decides to install new fibre-optic internet cables underground. The school has eight buildings that need to be connected by these cables. A map of the school is shown below, with the internet access point of each building labelled A–H. Jonas is planning where to install the underground cables. He begins by determining the distances, in metres, between the underground access points in each of the buildings. He finds AD = (a) , 89. 2 m DF = 104. 9 m and ADĚ‚F = 83° . Find AF. The cost for installing the cable directly between A and F is $21 (b) [3] 310 . Find the cost per metre of installing this cable. Jonas estimates that it will cost $110 per metre to install the cables between all the other buildings. [2] (c) State why the cost for installing the cable between A and F would be higher than between the other buildings. [1] Jonas creates the following graph, S , using the cost of installing the cables between two buildings as the weight of each edge. The computer network could be designed such that each building is directly connected to at least one other building and hence all buildings are indirectly connected. (d.i) (d.ii) By using Kruskal’s algorithm, find the minimum spanning tree for S , showing clearly the order in which edges are added. [3] Hence find the minimum installation cost for the cables that would allow all the buildings to be part of the computer network. [2] The computer network fails if any part of it becomes unreachable from any other part. To help protect the network from failing, every building could be connected to at least two other buildings. In this way if one connection breaks, the building is still part of the computer network. Jonas can achieve this by finding a Hamiltonian cycle within the graph. (e) (f ) (g) State why a path that forms a Hamiltonian cycle does not always form an Eulerian circuit. [1] Starting at D, use the nearest neighbour algorithm to find the upper bound for the installation cost of a computer network in the form of a Hamiltonian cycle. Note: Although the graph is not complete, in this instance it is not necessary to form a table of least distances. [5] By deleting D, use the deleted vertex algorithm to find the lower bound for the installation cost of the cycle. [6] After more research, Jonas decides to install the cables as shown in the diagram below. Each individual cable is installed such that each end of the cable is connected to a building’s access point. The connection between each end of a cable and an access point has a 1. 4% probability of failing after a power surge. For the network to be successful, each building in the network must be able to communicate with every other building in the network. In other words, there must be a path that connects any two buildings in the network. Jonas would like the network to have less than a 2% probability of failing to operate after a power surge. (h) Show that Jonas’s network satisfies the requirement of there being less than a 2% probability of the network failing after a power surge. 18. [5] [Maximum mark: 5] 21N.1.AHL.TZ0.12 The following table shows the time, in days, from December 1st and the percentage of Christmas trees in stock at a shop on the beginning of that day. The following table shows the natural logarithm of both d and x on these days to 2 decimal places. (a) Use the data in the second table to find the value of m and the value of b for the regression line, ln x = m(ln d) + b. [2] (b) Assuming that the model found in part (a) remains valid, estimate the percentage of trees in stock when d = 25. [3] 19. [Maximum mark: 7] 21N.1.AHL.TZ0.14 On Paul’s farm, potatoes are packed in sacks labelled 50 kg. The weights of the sacks of potatoes can be modelled by a normal distribution with mean weight 49. 8 kg and standard deviation 0. 9 kg . (a) Find the probability that a sack is under its labelled weight. [2] (b) Find the lower quartile of the weights of the sacks of potatoes. [2] (c) The sacks of potatoes are transported in crates. There are 10 sacks in each crate and the weights of the sacks of potatoes are independent of each other. Find the probability that the total weight of the sacks of potatoes in a crate exceeds 500 kg. [3] 20. [Maximum mark: 7] 21M.1.AHL.TZ1.11 A factory, producing plastic gifts for a fast food restaurant’s Jolly meals, claims that just 1% of the toys produced are faulty. A restaurant manager wants to test this claim. A box of 200 toys is delivered to the restaurant. The manager checks all the toys in this box and four toys are found to be faulty. (a) Identify the type of sampling used by the restaurant manager. [1] The restaurant manager performs a one-tailed hypothesis test, at the 10% significance level, to determine whether the factory’s claim is reasonable. It is known that faults in the toys occur independently. 21. (b) Write down the null and alternative hypotheses. [2] (c) Find the p-value for the test. [2] (d) State the conclusion of the test. Give a reason for your answer. [2] [Maximum mark: 6] 21M.1.AHL.TZ1.14 The weights of apples from Tony’s farm follow a normal distribution with mean 158 g and standard deviation 13 g . The apples are sold in bags that contain six apples. (a) Find the mean weight of a bag of apples. [2] (b) Find the standard deviation of the weights of these bags of apples. [2] (c) Find the probability that a bag selected at random weighs more than 1 kg. [2] 22. [Maximum mark: 8] 21M.1.AHL.TZ2.9 A newspaper vendor in Singapore is trying to predict how many copies of The Straits Times they will sell. The vendor forms a model to predict the number of copies sold each weekday. According to this model, they expect the same number of copies will be sold each day. To test the model, they record the number of copies sold each weekday during a particular week. This data is shown in the table. A goodness of fit test at the 5% significance level is used on this data to determine whether the vendor’s model is suitable. The critical value for the test is 9. 49 . (a) Find an estimate for how many copies the vendor expects to sell each day. [1] (b.i) State the null and alternative hypotheses for this test. [2] (b.ii) Write down the degrees of freedom for this test. [1] (b.iii) Write down the conclusion to the test. Give a reason for your answer. [4] 23. [Maximum mark: 18] 21M.2.AHL.TZ2.4 In a small village there are two doctors’ clinics, one owned by Doctor Black and the other owned by Doctor Green. It was noted after each year that 3. 5% of Doctor Black’s patients moved to Doctor Green’s clinic and 5% of Doctor Green’s patients moved to Doctor Black’s clinic. All additional losses and gains of patients by the clinics may be ignored. At the start of a particular year, it was noted that Doctor Black had 2100 patients on their register, compared to Doctor Green’s 3500 patients. (a) (b) (c) (d) Write down a transition matrix T indicating the annual population movement between clinics. [2] Find a prediction for the ratio of the number of patients Doctor Black will have, compared to Doctor Green, after two years. [2] Find a matrix P , with integer elements, such that T where D is a diagonal matrix. [6] Hence, show that the long-term transition matrix T T (e) ∞ =( 10 10 17 17 7 7 17 17 ) = PD P ∞ −1 , is given by . Hence, or otherwise, determine the expected ratio of the number of patients Doctor Black would have compared to Doctor Green in the long term. [6] [2] 24. [Maximum mark: 16] 21M.2.AHL.TZ2.2 It is known that the weights of male Persian cats are normally distributed with mean 6. 1 kg and variance 0. 5 2 kg 2 . (a) Sketch a diagram showing the above information. [2] (b) Find the proportion of male Persian cats weighing between 5. 5 and 6. 5 kg. kg [2] A group of 80 male Persian cats are drawn from this population. (c) Determine the expected number of cats in this group that have a weight of less than 5. 3 kg. [3] The male cats are now joined by 80 female Persian cats. The female cats are drawn from a population whose weights are normally distributed with mean 4. 5 kg and standard deviation 0. 45 kg . Ten female cats are chosen at random. (d.i) Find the probability that exactly one of them weighs over 4. 62 (d.ii) Let N be the number of cats weighing over 4. 62 kg kg [4] . Find the variance of N . (e) . [1] A cat is selected at random from all 160 cats. Find the probability that the cat was female, given that its weight was over 4. 7 kg. [4] 25. [Maximum mark: 28] 21M.3.AHL.TZ1.2 A firm wishes to review its recruitment processes. This question considers the validity and reliability of the methods used. Every year an accountancy firm recruits new employees for a trial period of one year from a large group of applicants. At the start, all applicants are interviewed and given a rating. Those with a rating of either Excellent, Very good or Good are recruited for the trial period. At the end of this period, some of the new employees will stay with the firm. It is decided to test how valid the interview rating is as a way of predicting which of the new employees will stay with the firm. Data is collected and recorded in a contingency table. (a) Use an appropriate test, at the 5% significance level, to determine whether a new employee staying with the firm is independent of their interview rating. State the null and alternative hypotheses, the p-value and the conclusion of the test. The next year’s group of applicants are asked to complete a written assessment which is then analysed. From those recruited as new employees, a random sample of size 18 is selected. The sample is stratified by department. Of the 91 new employees recruited that year, 55 were placed in the national department and 36 in the international department. (b) Show that 11 employees are selected for the sample from the national department. [6] [2] At the end of their first year, the level of performance of each of the 18 employees in the sample is assessed by their department manager. They are awarded a score between 1 (low performance) and 10 (high performance). The marks in the written assessment and the scores given by the managers are shown in both the table and the scatter diagram. The firm decides to find a Spearman’s rank correlation coefficient, r , for this data. s (c.i) Without calculation, explain why it might not be appropriate to calculate a correlation coefficient for the whole sample of 18 (c.ii) (c.iii) employees. [2] Find r for the seven employees working in the international department. [4] Hence comment on the validity of the written assessment as a measure of the level of performance of employees in this department. Justify your answer. [2] s The same seven employees are given the written assessment a second time, at the end of the first year, to measure its reliability. Their marks are shown in the table below. (d.i) State the name of this type of test for reliability. [1] (d.ii) For the data in this table, test the null hypothesis, H : ρ = 0, against the alternative hypothesis, H : ρ > 0, at the 5% significance level. You may assume that all the requirements for carrying out the test have been met. 0 1 (d.iii) Hence comment on the reliability of the written assessment. The written assessment is in five sections, numbered 1 to 5. At the end of the year, the employees are also given a score for each of five professional attributes: V, W, X, Y and Z. The firm decides to test the hypothesis that there is a correlation between the mark in a section and the score for an attribute. They compare marks in each of the sections with scores for each of the attributes. [4] [1] (e.i) Write down the number of tests they carry out. (e.ii) The tests are performed at the 5% significance level. [1] Assuming that: there is no correlation between the marks in any of the sections and scores in any of the attributes, the outcome of each hypothesis test is independent of the outcome of the other hypothesis tests, find the probability that at least one of the tests will be significant. (e.iii) The firm obtains a significant result when comparing section 2 of the written assessment and attribute X. Interpret this result. [4] [1] 26. [Maximum mark: 24] 21M.3.AHL.TZ2.1 Juliet is a sociologist who wants to investigate if income affects happiness amongst doctors. This question asks you to review Juliet’s methods and conclusions. Juliet obtained a list of email addresses of doctors who work in her city. She contacted them and asked them to fill in an anonymous questionnaire. Participants were asked to state their annual income and to respond to a set of questions. The responses were used to determine a happiness score out of 100. Of the 415 doctors on the list, 11 replied. (a.i) (a.ii) Describe one way in which Juliet could improve the reliability of her investigation. [1] Describe one criticism that can be made about the validity of Juliet’s investigation. [1] Juliet’s results are summarized in the following table. (b) Juliet classifies response K as an outlier and removes it from the data. Suggest one possible justification for her decision to remove it. [1] For the remaining ten responses in the table, Juliet calculates the mean happiness score to be 52. 5. (c.i) Calculate the mean annual income for these remaining responses. [2] (c.ii) Determine the value of r, Pearson’s product-moment correlation coefficient, for these remaining responses. [2] Juliet decides to carry out a hypothesis test on the correlation coefficient to investigate whether increased annual income is associated with greater happiness. (d.i) State why the hypothesis test should be one-tailed. [1] (d.ii) State the null and alternative hypotheses for this test. [2] (d.iii) The critical value for this test, at the 5% significance level, is 0. 549. Juliet assumes that the population is bivariate normal. Determine whether there is significant evidence of a positive correlation between annual income and happiness. Justify your answer. [2] Juliet wants to create a model to predict how changing annual income might affect happiness scores. To do this, she assumes that annual income in dollars, X, is the independent variable and the happiness score, Y , is the dependent variable. She first considers a linear model of the form . Y = aX + b (e.i) Use Juliet’s data to find the value of a and of b. [1] (e.ii) Interpret, referring to income and happiness, what the value of a represents. [1] Juliet then considers a quadratic model of the form Y = cX 2 . + dX + e (e.iii) Find the value of c, of d and of e. [1] (e.iv) Find the coefficient of determination for each of the two models she considers. [2] (e.v) [1] Hence compare the two models. (e.vi) Juliet decides to use the coefficient of determination to choose between these two models. Comment on the validity of her decision. [1] After presenting the results of her investigation, a colleague questions whether Juliet’s sample is representative of all doctors in the city. A report states that the mean annual income of doctors in the city is $80 Juliet decides to carry out a test to determine whether her sample could realistically be taken from a population with a mean of $80 000. 000 . (f.i) State the name of the test which Juliet should use. [1] (f.ii) State the null and alternative hypotheses for this test. [1] (f.iii) Perform the test, using a 5% significance level, and state your conclusion in context. [3] 27. [Maximum mark: 7] 19N.3.AHL.TZ0.Hsp_1 Peter, the Principal of a college, believes that there is an association between the score in a Mathematics test, X, and the time taken to run 500 m, Y seconds, of his students. The following paired data are collected. It can be assumed that (X, Y ) follow a bivariate normal distribution with product moment correlation coefficient ρ. (a.i) (a.ii) (b) 28. State suitable hypotheses H and H to test Peter’s claim, using a two-tailed test. [1] Carry out a suitable test at the 5 % significance level. With reference to the p-value, state your conclusion in the context of Peter’s claim. [4] Peter uses the regression line of y on x as y = 0.248x + 83.0 and calculates that a student with a Mathematics test score of 73 will have a running time of 101 seconds. Comment on the validity of his calculation. [2] 0 1 [Maximum mark: 7] 19M.1.AHL.TZ1.H_6 Let X be a random variable which follows a normal distribution with mean μ . Given that P (X < μ − 5) = 0.2 , find (a) P (X > μ + 5) . (b) P (X < μ + 5 | X > μ − 5 ) [2] . [5] 29. [Maximum mark: 13] 19M.2.AHL.TZ1.H_9 A café serves sandwiches and cakes. Each customer will choose one of the following three options; buy only a sandwich, buy only a cake or buy both a sandwich and a cake. The probability that a customer buys a sandwich is 0.72 and the probability that a customer buys a cake is 0.45. Find the probability that a customer chosen at random will buy (a.i) both a sandwich and a cake. [3] (a.ii) only a sandwich. [1] On a typical day 200 customers come to the café. (b.i) Find the expected number of cakes sold on a typical day. [1] (b.ii) Find the probability that more than 100 cakes will be sold on a typical day. [3] It is known that 46 % of the customers who come to the café are male, and that 80 % of these buy a sandwich. (c.i) (c.ii) A customer is selected at random. Find the probability that the customer is male and buys a sandwich. [1] A female customer is selected at random. Find the probability that she buys a sandwich. [4] 30. [Maximum mark: 1] 19M.2.AHL.TZ1.H_3 The marks achieved by eight students in a class test are given in the following list. The teacher increases all the marks by 2. Write down the new value for (b.ii) the standard deviation. [1] 31. [Maximum mark: 8] 19M.2.AHL.TZ2.H_3 Iqbal attempts three practice papers in mathematics. The probability that he passes the first paper is 0.6. Whenever he gains a pass in a paper, his confidence increases so that the probability of him passing the next paper increases by 0.1. Whenever he fails a paper the probability of him passing the next paper is 0.6. (a) Complete the given probability tree diagram for Iqbal’s three attempts, labelling each branch with the correct probability. [3] (b) (c) Calculate the probability that Iqbal passes at least two of the papers he attempts. [2] Find the probability that Iqbal passes his third paper, given that he passed only one previous paper. [3] 32. [Maximum mark: 16] 19M.2.AHL.TZ2.H_10 Steffi the stray cat often visits Will’s house in search of food. Let X be the discrete random variable “the number of times per day that Steffi visits Will’s house”. The random variable X can be modelled by a Poisson distribution with mean 2.1. (a) Find the probability that on a randomly selected day, Steffi does not visit Will’s house. [2] Let Y be the discrete random variable “the number of times per day that Steffi is fed at Will’s house”. Steffi is only fed on the first four occasions that she visits each day. (b) Copy and complete the probability distribution table for Y. [4] (c) (d) (e) Hence find the expected number of times per day that Steffi is fed at Will’s house. [3] In any given year of 365 days, the probability that Steffi does not visit Will for at most n days in total is 0.5 (to one decimal place). Find the value of n. [3] Show that the expected number of occasions per year on which Steffi visits Will’s house and is not fed is at least 30. [4] 33. [Maximum mark: 5] 19M.2.AHL.TZ2.H_2 Timmy owns a shop. His daily income from selling his goods can be modelled as a normal distribution, with a mean daily income of $820, and a standard deviation of $230. To make a profit, Timmy’s daily income needs to be greater than $1000. (a) (b) Calculate the probability that, on a randomly selected day, Timmy makes a profit. The shop is open for 24 days every month. Calculate the probability that, in a randomly selected month, Timmy makes a profit on between 5 and 10 days (inclusive). 34. [2] [Maximum mark: 6] Consider two events, A and B, such that P (A) P (A ∩ B) = 0.1 . [3] = P (A ∩ B) = 18N.1.AHL.TZ0.H_1 0.4 and (a) By drawing a Venn diagram, or otherwise, find P (A ∪ B). [3] (b) Show that the events A and B are not independent. [3] ′ 35. [Maximum mark: 18] 18N.2.AHL.TZ0.H_10 Willow finds that she receives approximately 70 emails per working day. She decides to model the number of emails received per working day using the random variable X, where X follows a Poisson distribution with mean 70. (a.i) Using this distribution model, find P (X (a.ii) Using this distribution model, find the standard deviation of X. < 60) . [2] [2] In order to test her model, Willow records the number of emails she receives per working day over a period of 6 months. The results are shown in the following table. From the table, calculate (b.i) (b.ii) (c) an estimate for the mean number of emails received per working day. [3] an estimate for the standard deviation of the number of emails received per working day. [2] Give one piece of evidence that suggests Willow’s Poisson distribution model is not a good fit. [1] Archie works for a different company and knows that he receives emails according to a Poisson distribution, with a mean of λ emails per day. (d) (e) 36. Suppose that the probability of Archie receiving more than 10 emails in total on any one day is 0.99. Find the value of λ. [3] Now suppose that Archie received exactly 20 emails in total in a consecutive two day period. Show that the probability that he received exactly 10 of them on the first day is independent of λ. [5] [Maximum mark: 8] 18N.2.AHL.TZ0.H_3 It is known that 56 % of Infiglow batteries have a life of less than 16 hours, and 94 % have a life less than 17 hours. It can be assumed that battery life is modelled by the normal distribution N (μ, 37. 2 σ ) . (a) Find the value of μ and the value of σ. [6] (b) Find the probability that a randomly selected Infiglow battery will have a life of at least 15 hours. [2] [Maximum mark: 5] 18M.1.AHL.TZ1.H_3 Two unbiased tetrahedral (four-sided) dice with faces labelled 1, 2, 3, 4 are thrown and the scores recorded. Let the random variable T be the maximum of these two scores. The probability distribution of T is given in the following table. (a) Find the value of a and the value of b. [3] (b) Find the expected value of T. [2] 38. 39. [Maximum mark: 6] 18M.1.AHL.TZ2.H_3 The discrete random variable X has the following probability distribution, where p is a constant. (a) Find the value of p. [2] (b.i) Find μ, the expected value of X. [2] (b.ii) Find P(X > μ). [2] [Maximum mark: 5] 18M.2.AHL.TZ1.H_4 The age, L, in years, of a wolf can be modelled by the normal distribution L ~ N(8, 5). (a) (b) Find the probability that a wolf selected at random is at least 5 years old. [2] Eight wolves are independently selected at random and their ages recorded. Find the probability that more than six of these wolves are at least 5 years old. [3] 40. [Maximum mark: 5] 18M.2.AHL.TZ1.H_6 The mean number of squirrels in a certain area is known to be 3.2 squirrels per hectare of woodland. Within this area, there is a 56 hectare woodland nature reserve. It is known that there are currently at least 168 squirrels in this reserve. Assuming the population of squirrels follow a Poisson distribution, calculate the probability that there are more than 190 squirrels in the reserve. 41. [Maximum mark: 7] 18M.2.AHL.TZ1.H_8 Each of the 25 students in a class are asked how many pets they own. Two students own three pets and no students own more than three pets. The mean and standard deviation of the number of pets owned by students in the class are 18 25 and 24 25 respectively. Find the number of students in the class who do not own a pet. 42. [5] [7] [Maximum mark: 7] 18M.2.AHL.TZ2.H_8 The random variable X has a binomial distribution with parameters n and p. It is given that E(X) = 3.5. (a) Find the least possible value of n. (b) It is further given that P(X ≤ 1) = 0.09478 correct to 4 significant figures. Determine the value of n and the value of p. [2] [5] 43. [Maximum mark: 6] 18M.2.AHL.TZ2.H_3 The random variable X has a normal distribution with mean μ = 50 and variance σ 2 = 16 . (a) Sketch the probability density function for X, and shade the region representing P(μ − 2σ < X < μ + σ). [2] (b) Find the value of P(μ − 2σ < X < μ + σ). [2] (c) Find the value of k for which P(μ − kσ < X < μ + kσ) = 0.5. [2] 44. [Maximum mark: 11] 17N.1.AHL.TZ0.H_10 Chloe and Selena play a game where each have four cards showing capital letters A, B, C and D. Chloe lays her cards face up on the table in order A, B, C, D as shown in the following diagram. Selena shuffles her cards and lays them face down on the table. She then turns them over one by one to see if her card matches with Chloe’s card directly above. Chloe wins if no matches occur; otherwise Selena wins. (a) Show that the probability that Chloe wins the game is . 3 8 [6] Chloe and Selena repeat their game so that they play a total of 50 times. Suppose the discrete random variable X represents the number of times Chloe wins. (b.i) Determine the mean of X. [3] (b.ii) Determine the variance of X. [2] 45. [Maximum mark: 6] 17N.2.AHL.TZ0.H_6 The number of bananas that Lucca eats during any particular day follows a Poisson distribution with mean 0.2. (a) (b) 46. 47. Find the probability that Lucca eats at least one banana in a particular day. [2] Find the expected number of weeks in the year in which Lucca eats no bananas. [4] [Maximum mark: 6] Events A and B are such that P(A ∪ B) P(A|B) = 0.75 . = 0.95, P(A ∩ B) = 17N.2.AHL.TZ0.H_2 0.6 and (a) Find P(B). [2] (b) Find P(A). [2] (c) Hence show that events A and B are independent. ′ [2] [Maximum mark: 6] 17N.2.AHL.TZ0.H_4 It is given that one in five cups of coffee contain more than 120 mg of caffeine. It is also known that three in five cups contain more than 110 mg of caffeine. Assume that the caffeine content of coffee is modelled by a normal distribution. Find the mean and standard deviation of the caffeine content of coffee. [6] 48. [Maximum mark: 6] Consider two events A and B such that 17M.2.AHL.TZ1.H_1 2 P(A) = k, P(B) = 3k, P(A ∩ B) = k 49. (a) Calculate k; (b) Find P(A ′ ∩ B) and P(A ∪ B) = 0.5 . [3] . [3] [Maximum mark: 8] 17M.2.AHL.TZ1.H_9 The times taken for male runners to complete a marathon can be modelled by a normal distribution with a mean 196 minutes and a standard deviation 24 minutes. (a) Find the probability that a runner selected at random will complete the marathon in less than 3 hours. [2] It is found that 5% of the male runners complete the marathon in less than T minutes. 1 (b) Calculate T . 1 [2] The times taken for female runners to complete the marathon can be modelled by a normal distribution with a mean 210 minutes. It is found that 58% of female runners complete the marathon between 185 and 235 minutes. (c) Find the standard deviation of the times taken by female runners. [4] 50. [Maximum mark: 4] 17M.2.AHL.TZ2.H_1 There are 75 players in a golf club who take part in a golf tournament. The scores obtained on the 18th hole are as shown in the following table. (a) (b) 51. One of the players is chosen at random. Find the probability that this player’s score was 5 or more. [2] Calculate the mean score. [2] [Maximum mark: 9] 17M.2.AHL.TZ2.H_5 John likes to go sailing every day in July. To help him make a decision on whether it is safe to go sailing he classifies each day in July as windy or calm. Given that a day in July is calm, the probability that the next day is calm is 0.9. Given that a day in July is windy, the probability that the next day is calm is 0.3. The weather forecast for the 1st July predicts that the probability that it will be calm is 0.8. (a) Draw a tree diagram to represent this information for the first three days of July. [3] (b) Find the probability that the 3rd July is calm. [2] (c) Find the probability that the 1st July was calm given that the 3rd July is windy. [4] 52. [Maximum mark: 7] 17M.2.AHL.TZ2.H_3 Packets of biscuits are produced by a machine. The weights X, in grams, of packets of biscuits can be modelled by a normal distribution where . A packet of biscuits is considered to be underweight if it weighs less than 250 grams. 2 X ∼ N(μ, σ ) (a) Given that μ = 253 and σ = 1.5 find the probability that a randomly chosen packet of biscuits is underweight. [2] The manufacturer makes the decision that the probability that a packet is underweight should be 0.002. To do this μ is increased and σ remains unchanged. (b) Calculate the new value of μ giving your answer correct to two decimal places. [3] The manufacturer is happy with the decision that the probability that a packet is underweight should be 0.002, but is unhappy with the way in which this was achieved. The machine is now adjusted to reduce σ and return μ to 253. (c) 53. Calculate the new value of σ. [2] [Maximum mark: 4] 17M.2.AHL.TZ2.H_1 There are 75 players in a golf club who take part in a golf tournament. The scores obtained on the 18th hole are as shown in the following table. (a) (b) One of the players is chosen at random. Find the probability that this player’s score was 5 or more. [2] Calculate the mean score. [2] 54. [Maximum mark: 9] Consider two events A and A defined in the same sample space. (a) Show that P(A ∪ B) Given that P(A ∪ B) (b) = 4 9 ′ = P(A) + P(A ∩ B) , P(B|A) = (i) show that P(A) = 1 3 1 3 . [3] and P(B|A ) ′ = 1 6 , ; (ii) hence find P(B). 55. 16N.1.AHL.TZ0.H_10 [6] [Maximum mark: 4] 16N.1.AHL.TZ0.H_2 The faces of a fair six-sided die are numbered 1, 2, 2, 4, 4, 6. Let X be the discrete random variable that models the score obtained when this die is rolled. (a) Complete the probability distribution table for X. [2] (b) Find the expected value of X. [2] 56. 57. [Maximum mark: 5] 16N.2.AHL.TZ0.H_1 A random variable X has a probability distribution given in the following table. (a) Determine the value of E(X ). [2] (b) Find the value of Var(X). [3] 2 [Maximum mark: 8] 16N.2.AHL.TZ0.H_8 A random variable X is normally distributed with mean μ and standard deviation σ, such that P(X < 30.31) = 0.1180 and P(X > 42.52) = 0.3060 . (a) Find μ and σ. (b) Find P (|X − μ| [6] < 1.2σ) . © International Baccalaureate Organization, 2023 [2]