1. The following table shows the cost in AUD of seven paperback books chosen at random, together with the number of pages in each book. Book 1 2 3 4 5 6 7 Number of pages (x) 50 120 200 330 400 450 630 6.00 5.40 7.20 4.60 7.60 5.80 5.20 Cost (y AUD) (a) Plot these pairs of values on a scatter diagram. Use a scale of 1 cm to represent 50 pages on the horizontal axis and 1 cm to represent 1 AUD on the vertical axis. (3) (b) Write down the linear correlation coefficient, r, for the data. (2) (c) Stephen wishes to buy a paperback book which has 350 pages in it. He plans to draw a line of best fit to determine the price. State whether or not this is an appropriate method in this case and justify your answer. (2) (Total 7 marks) 2. 200 people of different ages were asked to choose their favourite type of music from the choices Popular, Country and Western and Heavy Metal. The results are shown in the table below. Age/Music choice Popular Country and Western Heavy Metal Totals 11–25 35 5 50 90 26–40 30 10 20 60 41–60 20 25 5 50 Totals 85 40 75 200 It was decided to perform a chi-squared test for independence at the 5% level on the data. (a) Write down the null hypothesis. (1) (b) Write down the number of degrees of freedom. (1) (c) Write down the chi-squared value. (2) (d) State whether or not you will reject the null hypothesis, giving a clear reason for your answer. (2) (Total 6 marks) 3. Manuel conducts a survey on a random sample of 751 people to see which television programme type they watch most from the following: Drama, Comedy, Film, News. The results are as follows. 1 Drama Comedy Film News Males under 25 22 65 90 35 Males 25 and over 36 54 67 17 Females under 25 22 59 82 15 Females 25 and over 64 39 38 46 Manuel decides to ignore the ages and to test at the 5% level of significance whether the most watched programme type is independent of gender. (a) Draw a table with 2 rows and 4 columns of data so that Manuel can perform a chi-squared test. (3) (b) State Manuel’s null hypothesis and alternative hypothesis. (1) (c) Find the expected frequency for the number of females who had “Comedy” as their most-watched programme type. Give your answer to the nearest whole number. (2) (d) Using your graphic display calculator, or otherwise, find the chi-squared statistic for Manuel’s data. (3) (e) (i) State the number of degrees of freedom available for this calculation. (ii) State the critical value for Manuel’s test. (iii) State his conclusion. (3) (Total 12 marks) 4. Tania wishes to see whether there is any correlation between a person’s age and the number of objects on a tray which could be remembered after looking at them for a certain time. She obtains the following table of results. (a) Age (x years) 15 21 36 40 44 55 Number of objects remembered (y) 17 20 15 16 17 12 Use your graphic display calculator to find the equation of the regression line of y on x. (2) (b) Use your equation to estimate the number of objects remembered by a person aged 28 years. (1) (c) Use your graphic display calculator to find the correlation coefficient r. (1) (d) Comment on your value for r. (2) 2 (Total 6 marks) 5. The local park is used for walking dogs. The sizes of the dogs are observed at different times of the day. The table below shows the numbers of dogs present, classified by size, at three different times last Sunday. Small Morning ⎛ 9 ⎜ Afternoon ⎜11 Evening ⎜⎝ 7 Medium Large 2⎞ ⎟ 13 ⎟ 9 ⎟⎠ 18 6 8 (a) Write a suitable null hypothesis for a χ2 test on this data. (b) Write down the value of χ2 for this data. (c) The number of degrees of freedom is 4. Show how this value is calculated. The critical value, at the 5% level of significance, is 9.488. (d) What conclusion can be drawn from this test? Give a reason for your answer. (Total 6 marks) 6. (a) For his Mathematical Studies project, Marty set out to discover if stress was related to the amount of time that students spent travelling to or from school. The results of one of his surveys are shown in the table below. Travel time (t mins) ↓ high stress Number of students moderate stress t ≤ 15 9 5 18 15 < t ≤ 30 17 8 28 30 < t 18 6 7 low stress He used a χ2 test at the 5% level of significance to find out if there was any relationship between student stress and travel time. (i) Write down the null and alternative hypotheses for this test. (2) (ii) Write down the table of expected values. Give values to the nearest integer. (3) (iii) Show that there are 4 degrees of freedom. (1) (iv) Calculate the χ2 statistic for this data. (2) The χ2 critical value for 4 degrees of freedom at the 5% level of significance is 9.488. (v) What conclusion can Marty draw from this test? Give a reason for your answer. (2) 3 (b) Marty asked some of his classmates to rate their level of stress out of 10, with 10 being very high. He also asked them to measure the number of minutes it took them to get from home to school. A random selection of his results is listed below. Travel time (x) 13 24 22 18 36 16 14 20 6 12 Stress rating (y) 3 7 5 4 8 8 4 8 2 6 (i) Write down the value of the (linear) coefficient of correlation for this information. (1) (ii) Explain what a positive value for the coefficient of correlation indicates. (1) (iii) Write down the linear regression equation of y on x in the form y = ax + b (2) (iv) Use your equation in part (iii) to determine the stress rating for a student who takes three quarters of an hour to travel to school. (2) (v) Can your answer in part (iv) be considered reliable? Give a reason for your answer. (2) (Total 18 marks) 7. Several candy bars were purchased and the following table shows the weight and the cost of each bar. Weight (g) Cost (Euros) (a) Yummy Chox Marz Twin Chunx Lite BigC Bite 60 85 80 65 95 50 100 45 1.10 1.50 1.40 1.20 1.80 1.00 1.70 0.90 Given that sx = 19.2, sy = 0.307 and sxy = 5.81, find the correlation coefficient, r, giving your answer correct to 3 decimal places. (2) (b) Describe the correlation between the weight of a candy bar and its cost. (1) (c) Calculate the equation of the regression line for y on x. (3) (d) Use your equation to estimate the cost of a candy bar weighing 109 g. (2) (Total 8 marks) 8. In a competition the number of males and females taking part in different swimming races is given in the table of observed values below. Backstroke (100 m) Freestyle (100 m) Butterfly (100 m) Breaststroke (100 m) Relay (4 × 100 m) Male 30 90 31 29 20 Female 28 63 20 37 12 The Swimming Committee decides to perform a χ2 test at the 5% significance level in order to test 4 if the number of entries for the various strokes is related to gender. (a) State the null hypothesis. (1) (b) Write down the number of degrees of freedom. (1) (c) Write down the critical value of χ2. (1) The expected values are given in the table below: Backstroke (100 m) Freestyle (100 m) Butterfly (100 m) Breaststroke (100 m) Relay (4 × 100 m) Male 32 a 28 37 18 Female 26 68 23 b 14 (d) Calculate the values of a and b. (2) (e) Calculate the χ2 value. (3) (f) State whether or not you accept your null hypothesis and give a reason for your answer. (2) (Total 10 marks) 9. It is thought that the breaststroke time for 200 m depends on the length of the arm of the swimmer. Eight students swim 200 m breaststroke. Their times (y) in seconds and arm lengths (x) in cm are shown in the table below. Length of arm, x cm Breaststroke, y seconds (a) 1 2 3 4 5 6 7 8 79 74 72 70 77 73 64 69 135.1 135.7 139.3 141.0 132.8 137.0 152.9 144.0 Calculate the mean and standard deviation of x and y. (4) (b) Given that sxy = –24.82, calculate the correlation coefficient, r. (2) (c) Comment on your value for r. (2) (d) Calculate the equation of the regression line of y on x. (3) (e) Using your regression line, estimate how many seconds it will take a student with an arm length of 75 cm to swim the 200 m breaststroke. (1) (Total 12 marks) 5 1. (a) A1)(A1)(A1)3 (A Notes: N (A1) for fo label and sscales, (A2) for fo all points p correct, (A1) for 5 oor 6 correct. Award A a maximum of (A2)) if points are joined. (b) r = −0.141 (G2)2 Note: N If negative sign is missing m award d (G1)(G0). (c) ““The coefficient of correlaation is too low w, (very) weaak ( (linear) relatio onship”. N a sensiblee thing to do, accept “no” Not ”. Note: N Do not award (R0)(A (A1) The T correlatioon coefficientt has to be meentioned in their t reasoninng. (R1) (A1)2 [7] 2. 2 (a) C Choice of mu usic is indepenndent of age. (b) ( – 1)(3 – 1)) (3 (A1)(C C1) =4 (c) (A1)(C C1) χ2 = 51.6 (A2) Note: N 52 is an n accuracy peenalty (A1)(A0)(AP). (C2) (d) p-value < 0.05 for 5% level of significance (R1)(ft) or 51.6 > χ2 crit (R1)(ft) Reject the null hypothesis (do not accept the null hypothesis). Note: Do not award (R0)(A1). (A1)(ft) (C2) [6] 3. (a) Drama Comedy Film News Males 58 119 157 52 Females 86 98 120 61 (M1)(M1)(A1) (b) H0: favourite TV programme is independent of gender or no association between favourite TV programme and gender H1: favourite TV programme is dependent on gender (must have both) (c) 365× 217 751 (A1) 1 (M1) = 105 (A1)(ft)(G2) 2 (d) 12.6 (accept 12.558) (G3) 3 (e) (i) 3 (A1) (ii) 7.815 (accept 7.82)((ft) from their (i)) (A1)(ft) (iii) reject H0 or equivalent statement (eg accept H1) (A1)(ft) [12] 4. (a) a = –0.134, b = 20.9 (A1) y = 20.9 – 0.134x (A1) (C2) 7 3 (b) 17 objects (A1)(ft) (C1) Note: Accept only 17 (c) r = –0.756 (A1) (C1) (d) negative and moderately strong (A1)(ft)(A1)(ft) [6] 5. (a) Ho: The size of dog is independent of the time of day, (or equivalent) Note: Award (A0) for ‘no correlation’ (A1) (C1) (b) χ2 = 4.33. (accept 4.328) Note: GDC use is anticipated but candidates might calculate this by hand. (M1) can be awarded for a reasonable attempt to use the formula. (M1)(A1) (c) (3–1)(3–1) = 4 Note: Award mark for left hand side seen. (A1) (C1) (d) The hypothesis should not be rejected, (allow ‘accept Ho’) (C2) OR The size of dog is independent of the time of day (A1)(ft) 4.33 < 9.488 or 0.363 > 0.05 Notes: Allow χ2calc < χ2crit only if a value for χ2calc is seen somewhere. Award (R1)(ft) for comparing the values and (A1)(ft) if the conclusion is valid according to the comparison given. If no reason is given, or if the reason is wrong both marks are lost. Note that (A0)(R1)(ft) can be awarded but (A1)(R0) cannot. (R1)(ft) [6] 8 (C2) 6. (a) (i) H0 : level of stress is independent of travel time (A1) H1 : level of stress is not independent of travel time (A1) 2 (ft) (or reasonable equivalents) (ii) 12.1 5.24 14.6 20.1 8.68 24.2 11.8 5.08 14.2 (M1)(A1)(G2) Note: (M1) for attempting to calculate expected values by hand 44 × 32 =12.1 etc. eg 116 12 5 15 20 9 24 12 5 14 Nearest integers (A1)(G3) (iii) (iv) (v) 3 df = (r – 1) (c – 1) = (3 – 1)(3 – 1) = 4 )(AG) (M1 1 χ2 = 9.83(1) (G2) OR χ2 = 9.277 ..... if calculated from integer values (M1)(A1) OR 2 For χ2 = 9.83 Do not accept H0 : (A1) (ft) (Level of stress is not independent of travel time or reasonable equivalent) 2 2 or p-value < 0.05 because χ calc > χ crit (R1) (ft) OR For χ2 = 9.278 Accept H0 : (A1) (ft) 2 2 because χ calc > χ crit or p-value > 0.05 (ft) Note: a correct reason must be given for the (A1) to be awarded. (R1) 2 9 (b) (i) (ii) (iii) r = 0.667 (A1) 1 Stress rating increases as travel time increases (or reasonable equivalent eg y increases as x increases). Note: Do not accept “positive correlation” (R1) 1 y = 0.181x + 2.22 for 0.181x and for 2.22 Note: For y = 2.22x + 0.181, award (A0)(A1)(ft) (A1) (A1) 2 10 (iv) Putting x = 45 (M1 ) 0.181× 45 + 2.22 = 10.365 (10.4) (ft)(G2) Notes: Allow 10 or 11 only if the method is shown and is correct. Allow follow through only if method shown. (v) not reliable … Because result is outside the data range or because the correlation coefficient not high or the sample is small or responses are subjective. Note: Award (R1) for any of the above. A correct reason must be given to award the (A1). (A1) 2 (A1) (R1) 2 [18] 7. (a) r= S xy (S x S y ) = 0.986 = 5.81 (19.2 × 0.307) (M1) (A1) 2 1 Note: Award (G2) for 0.985 from GDC. (b) Strong, positive correlation (A1) (c) y = 0.182 + 0.0158x (G3) OR 5.81 (x – 72.5) 19.2 2 y = 0.0158x + 0.182 y – 1.325 = (d) y = 0.0158 × 109 + 0.182 = 1.90 euros. (M1)(A1) (A1) 3 (M1) (A1) 2 [8] 8. (a) H0 : number of entries is independent of gender. (A1) 1 (b) 4 (A1) 1 11 (c) 9.488 (A1) 1 (d) a = 85, b = 29 (A1)(A1) (e) (30 − 32) 2 + ... 32 (M1)(A1) = 6.10 (using given values) (A1) OR (f) 5.80 (from calculator) (G3) 3 Do not reject the null hypothesis as the χ2 value is less than the critical value. So, gender and stroke are independent. (A1)(R1) (Also allow “accept”). [10] 9. (a) (b) mean of x = 72.25 (A1) sd of x = 4.41 (A1) mean of y = 139.7 (140) (A1) sd of y = 5.99 (A1) r = – 0.940 (G2) 4 OR r= − 24.82 (4.41× 5.99) = −0.9396 (= – 0.94) (M1)(A1) (c) strong, negative correlation Note: Award (A1) for negative, (A1) for strong. (A2) (d) y = 232 – 1.28x (G3) 12 2 OR ( y − 139.7) = − 24.82 ( x − 72.25) 4.412 y = –1.28x + 232 (e) y = 232 – 1.28 × 75 = 136 seconds (M1)(A1)(A1)) (A1) [12] 13 1