A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 1. Sketch scatter diagrams with at least 5 points to illustrate the following: a) Data with a product moment correlation coefficient of –1 [1] b) Data with a rank correlation coefficient of 1, but product moment correlation coefficient less than 1 [1] c) Data with a product moment correlation coefficient of 0.1 [1] 2. The following data were obtained on the heights (in cm) and masses (in kg) of 10 children: Child Height (H) Mass (M) A 143 37 B 120 34 C 131 30 D 128 38 E 118 29 F 106 25 G 118 50 H 138 42 I 144 38 J 101 18 H = 1247; M = 341; H2 = 157 459; HM = 43 223; M2 = 12 367 a) Plot a scatter diagram of M on H [2] One child is significantly overweight. b) Use your diagram to identify this child, explaining the reasons for your choice [2] c) Omitting the child identified in b), calculate the equation of a suitable regression line for estimating the mass of a child of height 124cm, giving all values correct to three significant figures [9] d) Explain why your line would not be suitable for estimating the weight of a baby of height 54cm [1] pg. 1 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 3. Cost (£) .. . ... . . H E F G C D B A Length of trip (days) The scatter diagram show the cost (in pounds) and length of trip (in days) for the business trips taken by the employees (A to H) of a certain firm last year. a) Two of the trips were abroad. Identify, with reasons, the two employees who made these trips [2] A regression of cost on length of trip is found b) Explain the significance of the gradient and intercept of this line in terms of trip costs, giving examples to illustrate your answer [4] 4. X Y -2 -3 -1 -2 0 0 1 1 2 5 3 14 4 31 5 60 6 112 a) Plot a scatter diagram of the data in the table above. [2] b) Without further calculation, state the value of Spearman's rank correlation coefficient for this data [1] c) Calculate the value of the product moment correlation coefficient of this data (x2 = 96; y2 = 17340; xy =1157; x = 18; y = 218) [4] It is suggested that there would be a higher correlation between U and Y, where U=X 3 d) Comment on this suggestion with reference to your graph. [1] pg. 2 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 5. In a biology experiment a student applied different volumes of water (W) to seven different tomato plants. He then measures the yield (Y) of tomatoes produced. He obtains the following results. Plant W Y A 0 0 B 10 3 C 20 8 D 30 12 E 40 9 F 50 5 G 60 0 a) Draw a scatter diagram of Y on W [2] b) Explain why you would not expect a high value of the product-moment correlation coefficient between W and Y [2] The student carries out a revised experiment in which he only varies W between 0 and 30. He again uses 7 plants and obtains: W = 105; Y = 41; W2 = 2275; Y2 = 375; WY = 920 c) Calculate the product-moment correlation coefficient between W and Y and comment on your value [5] d) Explain why it is not true to say “the more water, the higher the yield” [2] 6. x and y are the scores obtained by 8 children in tests on English and Mathematics respectively. x = 544; x2 = 39904; y = 513; y2 = 34691; xy = 36946 a) Calculate the product moment correlation coefficient between x and y [4] b) Test whether there is significant positive correlation between x and y at the 5% level. [3] c) It is suggested that pupils who are good at English are rarely good at maths. Use your results to comment on this assertion. [2] pg. 3 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 7. A geography student is investigating the relationship between the size of a shopping centre (measured by the number of shops it contains) and the mean distance travelled by shoppers to reach the shopping centre. She obtains the following data: Centre No. of shops Mean distance (km) A 6 0.5 B 20 2.1 C 15 2.4 D 30 4.1 E 30 3.9 F 60 6.2 a) Calculate Spearman’s rank correlation coefficient between mean distance travelled and number of shops [7] The student later obtains additional data for shopping centres G, H and I, as shown below: Centre No. of shops Mean distance (km) G 7 4.2 H 15 3.8 I 17 4.0 b) Without doing any further calculations, explain with reasons whether this additional data will cause the rank correlation coefficient to increase, decrease or remain unaltered [2] 8. The following table shows the ranks given to the 10 contestants in a beauty contest by two judges: Contestant Judge X Judge Y A 1 2 B 6 5 C 2 2 D 9 8 E 5 4 F 3 1 G 8 9 H 10 10 I 7 7 J 4 6 a) Calculate Spearman's rank correlation coefficient for this data [7] b) Test, at the 1% level, whether there is significant agreement between the judges. [3] pg. 4 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 9. a) Give two circumstances when it would be appropriate to use Spearman's rank correlation coefficient instead of the product moment correlation coefficient. [2] The following are the times taken (in seconds) by 8 children to run two different races: Child Race 1 Race 2 A 12.1 21.4 B 13.6 23.0 C 14.2 32.2 D 13.8 26.2 E 12.4 24.4 F 12.9 23.0 G 12.8 23.0 H 13.6 27.7 b) Calculate Spearman's rank correlation coefficient for this data, and test at the 5% level whether it is significantly greater than zero. [11] Two children make the following assertions: Andrea: “A graph of time in race 1 against time in race 2 would be close to a straight line” Bijal: “If you get a high place in one race you will usually get a high place in the other race” c) State whether either of these statements justified solely on the basis of the calculations you have already carried out. Explain your answer. [4] 10. Spearman’s rank correlation coefficient was calculated as –0.46, based on 16 pairs of data. a) Test whether this is significantly less than zero using a 5% level of significance [3] b) Test whether this is significantly different to zero using a 5% level of significance [3] pg. 5 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 11. The diagram below shows the relationship between inflation (I) and unemployment (U) for various countries: I U a) Estimate Spearman's rank correlation coefficient between U and I [2] b) Comment on the assertion that “It is impossible to have low inflation with low unemployment” [2] c) Comment on the suitability of the model I = A – BU, where A and B are positive constants [2] d) Using this model, give an interpretation of the values A and A B [2] v) Suggest an improved model [1] 12. The table below gives data on pollution levels (L) and distance from the city centre (D) D L D = 20.1; 0.6 18 1.3 15 D2 =64.13; 1.6 13 2.1 16 L =94; 2.5 11 3.3 9 L2 =1250; 4.1 7 4.6 5 DL =193.6; a) Calculate the equation of the regression line of L on D [7] b) Use your equation to estimate the pollution level at a point 3km away from the city centre. [2] c) Explain the significance of the coefficients of your regression line. [2] pg. 6 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 13. The table below shows the test marks in science (S) (out of 100) and Maths (M) (out of 120) for 8 pupils. S M 60 120 65 109 70 97 75 85 80 72 85 59 90 45 95 46 A student is required to calculate the regression line of M on S. To make the calculations easier, he decides to use variables U and V instead of S and M, where U = 0.2(S80) and V = M90 a) Calculate the values of U and V [2] b) Find the equation of the regression line of V on U, given that U2 = 44 and UV = 439 [7] c) Hence find the regression line of M on S [3] d) One student is absent for the mathematics test. He obtain 40 on the science test. Use your regression line to obtain an estimate for his mathematics score, and comment on the reliability of your estimate. [4] 14.a) Explain the difference between the regression line of y on x and the regression line of x on y, and show in a sketch the deviations that are to be minimised in each case. [4] An investigation is being carried out into the relationship between house-price (H) and earnings (E). Data is collected from people who have bought their own houses. The equation of the regression line of H on E is found to be H = 3.1E + 8000 b) Explain why this is the most appropriate regression line to use [1] c) Explain the significance of the coefficients of the regression line [2] Mr Windsor has inherited his house. d) Explain why the use of this regression line may give an inaccurate estimate for the value of Mr Windsor’s house. State whether you expect it to be an over- or under-estimate and explain your answer. [3] v) What would be the new equation if, within the sample: i) House prices all increase by £5000 [1] ii) House prices all increase by 10% [3] pg. 7 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 15. The following data were recorded in a science experiment to investigate the relationship between the length of a pendulum (L cm) and the time taken for one oscillation (T s): L T 20 0.89 25 1.01 30 1.10 35 1.20 40 1.25 45 1.36 50 1.40 60 1.53 70 1.66 80 1.80 90 1.89 It is suggested that there is a relationship between T2 and L a) Calculate the product moment correlation coefficient between T2 and L T2 = 25.783; T4=67.56280342; L=645; L2=42275; T2L =1689.8215) [4] b) Calculate the regression line of T2 on L [4] c) Give a physical interpretation of the gradient of your line [1] d) Suggest why it would not be appropriate to use this equation to find the oscillation time for a pendulum of length 1cm. [2] pg. 8 GOPIMATHS 100 2.01 A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 16. A student is investigating the relationship between the average Mathematics test mark obtained by sixth-formers (M) and the number of hours per week they watch TV (T). She obtains the following data: Sixth-former M T A 44 21 B 72 5 C 91 23 D 33 28 E 66 10 F 65 12 G 80 4 H 55 14 I 47 17 J 53 17 K 77 7 L 22 28 a) Draw a scatter diagram to illustrate this data [2] b) Without any calculations explain why you would not expect a particularly high value for the product moment correlation coefficient of this data. [1] The student decides to “adjust” her results by removing one sixth-former’s data to improve the correlation. c) Which sixth-former’s data does she remove? [1] d) Calculate the product moment correlation coefficient for the data with this individual removed. [7] The student says in her conclusion “The more hours TV watched, the lower the test mark”. v) Using your result from d), test at the 1% level whether this assertion is justified. [3] The student decides to test her findings by using a regression line to predict the average test mark of Janice, who watches 17 hours TV per week f) Omitting the same individual as before, calculate the equation of the appropriate regression line, and estimate Janice’s average test mark. [8] g) State with reasons whether this line could be used to do the following: i) Estimate the average test mark for Mike, who watches 35 hours TV per week [2] ii) Estimate the hours of TV watched by Saleem, who has an average test mark of 68. [2] pg. 9 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 17. The following data were obtained for variables X and Y X Y 0.5 43 1 22 2 11 4 7 5 6 8 4 10 4 16 3 20 2.5 25 2 40 1.5 a) Draw a scatter diagram to represent this data [2] The model Y = A + BX was suggested for this data, where A and B are constants b) With reference to your diagram, explain why this model is not suitable [1] It is suggested that the model Y = P + QZ (where Z = 1 X ) would be more suitable c) With reference to your diagram, comment on this suggestion [1] d) By calculating the equation of an appropriate regression line, obtain estimates for the values of P and Q (Y = 106; 1 X = 4.325; 1 X2 = 5.38675625; Y X = 117.78) [6] 18.The data below were obtained from observations of the radioactivity (as measured by a Geiger counter) of a sample of a chemical (R) and the time (t) since the beginning of the experiment. t R 0 401 1 280 2 200 3 142 4 98 5 68 6 50 7 34 8 23 a) Draw a scatter diagram of R against t [2] A student suggests using the equation R = At + B, where A and B are constants, to model the data. b) With reference to your diagram, explain why this would not be a suitable choice. [2] An alternative model suggested is lnR = P + Qt, where P and Q are constants. Given t = 36; lnR = 41.26; t2 = 204; (lnR)2 = 196.71; tlnR = 143.78 c) Calculate the equation of a suitable regression line to obtain the values of P and Q [7] d) Estimate the values of the radioactivity count obtained at t = 4.5, giving your answer to the nearest whole number. [3] pg. 10 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 19. The following table gives data on how the price per leaflet (P) varies with the number of leaflets produced (N) by a printer N P (pence) 100 150 500 100 N=23 100N2=141 510 000 1000 75 P=586; 1500 70 2000 63 3000 50 5000 43 10000 35 P2=52 568; NP = 1 086 000 a) Calculate the equation of the regression line of price per leaflet on number of leaflets, giving the coefficients correct to 3 significant figures [7] b) Use your equation to find an estimate of the total price for 2500 leaflets [3] c) Find the points at which the regression line crosses the coordinate axes. [2] d) Give an interpretation of these coordinates [2] v) Comment on the limitations of this model [2] f) The company purchasing the leaflets estimate they can spend no more than £2570 in total on leaflets. Find the maximum number of leaflets they can order. [4] pg. 11 GOPIMATHS A LEVEL MATHEMATICS QUESTIONBANKS REGRESSION AND CORRELATION 20. The table shows data on the mean annual temperature (T), the electricity consumption (E) and the gas consumption (G) for some households in various countries: T E G 10.0 400 110 11.3 150 330 6.5 640 340 T = 112.7; T2 = 1553.71; E= 3840; 2 = 1771800; 15.2 300 100 4.1 700 400 19.6 150 160 S = 6340; G = 2500; 9.4 300 320 21.0 340 20 8.4 460 300 7.2 400 420 ST = 59236; G2 =801400; EG = 1053900; a) Without carrying out any calculations, explain, giving your reasons, between which of the following pairs of variables would you expect the highest correlation: T and E T and G T and S, where S=E+G [2] b) Calculate the equation of the regression line of S on T [7] c) Explain why it would not be appropriate to calculate the regression line of T on S [1] d) Explain why the regression line might not give correct predictions for large values of T [2] e) Calculate the product moment correlation coefficient between E and G, and test at the 5% level whether there is any significant correlation [7] f) Explain why regression was used in b) and correlation in e), not vice versa. [2] pg. 12 GOPIMATHS