A Level Mathematics Questionbanks

advertisement
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
1. Sketch scatter diagrams with at least 5 points to illustrate the following:
a) Data with a product moment correlation coefficient of –1
[1]
b) Data with a rank correlation coefficient of 1, but product moment correlation coefficient less than 1
[1]
c) Data with a product moment correlation coefficient of 0.1
[1]
2. The following data were obtained on the heights (in cm) and masses (in kg) of 10 children:
Child
Height (H)
Mass (M)
A
143
37
B
120
34
C
131
30
D
128
38
E
118
29
F
106
25
G
118
50
H
138
42
I
144
38
J
101
18
H = 1247; M = 341; H2 = 157 459; HM = 43 223; M2 = 12 367
a) Plot a scatter diagram of M on H
[2]
One child is significantly overweight.
b) Use your diagram to identify this child, explaining the reasons for your choice
[2]
c) Omitting the child identified in b), calculate the equation of a suitable regression line for estimating
the mass of a child of height 124cm, giving all values correct to three significant figures
[9]
d) Explain why your line would not be suitable for estimating the weight of a baby of height 54cm
[1]
pg. 1 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
3.
Cost
(£)
.. .
... .
.
H
E
F
G
C D
B
A
Length of trip
(days)
The scatter diagram show the cost (in pounds) and length of trip (in days) for the business trips taken by the
employees (A to H) of a certain firm last year.
a) Two of the trips were abroad. Identify, with reasons, the two employees who made these trips
[2]
A regression of cost on length of trip is found
b) Explain the significance of the gradient and intercept of this line in terms of trip costs, giving examples
to illustrate your answer
[4]
4.
X
Y
-2
-3
-1
-2
0
0
1
1
2
5
3
14
4
31
5
60
6
112
a) Plot a scatter diagram of the data in the table above.
[2]
b) Without further calculation, state the value of Spearman's rank correlation coefficient for this data
[1]
c) Calculate the value of the product moment correlation coefficient of this data
(x2 = 96; y2 = 17340; xy =1157; x = 18; y = 218)
[4]
It is suggested that there would be a higher correlation between U and Y, where U=X 3
d) Comment on this suggestion with reference to your graph.
[1]
pg. 2 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
5. In a biology experiment a student applied different volumes of water (W) to seven different tomato plants.
He then measures the yield (Y) of tomatoes produced. He obtains the following results.
Plant
W
Y
A
0
0
B
10
3
C
20
8
D
30
12
E
40
9
F
50
5
G
60
0
a) Draw a scatter diagram of Y on W
[2]
b) Explain why you would not expect a high value of the product-moment correlation coefficient between
W and Y
[2]
The student carries out a revised experiment in which he only varies W between 0 and 30.
He again uses 7 plants and obtains: W = 105; Y = 41; W2 = 2275; Y2 = 375; WY = 920
c) Calculate the product-moment correlation coefficient between W and Y and comment on your value
[5]
d) Explain why it is not true to say “the more water, the higher the yield”
[2]
6. x and y are the scores obtained by 8 children in tests on English and Mathematics respectively.
 x = 544; x2 = 39904;
y = 513;
y2 = 34691;
xy = 36946
a) Calculate the product moment correlation coefficient between x and y
[4]
b) Test whether there is significant positive correlation between x and y at the 5% level.
[3]
c) It is suggested that pupils who are good at English are rarely good at maths.
Use your results to comment on this assertion.
[2]
pg. 3 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
7. A geography student is investigating the relationship between the size of a shopping centre (measured by
the number of shops it contains) and the mean distance travelled by shoppers to reach the shopping centre.
She obtains the following data:
Centre
No. of shops
Mean distance (km)
A
6
0.5
B
20
2.1
C
15
2.4
D
30
4.1
E
30
3.9
F
60
6.2
a) Calculate Spearman’s rank correlation coefficient between mean distance travelled and number of shops
[7]
The student later obtains additional data for shopping centres G, H and I, as shown below:
Centre
No. of shops
Mean distance (km)
G
7
4.2
H
15
3.8
I
17
4.0
b) Without doing any further calculations, explain with reasons whether this additional data will cause the rank
correlation coefficient to increase, decrease or remain unaltered
[2]
8. The following table shows the ranks given to the 10 contestants in a beauty contest
by two judges:
Contestant
Judge X
Judge Y
A
1
2
B
6
5
C
2
2
D
9
8
E
5
4
F
3
1
G
8
9
H
10
10
I
7
7
J
4
6
a) Calculate Spearman's rank correlation coefficient for this data
[7]
b) Test, at the 1% level, whether there is significant agreement between the judges.
[3]
pg. 4 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
9. a) Give two circumstances when it would be appropriate to use Spearman's rank correlation coefficient
instead of the product moment correlation coefficient.
[2]
The following are the times taken (in seconds) by 8 children to run two different races:
Child
Race 1
Race 2
A
12.1
21.4
B
13.6
23.0
C
14.2
32.2
D
13.8
26.2
E
12.4
24.4
F
12.9
23.0
G
12.8
23.0
H
13.6
27.7
b) Calculate Spearman's rank correlation coefficient for this data, and test at the 5% level whether it is
significantly greater than zero.
[11]
Two children make the following assertions:
Andrea: “A graph of time in race 1 against time in race 2 would be close to a straight line”
Bijal: “If you get a high place in one race you will usually get a high place in the other race”
c) State whether either of these statements justified solely on the basis of the calculations you have already
carried out. Explain your answer.
[4]
10. Spearman’s rank correlation coefficient was calculated as –0.46, based on 16 pairs of data.
a) Test whether this is significantly less than zero using a 5% level of significance
[3]
b) Test whether this is significantly different to zero using a 5% level of significance
[3]
pg. 5 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
11. The diagram below shows the relationship between inflation (I) and unemployment (U) for various countries:
I
U
a) Estimate Spearman's rank correlation coefficient between U and I
[2]
b) Comment on the assertion that “It is impossible to have low inflation with low unemployment”
[2]
c) Comment on the suitability of the model I = A – BU, where A and B are positive constants
[2]
d) Using this model, give an interpretation of the values A and
A
B
[2]
v) Suggest an improved model
[1]
12. The table below gives data on pollution levels (L) and distance from the city centre (D)
D
L



D = 20.1;
0.6
18
1.3
15
D2 =64.13;
1.6
13
2.1
16
L =94;
2.5
11
3.3
9
L2 =1250;
4.1
7
4.6
5
DL =193.6;
a) Calculate the equation of the regression line of L on D
[7]
b) Use your equation to estimate the pollution level at a point 3km away from the city centre.
[2]
c) Explain the significance of the coefficients of your regression line.
[2]
pg. 6 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
13. The table below shows the test marks in science (S) (out of 100) and Maths (M) (out of 120) for 8 pupils.
S
M
60
120
65
109
70
97
75
85
80
72
85
59
90
45
95
46
A student is required to calculate the regression line of M on S. To make the calculations easier, he decides to
use variables U and V instead of S and M, where U = 0.2(S80) and V = M90
a) Calculate the values of U and V
[2]
b) Find the equation of the regression line of V on U, given that U2 = 44 and UV = 439
[7]
c) Hence find the regression line of M on S
[3]
d) One student is absent for the mathematics test. He obtain 40 on the science test. Use your regression line to
obtain an estimate for his mathematics score, and comment on the reliability of your estimate.
[4]
14.a) Explain the difference between the regression line of y on x and the regression line of x on y, and show in a
sketch the deviations that are to be minimised in each case.
[4]
An investigation is being carried out into the relationship between house-price (H) and earnings (E).
Data is collected from people who have bought their own houses.
The equation of the regression line of H on E is found to be H = 3.1E + 8000
b) Explain why this is the most appropriate regression line to use
[1]
c) Explain the significance of the coefficients of the regression line
[2]
Mr Windsor has inherited his house.
d) Explain why the use of this regression line may give an inaccurate estimate for the value of Mr Windsor’s
house. State whether you expect it to be an over- or under-estimate and explain your answer.
[3]
v) What would be the new equation if, within the sample:
i) House prices all increase by £5000
[1]
ii) House prices all increase by 10%
[3]
pg. 7 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
15. The following data were recorded in a science experiment to investigate the relationship between the
length of a pendulum (L cm) and the time taken for one oscillation (T s):
L
T
20
0.89
25
1.01
30
1.10
35
1.20
40
1.25
45
1.36
50
1.40
60
1.53
70
1.66
80
1.80
90
1.89
It is suggested that there is a relationship between T2 and L

a) Calculate the product moment correlation coefficient between T2 and L
 T2 = 25.783;
T4=67.56280342; L=645; L2=42275;
T2L =1689.8215)
[4]
b) Calculate the regression line of T2 on L
[4]
c) Give a physical interpretation of the gradient of your line
[1]
d) Suggest why it would not be appropriate to use this equation to find the oscillation time for
a pendulum of length 1cm.
[2]
pg. 8 GOPIMATHS
100
2.01
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
16. A student is investigating the relationship between the average Mathematics test mark obtained by
sixth-formers (M) and the number of hours per week they watch TV (T).
She obtains the following data:
Sixth-former
M
T
A
44
21
B
72
5
C
91
23
D
33
28
E
66
10
F
65
12
G
80
4
H
55
14
I
47
17
J
53
17
K
77
7
L
22
28
a) Draw a scatter diagram to illustrate this data
[2]
b) Without any calculations explain why you would not expect a particularly high value for the
product moment correlation coefficient of this data.
[1]
The student decides to “adjust” her results by removing one sixth-former’s data to improve the correlation.
c) Which sixth-former’s data does she remove?
[1]
d) Calculate the product moment correlation coefficient for the data with this individual removed.
[7]
The student says in her conclusion “The more hours TV watched, the lower the test mark”.
v) Using your result from d), test at the 1% level whether this assertion is justified.
[3]
The student decides to test her findings by using a regression line to predict the average test mark of Janice,
who watches 17 hours TV per week
f) Omitting the same individual as before, calculate the equation of the appropriate regression line,
and estimate Janice’s average test mark.
[8]
g) State with reasons whether this line could be used to do the following:
i) Estimate the average test mark for Mike, who watches 35 hours TV per week
[2]
ii) Estimate the hours of TV watched by Saleem, who has an average test mark of 68.
[2]
pg. 9 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
17. The following data were obtained for variables X and Y
X
Y
0.5
43
1
22
2
11
4
7
5
6
8
4
10
4
16
3
20
2.5
25
2
40
1.5
a) Draw a scatter diagram to represent this data
[2]
The model Y = A + BX was suggested for this data, where A and B are constants
b) With reference to your diagram, explain why this model is not suitable
[1]
It is suggested that the model Y = P + QZ (where Z =
1
X
) would be more suitable
c) With reference to your diagram, comment on this suggestion
[1]
d) By calculating the equation of an appropriate regression line, obtain estimates for the values of P and Q
(Y = 106; 
1
X
= 4.325; 
1
X2
= 5.38675625; 
Y
X
= 117.78)
[6]
18.The data below were obtained from observations of the radioactivity (as measured by a Geiger counter) of a
sample of a chemical (R) and the time (t) since the beginning of the experiment.
t
R
0
401
1
280
2
200
3
142
4
98
5
68
6
50
7
34
8
23
a) Draw a scatter diagram of R against t
[2]
A student suggests using the equation R = At + B, where A and B are constants, to model the data.
b) With reference to your diagram, explain why this would not be a suitable choice.
[2]
An alternative model suggested is lnR = P + Qt, where P and Q are constants.
Given t = 36; lnR = 41.26; t2 = 204; (lnR)2 = 196.71; tlnR = 143.78
c) Calculate the equation of a suitable regression line to obtain the values of P and Q
[7]
d) Estimate the values of the radioactivity count obtained at t = 4.5, giving your answer to the
nearest whole number.
[3]
pg. 10 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
19. The following table gives data on how the price per leaflet (P) varies with the number of leaflets produced (N)
by a printer
N
P (pence)


100
150
500
100
N=23 100N2=141 510 000
1000
75
P=586;
1500
70
2000
63
3000
50
5000
43
10000
35
P2=52 568; NP = 1 086 000
a) Calculate the equation of the regression line of price per leaflet on number of leaflets, giving the coefficients
correct to 3 significant figures
[7]
b) Use your equation to find an estimate of the total price for 2500 leaflets
[3]
c) Find the points at which the regression line crosses the coordinate axes.
[2]
d) Give an interpretation of these coordinates
[2]
v) Comment on the limitations of this model
[2]
f) The company purchasing the leaflets estimate they can spend no more than £2570 in total on leaflets.
Find the maximum number of leaflets they can order.
[4]
pg. 11 GOPIMATHS
A LEVEL MATHEMATICS QUESTIONBANKS
REGRESSION AND CORRELATION
20. The table shows data on the mean annual temperature (T), the electricity consumption (E) and the gas
consumption (G) for some households in various countries:
T
E
G



10.0
400
110
11.3
150
330
6.5
640
340
T = 112.7; T2 = 1553.71;
E= 3840; 2 = 1771800;
15.2
300
100
4.1
700
400
19.6
150
160
S = 6340;
G = 2500;
9.4
300
320
21.0
340
20
8.4
460
300
7.2
400
420
ST = 59236;
G2 =801400;
EG = 1053900;
a) Without carrying out any calculations, explain, giving your reasons, between which of the following
pairs of variables would you expect the highest correlation:
T and E
T and G
T and S, where S=E+G
[2]
b) Calculate the equation of the regression line of S on T
[7]
c) Explain why it would not be appropriate to calculate the regression line of T on S
[1]
d) Explain why the regression line might not give correct predictions for large values of T
[2]
e) Calculate the product moment correlation coefficient between E and G, and test at the 5% level whether
there is any significant correlation
[7]
f) Explain why regression was used in b) and correlation in e), not vice versa.
[2]
pg. 12 GOPIMATHS
Download