36 Calculating residuals

advertisement
“Teach A Level Maths”
Vol. 2: A2 Core Modules
Calculating Residuals
© Christine Crisp
Calculating Residuals
Statistics 1
AQA
EDEXCEL
OCR
"Certain images and/or photos on this presentation are the copyrighted property of JupiterImages and are being used with
permission under license. These images and/or photos may not be copied or downloaded without permission from JupiterImages"
Calculating Residuals
Once we have found a regression line, we may need to
know how close any particular observation is to the line.
To do this, we find a residual. For the height and foot
length data . . .
Foot length and height of UK children
Foot
length
(cm)
( xA, yA )
y on x
regression line
Height (cm)
To find the residual for the point ( x A , y A ) we find
yA  y
Calculating Residuals
e.g. The marks for 10 students in Maths and Physics are
as follows:
A
B
C
D
E
F
G
H
I
J
Maths, x
41 37 38 39 47 42 34 35 48 49
Physics, y
36 20 31 24 35 42 26 27 29 37
The regression line for y on x is y  1  81  0  70x
Residual of point A = y A  y
( The residual is negative if the point is below the line.)
To find y, substitute the value of x at point A into the
regression line:
y  1  81  0  70(41)  30  5

y A  y  36  30 51  5  49
Calculating Residuals
SUMMARY
To find the residual for a particular observation, A,
•
calculate the y-coordinate on the regression line
corresponding to the x-value at A,
•
find y A  y
( xA, yA )
( xA, y ) x
•
Since y  a  bx , the residual at A is also given by
•
The residual is negative if the point is below
the line
y A  a  bx A
Calculating Residuals
Outliers
Outliers are points that lie well away from the
regression line.
Since a residual measures the distance of a point from a
line, residuals are used to identify outliers.
Outliers can have a considerable effect on a regression
line and make it unreliable.
Calculating Residuals
e.g. The diagram is a scatter diagram of the data shown in
the table.
If we were to draw the line “by eye”,
the 1st point . . . would lie well
away from the line we would want to
draw.
x
1
2
3
4
5
6
7
8
y
5
18
12
14
12
11
7
3
However, the calculation of the regression line includes
the 1st point and distorts the position of the line.
Calculating Residuals
e.g. The diagram is a scatter diagram of the data shown in
the table.
y  14  21  0  88 x
The diagram shows the y on x
regression line for all the data. The
residuals are shown by the red lines.
The left-hand end of the line is
further down than it would be
without the 1st point.
x
1
2
3
4
5
6
7
8
y
5
18
12
14
12
11
7
3
Calculating Residuals
e.g. The diagram is a scatter diagram of the data shown in
the table.
y  14  21  0  88 x
Removing the 1st point . . .
x
1
y
5
2
3
4
5
18
12
14
12
6
7
8
11
7
3
Calculating Residuals
e.g. The diagram is a scatter diagram of the data shown in
the table.
y  14  21  0  88 x
Removing the 1st point gives
y  21  36  2  07 x
x
1
y
5
2
3
4
5
18
12
14
12
6
7
8
11
7
3
Calculating Residuals
e.g. The diagram is a scatter diagram of the data shown in
the table.
The sum of the squares
of the residuals,
2
R
  139
y  14  21  0  88 x
Removing the 1st point gives
The sum of the squares
of the residuals,
2
R
  19  9
y  21  36  2  07 x
Without the 1st point,
we have a regression
line that is a much
better fit.
Calculating Residuals
Exercise
1. The table shows the number of accidents to children
as a percentage of those to adults, y, in 9 areas of
London together with the percentage of open space in
those areas, x.
Open Spaces(%)
Children’s
Accidents (%)
A
B
C
D
E
F
G
H
I
5
1·3
1·4
7
4·5
5·2
6·3
46·
3
42·
9
40
38·
2
37
33·
6
30·
8
14·
6
23·
8
14·
8
17·
1
(a) Find the equation of the regression line of y on x
(b) Estimate the percentage of accidents to children in
an area with 10% open space.
(c) Find the residual for A.
Calculating Residuals
Solutions
(a) Find the equation of the regression line of y on x
(b) Estimate the percentage of accidents to children in
an area with 10% open space.
(c) Find the residual for A.
Solution:
(a) The equation of the regression line for y on x is
y  45  40  1  65x
(b) x  10

y  28  9
Nearly 29% of accidents will involve children.
(c) At A(5, 46  3) ,
x5

y  45  40  1  65(5)  37  15
Residual = y A  y  46  3  37  15  9  15
Calculating Residuals
The following slides contain repeats of
information on earlier slides, shown without
colour, so that they can be printed and
photocopied.
For most purposes the slides can be printed
as “Handouts” with up to 6 slides per sheet.
Calculating Residuals
SUMMARY
To find the residual for a particular observation, A,
•
calculate the y-coordinate on the regression line
corresponding to the x-value at A,
•
find y A  y
( xA, yA )
( xA, y ) x
•
Since
y  a  bx, the residual at A is also given by
y A  a  bx A
The residual is negative if the point is below the line
Calculating Residuals
e.g. The marks for 10 students in Maths and Physics are
as follows:
A
Maths, x
B
C
D
E
F
G
H
I
J
41 37 38 39 47 42 34 35 48 49
Physics, y 36 20 31 24 35 42 26 27 29 37
The regression line for y on x is y  1  07  0  72x
Residual of point A = y A  y
( The residual is negative if the point is below the line.)
To find y, substitute the value of x at point A into the
regression line:

y  1  07  0  72(41)  30  59
y A  y  36  30 59  5  41
Calculating Residuals
Outliers
Outliers are points that lie well away from the
regression line.
Since a residual measures the distance of a point from a
line, residuals are used to identify outliers.
Outliers can have a considerable effect on a regression
line and make it unreliable.
Calculating Residuals
e.g. The diagram is a scatter diagram of the data shown in
the table.
If we were to draw the line “by eye”,
the 1st point . . . would lie well
away from the line we would want to
draw.
x
1
2
3
4
5
6
7
8
y
5
18
12
14
12
11
7
3
However, the calculation of the regression line includes
the 1st point and distorts the position of the line.
Calculating Residuals
e.g. The diagram shows the y on x regression line for the
data in the table. The residuals are shown by the lines
parallel to the y-axis.
The sum of the squares
of the residuals,
2
R
  139
y  17  71  1  46 x
The 1st point has the
largest residual.
y  21  36  2  07 x
Without the 1st point,
we have a regression
line that is a much
better fit.
The sum of the squares
of the residuals,
2
R
  19  9
Download