Chapter 65
Linear regression
65.1 Introduction to linear
regression
Regression analysis, usually termed regression, is used
to draw the line of ‘best fit’ through co-ordinates on a
graph. The techniques used enable a mathematical equation of the straight line form y = mx + c to be deduced
for a given set of co-ordinate values, the line being such
that the sum of the deviations of the co-ordinate values
from the line is a minimum, i.e. it is the line of ‘best
fit’. When a regression analysis is made, it is possible to
obtain two lines of best fit, depending on which variable
is selected as the dependent variable and which variable
is the independent variable. For example, in a resistive
electrical circuit, the current flowing is directly proportional to the voltage applied to the circuit. There are
two ways of obtaining experimental values relating the
current and voltage. Either, certain voltages are applied
to the circuit and the current values are measured, in
which case the voltage is the independent variable and
the current is the dependent variable; or, the voltage can
be adjusted until a desired value of current is flowing
and the value of voltage is measured, in which case the
current is the independent value and the voltage is the
dependent value.
65.2 The least-squares regression
lines
For a given set of co-ordinate values, (X1 , Y1 ),
(X2 , Y2 ), . . . , (Xn , Yn ) let the X values be the independent variables and the Y -values be the dependent values.
Also let D1 , . . . , Dn be the vertical distances between the
line shown as PQ in Fig. 65.1 and the points representing the co-ordinate values. The least-squares regression
line, i.e. the line of best fit, is the line which makes the
value of D12 + D22 + · · · + Dn2 a minimum value.
Y
(Xn, Yn)
Q
Dn
H4
H3
(X1, Y1)
D2
(X2, Y2)
D1
P
X
Figure 65.1
The equation of the least-squares regression line is
usually written as Y = a0 + a1 X, where a0 is the Y -axis
intercept value and a1 is the gradient of the line (analogous to c and m in the equation y = mx + c). The values
of a0 and a1 to make the sum of the ‘deviations squared’
a minimum can be obtained from the two equations:
Y = a0 N + a1 X
(1)
(XY) = a0 X + a1 X 2
(2)
where X and Y are the co-ordinate values, N is the
number of co-ordinates and a0 and a1 are called the
regression coefficients of Y on X. Equations (1) and (2)
are called the normal equations of the regression line
of Y on X. The regression line of Y on X is used to
estimate values of Y for given values of X.
If the Y -values (vertical-axis) are selected as the independent variables, the horizontal distances between the
Copyright © 2010 John Bird. Published by Elsevier Ltd. All rights reserved.
Linear regression 7
line shown as PQ in Fig. 65.1 and the co-ordinate values
(H3 , H4 , etc.) are taken as the deviations. The equation
of the regression line is of the form: X = b0 + b1 Y and
the normal equations become:
X = b0 N + b1 Y
(3)
(XY) = b0 Y + b1 Y 2
(4)
where X and Y are the co-ordinate values, b0 and b1
are the regression coefficients of X on Y and N is the
number of co-ordinates. These normal equations are of
the regression line of X on Y , which is slightly different
to the regression line of Y on X. The regression line of
X on Y is used to estimate values of X for given values
of Y . The regression line of Y on X is used to determine
any value of Y corresponding to a given value of X. If
the value of Y lies within the range of Y -values of the
extreme co-ordinates, the process of finding the corresponding value of X is called linear interpolation. If
it lies outside of the range of Y -values of the extreme
co-ordinates then the process is called linear extrapolation and the assumption must be made that the line of
best fit extends outside of the range of the co-ordinate
values given.
By using the regression line of X on Y , values of X
corresponding to given values of Y may be found by
either interpolation or extrapolation.
65.3 Worked problems on linear
regression
Problem 1. In an experiment to determine the
relationship between frequency and the inductive
reactance of an electrical circuit, the following
results were obtained:
Since the regression line of inductive reactance on frequency is required, the frequency is the independent
variable, X, and the inductive reactance is the dependent variable, Y . The equation of the regression line of
Y on X is:
Y = a0 + a1 X
and the regression coefficients a0 and a1 are obtained
by using the normal equations
Y = a0 N + a1
X
and
XY = a0
X + a1
X2
(from equations (1) and (2))
A tabular approach is used to determine the summed
quantities.
Frequency, X
Inductive
reactance, Y
X2
50
30
2500
100
65
10 000
150
90
22 500
200
130
40 000
250
150
62 500
300
190
90 000
350
X = 1400
200
Y = 855
122 500
2
X = 350 000
Y2
XY
Frequency (Hz)
50
100
150
1500
900
Inductive
reactance (ohms)
30
65
90
6500
4225
13 500
8100
200
250
300
350
26 000
16 900
Inductive
130
reactance (ohms)
150
190
200
37 500
22 500
57 000
36 100
70 000
40 000
Frequency (Hz)
Determine the equation of the regression line of
inductive reactance on frequency, assuming a linear
relationship.
XY = 212 000
Copyright © 2010 John Bird. Published by Elsevier Ltd. All rights reserved.
Y 2 = 128 725
8 Engineering Mathematics
The number of co-ordinate values given, N is 7.
Substituting in the normal equations gives:
855 = 7a0 + 1400a1
212 000 = 1400a0 + 350 000a1
(2)
(3)
(4)
(4)–(3) gives:
287 000 = 0 + 490 000a1
(5)
287 000
= 0.586
490 000
Substituting a1 = 0.586 in equation (1) gives:
from which, a1 =
855 = 7a0 + 1400(0.586)
i.e.
a0 =
855 − 820.4
= 4.94
7
Thus the equation of the regression line of inductive
reactance on frequency is:
Y = 4.94 + 0.586X
Problem 2. For the data given in Problem 1,
determine the equation of the regression line of
frequency on inductive reactance, assuming a linear
relationship
In this case, the inductive reactance is the independent
variable X and the frequency is the dependent variable
Y . From equations 3 and 4, the equation of the regression
line of X on Y is:
X = b0 + b1 Y
and the normal equations are
X = b0 N + b1
Y
and
XY = b0
Y + b1
Y2
From the table shown in Problem 1, the simultaneous
equations are:
1400 = 7b0 + 855b1
212 000 = 855b0 + 128 725b1
and
b1 = 1.69, correct to 3 significant figures.
Thus the equation of the regression line of frequency on
inductive reactance is:
7 × (2) gives:
1 484 000 = 9800a0 + 2 450 000a1
b0 = −6.15
(1)
1400 × (1) gives:
1 197 000 = 9800a0 + 1 960 000a1
Solving these equations in a similar way to that in
problem 1 gives:
X = −6.15 + 1.69Y
Problem 3. Use the regression equations
calculated in Problems 1 and 2 to find (a) the value
of inductive reactance when the frequency is
175 Hz, and (b) the value of frequency when the
inductive reactance is 250 ohms, assuming the line
of best fit extends outside of the given co-ordinate
values. Draw a graph showing the two regression
lines
(a) From Problem 1, the regression equation of inductive reactance on frequency is:
Y = 4.94 + 0.586X. When the frequency, X, is
175 Hz, Y = 4.94 + 0.586(175) = 107.5, correct
to 4 significant figures, i.e. the inductive reactance
is 107.5 ohms when the frequency is 175 Hz.
(b) From Problem 2, the regression equation of frequency on inductive reactance is:
X = −6.15 + 1.69Y . When the inductive reactance, Y , is 250 ohms, X = −6.15 + 1.69(250) =
416.4 Hz, correct to 4 significant figures, i.e.
the frequency is 416.4 Hz when the inductive
reactance is 250 ohms.
The graph depicting the two regression lines is shown
in Fig. 65.2. To obtain the regression line of inductive reactance on frequency the regression line equation
Y = 4.94 + 0.586X is used, and X (frequency) values of
100 and 300 have been selected in order to find the corresponding Y values. These values gave the co-ordinates
as (100, 63.5) and (300, 180.7), shown as points A
and B in Fig. 65.2. Two co-ordinates for the regression
line of frequency on inductive reactance are calculated
using the equation X = −6.15 + 1.69Y , the values of
inductive reactance of 50 and 150 being used to obtain
the co-ordinate values. These values gave co-ordinates
(78.4, 50) and (247.4, 150), shown as points C and D
in Fig. 65.2.
It can be seen from Fig. 65.2 that to the scale drawn,
the two regression lines coincide. Although it is not
necessary to do so, the co-ordinate values are also shown
to indicate that the regression lines do appear to be the
Copyright © 2010 John Bird. Published by Elsevier Ltd. All rights reserved.
Linear regression 9
Y
Using a tabular approach to determine the
values of the summations gives:
Inductive reactance in ohms
300
250
Radius, X
200
B
D
150
100
A
50
0
55
5
3025
30
10
900
16
15
256
12
20
144
11
25
121
9
30
81
7
35
49
5
40
C
100
200 300 400
Frequency in hertz
X
500
Figure 65.2
lines of best fit. A graph showing co-ordinate values is
called a scatter diagram in statistics.
X = 145
Force (N)
25
2
X = 4601
Y = 180
Y2
XY
Problem 4. The experimental values relating
centripetal force and radius, for a mass travelling at
constant velocity in a circle, are as shown:
275
25
300
100
240
225
240
400
275
625
270
900
245
1225
200
1600
5 10 15 20 25 30 35 40
Radius (cm) 55 30 16 12 11
9
7
5
Determine the equations of (a) the regression line
of force on radius and (b) the regression line of
force on radius. Hence, calculate the force at a
radius of 40 cm and the radius corresponding to a
force of 32 N
Let the radius be the independent variable X, and the
force be the dependent variable Y . (This decision is
usually based on a ‘cause’ corresponding to X and an
‘effect’ corresponding to Y .)
(a) The equation of the regression line of force on
radius is of the form Y = a0 + a1 X and the constants a0 and a1 are determined from the normal
equations:
and
X2
Force, Y
X
Y = a0 N + a1
XY = a0
X + a1
X2
(from equations (1) and (2))
Thus
and
XY = 2045
Y 2 = 5100
180 = 8a0 + 145a1
2045 = 145a0 + 4601a1
Solving these simultaneous equations gives
a0 = 33.7 and a1 = −0.617, correct to 3 significant figures. Thus the equation of the regression
line of force on radius is:
Y = 33.7 − 0.617X
Copyright © 2010 John Bird. Published by Elsevier Ltd. All rights reserved.
10 Engineering Mathematics
(b) The equation of the regression line of radius on
force is of the form X = b0 + b1 Y and the constants b0 and b1 are determined from the normal
equations:
Y
X = b0 N + b1
and
XY = b0
Y + b1
Y2
(from equations (3) and (4))
The values of the summations have been obtained
in part (a) giving:
145 = 8b0 + 180b1
and
X = 44.2 − 1.16Y
The force, Y , at a radius of 40 cm, is obtained
from the regression line of force on radius, i.e.
Y = 33.7 − 0.617(40) = 9.02,
i.e. the force at a radius of 40 cm is 9.02 N
The radius, X, when the force is 32 Newton’s
is obtained from the regression line of radius on
force, i.e. X = 44.2 − 1.16(32) = 7.08,
i.e. the radius when the force is 32 N is 7.08 cm
Now try the following exercise
Exercise 222 Further problems on linear
regression
In Problems 1 and 2, determine the equation of the
regression line of Y on X, correct to 3 significant
figures.
X
Y
14
900
18
1200
23
1600
30
2100
50
3800
[Y = −256 + 80.6X]
2.
3. The data given in Problem 1.
[X = 3.20 + 0.0124Y ]
4. The data given in Problem 2.
[X = −0.056 + 4.56Y ]
5. The relationship between the voltage applied
to an electrical circuit and the current flowing
is as shown:
Current
(mA)
2045 = 180b0 + 5100b1
Solving these simultaneous equations gives
b0 = 44.2 and b1 = −1.16, correct to 3 significant
figures. Thus the equation of the regression line of
radius on force is:
1.
In Problems 3 and 4, determine the equations
of the regression lines of X on Y for the data
stated, correct to 3 significant figures.
X 6
3
9
15 2
14 21 13
Y 1.3 0.7 2.0 3.7 0.5 2.9 4.5 2.7
[Y = 0.0477 + 0.216X]
2
4
6
8 10 12 14
Applied
voltage (V) 5 11 15 19 24 28 33
Assuming a linear relationship, determine the
equation of the regression line of applied voltage, Y , on current, X, correct to 4 significant
figures.
[Y = 1.142 + 2.268X]
6. For the data given in Problem 5, determine the
equation of the regression line of current on
applied voltage, correct to 3 significant figures.
[X = −0.483 + 0.440Y ]
7. Draw the scatter diagram for the data given
in Problem 5 and show the regression lines
of applied voltage on current and current on
applied voltage. Hence determine the values
of (a) the applied voltage needed to give a
current of 3 mA and (b) the current flowing
when the applied voltage is 40 volts, assuming
the regression lines are still true outside of the
range of values given.
[(a) 7.92 V (b) 17.1 mA]
8. In an experiment to determine the relationship
between force and momentum, a force, X, is
applied to a mass, by placing the mass on an
inclined plane, and the time, Y , for the velocity
to change from u m/s to v m/s is measured. The
results obtained are as follows:
Force (N)
Time (s)
11.4
18.7
0.56
Copyright © 2010 John Bird. Published by Elsevier Ltd. All rights reserved.
0.35
11.7
0.55
Linear regression 11
Force (N) 12.3 14.7 18.8 19.6
Time (s)
0.52 0.43 0.34 0.31
Determine the equation of the regression line
of time on force, assuming a linear relationship
between the quantities, correct to 3 significant
figures.
[Y = 0.881 − 0.0290X]
10. Draw a scatter diagram for the data given in
Problem 8 and show the regression lines of
time on force and force on time. Hence find
(a) the time corresponding to a force of 16 N,
and (b) the force at a time of 0.25 s, assuming
the relationship is linear outside of the range
of values given.
[(a) 0.417 s (b) 21.7 N]
9. Find the equation for the regression line of
force on time for the data given in Problem 8,
correct to 3 decimal places.
[X = 30.194 − 34.039Y ]
Copyright © 2010 John Bird. Published by Elsevier Ltd. All rights reserved.