7)_Regression

advertisement
Regression
• This Chapter is on Regression
• We will learn the difference between
dependent and independent variables
• We will be looking at the line of best fit
• We are going to see how to calculate the
equation of the line of best fit (regression
equation), and interpret it
Regression
Variables, and the line of best fit
The equation of a straight line is
usually given in the form y = a + bx.
If y = a + bx then a is the yintercept (where the line cuts the
y-axis) and b is the gradient of the
line.
y
x=0
y=3
x=1
y=5
x=2
y=7
y = 2x + 3
8
6
4
You can draw any line like this by
choosing values for x and
substituting into the equation.
 Sketch the equation y = 2x + 3
2
0
0
1
2
3
x
7A
Regression
Variables, and the line of best fit
Independent variable (explanatory) is independent of the other
variable. It is plotted on the x-axis.
Dependent variable (response) is the one whose values are determined
by the independent variable. It is plotted on the y-axis.
For example:
 If we are looking at album sales and stores that stock albums…
 The album sales will be dependent on the number of stores selling
them
 So album sales are dependent, and the number of stores independent
7A
Regression
Variables, and the line of best fit
The formula for the line of best fit will be in the form:
y = a + bx
Sxy
b
Sxx
a  y  bx
The regression line goes through the middle
of the points plotted
y
e5
e3
e1
So you must always calculate b first!
e4
Mathematically each point is a vertical
distance ‘e’ from the line
Each of these distances is known as a residual
e2
The regression line will minimise the sum of
the squares of these residuals
x
2
e
  Minimum
7A
Regression
Variables, and the line of best fit
For the following set of data:
a) Calculate Sxx and Sxy.
b) Work out the equation of the regression line.
Sxx   x
2

 x
n
2
n5
 xy 18238
x
y
 x 300
2
22000
x  60
 y 288.6
2
16879.14
y  57.72
x y

Sxy   xy 
n
(300) 2
Sxx  22000 
5
Sxy  18238 
Sxx  4000
Sxy  922
300  288.6
5
7A
Regression
Variables, and the line of best fit
For the following set of data:
a) Calculate Sxx and Sxy.
b) Work out the equation of the regression line.
y = a + bx
Sxy
b
Sxx
a  y  bx
922
b
4000
a  57.72  (0.230560)
b  0.2305
a  43.89
y = 43.89 + 0.2305x
n5
 xy 18238
x
y
 x 300
2
22000
x  60
 y 288.6
2
16879.14
y  57.72
Sxx  4000 Sxy  922
Give answers in full, or if
rounded, to 3sf
7A
Regression
Coding and Regression Equations
As with other topics we have looked at, coding can be used to make the
numbers easier to work with.
However, the coded regression line will most likely be different from
the actual regression line
To calculate the actual regression line, you must substitute the codes
for x and y into the coded regression formula…
7B
Regression
Coding and Regression
Equations
The following coding was used to
alter a set of data.
x2
r
10
t  5y
This is the formula for the coded
regression line:
t  2r  5
Calculate the actual regression line
for the original data, x and y.
t  2r  5
Substitute the
codes for t and r
Multiply all parts
by 10 to cancel the
divide by 10
Expand the bracket
Simplify by
grouping
Divide by 50 to
leave y on its own
 x2
5y  2
5
 10 
50 y  2  x  2  50
50 y  2x  4  50
50 y  2x  46
y  2 x  46
50
OR: y = (0.04x + 0.92)
7B
Regression
Coding and Regression Equations
Eight Samples of carbon steel were
produced with different percentages
(c) of carbon in them. Each sample was
heated until it melted and the
temperature (m) recorded. The results
were coded so that:
x  10c
Sxx   x 2 
 x  36
m  700
y
5
2
3
4
5
6
7
8
Melting
Point (y)
35
28
24
16
15
12
8
6
 y  144
Sxy  478 
36  144
8
Sxy  42
Calculate Sxy and Sxx.
 xy 
(36)2
Sxx  204 
8
n
1
 204
n
x
y
Sxy   xy   
Carbon (x)
2
2
Sxx  170
The following table shows the coded
results:
x
 x
478
7B
Regression
Sxy
b
Sxx
Coding and Regression Equations
Carbon (x)
1
2
3
4
5
6
7
8
Melting
Point (y)
35
28
24
16
15
12
8
6
b
 x  204  xy  478  x  36
 y  144 Sxx  170 Sxy  42
170
42
2
b  4.048
a  y  bx
Calculate the regression line of y on x.
y = a + bx
Sxy
b
Sxx
a  y  bx
y = 36.21 - 4.048x
 85 


 21 
y y
n
x x
a
144 
36 
  4.048  
8 
8 
a  36.21
 507 


 14 
n
7B
Regression
Coding and Regression
Equations
y = 36.21 - 4.048x
Calculate the regression line of
m on c.
x  10c
m  700
y
5
y  36.21  4.048 x
Substitute the
codes for y and x
Multiply out the
bracket
Multiply by 5 to
cancel the division
m  700
 36.21  4.048(10c)
5
m  700
 36.21  40.48c
5
m  700 181.08  202.4c
Add 700
Remember, with longer
decimals, make a note of the
fraction your calculator gives,
so you can get the exact value
later on…
m  881.08  202.4c
7B
Regression
Applying and Interpreting the Regression Equation
A regression equation can be used to predict the dependent variable,
based on a chosen value of the independent variable.
Interpolation  Estimating a value that is within the data range you
have
Extrapolation  Estimating a value outside the data that you have. As
it is outside the data you have, extrapolated values can be unreliable.
Generally, avoid extrapolating values unless asked and even then treat
answers ‘with caution’…
7C
Regression
Applying and Interpreting the Regression Equation
The results from an experiment in which different masses were placed on a
spring and the resulting length of the spring measured, are shown below.
Mass, (x) kg
20
40
60
80
100
Length, y (cm)
48
55.1
56.3
61.2
68
The regression line was calculated to be:
y = 43.89 + 0.2305x
Estimate the value for y when x = 35kg. Is this Interpolation or Extrapolation?
y  43.89  0.2305 x
y  43.89  (0.2305  35)
y  51.96cm
Include
the unit!
Interpolation as x =
35 is within the data
range we have…
7C
Regression
Applying and Interpreting the Regression Equation
The results from an experiment in which different masses were placed on a
spring and the resulting length of the spring measured, are shown below.
Mass, (x) kg
20
40
60
80
100
Length, y (cm)
48
55.1
56.3
61.2
68
The regression line was calculated to be:
y = 43.89 + 0.2305x
Estimate the value for y when x = 120kg. Is this Interpolation or Extrapolation?
y  43.89  0.2305 x
y  43.89  (0.2305 120)
y  71.55cm
Include
the unit!
Extrapolation as x =
120 is outside the
data range we have…
7C
Regression
Applying and Interpreting the Regression Equation
The results from an experiment in which different masses were placed on a
spring and the resulting length of the spring measured, are shown below.
Mass, (x) kg
20
40
60
80
100
Length, y (cm)
48
55.1
56.3
61.2
68
The regression line was calculated to be:
y = 43.89 + 0.2305x
Interpret the ’43.89’ in the equation.
 If x = 0, y = 43.89
 If the mass is 0kg, the length of the spring is 43.89cm
 So the 43.89 represents the starting length of the spring!
The x represents
mass and the y
represents spring
length
7C
Regression
Applying and Interpreting the Regression Equation
The results from an experiment in which different masses were placed on a
spring and the resulting length of the spring measured, are shown below.
Mass, (x) kg
20
40
60
80
100
Length, y (cm)
48
55.1
56.3
61.2
68
The regression line was calculated to be:
y = 43.89 + 0.2305x
The x represents
mass and the y
represents spring
length
Interpret the ’0.2305’ in the equation.
 If we increase x by 1, y increases by 0.2305
 If the mass increases by 1kg, the length of the spring increases by 0.2305cm
 So the 0.2305 represents the length increase of the spring after adding on an
extra kilogram of mass
7C
Summary
• We have learnt how to calculate a line of best
fit
• We have used coding and learnt how to ‘undo’
it by substitution
• We have learnt how to interpret a regression
equation
• We have looked at Interpolation and
Extrapolation
Download