Statistics: Scatter Diagram and Regression Line

advertisement
Statistics:
Scatter Diagram and Regression Line
Given N pairs of data ( x i , y i ) , a scatter diagram is the set of points ( x i , y i ) .
Example:
The growth of a bacterial population y is related to the surrounding
temperature x  C by the equation y  a  bx .
The following data were collected in an experiment to examine the above
relationship.
x
y
5
21
7
30
10
43
11
50
12
61
17
128
20
205
22
271
Scatter diagram:
The relation between the temperature x and the bacterial population y does
not look linear, meaning that it would be hard to draw a line that passes very
close to each of the point ( xi , yi ) .
Nonetheless, we could decide to model the relationship between x and y by
a line because it is the easiest way to model a relationship. The line that
passes closest to each of the point is called the regression line.
The linear regression equation of y on x is the equation
y  a  bx, where b 
 x  y ,
N  ( x )   x 
N  ( xi yi ) 
i
i
2
2
i
and
a
y
i
N
i
b
x
i
N
-
x is the independent variable. Eg: rainfall in mm, age in years…
-
y is the dependent variable (i.e. it depends on x ). Eg: growth of a certain
vegetable depending on the rainfall, cost of a medical treatment depending
on the age of the patient…
-
N is the number of pairs ( xi , yi ) .
Let us examine another example:
Example 2:
In a field, the growth of the lettuces (in cm) is measured every week, together
with the rainfall (in mm). The data collected are as follows:
x=Rainfall
y=growth
2
3
Scatter diagram:
1
6
0.5
8
1
5
1.5
4
2
2
This time the relationship looks linear. Let’s work out the equation of the
regression line from the formulae above:
 x  8 ,  x 
 x  y   224
2
N  6,
i
i
i
 64 ,
y
i
 28 ,
 ( x y )  31 , and
i
i
i
so that
b  3.45 , a  9.26 , and finally y  3.45 x  9.26 .
Let’s compare the graph of the regression line to the scatter diagram we
obtained:
Not bad, is it?
You can also use the regression line to make predict what the growth
would be if the rainfall were, say, 2.5:
For x  2.5 we have, from the equation of the regression line,
y  9.26  3.45  2.5  0.635
so that if the rainfall were equal to 2.5 mm , the growth of the lettuces would
be 0.635 cm .
Download