- The Teachers` Beehive

advertisement
Having learnt how to calculate the equation for a
least squares regression line, you are well on
your way to performing a regression analysis.
A full regression analysis involves several
processes that include:




Constructing a scatterplot to investigate the
nature of the relationship between the variables
Calculating the correlation coefficient (r) to give a
measure of strength of the relationship
Determining the equation of the regression line
Interpreting the coefficients ( a y-intercept and b
gradient) of the least-squares line y = a + bx




Using the regression line to make predictions
Using the coefficient of determination r2 ( to
give a measure of the predictive power of the
linear relationship
Using a residual plot to test the assumptions
of linearity
Writing a report on your findings.
Life expectancies (in years) and birth rate
(no.of births/1000 people) have been
determined for 10 countries as given below;
Birth Rate
(per 1000)
30
38
38 43 34 42 31 32 26
34
Life
Expectancy
(years)
66
54
43 42 49 45 64 61 61
66
1.
Construct a scatterplot (use calc.) Show
scaled graph complete with labelled axis
and a title.
2.
Calculate (r) and comment on the strength of
relationship
r = -0.8069

There appears to be a strong, negative linear
association between life expectancy and birth rate.
3. Find the least-squares regression line (using calc.)
a = 105.37
y = a + bx
y = 105.37 -1.44x
b = -1.44
Life expectancy = 105.37 – (1.44 x birth rate)
For the regression equation: y = a + bx

The slope b predicts the change in y when x
changes by one unit.
If b is positive, then y increases as x increases.
If b is negative, then y decreases as x increases.

The y-intercept represents the value of y
when x = 0.
4.


Interpret coefficients
Slope: on average, life expectancies (y) in
countries will decrease by 1.44 years for an
increase in birth rate (x) on one birth per
1000 people.
Intercept: on average, the life expectancy for
countries with a zero birth rate is 105.37
years.
Regression lines are used to predict y values
given x.

For example, find the life expectancy for a
country with a birth rate of 35 people per
1000 people.
5. Use regression line for predictions
Life expectancy = 105.37 – (1.44 x birth rate)
If x = 35;
y = 105.37 – (1.44 x 35)
= 54.97

On average, a country with a birth rate of 35
per 1000 people would have a life expectancy
of approximately 55 years.
6. Find ( r2 ) x 100
If r2 = 0.651
r2 x 100= 65.1%
Therefore;
 65.1% of the variation in life expectancy can
be explained by variation in birth rate.


When fitting a regression line using the leastsquares method, the greatest assumption
made is that the original data is linear.
The only way to determine linearity is by
investigating the scatterplot and by
performing a residual analysis by using the
predicted values, and comparing them to
actual values.
7. For a birth rate of 31, use the equation to
find the predicted life expectancy.
y = 105.37 – (1.44 x 31)
y = 60.73 (Predicted value when x = 31)
Actual value when x = 31 is 64.
Residual = 64 – 60.73
Residual = 3.27


A key assumption made when calculating a
least squares regression line is that the
relationship between the variables is linear.
Residual value = data value – predicted value
For country A:
Predicted life expectancy = 105.4 – 1.44(34) = 56.4 yrs
Actual life expectancy = 66 yrs (from table)
Residual value = data value – predicted value
=
66 56.4
= 9.6 yrs
The residual is positive due to actual data value lying
above the prediction line.
For country B:
Predicted life expectancy = 105.4 – 1.44(34) = 56.4 yrs
Actual life expectancy = 49 yrs (from table)
Residual value = data value – predicted value
=
49
56.4
= -7.4 yrs
The residual is negative due to actual data value lying
below the prediction line.

Conclusion: The residual plot shows no clear
pattern . The residual coordinates are
randomly scattered around the x-axis. This
confirms that the use of a linear equation to
describe the relationship between life
expectancy and birth rate is appropriate.


If a residual plot shows points randomly
scattered above and below zero, then the
original data was linear.
If a pattern is present, then a relationship
exists but is not linear.
From the scatterplot, we see there is a __________
(strong/moderate/weak) ____________ (positive/negative)
___________ (linear/non-linear) relationship between
__________ (y variable [DV]) and _________ (x variable [IV])
for this sample. The correlation coefficient is r =
___________ . There are _________ (no?) outliers. The
equation of the least-square regression line is;
____ [DV] = ____ (a) + (___ (b) x ___ [IV])
The slope of the regression line predicts that on
average, ______(DV) ________ (decreases/increases) by
________ for an (decrease/increase) in ______ (IV) (units).
The coefficient of determination indicates that on
average, ____ (x 100) % of the variation in the ____ [DV]
can be explained by the variation in _____ [IV]. The
residual plot shows _________ (a/no pattern) indicating
the original data was ________ (not linear/linear).
8.
Report
From the scatterplot, we see there is a strong, positive, linear relationship
between life expectancy and birth rate for this sample. The correlation
coefficient is r = - 0.807 . There are no apparent outliers. The equation
of the least-square regression line is;
Life Expectancy = 105.4 – ( 1.44 x Birth Rate)
The slope of the regression line predicts that on average, life expectancy
decreases by 1.44 years for an increase in birth rate of one per 1000
people.
The coefficient of determination indicates that on average, 65.1% of the
variation in life expectancy can be explained by the variation in birth
rate. The residual plot shows no clear pattern indicating the original
data was linear.
Download