Regression Line and Prediction. - Department of Mathematics and

advertisement
MATH 2560 C F03
Elementary Statistics I
LECTURE 8: Least-Squares Regression:
Regression Line and Prediction.
1
Outline
⇒ regresion line;
⇒ fitting a line to data;
⇒ prediction;
⇒ regression line with Excel;
2
Regression Line
⇒ A regression line summarizes the relationship between two variables
in the setting when one of the variables helps explain or predict the other.
⇒ Regression describes a relationship between an explanatory variables
and a response variable.
Regression Line
A regression line is a straight line that describes
how a response variable y changes as an explanatory variable x changes.
A regression line is used to predict the value of y for a given value of x.
Regression, unlike correlation, requires that we have an explanatory variable
and a response variable.
Example 2.10: Heights of Children in Kalama, Egypt (Table 2.7
and Figure 2.11)
The data were obtained by measuring the heights of 161 children from the
village each month from 18 to 29 months of age. Figure 2.11 is a scatterplot
of the data in Table 2.7.
Age is the explanatory variable, which is plotted on the x axis.
Mean height (in cm) is the response variable.
We can see on the plot a strong positive linear association with no outliers.
The correlation is r=0.994, close to the r = 1 of points that lie exactly
on a line.
A line drawn through the points will describe these data very well.
This line is called the regression line.
3
Fitting a Line to Data
The overall pattern can be described by drawing a stright line through the
points (we note, that a scatterplot displays a linear pattern).
⇒ Fitting a Line to data means drawing a line that comes as close as
possible to the points.
(Of course, no stright line passes exactly through all of the points).
⇒ The equation of a line fitted to the data gives a compact description of the dependence of the response variable y on the explanatory variable
x.
⇒ It is a mathematical model for the stright-line relationship.
Stright Line
Let y is a response variable and x is an explanatory variable.
A stright line relating y to x has an equation of the form
y = a + bx.
In this equation, b is the slope, the amount by which y changes when x increases
by one unit.
The number a is the intercept, the value of y when x = 0.
Example 2.10. (Table 2.7, Figure 2.11 and Figure 2.12).
The stright line describing the Kalama data has the form
height = a + (b × age).
In Figure 2.12 the regression line has been drawn with the following equation
height = 64.93 + (0.635 × age).
⇒ The figure shows that this line fits the data well.
The slope b = 0.635 tells us that the height of Kalama children increases
by about 0.6 cm for each month of age.
The slope b of a line y = a + bx is the rate of change in the response y as
the explanatory variable x changes.
⇒ The slope of a regression line is an important numerical description
of the relationship between the two variables.
4
Prediction
=⇒ A regression line is used to predict the response y for a specific value
of the explanatory variable x.
Example 1. Predict the mean height of Kalama children at 32 months of age.
We use the Figure 2.12: from age 32 months on the x axis, go up to the fitted
line and over to the y axis. The predicted height is a bit more than 85 cm.
It is faster and more accurate to substitute 32 for the age in the equation
of the regression line.
Our predicted height is
height = 64.93 + (0.635 × 32) = 85.25cm.
Important Remark: The accuracy of predictions from a regression line
dependes on how much scatter about the line the data show. Kalama example: the data points are all very close to the line, so we are confident that
our prediction is accurate. If the data show a linear pattern with considerable spread, we may use a regression line but we will put less confidence in
predictions based on the line.
Example 2. Predict the mean height of Kalama children at 20 years of age.
20 years is 240 months, so we substitute 240 for the age. The prediction is:
height = 64.93 + (0.635 × 240) = 217.33cm.
Blind calculation has produced an unreasonable result.
The data cover only ages from 18 to 29 months.
As people grow older, they gain height more slowly, so our fitted line is not good
model at ages far removed from the data that produced it.
Extrapolation
Extrapolation is the use of a regression line for prediction
far outside the range of values of the explanatory variable x that you used to obtain
the line.
Such predictions are often not accurate.
5
Summary
1. A regression line is stright line that describes how a response variable
y changes as an explanatory variable x changes.
2. A regression line is used to predict the value of y for any value of x by
substituting this x into the eqution of the line.
Exptrapolation beyond the range of x values spanned by the data is
risky.
3. The slope b of a regression line ŷ = a + bx is the rate at which the
predicted response ŷ changes along the line as the explanatory variable x
changes.
Specifically, b is the change in ŷ when x increases by 1.
4. The intercept a of a regression line ŷ = a + bx is the predicted response
ŷ when the explanatory variable x = 0.
This prediction is of no statistical use unless x can actually take values
near 0.
Download