EXPLORATION

advertisement
EXPLORATION
Plot one point on the graph and then click Show Line. Why do you think a line is not
graphed?
Clear the graph and plot two points that have whole-number coordinates.


On your own, find an equation for the line through these two points.
Click Show Line. Compare the equation for the line drawn to the equation that
you calculated. Explain and resolve any differences.
Clear the graph and plot three points. Think about a line that "fits" these three points as
closely as possible.



Is it possible for a single straight line to contain all three of the points you plotted?
On a piece of paper, plot these same three points, and sketch a line that you think
best fits the three points.
Click Show Line. Do you think that the line graphed fits the points well? How
does it compare to the line you drew?
Clear the graph. Place several points on the graph that lie roughly in a straight line, then
hit Show Line. The line that appears is the regression line, which is sometimes known as
the "line of best fit."



What is the r-value for the line?
Place just one additional point on the graph that lies far away from the line. What
effect does this point have on the r-value? What effect does it have on the line of
best fit?
Move several of the points. How does the r-value and line change as points are
moved?
The line that is drawn is called the "least-squares regression line." Bascially, the leastsquares regression line is the line that minimizes the squared "errors" between the actual
points and the points on the line. This makes the line fit the points. To get a better feel for
the regression line, try the following tasks.


Plot four points so that the regression line is horizontal. Do this in several
different ways. What do you notice about the regression line and the r-value?
Plot three points (not all on a straight line) so that the regression line is horizontal.
What do you notice about the regression line and the r-value?
Explore the Relationship Between Correlation
and Linear Association
Use the interactive math applet below to help you answer these questions:
1. Compare the r-values for the following three situations.
a. Create a scatterplot that you think shows a strong positive linear association
between the two variables. Sketch a picture of this scatterplot. What is the rvalue?
b. Create a scatterplot that you think shows a strong negative linear association
between the two variables. Sketch a picture of this scatterplot. What is the rvalue?
c. Create a scatterplot that you think shows no linear association between the
two variables. Sketch a picture of this scatterplot. What is the r-value?
2. For each r-value below, create a scatterplot that has that exact r-value. Sketch a picture
of that graph.
a.
r=1
b. r = -1
c. r =0
3. Plot several points that exhibit a strong positive linear trend, and then plot one outlier.
a. Overall, is this scatterplot roughly linear?
b. Is the r-value close to 1?
c. What is the r value?
4. In the lower left corner of the coordinate plane, plot 10 points that exhibit no trend
(this is sometimes called a "cloud" of points). Then plot one point in the upper right
corner.
a. Overall, is this scatterplot linear?
b. Is the r-value close to 1?
c. What is the r-value?
5. Does a high r-value necessarily mean that the data are generally linear?
6. Does an r-value close to zero always mean that the data are not linear?
Journal: Explain what you have learned about r, the correlation coefficient.
SUMMARY
Pearson Correlation Coefficient
An important question that comes up in determining a curve to fit our
data points is: How scattered can the points be and still have a shape that can
be represented by a curve? The idea of correlation helps to measure this. The
value r is Pearson's correlation coefficient. It is a measure of the linear
association between the horizontal variable and the vertical variable. It gives
information about how tightly packed the data points are about the
regression line. It thereby also gives information about how well the
regression line fits the data. The r-values can range from -1 (strong negative
linear association) to 0 (no linear association) to +1 (strong positive linear
association). But beware! The correlation coefficient, r, is sometimes
misleading. You should always look at the scatterplot and combine that
knowledge with the r-value in order to draw valid conclusions about the
strength of the linear association.
The moral is that the correlation coefficient, r, is a valuable tool for studying
the linear association between two variables, but it does not fully explain the
association (in fact, no statistic does).
Download