Statistics 101 - Take Home Laboratory 4

advertisement
Statistics 101 - Take Home Laboratory 4
Exploring Correlation and Regression on the World Wide Web
In the exercises below, you will be looking at some computer simulations involving correlation and regression.
You can work alone or in groups, but each person must hand in their answers separately. The simulations
focus on understanding how least squares regression lines fit through a set of points and gauge your ability
to estimate the value of the correlation coefficient from a scatter plot.
1. Simulation 1: Putting Points
Go to a computer that has access to the World Wide Web. Start your web browser and go to the
following address (if you have an old version of a Web browser or if your browser is not Java enabled
or does not support frames, you may encounter difficulties):
http://streaming.stat.iastate.edu/∼stat101/homepage.html
Under the Computing - Websites heading, click on the Putting points and regression lines link.
A Data Applet will appear. Click on Scatter Plots in the middle of the top row. Click on Blank
Plot on the left hand side of the display. Then click on Update at the bottom right. A set of axes
will appear. The Y (vertical) axis goes from 0 to 100 as does the X (horizontal) axis. By putting your
cursor anywhere in the area bounded by the X and Y axes you are able to add points to the plot. As
you add points to the plot, the correlation coefficient and least squares regression line are calculated
and displayed in the blue shaded area at the top left.
(a) Put 10 points on the graph that illustrate a strong positive linear relationship between two variables. What are the correlation and the equation of the least squares regression line? You only
need to report the coefficients up to 3 decimal places. Click on the box Regression to put the
least squares regression line on the graph.
(b) Ask for a residual plot by clicking on the Residuals. What does the residual plot look like?
(c) Now put a point on the graph in the bottom middle of the plot. What is the correlation now
and how does it compare to the previous one? What is the new regression equation and how
has it changed? What effect did adding this point have on the residual plot? The additional
point should have a large residual. Such a point (with a large residual) is called an outlier in
regression.
(d) Clear the plot by clicking on Update. Put 10 points on the graph that have a curved relationship,
not linear. Looking at the regression line on the plot and the plot of residuals, how can you tell
the points do not have a linear relationship?
(e) Clear the plot by clicking on Update. In the lower-left hand corner of the plot, put points on
the graph that illustrate a strong negative linear relationship. In the upper right hand corner
of the plot, put a single point. Where is the regression line? What does the residual plot look
like? If the point in the upper right hand corner an outlier in regression? Now remove the point
in the upper right hand corner (you can do this by clicking on the arrow next to Add Point
and change this to Remove Point and then clicking on the point to be removed). What do the
regression line and residual plot look like now? Note: points like the one you just removed are
called influential points - they cause noticeable changes in the slope or intercept, or both, when
they are added or removed.
(f) Clear the plot by clicking on Update. Try to create plots (use about 25 points) with correlations
close to the following: 0.70, 0.90, -0.30.
2. Simulation 2: Guessing correlations
Go back to the Web address given above but this time click on the Guessing the value of the
correlation coefficient link. Ask for New plots and match the correlation coefficients to the plots.
Do this for 5 rounds (more if you are on a roll, a streak of correct answers can get your name on the
list of top scores). This should help you see how the different values for the correlation coefficient
correspond to scatterplots of data.
1
Statistics 101 Laboratory 4
Take Home Lab Answer Sheet
Due: Day and Time specified by your course instructor.
Name:
Section:
1. Simulation: Putting Points
(a) 10 points with strong positive relationship:
Correlation:
Least Squares Regression Equation:
(b) Residual Plot
What is the formula for residuals?
Describe the plot:
Linear relationship adequate?
Any outliers?
Will prediction be bad in any areas? (i.e. megaphone shape?)
(c) Add another point to the graph (bottom middle of plot):
What is the new correlation?
How does it compare to the previous correlation?
Least squares regression equation:
How did the regression line change?
What happened to the residual plot?
Is the point you added an outlier?
(d) 10 points that have a non-linear relationship:
Correlation:
Least squares regression equation:
In looking at the residual plot, what do you notice or how can you tell the relationship is not linear?
(e) 10 points that have a strong negative linear relationship and one point in the upper
right corner:
Correlation:
Least squares regression equation:
What does the residual plot look like?
Is the point in the upper right an outlier?
Erase the point:
New Correlation:
New regression equation:
Did the regression line change:
Is the point that you removed an influential point?
2
(f) Using 25 points, make plots with the following correlation and draw them in the
space provided.
i. r=0.70
ii. r=0.90
iii. r=-0.30
2. Simulation: Guessing correlations
(a) Score on the first round:
(b) Score on the second round:
(c) Score on the third round:
(d) Score on the fourth round:
(e) Score on the fifth round:
3
Download