Statistics 101 - Take Home Laboratory 4

advertisement
Statistics 101 - Take Home Laboratory 4
Exploring Correlation and Regression on the World Wide Web
In the exercises below, you will be looking at some computer simulations involving correlation and regression.
You can work alone or in groups, but each person must hand in their answers separately. The simulations
focus on understanding how least squares regression lines fit through a set of points and gauge your ability
to estimate the value of the correlation coefficient from a scatter plot.
1. Simulation 1: Putting Points
Go to a computer that has access to the World Wide Web. Start your web browser and go to the
following address (if you have an old version of a Web browser or if your browser is not Java enabled
or does not support frames, you may encounter difficulties):
http://www.stat.iastate.edu/courses/stat101.html
Under the Webpages heading, click on the Putting points and regression lines link. There are
brief instructions on how to use this simulation at the top of the screen. In the center of the screen is a
blue area where you can Put Points by clicking on the left mouse button. The correlation coefficient
and least squares regression line are calculated and displayed in the margin above this blue area.
(a) Put 10 points on the graph that illustrate a strong positive linear relationship between two variables. What are the correlation and the equation of the least squares regression line? You only
need to report the coefficients up to 3 decimal places. Click on the box Regression line to put
the least squares regression line on the graph.
(b) Ask for a residual plot by clicking on the Residuals. What does the residual plot look like?
(c) Now put a point on the graph in the bottom middle of the blue area. What is the correlation
now and how does it compare to the previous one? What is the new regression equation and
how has it changed? What effect did adding this point have on the residual plot? The additional
point should have a large residual. Such a point (with a large residual) is called an outlier in
regression.
(d) Clear the plot. Put 10 points on the graph that have a curved relationship, not linear. Looking
at the regression line on the plot and the plot of residuals, how can you tell the points do not
have a linear relationship?
(e) Clear the plot. In the lower-left hand corner of the blue area, put points on the graph that
illustrate a strong negative linear relationship. In the upper right hand corner of the blue area,
put a single point. Where is the regression line? What does the residual plot look like? If the
point in the upper right hand corner an outlier in regression? Now remove the point in the upper
right hand corner (you can do this by clicking on Erase Point and clicking on the point to be
removed). What do the regression line and residual plot look like now? Note: points like the one
you just removed are called influential points - they cause noticeable changes in the slope or
intercept, or both, when they are added or removed.
(f) Clear the plot. Try to create plots (use about 25 points) with correlations close to the following:
0.70, 0.90, -0.30. If you are having difficulty doing this, let the program help you by putting 25 in
the # of points box and a correlation in the Target correlation box. Then click on Random
points. Be sure to clear the plot before resetting the target correlation.
2. Simulation 2: Guessing correlations
Go back to the Web address given above but this time click on the Guessing the value of the
correlation coefficient link. Ask for New plots and match the correlation coefficients to the plots.
Do this for 5 rounds (more if you are on a roll, a streak of correct answers will get your name on
the list of top scores). This should help you see how the different values for the correlation coefficient
correspond to scatterplots of data.
1
Statistics 101 Laboratory 4
Take Home Lab Answer Sheet
Due: Day and Time specified by your course instructor.
Name:
Section:
1. Simulation: Putting Points
(a) 10 points with strong positive relationship:
Correlation:
Least Squares Regression Equation:
(b) Residual Plot
What is the formula for residuals?
Describe the plot:
Linear relationship adequate?
Any outliers?
Will prediction be bad in any areas? (i.e. megaphone shape?)
(c) Add another point to the graph (bottom middle of blue area):
What is the new correlation?
How does it compare to the previous correlation?
Least squares regression equation:
How did the regression line change?
What happened to the residual plot?
Is the point you added an outlier?
(d) 10 points that have a non-linear relationship:
Correlation:
Least squares regression equation:
In looking at the residual plot, what do you notice or how can you tell the relationship is not linear?
(e) 10 points that have a strong negative linear relationship and one point in the upper
right corner:
Correlation:
Least squares regression equation:
What does the residual plot look like?
Is the point in the upper right an outlier?
Erase the point:
New Correlation:
New regression equation:
Did the regression line change:
Is the point that you removed an influential point?
2
(f) Using 25 points, make plots with the following correlation and draw them in the
space provided.
i. r=0.70
ii. r=0.90
iii. r=-0.30
2. Simulation: Guessing correlations
(a) Score on the first round:
(b) Score on the second round:
(c) Score on the third round:
(d) Score on the fourth round:
(e) Score on the fifth round:
3
Download