Statistics 101 - Take Home Laboratory 4 Exploring Correlation and Regression on the World Wide Web In the exercises below, you will be looking at some computer simulations involving correlation and regression. You can work alone or in groups, but each person must hand in their answers separately. The simulations focus on understanding how least squares regression lines fit through a set of points and gauge your ability to estimate the value of the correlation coefficient from a scatter plot. 1. Simulation 1: Putting Points Go to a computer that has access to the World Wide Web. Start your web browser and go to the following address (if you have an old version of a Web browser or if your browser is not Java enabled or does not support frames, you may encounter difficulties): http://www.stat.iastate.edu/courses/stat101.html Under the Webpages heading, click on the Putting points and regression lines link. There are brief instructions on how to use this simulation at the top of the screen. In the center of the screen is a blue area where you can Put Points by clicking on the left mouse button. The correlation coefficient and least squares regression line are calculated and displayed in the margin above this blue area. (a) Put 10 points on the graph that illustrate a strong positive linear relationship between two variables. What are the correlation and the equation of the least squares regression line? You only need to report the coefficients up to 3 decimal places. Click on the box Regression line to put the least squares regression line on the graph. (b) Ask for a residual plot by clicking on the Residuals. What does the residual plot look like? (c) Now put a point on the graph in the bottom middle of the blue area. What is the correlation now and how does it compare to the previous one? What is the new regression equation and how has it changed? What effect did adding this point have on the residual plot? The additional point should have a large residual. Such a point (with a large residual) is called an outlier in regression. (d) Clear the plot. Put 10 points on the graph that have a curved relationship, not linear. Looking at the regression line on the plot and the plot of residuals, how can you tell the points do not have a linear relationship? (e) Clear the plot. In the lower-left hand corner of the blue area, put points on the graph that illustrate a strong negative linear relationship. In the upper right hand corner of the blue area, put a single point. Where is the regression line? What does the residual plot look like? If the point in the upper right hand corner an outlier in regression? Now remove the point in the upper right hand corner (you can do this by clicking on Erase Point and clicking on the point to be removed). What do the regression line and residual plot look like now? Note: points like the one you just removed are called influential points - they cause noticeable changes in the slope or intercept, or both, when they are added or removed. (f) Clear the plot. Try to create plots (use about 25 points) with correlations close to the following: 0.70, 0.90, -0.30. If you are having difficulty doing this, let the program help you by putting 25 in the # of points box and a correlation in the Target correlation box. Then click on Random points. Be sure to clear the plot before resetting the target correlation. 2. Simulation 2: Guessing correlations Go back to the Web address given above but this time click on the Guessing the value of the correlation coefficient link. Ask for New plots and match the correlation coefficients to the plots. Do this for 5 rounds (more if you are on a roll, a streak of correct answers will get your name on the list of top scores). This should help you see how the different values for the correlation coefficient correspond to scatterplots of data. 1 Statistics 101 Laboratory 4 Take Home Lab Answer Sheet Due: Day and Time specified by your course instructor. Name: Section: 1. Simulation: Putting Points (a) 10 points with strong positive relationship: Correlation: Least Squares Regression Equation: (b) Residual Plot What is the formula for residuals? Describe the plot: Linear relationship adequate? Any outliers? Will prediction be bad in any areas? (i.e. megaphone shape?) (c) Add another point to the graph (bottom middle of blue area): What is the new correlation? How does it compare to the previous correlation? Least squares regression equation: How did the regression line change? What happened to the residual plot? Is the point you added an outlier? (d) 10 points that have a non-linear relationship: Correlation: Least squares regression equation: In looking at the residual plot, what do you notice or how can you tell the relationship is not linear? (e) 10 points that have a strong negative linear relationship and one point in the upper right corner: Correlation: Least squares regression equation: What does the residual plot look like? Is the point in the upper right an outlier? Erase the point: New Correlation: New regression equation: Did the regression line change: Is the point that you removed an influential point? 2 (f) Using 25 points, make plots with the following correlation and draw them in the space provided. i. r=0.70 ii. r=0.90 iii. r=-0.30 2. Simulation: Guessing correlations (a) Score on the first round: (b) Score on the second round: (c) Score on the third round: (d) Score on the fourth round: (e) Score on the fifth round: 3