Statistics 101 - Take Home Laboratory 4 Exploring Correlation and Regression on the World Wide Web In the exercises below, you will be looking at some computer simulations involving correlation and regression. You can work alone or in groups, but each person must hand in their answers separately. The simulations focus on understanding how least squares regression lines fit through a set of points and gauge your ability to estimate the value of the correlation coefficient from a scatter plot. 1. Simulation 1: Putting Points Go to a computer that has access to the World Wide Web. Start your web browser and go to the following address (if you have an old version of a Web browser or if your browser is not Java enabled or does not support frames, you may encounter difficulties): http://streaming.stat.iastate.edu/∼stat101/homepage.html Under the Computing - Websites heading, click on the Putting points and regression lines link. A Data Applet will appear. Click on Scatter Plots in the middle of the top row. Click on Blank Plot on the left hand side of the display. Then click on Update at the bottom right. A set of axes will appear. The Y (vertical) axis goes from 0 to 100 as does the X (horizontal) axis. By putting your cursor anywhere in the area bounded by the X and Y axes you are able to add points to the plot. As you add points to the plot, the correlation coefficient and least squares regression line are calculated and displayed in the blue shaded area at the top left. (a) Put 10 points on the graph that illustrate a strong positive linear relationship between two variables. What are the correlation and the equation of the least squares regression line? You only need to report the coefficients up to 3 decimal places. Click on the box Regression to put the least squares regression line on the graph. (b) Ask for a residual plot by clicking on the Residuals. What does the residual plot look like? (c) Now put a point on the graph in the bottom middle of the plot. What is the correlation now and how does it compare to the previous one? What is the new regression equation and how has it changed? What effect did adding this point have on the residual plot? The additional point should have a large residual. Such a point (with a large residual) is called an outlier in regression. (d) Clear the plot by clicking on Update. Put 10 points on the graph that have a curved relationship, not linear. Looking at the regression line on the plot and the plot of residuals, how can you tell the points do not have a linear relationship? (e) Clear the plot by clicking on Update. In the lower-left hand corner of the plot, put points on the graph that illustrate a strong negative linear relationship. In the upper right hand corner of the plot, put a single point. Where is the regression line? What does the residual plot look like? If the point in the upper right hand corner an outlier in regression? Now remove the point in the upper right hand corner (you can do this by clicking on the arrow next to Add Point and change this to Remove Point and then clicking on the point to be removed). What do the regression line and residual plot look like now? Note: points like the one you just removed are called influential points - they cause noticeable changes in the slope or intercept, or both, when they are added or removed. (f) Clear the plot by clicking on Update. Try to create plots (use about 25 points) with correlations close to the following: 0.70, 0.90, -0.30. 2. Simulation 2: Guessing correlations Go back to the Web address given above but this time click on the Guessing the value of the correlation coefficient link. Ask for New plots and match the correlation coefficients to the plots. Do this for 5 rounds (more if you are on a roll, a streak of correct answers can get your name on the list of top scores). This should help you see how the different values for the correlation coefficient correspond to scatterplots of data. 1 Statistics 101 Laboratory 4 Take Home Lab Answer Sheet Due: Day and Time specified by your course instructor. Name: Section: 1. Simulation: Putting Points (a) 10 points with strong positive relationship: Correlation: Least Squares Regression Equation: (b) Residual Plot What is the formula for residuals? Describe the plot: Linear relationship adequate? Any outliers? Will prediction be bad in any areas? (i.e. megaphone shape?) (c) Add another point to the graph (bottom middle of plot): What is the new correlation? How does it compare to the previous correlation? Least squares regression equation: How did the regression line change? What happened to the residual plot? Is the point you added an outlier? (d) 10 points that have a non-linear relationship: Correlation: Least squares regression equation: In looking at the residual plot, what do you notice or how can you tell the relationship is not linear? (e) 10 points that have a strong negative linear relationship and one point in the upper right corner: Correlation: Least squares regression equation: What does the residual plot look like? Is the point in the upper right an outlier? Erase the point: New Correlation: New regression equation: Did the regression line change: Is the point that you removed an influential point? 2 (f) Using 25 points, make plots with the following correlation and draw them in the space provided. i. r=0.70 ii. r=0.90 iii. r=-0.30 2. Simulation: Guessing correlations (a) Score on the first round: (b) Score on the second round: (c) Score on the third round: (d) Score on the fourth round: (e) Score on the fifth round: 3