Statistics 101: Section L - Laboratory 5 Take Home

advertisement
Statistics 101: Section L - Laboratory 5 Take Home
Laboratory 5 is a take home laboratory consisting three activities. It is due at the end of lab on
Tuesday, February 17. For the first two activities, you will be looking at some computer simulations
involving correlation and regression. These simulations focus on understanding how least squares
regression lines fit through a set of points and gauge your ability to estimate the value of the
correlation coefficient from a scatter plot. The third activity involves using JMP to fit the least
squares regression line to the forearm length and foot length data collected in Lab 2.
1. Simulation 1: Putting Points
Go to the Stat 101L course webpage
http://www.public.iastate.edu/∼wrstephe/stat101L.html
Under Computing click on the Putting points and regression lines link. There are
brief instructions on how to use this simulation at the top of the screen. In the center of the
screen is a blue area where you can Put Points by clicking on the left mouse button. The
correlation coefficient and least squares regression line are calculated and displayed in the
margin above this blue area.
(a) Put 10 points on the graph that illustrate a strong positive linear relationship between
two variables. What are the correlation and the equation of the least squares regression
line? You only need to report the coefficients up to 3 decimal places. Click on the box
Regression line to put the least squares regression line on the graph.
(b) Ask for a residual plot by clicking on Residuals. What does the residual plot, the one
on the left, look like? What does this tell you about the adequacy of the least squares
fit? You do not have to comment on the histogram of residuals, the plot on the right.
(c) Now put a point on the graph in the bottom middle of the blue area. What is the
correlation now and how does it compare to the previous one? What is the new regression
equation and how has it changed? What effect did adding this point have on the residual
plot? The additional point should have a large residual. Such a point (with a large
residual) is called an outlier in regression.
(d) Clear the plot. Put 10 points on the graph that have a curved relationship, not linear.
Looking at the regression line on the plot and the plot of residuals, how can you tell the
points do not have a linear relationship?
(e) Clear the plot. In the lower-left hand corner of the blue area, put points on the graph
that illustrate a strong negative linear relationship. In the upper right hand corner of
the blue area, put a single point. Where is the regression line? What does the residual
plot look like? If the point in the upper right hand corner an outlier in regression?
Now remove the point in the upper right hand corner (you can do this by clicking on
Erase Point and clicking on the point to be removed). What do the regression line
and residual plot look like now? Note: points like the one you just removed are called
influential points - they cause noticeable changes in the slope or intercept, or both,
when they are added or removed.
(f) Clear the plot. Create plots (use about 25 points) with correlations close to the following:
0.70, 0.90, -0.30. You can try to do this by individually adding the 25 points or let the
program help you by putting 25 in the # of points box and a value for the correlation
in the Target correlation box. Then click on Random points. Be sure to clear the plot
before resetting the next target correlation. No need to give any answer for this part,
this is just getting you ready for simulation 2.
1
2. Simulation 2: Guessing correlations
Go back to the Web address given above but this time click on the Guessing the value
of the correlation coefficient link under Computing. Ask for New plots and match the
correlation coefficients to the plots. Do this for 5 rounds (more if you are on a roll, a streak
of correct answers will get your name on the list of top scores). This should help you see how
the different values for the correlation coefficient correspond to scatterplots of data.
3. Regression and JMP
In laboratory 4, we looked at a “visual regression” line for predicting foot length from forearm
length. One drawback of “visual regression” is that everyone sees the plot a little differently
and so there is no consensus on one line that could be used to describe the general relationship
between forearm length and foot length. We will now use JMP to calculate the least squares
regression line for predicting foot length from forearm length.
(a) Go to the course website
http://www.public.iastate.edu/∼wrstephe/stat101L.html.
Under the Computing heading, click on the link for Forearm Length and Foot
Length. Follow the instructions for downloading a JMP file (if you are using Internet
Explorer) or TXT file (if you are using Netscape).
(b) To calculate the least squares regression line, select Analyze → Fit Y by X from the
JMP menu. Select the column foot and click on the button Y, Response. Then select
the column arm and click on the button X, Factor. Press OK.
(c) You will now have a scatterplot of the two variables, arm on the x-axis and foot on the
y-axis. To add the regression line to the output, click on the inverted red triangle next
to the words Bivariate Fit of foot By arm and select Fit Line. You should now
have a regression line on your scatterplot and regression statistics below.
(d) Click on the inverted red triangle next to the words Linear Fit and select Plot Residuals. You should now have a residual plot added to the bottom of the window.
(e) From the JMP menu, select File → Print. Use this output to answer the following
questions.
i. Give the intercept and slope for your least squares regression.
ii. Give an interpretation of each value, within the context of the problem.
iii. Give the least squares regression equation for predicting foot length from forearm
length.
iv. Find the predicted foot length for someone with a forearm length of 26 cm.
v. Three people in the class had a forearm length of 26 cm. Calculate a residual for
each of these people.
vi. Describe the residual plot. What does this residual plot tell you about the fit of the
regression line?
vii. Give the value of R 2 for this regression. Give an interpretation of this value.
viii. Based on the data, do you think a linear regression is a good approximation for the
relationship between these two variables? Explain your answer.
2
Statistics 101 Laboratory 5
Take Home Lab Answer Sheet
Name:
1. Simulation: Putting Points
(a) 10 points with strong positive relationship:
Correlation:
Least Squares Regression Equation:
(b) Residual Plot
Describe the plot:
Is the linear relationship adequate?
Any outliers?
Will prediction be bad in any areas? (i.e. megaphone shape?)
(c) Add another point to the graph (bottom middle of blue area):
What is the new correlation?
How does it compare to the previous correlation?
Least squares regression equation:
How did the regression line change?
What happened to the residual plot?
Is the point you added an outlier?
(d) 10 points that have a non-linear relationship:
Correlation:
Least squares regression equation:
In looking at the residual plot, what do you notice or how can you tell the relationship
is not linear?
3
(e) 10 points that have a strong negative linear relationship and one point in the upper right
corner:
Correlation:
Least squares regression equation:
What does the residual plot look like?
Is the point in the upper right an outlier?
Erase the point:
New Correlation:
New regression equation:
Did the regression line change?
Is the point that you removed an influential point?
2. Simulation: Guessing correlations
(a)
(b)
(c)
(d)
(e)
Score
Score
Score
Score
Score
on
on
on
on
on
the
the
the
the
the
first round:
second round:
third round:
fourth round:
fifth round:
3. Regression using JMP
(a) Intercept and slope or the least squares regression line.
(b) Interpretation of intercept and slope within the context of the problem.
4
(c) Least Squares Regression Equation
(d) Prediction of foot length for a forearm length of 26 cm.
(e) Calculating Residuals.
(f) Residual Plot
(g) R2 and interpretation
(h) Is linear regression OK for relationship?
5
Download