Assignment #1

advertisement
Assignment 1 - sink or swim time!
For each of the equations below, complete the following:
1. Create a random dataset representing the equation in excel
a. Create 100 x-values that follow a uniform distribution ranging from 0 to 10
b. Create 100 error values that follow a normal distribution with error specified
below
c. Create 100 y-values as predicted by the true equation below
d. Save the data in comma-delimited format (csv) to the desktop or other easily
accessible location
e. HINT: don’t close excel, you can re-use your work (with slight modifications) for
each equation below. Alternatively, save a separate file with the xls extension,
which will preserve the equations (csv saves just values).
2. Import the data into R
3. Use ‘head’ to verify that the import worked
4. Create an object that is a regression between y and x. Note that if you name your
objects different things (e.g., results1, results2, etc.), you can go back and compare
different models at a later time.
5. Examine the ‘summary’ for the regression object
a. Are estimates of slope, intercept, and standard error close to truth?
b. What is the p-value? Keep in mind that we know the effect (slope) is real,
regardless of the p-value provided.
6. Plot the data and the regression line to test for linearity of data; examine other
assumptions (normality of residuals, homoscedasticity, etc.)
a. If you are examining an equation with a known violation of regression
assumptions, can you see that violation in this plot?
7. Plot the residuals in order of x values
a. Examine regression assumptions (normality, homoscedasticity, linearity, etc.)
b. If you are examining an equation with a known violation of regression
assumptions, can you see that violation in this plot
8. Plot a histogram of the residuals
a. Examine assumption of normality; if you are examining an equation with a known
violation of normality, can you see that violation in this plot?
9. Run an examination of autocorrelation in residuals with residuals sorted in order to xvalues – use acf(residuals(results)[order(x)]), where ‘results’ is the name of your
regression object and ‘x’ is the name of your x data.
a. Is there evidence of autocorrelation of residuals?
10. Send me a plot that you feel best depicts that the assumption has been violated. Copy and
paste each plot into a word file (all plots in one file) and then e-mail me the file. Make
sure I can tell what the plot is (if it’s not obvious). All I want from you for this
assignment is the plots!
Finally, for equation 1 (and equation 1 only), write a sentence (or two) describing the
relationship between x and y as you might see in the published literature. Assume that x is
rainfall (cm) and y is biomass (kg/km2). Be sure to include confidence intervals and p-values.
Equation 1 – No assumptions violated (redo of Lecture 3)
Y = 2 + 3x + N(0,2)
Equation 2 – Nonlinearity in equation
Y=2 + 3x - 0.2x2 + N(0,1)
Note, you should be able to see evidence of the non-linearity in the plot of x on y, the plot of the
residuals, and the acf, but not in the histogram
Equation 3 – Non-normally distribution residuals
Y = 2 + 3x + exp(N(0,1))
Note, you should be able to see evidence of the non-normality in the plot of x on y (not many
points near the line compared to data with normality), the plot of the residuals (again not
many points near zero compared to data with normality), and the histogram, but not in the acf
Equation 4 - heteroscedasticity
Y = 2+ 3x + N(0,x))
Equation 5 - autocorrelated residuals
NOTE: first copy the x-values, then “paste special” the “values” in a new column. Then sort
these new x-values from smallest to largest. Use these x-values to generate your y-values.
Y = 2 + 3x + N(previous residual,1)
If you’ve done this right, your points should jump around the regression line erratically, and your
acf analysis should show significant positive autocorrelation for the first 5+ years
Download