Assignment 1 - sink or swim time! For each of the equations below, complete the following: 1. Create a random dataset representing the equation in excel a. Create 100 x-values that follow a uniform distribution ranging from 0 to 10 b. Create 100 error values that follow a normal distribution with error specified below c. Create 100 y-values as predicted by the true equation below d. Save the data in comma-delimited format (csv) to the desktop or other easily accessible location e. HINT: don’t close excel, you can re-use your work (with slight modifications) for each equation below. Alternatively, save a separate file with the xls extension, which will preserve the equations (csv saves just values). 2. Import the data into R 3. Use ‘head’ to verify that the import worked 4. Create an object that is a regression between y and x. Note that if you name your objects different things (e.g., results1, results2, etc.), you can go back and compare different models at a later time. 5. Examine the ‘summary’ for the regression object a. Are estimates of slope, intercept, and standard error close to truth? b. What is the p-value? Keep in mind that we know the effect (slope) is real, regardless of the p-value provided. 6. Plot the data and the regression line to test for linearity of data; examine other assumptions (normality of residuals, homoscedasticity, etc.) a. If you are examining an equation with a known violation of regression assumptions, can you see that violation in this plot? 7. Plot the residuals in order of x values a. Examine regression assumptions (normality, homoscedasticity, linearity, etc.) b. If you are examining an equation with a known violation of regression assumptions, can you see that violation in this plot 8. Plot a histogram of the residuals a. Examine assumption of normality; if you are examining an equation with a known violation of normality, can you see that violation in this plot? 9. Run an examination of autocorrelation in residuals with residuals sorted in order to xvalues – use acf(residuals(results)[order(x)]), where ‘results’ is the name of your regression object and ‘x’ is the name of your x data. a. Is there evidence of autocorrelation of residuals? 10. Send me a plot that you feel best depicts that the assumption has been violated. Copy and paste each plot into a word file (all plots in one file) and then e-mail me the file. Make sure I can tell what the plot is (if it’s not obvious). All I want from you for this assignment is the plots! Finally, for equation 1 (and equation 1 only), write a sentence (or two) describing the relationship between x and y as you might see in the published literature. Assume that x is rainfall (cm) and y is biomass (kg/km2). Be sure to include confidence intervals and p-values. Equation 1 – No assumptions violated (redo of Lecture 3) Y = 2 + 3x + N(0,2) Equation 2 – Nonlinearity in equation Y=2 + 3x - 0.2x2 + N(0,1) Note, you should be able to see evidence of the non-linearity in the plot of x on y, the plot of the residuals, and the acf, but not in the histogram Equation 3 – Non-normally distribution residuals Y = 2 + 3x + exp(N(0,1)) Note, you should be able to see evidence of the non-normality in the plot of x on y (not many points near the line compared to data with normality), the plot of the residuals (again not many points near zero compared to data with normality), and the histogram, but not in the acf Equation 4 - heteroscedasticity Y = 2+ 3x + N(0,x)) Equation 5 - autocorrelated residuals NOTE: first copy the x-values, then “paste special” the “values” in a new column. Then sort these new x-values from smallest to largest. Use these x-values to generate your y-values. Y = 2 + 3x + N(previous residual,1) If you’ve done this right, your points should jump around the regression line erratically, and your acf analysis should show significant positive autocorrelation for the first 5+ years