Assignment #4

advertisement
Assignment #4
Case 1 – Dataset1
In this example you have two continuous factors (e.g., plant food abundance and understory
cover) that might influence the density of bunnies. In this case, the two continuous variables are
collinear and are potentially confounding (more bunnies if more food because they have to eat,
but more bunnies if more understory cover because bunnies have to hide from predators). The
data was created such that bunny density is truly affected by both understory cover and plant
density. Understory cover ranges from 0 to 1 (percent of a density board seen from 15 meters
away). Food is kg of browse per square meter and is collinear with understory density (bunnies
tend to eat their cover). The response (bunnies) is bunnies / hectare. Examine the equations to see
how the data was made and get the values of truth.
1. Write a script that:
a. Imports the data (saved as a .csv file)
b. Plots the relationship between understory cover (x) and food (y)
c. Calculates the r^2 between food and understory
i. Using a comment, report the r^2
d. Plots the relationship between bunny density (y) and understory cover (x)
e. Plots the relationship between bunny density (y) and food (x)
f. Runs a simple regression between bunny density and food
i. Notice – are the coefficient estimates and residual standard error close to
truth?
g. Runs a simple regression between bunny density and cover
i. Notice – are the coefficient estimates and residual standard error close to
truth?
h. Runs a multiple regression between bunny density (y) and both cover (x) and food
(x)
i. Calculates the vif for food and understory
i. Using a comment, report the vif
j. Using comments, describe what happened in f,g, and h above. Be sure to discuss:
i. The coefficient estimates of the explanatory (x) variable(s) relative to truth
and other models run
ii. What your final model would be in this analysis (be sure to explain why)
and how you would deal with the collinearity among x variables.
iii. What you’ve learned from the exercise.
Case 2 – Dataset2
In this example, you have two continuous factors (sediment load and amount of organic material)
that might influence water clarity. In this case, the two continuous variables are collinear, but are
potentially redundant (both basically indicators of run-off, which is really driving water clarity).
The data was created such that water clarity is a function of run-off. This run-off data is in the
excel file, but pretend it’s data you didn’t collect and thus don’t have; DO NOT INCLUDE
RUNOFF IN ANY OF YOUR ANALYSES! Sediment and organic material are both closely
correlated with run-off, but sediment is more closely related to run-off and thus is a better
‘index’ of run-off. Sediment and organic matter might be grams per cubic meter; run-off might
be cubic-feet / minute, and clarity might be depth a secchi disk can be seen from (in centimeters),
although I’m not a limnologist so don’t expect the relationships or numbers to be realistic.
Examine the equations to see how the data was made and get the values of truth.
1. Write a script that:
a. Imports the data (saved as a .csv file)
b. Plots the relationship between sediment (x) and clarity (y)
c. Calculates the r^2 between sediment and clarity
i. Using a comment, report the r^2
d. Plots the relationship between clarity (y) and sediment (x)
e. Plots the relationship between clarity (y) and organic matter (x)
f. Runs a simple regression between clarity and sediment
i. Take note of the coefficient estimates and significance
g. Runs a simple regression between clarity and organic matter
i. Take note of the coefficient estimates and significance
h. Runs a multiple regression between clarity (y) and both sediment (x) and organic
matter (x)
i. Calculates the vif for sediment and organic matter
i. Using a comment, report the vif
j. Using comments, describe what happened in f,g, and h above. Be sure to discuss:
i. The coefficient estimates of the explanatory (x) variable(s) and how they
changed between models
ii. The significance of the variables and how it changed between models
iii. What your final model would be in this case – be sure to explain why!
iv. What you’ve learned from the exercise.
Download