Stat 406 – Spring 2016 Short answer questions:

advertisement
Stat 406 – Spring 2016
Exam 2, due 5 pm Friday, 22 April -- to my office or mailbox
Short answer questions: A few sentences should be sufficient to answer each question.
1) You have collected areal data and are using a newly proposed test to evaluate whether your data
show positive spatial correlation (regions are similar to their neighbors). The test statistic for this new
test is like Moran’s I, but the new test statistic has the value of 4 when the data have no spatial
correlation, values larger than 4 when the data indicate positive spatial correlation, and values less then
4 when the data indicate negative spatial correlation. The test statistic computed from your data is
5.82. You use Monte-Carlo simulation to compute the p-value, so you simulate 20 data sets with no
spatial correlation and compute the value of the test statistic for each data set. The 20 values, sorted
from smallest to largest are:
3.13, 3.17, 3.19, 3.24, 3.36, 3.91, 4.09, 4.44, 4.48, 4.65, 4.82, 4.90, 4.90, 5.38, 5.55, 5.55, 5.57, 5.69, 5.76,
5.92
a) (4 pts) You are interested only in positive spatial dependence, so you want the one-sided p-value (i.e.
you don’t care whether the observed correlation is about large negative correlation). Based on the 20
simulations reported above, what is the one-sided p-value? If you can’t compute this without further
information, what information do you need?
b) (4 pts) For this test statistic, would it be reasonable to use a normal approximation to compute the pvalue? Briefly explain why, why not, or why you can’t tell.
2) You are studying an interesting ecological response across central Iowa. You have divided central
Iowa into a 12 x 4 grid of regions (48 in all). Each region is approximately one county in size, and you
measure the response in each region. The data are plotted here:
11
10
9
8
7
6
5
4
a) (4 pts) When you do a Moran’s test, the estimated I = -0.066 and the two-sided p-value for the test of
no spatial correlation is 0.68. Based on this information, what can you conclude about the spatial
correlation of your response?
b) (4 pts) You decide to do a local Moran’s I analysis. The plot of the Z scores for each region is:
3
2
1
0
-1
-2
𝐼𝑖 −𝑒π‘₯𝑝𝑒𝑐𝑑𝑒𝑑 π‘£π‘Žπ‘™π‘’π‘’
,
𝑠𝑑
Remember that a Z score is computed as 𝑍𝑖 =
so positive Z values indicate a local I
statistic larger than the expected value, and negative Z values indicate a local I statistic below the
expected value. Also remember that when a Z score is larger than 2 or less then -2, the p-value (using a
normal approximation) is < 0.05. Based on the local Moran’s I analysis, what can you say about the
spatial correlation of your response?
Smoothed value
6
8
10
c) (4 pts) You decide to smooth the data, using the Fay-Herriot method discussed in class. Below is a
plot of the observed (raw) values and their smoothed values (eblup’s) for each of the 48 observations.
Superimposed on the plot are a solid line along which the smoothed value is the same as the raw value,
and a dotted line along which the smoothed value equals the overall mean (7.3). Two observations are
numbered in red (1 and 2). These have very similar raw values but quite different smoothed values.
Why are their smoothed values so different?
1
4
2
4
6
8
Raw value
10
3) (3 pts) Consider a CSR (complete spatial randomness) spatial point process with intensity πœ†. The notes
derived the expected value of Ripley’s K(x) function for this process. Remember K(x) is the average
number of additional events in a circle of radius x centered on an event. You decide it would
informative to study a new function, E(x) (empty circle function), defined as the average number of
events in a circle of radius x centered on a randomly chosen location (i.e. not centered on an event).
Derive the expected value of E(x) for a CSR process.
4) Shown below are plots of 𝑔̂(π‘₯) and ̂𝐿2(π‘₯) = ̂𝐿(π‘₯) − π‘₯ computed from one point pattern. There are
234 points in a 4 x 4 square. Both plots show the observed function and 95% pointwise confidence
envelopes. Both plots are calculated from the same data. One plot shows evidence of clustering; the
other does not.
a) (3 pts) Which plot shows evidence of clustering?
b) (4 pts) Explain why the two plots can legitimately tell different stories.
0.04
L(r)-r
Lobs r
Ltheo r
Lhi r
-0.02
0.00
L r
0.02
Llo r
0.0
0.2
0.4
0.6
0.8
1.0
r
2.0
g(r)
gobs r
1.5
gtheo r
ghi r
1.0
0.5
0.0
g r
glo r
0.0
0.2
0.4
0.6
r
0.8
1.0
5) Data analysis problem (45 pts):
The data for this question are based on a study that I worked on a few years ago. The goal of the study,
and your data analysis, will be to describe and understand spatial pattern of farms growing continuous
corn. The data are motivated by a real study, but the actual numbers are simulated.
Background:
The dominant cropping pattern in central Iowa, since the 1950’s, is a corn/soybean rotation. A field will
be planted to corn one year, then soybeans the next, and corn again the 3rd year. Rotation has multiple
advantages, including Nitrogen returned to the soil by the soybeans and lower pest pressure because
corn pests don’t do well on soybeans. However, a farmer (most years) makes a much larger profit from
corn than from soybeans. So some farmers choose continuous corn, where corn is planted every year,
without any soybean crop. One factor that may influence this decision is the presence nearby of a pig
CAFO (concentrated animal feeding operation). These generate large quantities of liquid manure. The
easiest way to dispose of this is by land application in the spring. This is very beneficial for corn but not
so (and may be detrimental) for soybeans. Liquid manure is very costly to transport, so it tends to get
applied close to CAFO’s; land application also requires a large acreage of corn every year. Hence, it is
reasonable to suppose that farms with a continuous corn cropping pattern are more frequent near a
CAFO. (As an aside, poultry manure is dry and quite light, so poultry manure from operations in NE IA is
frequently trucked long distances). Although you suspect that continuous corn farms are more frequent
(higher intensity) close to CAFO’s, you do not know the form of the relationship between intensity and
distance to the nearest CAFO. The relationship could be a log linear regression on distance (log intensity
= 𝛽0 + 𝛽1 π‘‘π‘–π‘ π‘‘π‘Žπ‘›π‘π‘’ +error), or it could be one intensity for locations within 5km of a CAFO and another
intensity for locations more than 5km from a CAFO (log intensity = 𝛽0 + 𝛽1 π‘π‘™π‘œπ‘ π‘’ +error), where close
has the value 1 for locations within 5km and 0 otherwise.
The data: two files
cc.csv: locations of the 191 farms with a continuous corn cropping system in the study area
cafo.csv: locations of the 10 CAFO’s in the study area
The study region is a rectangular area of Iowa, 100km EW and 20km NS. This can be specified to ppp by
window=owin(c(0,100), c(0,20)).
Examine the data, analyze the data and provide answers for the following scientific questions:
Produce a map of the estimated intensity of continuous corn farms throughout the study area.
Are continuous corn farms clustered, segregated, or randomly located thoughout the study area?
If they are clustered: estimate characteristics of the clusters.
If they are segregated: describe the characteristics of the inhibition.
Is there a linear association between distance to the nearest CAFO (see R notes, below) and the log
intensity of continuous corn farms? If so, estimate the regression coefficients and describe how
intensity changes with distance from a CAFO.
Which model for log intensity (constant, linear function of distance, or within 5km/not) best fits the
data?
For each scientific question, give me a bulleted list with:
ο‚· Your answer to the question (or map).
ο‚· What you did to answer the question.
ο‚· If you had to make choices to answer the question, what did you choose and why did you make
that choice.
As always, ask if you do not understand one of my questions. I will try to clarify what I am looking for or
give you a hint. If I can’t answer your question, I’ll tell you. If you don’t know or remember how to do
something in R, ask. I generally won’t write code for you, but I will give suggestions and help debug.
R Notes:
5 10
20
1) To create an image with the distance to the nearest CAFO for any point in the study area, create a ppp
for the CAFO locations, then use distmap(). You should get an image that looks like this:
2) To convert that image to an image with 0/1 for distance < 5km or not, create a copy of the image,
then manipulate the v variable, e.g.:
cafo.dist2 <- cafo.dist
cafo.dist2$v <- (cafo.dist2$v < 5) + 0
0
0.4
0.8
(The v variable contains the value of the image at every location). Or use eval.im(), which does math on
images: cafo.dist2 <- eval.im((cafo.dist < 5) + 0). You should get an image that looks like:
Check the appropriate statement and sign where indicated.
This exam is my work. I received no assistance except perhaps from Dr. Dixon.
I received the following assistance. Provide a short summary of who helped and what
help you received.
Signed: _________________________________________
Download