Stat 406 – Spring 2016 Exam 2, due 5 pm Friday, 22 April -- to my office or mailbox Short answer questions: A few sentences should be sufficient to answer each question. 1) You have collected areal data and are using a newly proposed test to evaluate whether your data show positive spatial correlation (regions are similar to their neighbors). The test statistic for this new test is like Moran’s I, but the new test statistic has the value of 4 when the data have no spatial correlation, values larger than 4 when the data indicate positive spatial correlation, and values less then 4 when the data indicate negative spatial correlation. The test statistic computed from your data is 5.82. You use Monte-Carlo simulation to compute the p-value, so you simulate 20 data sets with no spatial correlation and compute the value of the test statistic for each data set. The 20 values, sorted from smallest to largest are: 3.13, 3.17, 3.19, 3.24, 3.36, 3.91, 4.09, 4.44, 4.48, 4.65, 4.82, 4.90, 4.90, 5.38, 5.55, 5.55, 5.57, 5.69, 5.76, 5.92 a) (4 pts) You are interested only in positive spatial dependence, so you want the one-sided p-value (i.e. you don’t care whether the observed correlation is about large negative correlation). Based on the 20 simulations reported above, what is the one-sided p-value? If you can’t compute this without further information, what information do you need? b) (4 pts) For this test statistic, would it be reasonable to use a normal approximation to compute the pvalue? Briefly explain why, why not, or why you can’t tell. 2) You are studying an interesting ecological response across central Iowa. You have divided central Iowa into a 12 x 4 grid of regions (48 in all). Each region is approximately one county in size, and you measure the response in each region. The data are plotted here: 11 10 9 8 7 6 5 4 a) (4 pts) When you do a Moran’s test, the estimated I = -0.066 and the two-sided p-value for the test of no spatial correlation is 0.68. Based on this information, what can you conclude about the spatial correlation of your response? b) (4 pts) You decide to do a local Moran’s I analysis. The plot of the Z scores for each region is: 3 2 1 0 -1 -2 πΌπ −ππ₯ππππ‘ππ π£πππ’π , π π Remember that a Z score is computed as ππ = so positive Z values indicate a local I statistic larger than the expected value, and negative Z values indicate a local I statistic below the expected value. Also remember that when a Z score is larger than 2 or less then -2, the p-value (using a normal approximation) is < 0.05. Based on the local Moran’s I analysis, what can you say about the spatial correlation of your response? Smoothed value 6 8 10 c) (4 pts) You decide to smooth the data, using the Fay-Herriot method discussed in class. Below is a plot of the observed (raw) values and their smoothed values (eblup’s) for each of the 48 observations. Superimposed on the plot are a solid line along which the smoothed value is the same as the raw value, and a dotted line along which the smoothed value equals the overall mean (7.3). Two observations are numbered in red (1 and 2). These have very similar raw values but quite different smoothed values. Why are their smoothed values so different? 1 4 2 4 6 8 Raw value 10 3) (3 pts) Consider a CSR (complete spatial randomness) spatial point process with intensity π. The notes derived the expected value of Ripley’s K(x) function for this process. Remember K(x) is the average number of additional events in a circle of radius x centered on an event. You decide it would informative to study a new function, E(x) (empty circle function), defined as the average number of events in a circle of radius x centered on a randomly chosen location (i.e. not centered on an event). Derive the expected value of E(x) for a CSR process. 4) Shown below are plots of πΜ(π₯) and ΜπΏ2(π₯) = ΜπΏ(π₯) − π₯ computed from one point pattern. There are 234 points in a 4 x 4 square. Both plots show the observed function and 95% pointwise confidence envelopes. Both plots are calculated from the same data. One plot shows evidence of clustering; the other does not. a) (3 pts) Which plot shows evidence of clustering? b) (4 pts) Explain why the two plots can legitimately tell different stories. 0.04 L(r)-r Lobs r Ltheo r Lhi r -0.02 0.00 L r 0.02 Llo r 0.0 0.2 0.4 0.6 0.8 1.0 r 2.0 g(r) gobs r 1.5 gtheo r ghi r 1.0 0.5 0.0 g r glo r 0.0 0.2 0.4 0.6 r 0.8 1.0 5) Data analysis problem (45 pts): The data for this question are based on a study that I worked on a few years ago. The goal of the study, and your data analysis, will be to describe and understand spatial pattern of farms growing continuous corn. The data are motivated by a real study, but the actual numbers are simulated. Background: The dominant cropping pattern in central Iowa, since the 1950’s, is a corn/soybean rotation. A field will be planted to corn one year, then soybeans the next, and corn again the 3rd year. Rotation has multiple advantages, including Nitrogen returned to the soil by the soybeans and lower pest pressure because corn pests don’t do well on soybeans. However, a farmer (most years) makes a much larger profit from corn than from soybeans. So some farmers choose continuous corn, where corn is planted every year, without any soybean crop. One factor that may influence this decision is the presence nearby of a pig CAFO (concentrated animal feeding operation). These generate large quantities of liquid manure. The easiest way to dispose of this is by land application in the spring. This is very beneficial for corn but not so (and may be detrimental) for soybeans. Liquid manure is very costly to transport, so it tends to get applied close to CAFO’s; land application also requires a large acreage of corn every year. Hence, it is reasonable to suppose that farms with a continuous corn cropping pattern are more frequent near a CAFO. (As an aside, poultry manure is dry and quite light, so poultry manure from operations in NE IA is frequently trucked long distances). Although you suspect that continuous corn farms are more frequent (higher intensity) close to CAFO’s, you do not know the form of the relationship between intensity and distance to the nearest CAFO. The relationship could be a log linear regression on distance (log intensity = π½0 + π½1 πππ π‘ππππ +error), or it could be one intensity for locations within 5km of a CAFO and another intensity for locations more than 5km from a CAFO (log intensity = π½0 + π½1 ππππ π +error), where close has the value 1 for locations within 5km and 0 otherwise. The data: two files cc.csv: locations of the 191 farms with a continuous corn cropping system in the study area cafo.csv: locations of the 10 CAFO’s in the study area The study region is a rectangular area of Iowa, 100km EW and 20km NS. This can be specified to ppp by window=owin(c(0,100), c(0,20)). Examine the data, analyze the data and provide answers for the following scientific questions: Produce a map of the estimated intensity of continuous corn farms throughout the study area. Are continuous corn farms clustered, segregated, or randomly located thoughout the study area? If they are clustered: estimate characteristics of the clusters. If they are segregated: describe the characteristics of the inhibition. Is there a linear association between distance to the nearest CAFO (see R notes, below) and the log intensity of continuous corn farms? If so, estimate the regression coefficients and describe how intensity changes with distance from a CAFO. Which model for log intensity (constant, linear function of distance, or within 5km/not) best fits the data? For each scientific question, give me a bulleted list with: ο· Your answer to the question (or map). ο· What you did to answer the question. ο· If you had to make choices to answer the question, what did you choose and why did you make that choice. As always, ask if you do not understand one of my questions. I will try to clarify what I am looking for or give you a hint. If I can’t answer your question, I’ll tell you. If you don’t know or remember how to do something in R, ask. I generally won’t write code for you, but I will give suggestions and help debug. R Notes: 5 10 20 1) To create an image with the distance to the nearest CAFO for any point in the study area, create a ppp for the CAFO locations, then use distmap(). You should get an image that looks like this: 2) To convert that image to an image with 0/1 for distance < 5km or not, create a copy of the image, then manipulate the v variable, e.g.: cafo.dist2 <- cafo.dist cafo.dist2$v <- (cafo.dist2$v < 5) + 0 0 0.4 0.8 (The v variable contains the value of the image at every location). Or use eval.im(), which does math on images: cafo.dist2 <- eval.im((cafo.dist < 5) + 0). You should get an image that looks like: Check the appropriate statement and sign where indicated. This exam is my work. I received no assistance except perhaps from Dr. Dixon. I received the following assistance. Provide a short summary of who helped and what help you received. Signed: _________________________________________