S2. Statistical Appendix. Several distinct statistical procedures are

advertisement
1
S2. Statistical Appendix.
2
3
Several distinct statistical procedures are described here. All statistical analyses were conducted
4
in R (version 2.11.1; [1]). Bayesian statistics were conducted using the BRugs package, which
5
interfaces R to OpenBUGS [2]. All approaches follow methods described by McCarthy [3].
6
This appendix is presented in three parts: A description of the model selection procedure (A.), a
7
description of the procedure used to generate Figures 4 and 5 (B.) and a description of the
8
process used to generate the mean and 95% credible intervals presented in Figure 3 (C.).
9
10
11
A. Model selection
In this model selection procedure, a comma-separated values (.csv) file of the data (which
12
corresponds to a spreadsheet in the Data Appendix) is attached so that R can 'see' the individual
13
columns as variables. In the example code below, AllAg (all forms of agricultural land cover) is
14
natural-log converted prior to analysis. As a result, in this example the DIC and parameter
15
estimates are made for a model logarithmically relating agricultural land cover to the consumer
16
tissue δ15N (FF15N). The general format for the regression equation was:
17
Consumer tissue δ15N = WatershedProperty(β) + Intercept
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
library("BRugs")
## This loads the BRugs package
Year1RiverFF <- read.csv("C:/Users/jhlarson/Desktop/USGS Science/Stable
Isotopes 2011/Analysis/R Model Selection/Year1RiverFF.csv") ## Loading the
dataset
attach(Year1RiverFF)
## attaching an imported dataset
LogAllAg <- log(AllAg+1)
## Regression Model - This creates a function
regressionmodel <- function(){
a~dnorm(0,1.0E-6) ## Non-informative prior
b~dnorm(0,1.0E-6) ## Non-informative prior
prec~dgamma(0.001,0.001) ## Non-informative
that BRugs can use in OpenBUGS
y-intercept
slope
model precision
sy2 <- pow(sd(y[]),2)
R2B <- 1 - 1/(prec*sy2)
## Bayesian R2
for (i in 1:11) ## 1:N, where N is the number of observations
{
mean [i] <- a+b*x[i]
y[i] ~dnorm(mean[i],prec)
}
Larson et al. 1 of 8
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
}
73
used for any 1-parameter model. However, for 2-parameter models, this procedure was slightly
74
modified by the addition of a new model:
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
regressionmodelfile <- file.path(tempdir(),"regressionmodel.txt")
model <- writeModel(regressionmodel,regressionmodelfile)
inits <- "C:\\Users\\jhlarson\\Desktop\\USGS Science\\OpenBugs
Code\\Regression\\Initials.txt"
## location of the initials file: a=0, b=0, prec=100
## Test for AllAg.
x <- LogAllAg
## for convenience variable is converted to x to match model
y <- FF15N
## for convenience variable is converted to y to match model
bdata <-bugsData(c("y","x"),,digits=5) ## this places the data into a form
OpenBUGS can read
modelCheck(regressionmodelfile) ## Tells OpenBUGS to check the model
modelData(bdata)
## Tells OpenBUGS to load the data
modelCompile(numChains=1)
## Compiles the model with number of chains
modelInits(inits,)
## Loads the initials
modelUpdate(50000)
## Updates the model 50000 times as a burn-in
samplesSet(c("b","R2B","a","prec"))## Tells OpenBUGS to keep data on these
variables
dicSet()
## Tells OpenBUGS to keep data on DIC
modelUpdate(50000)
## Updates model 50000 times to collect data
YR1R15NLogAllAg <- samplesStats("*")
## Store the variable estimates in
a designated file
YR1R15NLogAllAgDIC <- dicStats()
## Store the DIC estimates in a
designated file
This basic approach was repeated for every model tested. The model above could be
## Two-parameter Regression Model
regressionmodel <- function(){
a~dnorm(0,1.0E-6)
b1~dnorm(0,1.0E-6)
b2~dnorm(0,1.0E-6)
prec~dgamma(0.001,0.001)
sy2 <- pow(sd(y[]),2)
R2B <- 1 - 1/(prec*sy2)
for (i in 1:11)
{
mean [i] <- a+b1*x1[i]+b2*x2[i]
y[i] ~dnorm(mean[i],prec)
Larson et al. 2 of 8
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
}
}
regressionmodelfile <- file.path(tempdir(),"regressionmodel.txt")
model <- writeModel(regressionmodel,regressionmodelfile)
inits <- "C:\\Users\\jhlarson\\Desktop\\USGS Science\\OpenBugs
Code\\Regression\\Initials2variables.txt"
x1 <- Wdep
x2 <- LogAllAg
y <- FF15N
bdata <-bugsData(c("y","x1","x2"),,digits=5)
In this example, initials were set at a=0, b1=0, b2=0 and prec=100. Other aspects of the
procedure were identical, except that parameters a,b1,b2, R2B and prec were monitored.
111
The above models have non-informative prior distributions, and these models were used
112
on the data from Larson et al. [4] to generate informative prior distributions that could be used in
113
the analysis of the new data. This re-analysis of the earlier data is summarized in Appendix
114
Table 1. These distributions were used as informative priors in the analysis of the new data. In
115
the example below, data from Appendix Table 1 is used to create prior distributions for model
116
parameters and model precision.
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
attach(RiverFF)
LogAllAg <- log(AllAg+1)
## Regression Model
regressionmodel <- function(){
a~dnorm(5.213,0.5259)
b~dnorm(1.497,5.4641)
prec~dgamma(13.164,4.5087)
sy2 <- pow(sd(y[]),2)
R2B <- 1 - 1/(prec*sy2)
for (i in 1:22)
{
mean [i] <- a+b*x[i]
y[i] ~dnorm(mean[i],prec)
}
}
Larson et al. 3 of 8
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
regressionmodelfile <- file.path(tempdir(),"regressionmodel.txt")
model <- writeModel(regressionmodel,regressionmodelfile)
inits <- "C:\\Users\\jhlarson\\Desktop\\USGS Science\\OpenBugs
Code\\Regression\\Initials.txt"
## Test for AllAg.
x <- LogAllAg
y <- FF15N
bdata <-bugsData(c("y","x"),,digits=5)
modelCheck(regressionmodelfile)
modelData(bdata)
modelCompile(numChains=1)
modelInits(inits,)
modelUpdate(50000)
samplesSet(c("a","b","R2B","prec"))
dicSet()
## Tells OpenBUGS to keep data on DIC
modelUpdate(50000)
R15NLogAllAg <- samplesStats("*")
R15NLogAllAgDIC <- dicStats()
## Store the DIC estimates in a
designated file
B. Visualizing the model and 95% credible intervals
164
Creating a visual representation of the model plus 95% credible intervals can be done by
165
creating predictions from the model across the parameter space and displaying those predictions
166
in a graphic. Although not necessarily the best mechanism to do this, we made these estimates in
167
R using the BRugs, then exported the resulting estimates to Excel to build a figure. In this
168
example, we calculated predictions for the range of possible values following a procedure
169
suggested by McCarthy [3].
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
RiverFF <- read.csv("C:/Users/jhlarson/Desktop/USGS Science/Stable Isotopes
2011/Analysis/R Model Selection/RiverFF.csv")
attach(RiverFF)
LogAllAg <- log(AllAg+1)
## Regression Model
regressionmodel <- function(){
a~dnorm(5.213,0.5259)
b~dnorm(1.497,5.4641)
prec~dgamma(13.164,4.5087)
prediction0.15<-a+b*0.15
## This generates the prediction for a particular value
prediction0.25<-a+b*0.25
prediction0.35<-a+b*0.35
prediction0.5<-a+b*0.5
Larson et al. 4 of 8
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
prediction1<-a+b*1
prediction1.6<-a+b*1.6
prediction1.8<-a+b*1.8
prediction2<-a+b*2
prediction2.2<-a+b*2.2
prediction2.4<-a+b*2.4
prediction2.6<-a+b*2.6
prediction3<-a+b*3
prediction3.1<-a+b*3.1
prediction3.2<-a+b*3.2
prediction3.3<-a+b*3.3
prediction3.4<-a+b*3.4
prediction3.5<-a+b*3.5
prediction3.6<-a+b*3.6
prediction3.7<-a+b*3.7
prediction3.8<-a+b*3.8
prediction3.9<-a+b*3.9
prediction4<-a+b*4
prediction4.1<-a+b*4.1
prediction4.2<-a+b*4.2
prediction4.3<-a+b*4.3
prediction4.4<-a+b*4.4
prediction4.45<-a+b*4.45
prediction4.5<-a+b*4.5
prediction4.55<-a+b*4.55
prediction4.6<-a+b*4.6
prediction4.65<-a+b*4.65
prediction5<-a+b*5
sy2 <- pow(sd(y[]),2)
for (i in 1:22)
{
mean [i] <- a+b*x[i]
y[i] ~dnorm(mean[i],prec)
}
}
regressionmodelfile <- file.path(tempdir(),"regressionmodel.txt")
model <- writeModel(regressionmodel,regressionmodelfile)
inits <- "C:\\Users\\jhlarson\\Desktop\\USGS Science\\OpenBugs
Code\\Regression\\Initials.txt"
## Test for AllAg.
x <- LogAllAg
y <- FF15N
bdata <-bugsData(c("y","x"),,digits=5)
modelCheck(regressionmodelfile)
modelData(bdata)
modelCompile(numChains=1)
modelInits(inits,)
modelUpdate(50000)
samplesSet(c("a","b","prediction0.15","prediction0.25","prediction0.35","pred
iction0.5","prediction1","prediction1.6","prediction1.8","prediction2","predi
Larson et al. 5 of 8
243
244
245
246
247
248
249
250
251
ction2.2","prediction2.4","prediction2.6","prediction3","prediction3.1","pred
iction3.2","prediction3.3","prediction3.4","prediction3.5","prediction3.6","p
rediction3.7","prediction3.8","prediction3.9","prediction4","prediction4.1","
prediction4.2","prediction4.3","prediction4.4","prediction4.45","prediction4.
5","prediction4.55","prediction4.6","prediction4.65","prediction5"))
modelUpdate(50000)
LogAgPredictions <- samplesStats("*")
252
C. Descriptive statistics
253
254
Estimating a mean using a Bayesian approach includes estimation of 95% credible
255
intervals [3]. This makes for a simple test of statistically significant differences: If intervals
256
overlap, then the means are not different. The following code was used to estimate mean and
257
95% credible intervals for consumer tissue δ15N.
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
## First the data file is loaded.
RiverFF <- read.csv("C:/Users/jhlarson/Desktop/USGS Science/Stable Isotopes
2011/Analysis/R Model Selection/RiverFF.csv")
attach(RiverFF) ## This allows R to read columns as variables
## This defines the function for OpenBUGS
regressionmodel <- function(){
## the naming of this function is arbitrary
for (i in 1:22){
x[i] ~ dnorm (mu[1], tau[1])
}
mu[1] ~ dnorm (0, 0.0001) ## non-informative prior distributions were used
tau[1] ~ dgamma (0.001, 0.001)
}
regressionmodelfile <- file.path(tempdir(),"regressionmodel.txt")
model <- writeModel(regressionmodel,regressionmodelfile)
inits <- "C:\\Users\\jhlarson\\Desktop\\USGS Science\\OpenBugs
Code\\1meansinitials.txt" ## this is the location of the initials file
x <- FF15N ## Transforming the variable to the same term used in the
function
bdata <-bugsData(c("x"),,digits=5) ## This prepares the data in the format
OpenBUGS uses
modelCheck(regressionmodelfile)
modelData(bdata)
modelCompile(numChains=1)
modelInits(inits,)
modelUpdate(50000)
## These are the same as described above.
Larson et al. 6 of 8
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
samplesSet(c("mu","tau"))
modelUpdate(50000)
MeanFF15N<- samplesStats("*")
## The same process is repeated for the RM sites below
RMFF <- read.csv("C:/Users/jhlarson/Desktop/USGS Science/Stable Isotopes
2011/Analysis/RM Model Selection/RMFF.csv")
attach(RMFF)
326
327
1.
R Development Core Team (2010) R: A language and environment for statistical
computing.
328
2.
Openbugs T, Best N, Lunn D (2007) The BRugs Package.
329
330
3.
McCarthy M (2007) Bayesian methods for ecology. New York, New York, USA:
Cambridge University Press. p.
331
332
333
334
4.
Larson JH, Richardson WB, Vallazza JM, Nelson JC (2012) An exploratory investigation
of the landscape-lake interface: Land cover controls over consumer N and C isotopic
composition in Lake Michigan rivermouths. Journal of Great Lakes Research 38: 610–
619.
## Regression Model
regressionmodel <- function(){
for (i in 1:21){
x[i] ~ dnorm (mu[1], tau[1])
}
mu[1] ~ dnorm (0, 0.0001)
tau[1] ~ dgamma (0.001, 0.001)
}
regressionmodelfile <- file.path(tempdir(),"regressionmodel.txt")
model <- writeModel(regressionmodel,regressionmodelfile)
inits <- "C:\\Users\\jhlarson\\Desktop\\USGS Science\\OpenBugs
Code\\1meansinitials.txt"
x <- FF15N
bdata <-bugsData(c("x"),,digits=5)
modelCheck(regressionmodelfile)
modelData(bdata)
modelCompile(numChains=1)
modelInits(inits,)
modelUpdate(50000)
samplesSet(c("mu","tau"))
modelUpdate(50000)
RMMeanFF15N<- samplesStats("*")
References
335
Larson et al. 7 of 8
336
Larson et al. 8 of 8
Download