Predictor variable preparation - Springer Static Content Server

advertisement
1
Appendix S1. Further details on analysis methods
2
Variable transformations and Factor Analysis
3
The factor analyses were justified as indicated by the Kaiser-Meyer-Olkin
4
measure of sampling adequacy (human impact variables: 0.86; environmental variables:
5
0.59; a minimum of 0.5 is usually considered indicative of data being appropriate for FA,
6
McGregor 1992) as well as Bartlett's test of sphericity (human impact variables:
7
2=1189.9, df=19, P<0.001; environmental variables: 2=1147.2, df=28, P<0.001;
8
McGregor 1992).
9
The third environmental factor (EFactor3) and the second human impact factor
10
(HFactor2) revealed from the FA were square root transformed to achieve more
11
symmetrical distributions before using them as predictors in the GLM.
12
13
Accounting for spatial autocorrelation
14
The abundances investigated were likely to show some spatial autocorrelation (i.e.,
15
neighboring transects revealing more similar values than more distant ones) unexplained
16
by the predictor variables included in the model. Such autocorrelation leads to spatially
17
non-independent residuals, violating a crucial prerequisite of any linear model and,
18
hence, devalues its reliability. We thus explicitly incorporated spatial autocorrelation into
19
the model. We did this using the following approach: First, we ran the full model with all
20
environmental gradients, the squared terms, the species, the interactions and the offset
21
term (log transformed transect length) and transect ID included and derived the residuals
22
from it. Then we calculated, separately for each for each individual data point (i.e.,
23
combination of species and individual transect), the weighted average of the residuals of
1
24
all other data points of the same species. The weight used was inversely related to the
25
distance between two transects and followed a Gaussian function. The mean of this
26
function was set to zero (i.e., maximum weight at a distance of zero) and its standard
27
deviation was determined by maximizing the likelihood of the full model with the derived
28
autocorrelation term included.
29
30
Species-specific analyses
31
To assess the impact of the different covariates on the abundances of the different species
32
and to understand the complex patterns of species specific impacts of the investigated
33
gradients we ran separate models for each species and investigated their results with
34
regard to direction, magnitude and significance of estimates. From these models we
35
removed squared terms which were clearly not significant (p>0.5). The autocorrelation
36
term included into these models was that derived from the full model ran for all species
37
and transect length was again included as an offset variable.
38
39
Species model predictions used for identifying core distributional range areas
40
After extracting the variables for each cell in the country (cell size: 0.05 degrees) and
41
transforming them when necessary to achieve approximately symmetrical distributions
42
(Table S1, S2) we subjected them to two separate factor analyses with varimax rotation
43
(conducted using the function fa of the R-package psych; Revelle 2012), one for the
44
environmental variables and one for the human impact variables. These were justified as
45
indicated by the Kaiser-Meyer-Olkin measure of sampling adequacy (environmental
46
variables: 0.421; human impact variables: 0.755) as well as Bartlett's test of sphericity
2
47
(environmental variables: 2=588.46, df=15, P<0.001; human impact variables:
48
2=550.05, df=21, P<0.001; McGregor 1992). Both revealed two factors with
49
Eigenvalues larger than one, together explaining roughly 50% of the total variance in the
50
set of variables (Table S1, S2). We extracted the respective scores (thereafter 'factor
51
scores') and used them as predictors in the models.
52
Out of the four factor scores and their squares we constructed all subsets (i.e., a total of
53
81 one models; models that included a squared covariate included also the respective
54
covariate unsquared). The squared terms we included to allow for non-linear impacts of
55
the predictors on species abundances (i.e., higher abundance at intermediate values of a
56
covariate). We fitted the models to the transect data (N=266 transects) using Generalized
57
Linear Models (McCullagh & Nelder 2008) with negative binomial error structure and
58
log link function (function glm.nb of the R package MASS; Venables & Ripley 2002). In
59
some cases a given model could not be fitted. This concerned chimpanzees and cane rat
60
(one model each), Spot nosed monkeys (five models), Giant rat (seven models), Warthog
61
(12 models), Bay duiker (26 models), Buffalo (30 models), and Campbell's monkey (32
62
models). The respective models we discarded from further consideration. We then
63
determined for each model the predicted values for all the grid cells in the country and
64
also its Akaike weight (Burnham & Anderson 2002) and then averaged the predictions
65
whereby we weighted their contribution by the respective Akaike weights. We used these
66
estimated abundances per species and grid cell as the basis of the spatial prioritization.
67
68
3
69
70
71
Table S1: Results of the factor analysis for the human impact variables. For each variable
the largest absolute loading is shown in bold.
variable
tr.ht_30
distance to nearest village
distance ot major roads
distance to minorroads
human population density
minimum distance to protected area
humanpopulation change
Eigenvalue
prop. variance explained
72
73
74
75
76
transformation
square root
square root
square root
square root
log
square root
square root
factor 1 factor 2
0.177
-0.630
0.198
0.782
0.609 -0.363
0.720 -0.211
-0.399
0.633
-0.151 -0.613
-0.295
0.538
2.167
1.311
0.310
0.187
Table S2: Results of the factor analysis for the environmental variables. For each variable
the largest absolute loading is shown in bold.
variable
elevation
tr.ht_40
CTI
ht_60
Precipitation seasonality
mean precipitation
Eigenvalue
prop. variance explained
transformation factor 1 factor 2
square root
0.961 -0.267
square root
0.143
0.354
square root
-0.667 -0.016
-0.210 -0.549
square root
-0.342
0.504
-0.021
0.997
1.655
1.642
0.276
0.274
77
78
79
Algorithm for identifying core distributional ranges
80
We first determined as starting point Qtot for the configuration of the selected 20% of grid
81
cells with maximum relative abundance (Qtot
82
step we identified those selected cells that had at least one adjacent unselected cell
83
(adjacent cells were considered those four pixels immediately above, below, to the left
84
and to the right of the pixel, not diagonal). These cells could potentially be dropped from
85
the CDRA ('pixels to be dropped'). In the third step we identified those pixels that were
86
not selected, but that were adjacent to at least one selected pixel following the same
4
actual;
Appendix S2, Fig. S4). In a second
87
criteria as in step two. These were the pixels that could potentially be included in the
88
CDRA ('pixels to be included'). In the fourth step we then calculated Qtot for each
89
combination of pixels to be dropped and pixels to be included and chose that combination
90
that maximized Qtot (Qtot new) by keeping 20% of all pixels. In the fifth step we evaluated
91
if Qtot new>Qtot actual, and if this was the case we chose the configuration of the new area as
92
updated CDRA and repeated steps one to five; if Qtot
93
search and used the actual configuration as final CDRA. The algorithm was implemented
94
in R (R Core Team 2012) and applied separately for each species.
new<Qtot actual
we terminated the
95
96
Implementation
97
All analyses were conducted in R (version 2.11; R Development Core Team 2010 or
98
version 3.1.2; R Core Team 2014). GLMMs were fitted using the functions glmmadmb of
99
the R package glmmADMB (Fournier et al. 2012; Skaug et al. 2014), the FAs were
100
conducted using the function factanal or the function fa of the package psych (Revelle
101
2012), and diagnostics of the applicability of FAs were derived using the function paf of
102
the package rela (Chajewski 2009). The autocorrelation term and the fitting of the
103
standard deviation of the respective weight function were derived using a self-written R-
104
script. The CCA was conducted and its results were plotted using the functions cca and
105
plot.cca of the R package vegan (Oksanen et al. 2010).
106
The identification of species core ranges turned out to be computationally intense.
107
We therefore parallelized the algorithm (to do multiple computer operations at the same
108
time) to achieve reasonable computation times using the R-package ‘parallel’. The
109
algorithm finished after 8 to 106 iterations (mean = 50, median = 42; average number of
5
110
arrangements tested per iteration: 66070) and needed on average 14.9 hours (arithmetic
111
mean) per species to finalize on a quadcore processor with 2.83Ghz.
112
113
References
114
Burnham, KP & Anderson, DR. (2002). Model Selection and Multimodel Inference. 2nd
115
ed. Berlin: Springer.
116
Chajewski, M. 2009. rela: Scale item analysis. R package version 4.1.
117
Fournier DA, Skaug HJ, Ancheta J, Ianelli J, Magnusson A, Maunder M, Nielsen A &
118
Sibert J. 2012. AD Model Builder: using automatic differentiation for statistical inference
119
of highly parameterized complex nonlinear models. Optim. Methods Softw., 27, 233-249.
120
McGregor PK. 1992. Quantifying responses to playback: one, many, or composite
121
multivariate measures? In: McGregor PK. (Ed.): Playback and Studies of Animal
122
Communication. Plenum Press. New York, London.
123
Oksanen, J., Blanchet, F.G., Kindt, R., Legendre, P., O'Hara, R.B., Simpson, G.L.,
124
Solymos, P., Stevens, M.H.H. & Wagner, H. 2010. vegan: Community Ecology Package.
125
R package version 1.17-4.
126
R Development Core Team. 2010. R: A Language and Environment for Statistical
127
Computing. R Foundation for Statistical Computing. Vienna, Austria.R Core Team.
128
2014. R: A Language and Environment for Statistical Computing. R Foundation for
129
Statistical Computing. Vienna, Austria.
130
Skaug H, Fournier D, Bolker B, Magnusson A & Nielsen A. 2014. Generalized Linear
131
Mixed Models using AD Model Builder. R package version 0.8.0.
132
Revelle, W. 2012. psych: Procedures for Personality and Psychological Research.
6
133
Venables, WN & Ripley, BD. 2002. Modern Applied Statistics with S. Fourth Edition.
134
Springer, New York.
135
7
Download