20
21
22
23
17
18
19
12
13
14
15
16
10
11
7
8
9
1
2
3
4
5
6
Supplementary Methods and Results
Single Model Parameters Used Within BIOMOD and Maxent
Models were run in BIOMOD (or in Maxent) using the following parameters:
ANN: ANN = T, CV.ann = 3
CTA: CTA = T, CV.tree = 50
MDA: MDA = T
GAM: GAM = T, Spline = 3
GBM:
GLM:
A continental risk assessment of West Nile virus under climate change
GBM = T, No.trees=2000
MARS: MARS = T
GLM = T, TypeGLM = “poly”, Test = “AIC”
MAXENT: 10000 background points, regularization multiplier = 1.0, maximum
iterations = 500, convergence threshold = 0.00005
SRE: SRE = T, quant = 0.025
RF: RF = T
Ryan J. Harrigan, Henri A. Thomassen, Wolfgang Buermann,
Single Model Performance
Thomas B. Smith
Among the species models used to make these predictions, significant differences in performance were found across models, with Maxent (HSD = 0.09, p < 0.01) and two of the nonlinear recursive bifurcating models, Random Forests (HSD = 0.048, p = 0.22) and
43
44
45
46
40
41
42
35
36
37
38
39
30
31
32
33
34
24
25
26
27
28
29
Generalized Boosting Models (HSD = 0.053, p = 0.17), ranking among the best performers in term of highest Receiver Operating Characteristic (Fig. 2A). Another group of models, including MARS, GAM, GLM, and MDA, performed moderately well, whereas two model types, CTA and ANN, generally had significantly lower ROC scores than other models (Fig. 2B). Although mean ROC scores were comparable amongst all models, three of the nine models, Maxent, Generalized Boosting Models, and Random
Forests, performed well across all datasets, comprising the top 3 models according to
ROC on over 70% of the datasets and runs performed. In addition, any two of these three models were ranked among the top three models according to ROC in 100% of all runs on all datasets (Table S4).
Current Probability of Presence
Annual variation in West Nile virus probability of presence depended on its presence in vectors, primary avian hosts, or secondary human hosts, and how long it had occurred in an area (Fig. 3, Figs S1, S2), but patterns remained broadly consistent. Earlier years, particularly 2003 and 2004, represent seasons where WNV had not yet, or had just reached, much of the West Coast and Pacific Northwest, and as such these areas were not identified as being particularly suitable for WNV presence (Fig. 3). The low probability of presence of WNV predictions in this area represents an inability of our models to predict the future outbreaks in regions ecologically distinct from those where the disease has not previously occurred, i.e., areas where the disease is infecting naïve hosts and expanding its niche breadth. Some variation in year-to-year predictions also are reflective of variation in screening and reporting once West Nile virus had been established in an
66
67
68
63
64
65
58
59
60
61
62
53
54
55
56
57
47
48
49
50
51
52 area; particularly, the precipitous decrease in West Nile virus predicted in primary hosts
(birds) likely reflects the fact that many vector control agencies discontinued screening of
West Nile virus in birds across the country (2,161 counties reported WNV positives or negatives in birds in 2003, as compared to 286 in 2011, Fig. S1). Although vector and human screening for West Nile virus are likely to give less biased estimates, we also recognize the possibility of variation in the ability to detect and report human cases of
WNV at a continental scale, as well as differences in vector screening efforts at local control agencies across the country.
Occurrence Changes Across Study Years
Increases in the probability of West Nile virus presence from the earliest season analyzed (2003) to the most recent (2011) were found in the western United States, particularly California and the Pacific Northwest, as well as many regions within the eastern and northeastern United States (Fig. S3). In contrast, probability of presence decreased in the Midwestern United States, a region often recording high numbers of
West Nile virus cases (CDC, 2012). Other areas, such as the Gulf Coast and southern
Texas, show no change in predicted probability of presence. However, consistently high probabilities in these areas suggest that proper management and control of vectors should be of paramount importance, as these regions are rarely spared annual West Nile virus outbreaks. Increases in the probability of presence indicate potential targets for management in future WNV seasons, as changes in climate and/or anthropogenic activity may lead to higher numbers of WNV infections in these areas.
69
70
71
72
73
74
75
76
77
78
79
80
81
Table S1. Bioclimatic variables used in ensemble models and future projections of
West Nile virus in North America.
Variable Description
____________________________________________________________
BIO1 Annual Mean Temperature
BIO2
BIO4
BIO5
BIO6
BIO12
BIO15
BIO19
Mean Diurnal Range (Mean of Monthly Max – Monthly Mean)
Temperature Standard Deviation (Seasonality)
Maximum Temperature of Warmest Month
Minimum Temperature of Coldest Month
Annual Precipitation
Precipitation Coefficient of Variation (Seasonality)
Precipitation of the Coldest Quarter
82
83
84
85
86
87
88
Table S2. Importance rankings as measured by mean decrease in accuracy when a variable is removed from model, for environmental variables used in ensemble modeling.
Shown are the variables listed in order according to importance, with results from both the Maxent, as well as the other top 2 SDMs implemented in the BIOMOD package
(Thuiller et al ., 2009), which together were the top three ranked models in over 70% of all model runs. Models are separated because variable importance scores are calculated differently for Maxent as compared to the other 2 SDMs.
89
Maxent GBM/RF
1. Bio5 Bio5
2. Bio19
3. Bio6
Bio12
Bio2
4.
5.
Bio1
Bio2
6. Bio4
7. Bio12
8. Bio15
Bio15
Bio4
Bio19
Bio1
Bio6
90
91
92
93
Table S3. Ten species distribution models and short description of each, used in model performance comparisons and final weighted ensemble models for vector, primary host, and human WNV occurrence in North America for the years 2003 –
2011.
94
95
Abbreviation
ANN
CTA
MDA
GAM
GBM
GLM
MARS
MAXENT
RF
SRE
Model Name
Artificial Neural Networks
Classification Tree Analysis
Multiple Discriminant Analysis
Generalized Additive Models
Generalized Boosting Models
Generalized Linear Models
Multivariate Additive Regression
Splines
Maximum Entropy
Random Forests
Surface Range Envelopes
Description and References
Non-linear regression models with multiple parameters, allowing for smoothed relationships
(Ripley 1996)
Bifurcating recursive partitioning model using single run attempting to split a response into groups with minimal deviance (Ripley 1996)
Supervised classification model that can use a variety of regression models to optimize predictive power (Hastie et al . 1994)
Nonparametric model using “smoother” algorithm, useful when relationship between predictors and response are expected to be complex (Hastie and Tibshirani 1990)
Model using iterations of regression trees, each attempting to predict residuals left from previous trees (Friedman 2001)
Linear models allowing for polynomial predictor terms (McCullagh and Nelder 1989)
Adaptive regression techniques allowing for variable relationships between response and predictor along different ranges of the response
(Friedman 1991)
Models presence and pseudo-absence data using maximum entropy of point locations (Phillips et al . 2006)
Models using series of tree regressions whereby random sets of both predictors and records are selected for training and tested against “out-ofbag” data (Breiman 2001)
Envelope based on minimum and maximum values of predictors at occurrence points (Busby
1991)
96
97
98
99
Table S4. Performance of nine species distribution models based on Receiver
Operating Characteristic (ROC) for each of 27 different datasets used in final ensemble models. SRE models received no ROC as they do not generate testable models but rather represent a rectilinear envelope (Thuiller et al. 2009).
100
Vectors
2003
2004
2005
2006
2007
2008
2009
2010
2011
Primary Hosts
2003
2004
2005
2006
2007
2008
2009
2010
2011
Secondary Hosts
2003
2004
2005
2006
2007
2008
2009
2010
2011
ANN CTA GAM GBM GLM MARS MAXENT MDA RF
0.63 0.746 0.777 0.813 0.787 0.794 0.832 0.784 0.85
0.58 0.683 0.698 0.749 0.717 0.726 0.807 0.697 0.78
0.657 0.725 0.78 0.785 0.784 0.784 0.803 0.764 0.814
0.678 0.737 0.815 0.833 0.802 0.821 0.868 0.757 0.867
0.609 0.726 0.772 0.78 0.759 0.772 0.855 0.755 0.818
0.542 0.629 0.677 0.688 0.694 0.681 0.871 0.612 0.751
0.654 0.629 0.734 0.71 0.717 0.715 0.835 0.723 0.658
0.557 0.696 0.735 0.756 0.744 0.743 0.911 0.694 0.772
0.55 0.704 0.709 0.758 0.738 0.738 0.877 0.713 0.814
ANN CTA GAM GBM GLM MARS MAXENT MDA RF
0.703 0.773 0.819 0.836 0.802 0.823 0.838 0.784 0.849
0.59 0.702 0.713 0.737 0.704 0.717 0.805 0.709 0.781
0.683 0.721 0.769 0.788 0.766 0.768 0.864 0.746 0.79
0.618 0.724 0.779 0.786 0.778 0.785 0.836 0.743 0.793
0.675 0.707 0.777 0.789 0.765 0.79 0.797 0.776 0.825
0.622 0.696 0.744 0.767 0.739 0.764 0.832 0.739 0.803
0.603 0.658 0.755 0.772 0.705 0.753 0.851 0.665 0.763
0.684 0.723 0.763 0.818 0.745 0.762 0.923 0.753 0.803
0.61 0.603 0.8 0.714 0.777 0.749 0.935 0.746 0.775
ANN CTA GAM GBM GLM MARS MAXENT MDA RF
0.705 0.762 0.795 0.816 0.779 0.802 0.82 0.771 0.85
0.624 0.697 0.729 0.756 0.721 0.738 0.813 0.735 0.77
0.71 0.75 0.798 0.805 0.799 0.808 0.816 0.761 0.829
0.667 0.723 0.729 0.769 0.715 0.754 0.799 0.7 0.81
0.711 0.718 0.77 0.777 0.769 0.764 0.805 0.743 0.814
0.63 0.654 0.677 0.704 0.649 0.705 0.823 0.667 0.775
0.673 0.686 0.742 0.762 0.738 0.769 0.785 0.69 0.791
0.62 0.672 0.739 0.751 0.729 0.725 0.796 0.693 0.816
0.554 0.601 0.652 0.705 0.631 0.662 0.854 0.574 0.777
101
102
103
Table S5. Seven model projections and a short description of each used in the mathematical ensemble of future climate conditions for North America.
104
105
Model
CCCma
CSIRO
Group Name, Location
Canadian Centre for Climate Modelling and
Analysis, Canada
Commonwealth Scientific and Industrial Research
Organisation, Australia
IPSL CM4
MPI
NCAR CCSM30
UKMO-HADCM3
Institut Pierre Simon Laplace, France
Max Planck Institute for Meteorology, Germany
National Center for Atmospheric Research, United
States
Hadley Centre for Climate Prediction and
Research, United Kingdom
UKMO-HADGEM1 Hadley Centre for Climate Prediction and
Research, United Kingdom
Model Name, version
Coupled Global Climate Model
(CGCM), version 3
CSIRO, mark 3.0
CM4
ECHAM5
Community Climate System
Model (CCSM), version 3.0
Hadley Centre Climate Model, version 3
Hadley Centre Global
Environmental Model, version 1.0
106
107
108
109
110
111
112
113
114
115
Fig. S1 Relationship of environmental predictors to the probability of presence of
West Nile virus.
The response curves for only the three top explanatory variables are shown for clarity. In addition, each model for each year and for each response variable
(vector, bird, and human cases) produced different response curves; the responses shown represent only the relationship between probabilities of presence in vectors for the year
2011. Yet general relationships remained consistent, with regions experiencing higher temperatures and lower precipitation (both annual and in the coldest quarter) predicted to have higher probabilities of presence of West Nile virus infection.
116
117
118
119
120
121
Fig. S2 Probability of presence of West Nile virus in primary hosts (birds) from years 2003 -2011, as determined by ensemble modeling of 60 runs across 10 separate models.
Areas in red indicate high probability of presence (maximum probability = 1), while those in green represent areas of low predicted prevalence (minimum probability =
0).
A
122
123
124
125
126
127
Fig. S3 Probability of presence of West Nile virus in secondary hosts from years
2003 -2011, as determined by ensemble modeling of 60 runs across 10 separate models.
Areas in red indicate high probability of presence (maximum probability = 1), while those in green represent areas of low predicted prevalence (minimum probability =
0).
128
129
130
131
132
133
134
Fig. S4 Change in probability of presence of West Nile virus in vectors, primary hosts, and secondary hosts from the year 2003 to the year 2011, as determined by ensemble models across years.
Some increases, particularly in the western United
States, represent the initial spread of the disease during the 2003 – 2004 WNV seasons.
Other areas, such as the northeast United States, represent increases of WNV during the past three WNV seasons (years 2009 – 2011).
135
136
137
138
139
Fig. S5 Changes in the eight environmental variables under an A1B ensembled model by the year 2050.
In general, warmer conditions are expected continentally, with wetter conditions in the north and drier in the south (Program, USGCR 2009).
140
141
142
143
144
145
146
Fig. S6 Changes in the eight environmental variables under an A1B ensembled model from the year 2050 to 2080.
In general, predicted patterns in 2080 are consistent in direction but with increased magnitude as compared to 2050. For instance, northern latitudes receive increased precipitation whereas southern latitudes continue to dry.
163
164
165
166
167
168
169
158
159
160
161
162
153
154
155
156
157
147
148
149
150
151
152
Breiman L (2001) Random Forests. Machine Learning 45 , 5-32.
Busby JR (1991) BIOCLIM - a bioclimatic analysis and prediction system. CSIRO,
Canberra, Australia.
Hastie TJ, Tibshirani RJ (1990) Genralized Additive Models . Chapman & Hall.
Hastie TJ, Tibshirani RJ, Buja A (1994) Flexible Discriminant Analysis by Optimal
Scoring.
10.1111/J.1600-0587.2008.05742.X.
, 1255-1270.
Friedman JH (1991) Multivariate Additive Regression Splines. Annals of Statistics 19 , 1-
67.
Friedman JH (2001) Greedy function approximation: A gradient boosting machine.
Annals of Statistics
McCullagh P, Nelder JA (1989) Generalized Linear Models . 2ed edition. Chapman &
Hall.
Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions.
10.1016/J.Ecolmodel.2005.03.026.
Program, U. S. G. C. R. (2009) Global Climate Change Impacts in the United States,
New York, New York.
Ripley BD (1996) Pattern Recognition and Neural Networks . Cambridge University
Press.
Journal of the American Statistical Association
29 , 1189-1232.
Ecological Modelling
Thuiller W, Lafourcade B, Engler R, Araujo MB (2009) BIOMOD - a platform for ensemble forecasting of species distributions.
190 , 231-259, doi:Doi
Ecography 32 , 369-373, doi:Doi