gcb12534-sup-0001-DataS1

advertisement

20

21

22

23

17

18

19

12

13

14

15

16

10

11

7

8

9

1

2

3

4

5

6

Supplementary Methods and Results

Single Model Parameters Used Within BIOMOD and Maxent

Models were run in BIOMOD (or in Maxent) using the following parameters:

ANN: ANN = T, CV.ann = 3

CTA: CTA = T, CV.tree = 50

MDA: MDA = T

GAM: GAM = T, Spline = 3

GBM:

GLM:

A continental risk assessment of West Nile virus under climate change

GBM = T, No.trees=2000

MARS: MARS = T

GLM = T, TypeGLM = “poly”, Test = “AIC”

MAXENT: 10000 background points, regularization multiplier = 1.0, maximum

iterations = 500, convergence threshold = 0.00005

SRE: SRE = T, quant = 0.025

RF: RF = T

Ryan J. Harrigan, Henri A. Thomassen, Wolfgang Buermann,

Single Model Performance

Thomas B. Smith

Among the species models used to make these predictions, significant differences in performance were found across models, with Maxent (HSD = 0.09, p < 0.01) and two of the nonlinear recursive bifurcating models, Random Forests (HSD = 0.048, p = 0.22) and

43

44

45

46

40

41

42

35

36

37

38

39

30

31

32

33

34

24

25

26

27

28

29

Generalized Boosting Models (HSD = 0.053, p = 0.17), ranking among the best performers in term of highest Receiver Operating Characteristic (Fig. 2A). Another group of models, including MARS, GAM, GLM, and MDA, performed moderately well, whereas two model types, CTA and ANN, generally had significantly lower ROC scores than other models (Fig. 2B). Although mean ROC scores were comparable amongst all models, three of the nine models, Maxent, Generalized Boosting Models, and Random

Forests, performed well across all datasets, comprising the top 3 models according to

ROC on over 70% of the datasets and runs performed. In addition, any two of these three models were ranked among the top three models according to ROC in 100% of all runs on all datasets (Table S4).

Current Probability of Presence

Annual variation in West Nile virus probability of presence depended on its presence in vectors, primary avian hosts, or secondary human hosts, and how long it had occurred in an area (Fig. 3, Figs S1, S2), but patterns remained broadly consistent. Earlier years, particularly 2003 and 2004, represent seasons where WNV had not yet, or had just reached, much of the West Coast and Pacific Northwest, and as such these areas were not identified as being particularly suitable for WNV presence (Fig. 3). The low probability of presence of WNV predictions in this area represents an inability of our models to predict the future outbreaks in regions ecologically distinct from those where the disease has not previously occurred, i.e., areas where the disease is infecting naïve hosts and expanding its niche breadth. Some variation in year-to-year predictions also are reflective of variation in screening and reporting once West Nile virus had been established in an

66

67

68

63

64

65

58

59

60

61

62

53

54

55

56

57

47

48

49

50

51

52 area; particularly, the precipitous decrease in West Nile virus predicted in primary hosts

(birds) likely reflects the fact that many vector control agencies discontinued screening of

West Nile virus in birds across the country (2,161 counties reported WNV positives or negatives in birds in 2003, as compared to 286 in 2011, Fig. S1). Although vector and human screening for West Nile virus are likely to give less biased estimates, we also recognize the possibility of variation in the ability to detect and report human cases of

WNV at a continental scale, as well as differences in vector screening efforts at local control agencies across the country.

Occurrence Changes Across Study Years

Increases in the probability of West Nile virus presence from the earliest season analyzed (2003) to the most recent (2011) were found in the western United States, particularly California and the Pacific Northwest, as well as many regions within the eastern and northeastern United States (Fig. S3). In contrast, probability of presence decreased in the Midwestern United States, a region often recording high numbers of

West Nile virus cases (CDC, 2012). Other areas, such as the Gulf Coast and southern

Texas, show no change in predicted probability of presence. However, consistently high probabilities in these areas suggest that proper management and control of vectors should be of paramount importance, as these regions are rarely spared annual West Nile virus outbreaks. Increases in the probability of presence indicate potential targets for management in future WNV seasons, as changes in climate and/or anthropogenic activity may lead to higher numbers of WNV infections in these areas.

69

70

71

72

73

74

75

76

77

78

79

80

81

Table S1. Bioclimatic variables used in ensemble models and future projections of

West Nile virus in North America.

Variable Description

____________________________________________________________

BIO1 Annual Mean Temperature

BIO2

BIO4

BIO5

BIO6

BIO12

BIO15

BIO19

Mean Diurnal Range (Mean of Monthly Max – Monthly Mean)

Temperature Standard Deviation (Seasonality)

Maximum Temperature of Warmest Month

Minimum Temperature of Coldest Month

Annual Precipitation

Precipitation Coefficient of Variation (Seasonality)

Precipitation of the Coldest Quarter

82

83

84

85

86

87

88

Table S2. Importance rankings as measured by mean decrease in accuracy when a variable is removed from model, for environmental variables used in ensemble modeling.

Shown are the variables listed in order according to importance, with results from both the Maxent, as well as the other top 2 SDMs implemented in the BIOMOD package

(Thuiller et al ., 2009), which together were the top three ranked models in over 70% of all model runs. Models are separated because variable importance scores are calculated differently for Maxent as compared to the other 2 SDMs.

89

Maxent GBM/RF

1. Bio5 Bio5

2. Bio19

3. Bio6

Bio12

Bio2

4.

5.

Bio1

Bio2

6. Bio4

7. Bio12

8. Bio15

Bio15

Bio4

Bio19

Bio1

Bio6

90

91

92

93

Table S3. Ten species distribution models and short description of each, used in model performance comparisons and final weighted ensemble models for vector, primary host, and human WNV occurrence in North America for the years 2003 –

2011.

94

95

Abbreviation

ANN

CTA

MDA

GAM

GBM

GLM

MARS

MAXENT

RF

SRE

Model Name

Artificial Neural Networks

Classification Tree Analysis

Multiple Discriminant Analysis

Generalized Additive Models

Generalized Boosting Models

Generalized Linear Models

Multivariate Additive Regression

Splines

Maximum Entropy

Random Forests

Surface Range Envelopes

Description and References

Non-linear regression models with multiple parameters, allowing for smoothed relationships

(Ripley 1996)

Bifurcating recursive partitioning model using single run attempting to split a response into groups with minimal deviance (Ripley 1996)

Supervised classification model that can use a variety of regression models to optimize predictive power (Hastie et al . 1994)

Nonparametric model using “smoother” algorithm, useful when relationship between predictors and response are expected to be complex (Hastie and Tibshirani 1990)

Model using iterations of regression trees, each attempting to predict residuals left from previous trees (Friedman 2001)

Linear models allowing for polynomial predictor terms (McCullagh and Nelder 1989)

Adaptive regression techniques allowing for variable relationships between response and predictor along different ranges of the response

(Friedman 1991)

Models presence and pseudo-absence data using maximum entropy of point locations (Phillips et al . 2006)

Models using series of tree regressions whereby random sets of both predictors and records are selected for training and tested against “out-ofbag” data (Breiman 2001)

Envelope based on minimum and maximum values of predictors at occurrence points (Busby

1991)

96

97

98

99

Table S4. Performance of nine species distribution models based on Receiver

Operating Characteristic (ROC) for each of 27 different datasets used in final ensemble models. SRE models received no ROC as they do not generate testable models but rather represent a rectilinear envelope (Thuiller et al. 2009).

100

Vectors

2003

2004

2005

2006

2007

2008

2009

2010

2011

Primary Hosts

2003

2004

2005

2006

2007

2008

2009

2010

2011

Secondary Hosts

2003

2004

2005

2006

2007

2008

2009

2010

2011

Model ROC Score

ANN CTA GAM GBM GLM MARS MAXENT MDA RF

0.63 0.746 0.777 0.813 0.787 0.794 0.832 0.784 0.85

0.58 0.683 0.698 0.749 0.717 0.726 0.807 0.697 0.78

0.657 0.725 0.78 0.785 0.784 0.784 0.803 0.764 0.814

0.678 0.737 0.815 0.833 0.802 0.821 0.868 0.757 0.867

0.609 0.726 0.772 0.78 0.759 0.772 0.855 0.755 0.818

0.542 0.629 0.677 0.688 0.694 0.681 0.871 0.612 0.751

0.654 0.629 0.734 0.71 0.717 0.715 0.835 0.723 0.658

0.557 0.696 0.735 0.756 0.744 0.743 0.911 0.694 0.772

0.55 0.704 0.709 0.758 0.738 0.738 0.877 0.713 0.814

ANN CTA GAM GBM GLM MARS MAXENT MDA RF

0.703 0.773 0.819 0.836 0.802 0.823 0.838 0.784 0.849

0.59 0.702 0.713 0.737 0.704 0.717 0.805 0.709 0.781

0.683 0.721 0.769 0.788 0.766 0.768 0.864 0.746 0.79

0.618 0.724 0.779 0.786 0.778 0.785 0.836 0.743 0.793

0.675 0.707 0.777 0.789 0.765 0.79 0.797 0.776 0.825

0.622 0.696 0.744 0.767 0.739 0.764 0.832 0.739 0.803

0.603 0.658 0.755 0.772 0.705 0.753 0.851 0.665 0.763

0.684 0.723 0.763 0.818 0.745 0.762 0.923 0.753 0.803

0.61 0.603 0.8 0.714 0.777 0.749 0.935 0.746 0.775

ANN CTA GAM GBM GLM MARS MAXENT MDA RF

0.705 0.762 0.795 0.816 0.779 0.802 0.82 0.771 0.85

0.624 0.697 0.729 0.756 0.721 0.738 0.813 0.735 0.77

0.71 0.75 0.798 0.805 0.799 0.808 0.816 0.761 0.829

0.667 0.723 0.729 0.769 0.715 0.754 0.799 0.7 0.81

0.711 0.718 0.77 0.777 0.769 0.764 0.805 0.743 0.814

0.63 0.654 0.677 0.704 0.649 0.705 0.823 0.667 0.775

0.673 0.686 0.742 0.762 0.738 0.769 0.785 0.69 0.791

0.62 0.672 0.739 0.751 0.729 0.725 0.796 0.693 0.816

0.554 0.601 0.652 0.705 0.631 0.662 0.854 0.574 0.777

101

102

103

Table S5. Seven model projections and a short description of each used in the mathematical ensemble of future climate conditions for North America.

104

105

Model

CCCma

CSIRO

Group Name, Location

Canadian Centre for Climate Modelling and

Analysis, Canada

Commonwealth Scientific and Industrial Research

Organisation, Australia

IPSL CM4

MPI

NCAR CCSM30

UKMO-HADCM3

Institut Pierre Simon Laplace, France

Max Planck Institute for Meteorology, Germany

National Center for Atmospheric Research, United

States

Hadley Centre for Climate Prediction and

Research, United Kingdom

UKMO-HADGEM1 Hadley Centre for Climate Prediction and

Research, United Kingdom

Model Name, version

Coupled Global Climate Model

(CGCM), version 3

CSIRO, mark 3.0

CM4

ECHAM5

Community Climate System

Model (CCSM), version 3.0

Hadley Centre Climate Model, version 3

Hadley Centre Global

Environmental Model, version 1.0

106

107

108

109

110

111

112

113

114

115

Fig. S1 Relationship of environmental predictors to the probability of presence of

West Nile virus.

The response curves for only the three top explanatory variables are shown for clarity. In addition, each model for each year and for each response variable

(vector, bird, and human cases) produced different response curves; the responses shown represent only the relationship between probabilities of presence in vectors for the year

2011. Yet general relationships remained consistent, with regions experiencing higher temperatures and lower precipitation (both annual and in the coldest quarter) predicted to have higher probabilities of presence of West Nile virus infection.

116

117

118

119

120

121

Fig. S2 Probability of presence of West Nile virus in primary hosts (birds) from years 2003 -2011, as determined by ensemble modeling of 60 runs across 10 separate models.

Areas in red indicate high probability of presence (maximum probability = 1), while those in green represent areas of low predicted prevalence (minimum probability =

0).

A

122

123

124

125

126

127

Fig. S3 Probability of presence of West Nile virus in secondary hosts from years

2003 -2011, as determined by ensemble modeling of 60 runs across 10 separate models.

Areas in red indicate high probability of presence (maximum probability = 1), while those in green represent areas of low predicted prevalence (minimum probability =

0).

128

129

130

131

132

133

134

Fig. S4 Change in probability of presence of West Nile virus in vectors, primary hosts, and secondary hosts from the year 2003 to the year 2011, as determined by ensemble models across years.

Some increases, particularly in the western United

States, represent the initial spread of the disease during the 2003 – 2004 WNV seasons.

Other areas, such as the northeast United States, represent increases of WNV during the past three WNV seasons (years 2009 – 2011).

135

136

137

138

139

Fig. S5 Changes in the eight environmental variables under an A1B ensembled model by the year 2050.

In general, warmer conditions are expected continentally, with wetter conditions in the north and drier in the south (Program, USGCR 2009).

140

141

142

143

144

145

146

Fig. S6 Changes in the eight environmental variables under an A1B ensembled model from the year 2050 to 2080.

In general, predicted patterns in 2080 are consistent in direction but with increased magnitude as compared to 2050. For instance, northern latitudes receive increased precipitation whereas southern latitudes continue to dry.

163

164

165

166

167

168

169

158

159

160

161

162

153

154

155

156

157

147

148

149

150

151

152

Breiman L (2001) Random Forests. Machine Learning 45 , 5-32.

Busby JR (1991) BIOCLIM - a bioclimatic analysis and prediction system. CSIRO,

Canberra, Australia.

Hastie TJ, Tibshirani RJ (1990) Genralized Additive Models . Chapman & Hall.

Hastie TJ, Tibshirani RJ, Buja A (1994) Flexible Discriminant Analysis by Optimal

Scoring.

10.1111/J.1600-0587.2008.05742.X.

, 1255-1270.

Friedman JH (1991) Multivariate Additive Regression Splines. Annals of Statistics 19 , 1-

67.

Friedman JH (2001) Greedy function approximation: A gradient boosting machine.

Annals of Statistics

McCullagh P, Nelder JA (1989) Generalized Linear Models . 2ed edition. Chapman &

Hall.

Phillips SJ, Anderson RP, Schapire RE (2006) Maximum entropy modeling of species geographic distributions.

10.1016/J.Ecolmodel.2005.03.026.

Program, U. S. G. C. R. (2009) Global Climate Change Impacts in the United States,

New York, New York.

Ripley BD (1996) Pattern Recognition and Neural Networks . Cambridge University

Press.

Journal of the American Statistical Association

29 , 1189-1232.

Ecological Modelling

Thuiller W, Lafourcade B, Engler R, Araujo MB (2009) BIOMOD - a platform for ensemble forecasting of species distributions.

190 , 231-259, doi:Doi

Ecography 32 , 369-373, doi:Doi

Download