Document 13154058

advertisement

AN ABSTRACT OF THE THESIS OF

Andrea M. Havron for the degree of Master of Science in Marine Resource Management presented on May 13, 2015.

Title: Habitat Suitability and Uncertainty: A Bayesian Approach to Mapping Benthic

Invertebrate Distributions

Abstract approved:

_____________________________________________________________________

Chris Goldfinger

Mitigating for increased human impact to the seafloor associated with resource extraction activities and renewable energy development can benefit from an understanding of the distribution of sensitive marine benthic species. Habitat suitability predictive modeling is a cost effective statistical tool to infer species distribution patterns from constrained sampling locations. However, uncertainties related to the data, collection methods, and the statistical process can carry over into final maps. Challenges remain in accurately displaying the spatial uncertainty of habitat suitability models in a way that is transparent and easy to interpret. This thesis uses Bayesian networks to develop habitat suitability maps for several species of benthic invertebrates along the shelf and slope of the continental US west coast. In addition to predictive maps, methods were developed to create two complementary maps communicating model prediction uncertainty and experience, a measure of equivalent sample size. Species modeled include three species of benthic macrofauna: a marine bivalve Axinopsida serricata , a marine gastropod Aystris gausapata , and a marine polychaete, Sternaspis fossor ; and

three species of benthic megafauna of the dictyonine glass sponge assemblage:

Aphrocallistes vastus , Heterochone calyx , and Farrea occa . Benthic macrofauna models were learned from benthic sampling data collected from sites along the Pacific Northwest shelf, spanning from Northern California to Washington State, south of the Olympic

Coast National Marine Sanctuary. Benthic megafauna models were learned from

NOAA’s historical sponge and coral observations dataset. Data were collected from bottom trawls conducted between 100m -1300m water depth along the US west coast continental shelf and upper slope. Netica

®

software was used to implement the design and analysis of the statistical models. Final macrofauna models were selected using a crossvalidation technique. A generalized benthic macrofauna model structure for invertebrates living within marine sediment was developed for reusability and update capacity. With additional maps of uncertainty, marine resource managers and decision makers are better equipped to interpret habitat suitability maps in light of the best available science, and improve map usability for marine spatial planning purposes. Methods developed and presented here are broadly applicable to a wide range of other species and ecosystems, particularly in settings with small sampling effort.

©Copyright by Andrea M. Havron

May 13, 2015

All Rights Reserved

Habitat Suitability and Uncertainty: A Bayesian Approach to Mapping Benthic

Invertebrate Distributions by

Andrea M. Havron

A THESIS submitted to

Oregon State University in partial fulfillment of the requirements for the degree of

Master of Science

Presented May 13, 2015

Commencement June 2015

Master of Science thesis of Andrea M. Havron presented on May 13, 2015

APPROVED:

Major Professor, representing Marine Resource Management

Dean of the College of Earth, Ocean, and Atmospheric Sciences

Dean of the Graduate School

I understand that my thesis will become part of the permanent collection of Oregon State

University libraries. My signature below authorizes release of my thesis to any reader upon request.

Andrea M. Havron, Author

ACKNOWLEDGEMENTS

The completion of my Master’s thesis would not have been possible without the guidance of the following people:

First and foremost, I would like to express my deepest appreciation for my advisor, Dr. Chris Goldfinger, who recognized my potential and was an excellent resource on habitat mapping and Bayesian analysis. I would also like to thank my committee, Dr. Sarah Henkel and Dr. Bruce G. Marcot. Sarah expanded my knowledge of benthic invertebrate ecology. Bruce was a steadfast resource on Bayesian networks, habitat suitability modeling, and scientific writing. My minor advisor, Dr. Charlotte

Wickham introduced me to R and, along with my other statistics professors, helped deepen my love for math in general and statistics in particular.

The CEOAS community has been an incredible support system over the past two years, especially Flaxen Conway, Robert Allen, and Lori Hartline. My fellow lab mates contributed much time and energy in assisting me on my thesis. Chris Romsos provided countless hours of GIS assistance and advice on modeling methodologies. Morgan

Erhardt and Daniel Lockett provided my first introduction into Bayesian networks.

Daniel also provided assistance in the initial stages of model development.

I received much help and support from the greater scientific community. I would like to thank the captains and crews of the R/V Pacific Storm, R/V Elakha, Miss Linda,

Derek M. Baylis; Marine Applied Research and Exploration; and David Evans for assisting in the data collection used for my research; and Kristen Politano for processing invertebrate and sediment samples. NOAA and the Geological Survey of Canada assisted

by providing data and information regarding deep sea coral and sponges, especially: Curt

Whitmire, Chris Rooper, Bob Stone, Tom Laidig, and Kim Conway. Norsys offered software support for their program, Netica.

I would like to recognize the people who have guided me along my academic journey. Doug Dahms inspired a lifelong love of learning and started me on my career trajectory in the biological sciences. Dr. Harold Heatwole introduced me to the core concepts of ecology and zoology. The following mentors enriched my understanding of marine ecology through unique opportunities typically from small boats on high seas:

John Calambokidis, Dr. Martin Raphael, Tom Bloxton, Gregory Campbell, Dr. Lisa

Munger, Rick Wood, Annie Douglas, and Mason Weinrich. I would also like to thank my mentors in geospatial analysis and GIS: Dr. Gregory Stewart, Dr. Dawn Wright, Kuuipo

Walsh, and Dominique Wiley-Camacho.

This thesis would not have been possible without the love and support of my family and friends. I am very fortunate in life to have an amazing partner and husband,

Parker Havron, who has encouraged me along every step of this journey. Our daughter,

Lila, has provided countless hours of joy and happiness, offering much respite from stressful days over the past two years.

Finally, I would like to acknowledge the support of the Bureau of Ocean Energy

Management (BOEM) and our collaborator and program manager Lisa Gilbane under

Cooperative Agreement award M10AC20002 which supported both the benthic macrofauna and glass sponge model work. I would also like to acknowledge the National

Oceanic and Atmospheric Administration (NOAA) and Mary Yoklavich at the NOAA

Southwest Fisheries Science Center, for continued support on the glass sponge model

work under Cooperative Agreement to the Cooperative Institute for Marine Resources

Studies (CIMRS) at Oregon State University.

CONTRIBUTION OF AUTHORS

Chris Goldfinger and Sarah Henkel were co-principle investigators of the research project detailed in Chapter 2. Their contributions included the formation of the research proposal and sampling effort plan. Chris Goldfinger led mapping efforts which coalesced into many of the environmental layers used in habitat suitability models. Sarah Henkel led benthic macrofauna sampling efforts and provided the macrofauna dataset used to train models. Bruce G. Marcot provided guidance in Bayesian network methodology development. Chris Romsos processed bathymetric data and created the 100 meter bathymetric grid and the mean grain size grid used in habitat suitability models. Lisa

Gilbane, as the Bureau of Ocean Energy Management project manager, provided feedback throughout the project. All authors assisted in editing the final manuscript.

TABLE OF CONTENTS

Page

Chapter 1. Introduction ................................................................................................... 1

1.1 Summary .................................................................................................................... 1

1.2 Habitat suitability of benthic invertebrates ..................................................................... 2

1.3 A Primer on Bayesian statistics and graphical networks ................................................. 7

Chapter 2. Mapping marine habitat suitability and uncertainty using Bayesian networks: a case study of northeastern Pacific benthic macrofauna ................................. 14

2.1 Introduction .............................................................................................................. 15

2.2 Literature Review ...................................................................................................... 15

2.3 Materials and Methods ............................................................................................... 19

2.3.1 Benthic macrofauna data .............................................................................................. 19

2.3.2 Environmental data ...................................................................................................... 22

2.3.3 Model development ..................................................................................................... 24

2.3.3.1 Variable discretization ............................................................................. 25

2.3.3.2 Network structure development ............................................................... 27

2.3.3.4 Model prediction, selection, and validation ............................................. 32

2.3.4 Map Creation ............................................................................................................... 35

2.4 Results ...................................................................................................................... 37

2.5 Discussion ................................................................................................................ 52

Chapter 3. Habitat suitability of the dictyonine glass sponge assemblage using

Bayesian networks ............................................................................................................ 57

3.1 Literature Review ...................................................................................................... 57

3.2 Materials and Methods ............................................................................................... 60

3.2.1 Dictyonine sponge data ................................................................................................. 60

3.2.2 Environmental data ....................................................................................................... 63

3.3.3 Model Development ...................................................................................................... 65

3.3 Results ...................................................................................................................... 68

3.4 Discussion ................................................................................................................ 73

Chapter 4. Conclusions ................................................................................................. 76

4.1 Future Recommendations ........................................................................................... 76

4.1.1 Improve Environmental Data ....................................................................................... 77

TABLE OF CONTENTS (Continued)

Page

4.1.2 Increase Spatially Explicit Sampling Effort ................................................................. 78

4.1.3 Accessibility of Model Predictions .............................................................................. 80

4.2 Implications for Marine Resource Management ........................................................... 80

Bibliography ..................................................................................................................... 83

LIST OF FIGURES

Figure Page

Figure 1.1. A sample Bayesian network modeling structure. ............................................. 8

Figure 1.2. Bayesian Network Conditional Probability Table (CPT). ................................ 9

Figure 1.3 Integrating presence probabilities. ................................................................... 13

Figure 2.1 Study Area ....................................................................................................... 21

Figure 2.2. Benthic macrofauna species chosen for habitat suitability models. ............... 22

Figure 2.3. An example Bayesian Network with respective conditional probability table.

........................................................................................................................................... 25

Figure 2.4. Supervised discretization technique to select breakpoints and state parameters for building the BN model. ............................................................................................... 26

Figure 2.5. Supervised BN Structure. ............................................................................... 28

Figure 2.6. Regional raster and local in situ variables ...................................................... 30

Figure 2.7. Example of a re-useable and updateable Bayesian network for benthic macrofauna living within marine sediment....................................................................... 31

Figure 2.8. Study area of field validation ......................................................................... 35

Figure 2.9. Bayesian network of Axinopsida serricata ..................................................... 38

Figure 2.10. Bayesian network of Aystris gausapata. ....................................................... 39

Figure 2.11. Bayesian network of Sternaspis fossor ......................................................... 40

Figure 2.12. Axinopsida serricata ..................................................................................... 46

Figure 2.13. Aystris gausapata ......................................................................................... 49

Figure 2.14. Sternaspis fossor ........................................................................................... 51

Figure 3.1. Dictyonine glass sponge species. .................................................................. 58

Figure 3.2. Dictyonine glass sponge influence diagram. .................................................. 64

Figure 3.3. Bayesian network of dictyonine glass sponge assemblage. ........................... 71

LIST OF FIGURES (Continued)

Figure Page

Figure 3.4. Dictyonine sponge assemblage. ...................................................................... 72

Figure 3.5 Newport embayment offshore of Oregon ........................................................ 74

LIST OF TABLES

Table Page

Table 1.1. Steps for calculating expected values (

𝑿

) of nodes displayed in Figure 1.1. .. 12

Table 2.1. Environmental Variables. ................................................................................ 24

Table 2.2. Model Tests conducted for each species. ......................................................... 34

Table 2.3. Axinopsida serricata model results.................................................................. 41

Table 2.4. Aystris gausapata model results. ..................................................................... 42

Table 2.5. Sternaspis fossor model results........................................................................ 43

Table 2.6. Sensitivity analysis results .

.............................................................................. 44

Table 3.1. Environmental variables. ................................................................................. 65

Table 3.2. Sensitivity analysis results .

.............................................................................. 69

In loving memory of my grandparents

Anna Victoria and Raymond Adam Siatkowski,

Prudella Louise and Robert Arnold Brocklesby

1

Habitat Suitability and Uncertainty: A Bayesian

Approach to Mapping Benthic Invertebrate

Distributions

Chapter 1.

Introduction

1.1 Summary

This thesis explores the use of Bayesian networks to derive habitat suitability maps of several species of benthic macrofauna and megafauna found throughout the shelf and slope of the continental United States west coast. Benthic invertebrates live within an environment that is difficult to observe, measure, and predict. For these reasons, statistical predictive approaches are necessary to provide useful information for marine management purposes.

This thesis outlines a novel and robust Bayesian network approach that addresses multiple issues encountered during habitat suitability modeling of benthic organisms.

Methods are provided for developing two additional map products that spatially describe model uncertainty and experience (a measure of equivalent sample size) to complement habitat suitability maps. Such additional map products aid in the interpretation of habitat suitability maps for spatial planning purposes.

Results from this work are presented as a set of three maps per species modeled describing habitat suitability, model uncertainty, and when methods allowed, experience.

Model performance metrics are reported for each species. Also, field validation results are reported when data allowed. Resulting maps are available online for use in marine spatial planning.

2

1.2 Habitat suitability of benthic invertebrates

Habitat is best defined as an area with a set of environmental resources, conditions, and biological factors that promote species-specific occupancy, including the support of an organism’s survival, reproduction, and growth (Morrison, Marcot &

Mannan 1992; Hall, Krausman & Morrison 1997). Establishing the quality of habitat necessitates an understanding of the area’s ability to sustain and maintain a healthy population of organisms, requiring a time series of demographic and life-history data

(Hall, Krausman & Morrison 1997). Such information can be difficult to obtain for marine benthic invertebrates given the limitations of cost, time, and technology associated with gathering data and performing experimental manipulations at depth

(Snelgrove 1999; Robison 2004). Advancements in technology have aided in collections methods (Hessler & Jumars 1974; Gage & Tyler 1991; Robison 2004; Eleftheriou &

McIntyre 2005; Rengstorf et al.

2013); yet there still remains a paucity of data on benthic systems when compared to terrestrial counterparts (Snelgrove 1999; Shackeroff, Hazen &

Crowder 2009).

In lieu of mapping habitat quality, habitat suitability probability (HSP) modeling allows for the mapping of species distribution patterns by analyzing statistical associations between environmental conditions and the presence or absence of a given species (Guisan and Zimmermann 2000). HSP modeling can be incredibly beneficial when studying and managing for elusive species (Sauer et al. 2013) and multiple habitat suitability studies on benthic invertebrates have previously been reported (Laine 2003;

Degraer et al.

2007; Glockzin & Zettler 2008; Rattray et al.

2009; Tittensor et al.

2009;

3

Gogina, Glockzin & Zettler 2010; Vierod, Guinotte & Davies 2014; Guinotte & Davies

2014). Noted limitations of previous studies include small spatial scope (Degraer et al.

2007; Glockzin & Zettler 2008; Gogina, Glockzin & Zettler 2010), issues related to parametric modeling such as multi-collinearity (Tittensor et al.

2009; Gogina, Glockzin

& Zettler 2010; Vierod, Guinotte & Davies 2014; Guinotte & Davies 2014), limitations of presence-only data (Tittensor et al.

2009; Vierod, Guinotte & Davies 2014; Guinotte &

Davies 2014), and modeling community assemblages over individual species due to small sample sizes (Degraer et al.

2007; Glockzin & Zettler 2008; Rattray et al.

2009; Tittensor et al.

2009; Guinotte & Davies 2014).

HSP models perform best when strong species-environmental associations exist and when the environmental cues related to species occupancy are easy to map (Elith &

Graham 2009). Benthic invertebrates are known to organize around sediment and substrata patterns and depth contours (Sanders 1968; Gray 1974). Additional associations to sediment characteristics, water column properties, and fluid dynamics are expected to occur (Snelgrove & Butman 1994; Snelgrove 1999).

The continental shelf of the Pacific Northwest, United States is a high-energy wave system, which results in broad-scale sedimentation patterns. Shallow regions impacted by wave energy are dominated by coarse, sandy sediment with low organic content; as depth increases and wave disturbance decreases, particle size decreases to finer, muddier sediment with high organic content (Snelgrove & Butman 1994; Snelgrove

1999; Byers & Grabowski 2013). As an active margin, rocky outcrops occur throughout the continental shelf and slope. Tectonic activity and uplift work alongside landslide, sea level change, and erosional processes to expose bedrock to the seafloor (Kulm & Fowler

1974). Regional maps have been developed for the US west coast continental shelf and slope describing depth at 100m resolution, mean grain size at 250m resolution, sediment type, and the probability of rock outcrop at 500m resolution (Goldfinger et al.

2014).

4

This thesis attempts to use regional maps representing environmental cues believed to drive distributional patterns of benthic invertebrates to model the habitat suitability of selected benthic invertebrate species.

This thesis will focus on two major groups of benthic invertebrates: macrofauna

(> 1 mm) living within soft sediment substrata and megafauna (>> 1 mm) attached to hard substrata. Chapter 2 focuses on three macrofauna species of the continental shelf:

Axinopsida serricata (Carpenter, 1864), a marine bivalve in the family Thyasiridae within the Lucinoida order; Aystris gausapata (Gould, 1850), a marine gastropod in the

Columbellidae family within the Neogastropoda order ; and Sternaspis fossor (Stimpson,

1854), a polychaete in the family Sternaspidae within the Terebellidae order. Chapter 3 focuses on the dictyonine glass sponge assemblage of the continental shelf and slope.

This group consists of three species within the Hexactinellida class: Aphrocallistes vastus , Heterochone calyx and Farrea occa .

Despite unknowns related to benthic invertebrate ecology, human use of the benthic environment is increasing. Such development increases the risk of exposing benthic invertebrates to disturbance from human impact in the form of resource extraction, energy device development, climate change, etc. Two specific impacts of concern related to species within this thesis are discussed: renewable energy development and bottom trawling.

5

As a high energy wave environment, coastal waters off Washington and Oregon are being considered for renewable wave energy development (Parkinson et al.

2015;

Reikard, Robertson & Bidlot 2015). The installation of renewable energy devices and the presence of anchors maintained on the seafloor are expected to alter sedimentation patterns (Amoudry et al.

2009; Neill et al.

2009; Coates, Vanaverbeke & Vincx 2013).

While it is unknown how renewable energy devices will impact benthic macrofauna, it is anticipated they will disturb communities, either through direct (e.g. anchor attachment, cable laying etc.), or indirect (e.g. changes to the local current and sediment patterns, acoustic and electromagnetic effects etc.) mechanisms (Boehlert & Gill 2010; Miller et al.

2013). Therefore, a preliminary assessment of likely macrofauna suitable habitat prior to the installation of multiple renewable energy devices will aid management and future spatial planning scenarios.

The advancement in fishing gear technology has led to an increased access to the seafloor by bottom trawl fisheries. Current regulations in US waters limit bottom trawl fisheries from 100 meters out to 70 fathoms (~1200m) depth (NMFS Pacific Fishery

Management Council (PFMC) 2014). Bottom trawl gear has been reported to impact benthic communities (Rijnsdorp et al.

1998; McConnaughey 2000; Chuenpagdee et al.

2003; Hiddink, Jennings & Kaiser 2006; Hiddink et al.

2009; Dransfield et al.

2014) due to surface scouring, sediment resuspension, destruction of biological and physical structures, and removal and scattering of benthic organisms (Jones, 1992).

The Sustainable Fisheries Act of 1996 amended the US Magnuson-Stevens

Fishery Conservation and Management Act, requiring a reduction in bycatch (Section

301) and the designation of essential fish habitat (EFH), including actions to conserve

6 such habitat such as the restriction of bottom trawl gear (Chuenpagdee et al.

2003).

Current research suggests associations between groundfish and deep sea coral and sponge communities (Krautter, Conway & Barrie 2006; Cook, Conway & Burd 2008; Stone et al.

2013), yet the latter currently remain unlisted as EFH. Observations of sponges and corals within the continental US west coast waters suggest isolated patches of organisms not representing enough density to warrant their listing as EFH. Further, even though steps are currently being taken to protect dictyonine sponge reefs in Canada (Jamieson &

Chew 2002) and deep sea coral and sponge regions in the Mid Atlantic as Habitats Areas of Particular Concern (Mid-Atlantic Fishery Management Council 2015), such protection is currently not offered to coral and sponge species living in pacific waters. Improved understanding of deep sea coral and sponge ecology, distribution, and impact from human activities may lead to their protection in the future. Initial HSP models of speciesspecific or species functional groups are a first step in developing a better understanding of these illusive benthic megafauna.

Given the importance of HSP maps in marine spatial planning, maps need to be explicit and transparent about underlying assumptions and uncertainties. While the HSP statistical approach provides a feasible tool for assessing species distribution patterns, uncertainties naturally inherent in the modeling process often go unreported (Rocchini et al.

2011). Attempts have been made to incorporate uncertainty into HSP mapping (Elith,

Burgman & Regan 2002; Fotheringham, Brunsdon & Charlton 2002; Johnson &

Gillingham 2004; Bierman et al.

2010), yet these cases remain the exception over the norm. Chapter 2 presents an assessment of uncertainty in HSP modeling and details methods that use Bayesian networks to develop maps expressing uncertainty and

7 equivalent sample size using macrofauna as the study example. Chapter 3 details methods used to incorporate spatial uncertainty associated with species records using glass sponges. Chapter 4 discusses future research directions and the implications map uncertainties have on their use in marine resource management and spatial planning.

1.3 A Primer on Bayesian statistics and graphical networks

Bayesian networks are a graphical modeling tool that applies Bayes’ theorem to a network of linked variables to calculate posterior probabilities of outcome states (Jensen

& Nielson 2007). Bayes’ theorem states that the conditional probability of event A given event B is equal to the product of the conditional probability of B given A and the unconditional probability of A divided by the unconditional probability of B, or

P(A|B) =

𝑃(𝐡|𝐴)𝑃(𝐴)

𝑃(𝐡)

(de Laplace 1812). In its application to habitat suitability modeling,

Bayes’ theorem calculates the probability of habitat suitability by calculating the probability that a species is present (Sp = Pres) given a set of environmental variables

(E

1

= e

1

E

2

= e

2

∩ … ∩

E k

= e k

), or

P(S = Pres | E

1

= e

1

E

2

= e

2

∩ … ∩

E k

= e k

) =

𝑃(E1 = e1 ∩ E2 = e2 ∩…∩ Ek = ek|𝑆𝑝= π‘ƒπ‘Ÿπ‘’π‘ )𝑃(𝑆𝑝 = π‘ƒπ‘Ÿπ‘’π‘ )

𝑃(E1 = e1 ∩ E2 = e2 ∩…∩ Ek = ek)

A Bayesian network consists of nodes containing explanatory (covariate or

prediction) variables and a response variable (Figure 1.1). In a habitat suitability model,

environmental predictive variables represent the explanatory nodes and the species of interest represents the response node. Bayesian network nodes are connected by linkages

(arrows) signifying correlative or assumed causal relationships or logical dependence. In

models developed from supervised expert judgment, linkages typically point from a prediction node (the “parent”) to a response node (the “child”). When a relationship between two prediction variables consists of a correlative relationship without causality, the direction of the linkage is not as important (Norsys, Netica

®

).

8

-55 to -20

-80 to -55

-110 to -80

-130 to -110

Depth

24.8

41.7

17.9

15.6

-73.2 ± 29

Mean Grain Size

0 to 2.5

2.5 to 3.75

3.75 to 10.5

60.5

12.4

27.0

3.07 ± 2.8

Sternaspis fossor

Absent

Present

75.7

24.3

0.243 ± 0.43

Figure 1.1. A sample Bayesian network modeling structure.

This model describes an example relationship between species and environment, where mean grain size (right node) is informed by depth (left node) and the macrofauna species

( S. fossor ; bottom node) is dependent on both depth and mean grain size.

Specifically within Netica

®

, each node is discretized into a number of states or

bins. As an example, in Figure 1.1, the

Sternaspis fossor node has two states: absent

(observations in which the species abundance was equal to zero) and present

(observations in which the species abundance was greater than zero). The depth node has four states: depth values which range from -20 to -55 m, -55 to -80 m, -80 to -110 m and -

110 to -130 m. The mean grain size node has three states: grain size values which range from 0 to 2.5 phi, 2.5 to 3.75 phi and 3.75 to 10.5 phi. Numbers to right of the states indicate their respective probabilities; child nodes contain posterior probabilities

calculated using Bayes’ theorem given the data (Norsys, Netica

®

). Posterior probabilities

9 are the probabilities after the model has been updated with (has “learned from”) data. All probabilities within a node add to 100 and numbers represent overall probabilities for the entire data set.

Within each child node is a conditional probability table (CPT) that contains all

possible combinations of parent states (Figure 1.2). For each unique combination of

parent state, a resultant probability is calculated for each child state using Bayes’ theorem when the probabilities are calculated using machine-learning algorithms; otherwise, in supervised models, CPT values can be calculated from frequencies of outcomes given observed conditions, or can be assigned based on best professional judgment.

Figure 1.2. Bayesian Network Conditional Probability Table (CPT).

For absent and present columns (right side), each row must add up to 100% probability.

To illustrate an example of applying Bayes’ theorem, the conditional probability of species presence given a depth between -55 and -80 and a mean grain size value

between 2.5 and 3.75, as illustrated in Figure 1.1, can be easily calculated. The example

10 model describes a relationship between species and environment, where mean grain size

(right node) is informed by depth (left node) and the macrofauna species ( S. fossor ; bottom node) is dependent on both depth and mean grain size. Habitat suitability for a species can be stated mathematically as:

P(Sp = Pres | [2.5 < MGS < 3.75]

[-80 < depth < -55]) =

P([2.5 < MGS < 3.75]

[-80 < depth < -55] | Sp = Pres) * P(Sp = Pres)

P([2.5 < MGS < 3.75]

[-80 < depth < -55])

Since P(A

B) = P(A|B)P(A), the above equation can also be written as:

P([2.5 < MGS < 3.75]

[-80 < depth < -55] | Sp = Pres) * P(Sp = Pres)

P([2.5 < MGS < 3.75] | [-80 < depth < -55]) * P(2.5 < MGS < 3.75)

The dataset used to train the model contained 218 total records, of which 10 records contained missing values for mean grain size. The condition, (2.5 < MGS < 3.75), was met in 26 out of 208 records, and the condition, (-80 < depth < -55) was met in 91 out of 218 records. Of the 26 records where the MGS condition was met, 12 records met the condition, (-80 < depth < -55). The species was found to be present in 52 of 208 records, of which, the condition, (2.5 < MGS < 3.75)

(-80 < depth < -55) occurred in 5 records. Therefore,

P(Sp = Pres | [2.5 < MGS < 3.75]

[-80 < depth < -55]) =

(

(

5

52

12

26

)∗(

)∗(

52

208

26

208

)

)

= 0.417.

This value is different from that reported by the Netica

®

CPT due to the application of the program’s expectation maximization (EM) algorithm. EM learning maximizes the likelihood of the network given the data when missing values are present.

If N = net and D = data, the EM algorithm calculates the most likely values for the CPT

11 by iteratively calculating P(N|D) =

𝑃(𝐷|𝑁)𝑃(𝑁)

𝑃(𝐷)

. Since P(D) remains the same for each candidate network, the EM algorithm maximizes the logarithm: log(P(D|N)) + log(P(N)).

The log(P(N)) is the prior probability of each net, or the net’s likelihood before it sees any data. When the prior network starts with a uniform probability, this term also acts as a constant. The log(P(D|N)) is the net’s log likelihood and is calculated as the product of log likelihood of each individual record in the training dataset,

∏ π‘˜ 𝑖=1 log(𝑃(𝑑

1

|𝑁)) ∗ log (𝑃(𝑑

2

|𝑁)) ∗ … ∗ log (𝑃(𝑑 π‘˜

|𝑁))

. The EM algorithm starts with a candidate net, calculates its log likelihood, and then processes the entire data set to calculate a better net. The algorithm continues this process until the there is no longer an improvement in log likelihood values.

After CPTs are calculated, an expected value (

𝑋̅

) plus or minus its standard deviation (

V) are reported for each node. The expected value is calculated via the Mean

Value Theorem in calculus, which states that the mean value of a smooth curve is where its derivative is equal to, or parallel to the secant, or x-axis (Norsys Netica

®

). This expected value is not the value most likely to occur, as is the case with the frequentist mean, but rather the center of the probabilistic curve that is bounded by the start and end state values and weighted by the respective probabilities of each state. The standard deviation describes the symmetric Gaussian error distribution, and the standard deviation

is equal to the square root of the variance. Table 1.1 calculates the expected value for

each node from Figure 1.1.

12

Table 1.1. Steps for calculating expected values (

𝑿

) of nodes displayed in Figure 1.1.

The expected mean is the bottom number in each continuous-value node. The midpoint of each state is multiplied by its posterior probability. Each weighted value is summed together to derive the node’s expected value.

Node

Depth

(m)

Mean

Grain

Size (phi)

Sternaspis fossor

State

Midpoint

-37.5

Posterior

Probability

0.248

-67.5

-95

0.417

0.179

-120 0.156

Expected Value:

1.25

3.125

7.125

0.605

0.124

0.27

Expected Value:

0 0.757

1 0.243

Expected Value:

Weighted

Value

-9.3

-28.15

-17.01

-18.72

-73.18

0.76

0.39

1.92

3.07

0

.243

.243

When an environmental condition is entered into the Bayesian network, the

network reports the resultant probabilities of absent and present from the CPT (Figure

1.3). The probability of presence translates scientifically to the probability of suitable

habitat (viz., in the context of the models in this thesis, the term refers to conditions providing for the presence of a given species) when models represent a binary outcome of present or absent. When networks are trained using abundance data, expected values and their standard deviations can be used to predict abundance.

13

Figure 1.3 Integrating presence probabilities.

When environmental conditions are entered into the Bayesian Network, presence probabilities are pulled from the CPT. The “probability of present” translates scientifically to the “probability of suitable habitat” since probabilities were trained from a minimal dataset establishing relationships between species presence and physical parameters. This is an example network and not a final model for S. fossor .

To create final predictive maps, Netica

®

processes a case file where each row is a cell on the map and each column represents a raster layer. Netica enters each raster value into the net for every cell location and reports the final probability of habitat suitability as a table. This table can be converted into a continuous floating point raster by first converting the table into a point shapefile and then applying the Point to Raster tool within ArcGIS 10.1. The resulting raster indicates the probability of habitat suitability given known environmental conditions.

14

Chapter 2.

Mapping marine habitat suitability and uncertainty using

Bayesian networks: a case study of northeastern Pacific benthic macrofauna

Authors: Andrea Havron

1

, Chris Goldfinger

1

, Sarah Henkel

2

, Bruce G. Marcot

3

, Chris

Romsos

1

, Lisa Gilbane

4

1. Active Tectonics and Seafloor Mapping Laboratory, College of Earth, Ocean, and

Atmospheric Sciences, Oregon State University. 2. Benthic Ecology Laboratory,

Department of Integrative Biology, Hatfield Marine Science Center, Oregon State

University. 3. Pacific Northwest Research Station, US Forest Service, USDA. 4. Bureau of Ocean Energy Management.

15

2.1 Introduction

Habitat suitability modeling of data-poor systems requires careful attention to assumptions and limitations. Interpretation of habitat suitability maps can be improved with maps that visually communicate model uncertainty. Bayesian networks (BNs) were applied to a benthic macrofauna dataset to illustrate their use in developing habitat suitability models derived from a small, marine dataset. BNs were also used to create maps displaying model uncertainty and data limitations. We describe BN modeling of three macrofauna species: a marine gastropod, Aystris gausapata , a marine bivalve,

Axinopsida serricata , and a marine worm, Sternaspis fossor.

Three map products were produced for each species: a habitat suitability map displaying suitable habitat based on the BN habitat model of regional predictor variables; an uncertainty map, displaying statistical uncertainty of model predictions of habitat suitability; and an experience map, displaying the empirical basis for habitat suitability predictions (equivalent sample size).

As habitat suitability modeling is an inherently uncertain process, managers using map products to formulate conservation oriented strategies and decisions should be informed of underlying limitations and uncertainty in habitat suitability projections. Visually describing statistical model uncertainty and equivalent sample size in map format allows for improved interpretation of habitat suitability map predictions.

2.2 Literature Review

Predictive habitat suitability modeling (HSM), founded in Hutchinson’s (1957) and Whittaker’s (1960) theories of the ecological niche and the environmental gradient, entails statistical analyses to extrapolate probabilities of a species’ suitable habitat from

16 known species-environmental correlations. HSM is advantageous when habitat maps are needed for management when limitations of cost or time often prevent rigorous sampling efforts. Such savings are even more critical in data-poor systems, such as the marine benthos.

Multiple statistical techniques have been developed (Guisan & Zimmermann

2000; Franklin 2009) for HSM. Choosing the most appropriate model comes down to using the best modeling approach that provides an acceptable level of prediction accuracy given uncertainty and constraints of the data being modeled. Understanding the limitations and assumptions underlying different statistical modeling approaches is essential, not only in selecting the most appropriate modeling tool, but also in communicating model results.

Several limitations are known to occur for most modeling frameworks currently in use. One limitation is that of multicollinearity (Graham 2003; Ahmadi-Nedushan et al.

2006). Most principal-component and related data-reduction methods cannot handle environmental parameters that are highly correlated (Heckerman 1995). Multicollinearity leads to inaccurate model parameterization, decreased statistical power, and the exclusion of significant predictor variables (Graham 2003). This can become a problem when modeling in the marine benthic environment, where many of the environmental variables used to predict species’ habitats are highly correlated (Snelgrove & Butman 1994).

Another limitation is missing data. For most modeling methods, if a location is missing the measurement of even one covariate, the location record cannot be used to develop the overall model. Likewise, if the regional environmental dataset is missing a single covariate, than a prediction cannot be made for this location otherwise the model

will result in inaccurate predictions (Thuiller, Araujo & Lavorel 2004). The alternative

17 would be to remove the covariate that contains the missing information from the analysis.

This, however, is disadvantageous if the covariate is an important predictor for a given species (Barry & Elith 2006).

A final limitation that will be discussed in this chapter is that of uncertainty.

Sources of prediction uncertainty in an ecological study can be attributed (but not limited) to model design and implementation bias, precision of measurements, chance variability in the system under study, and modeling and parameter uncertainty (Ioannidis

2005; Guisan & Thuiller 2005; Cressie et al.

2009; Franklin 2009). Visualizing the spatial distribution of uncertainty inherent in habitat suitability modeling improves user understanding and limitations of underlying habitat suitability maps (Barry & Elith 2006;

Rocchini et al.

2011).

With the increasing importance of habitat suitability modeling in environmental management scenarios (e.g. invasive species management, reserve design, climate change impact assessment) (Guisan & Thuiller 2005; Franklin 2009; Lele et al.

2013), an appropriate assessment of uncertainty is warranted so managers understand the limitations behind maps and models that are used to inform management decisions and strategies. Clearly communicating uncertainty is essential so that managers may better understand the spatial distribution of uncaptured variability and error, allowing for more scientifically informed decisions (Cleaves 1995; Barry & Elith 2006; Rocchini et al.

2011). We argue that the best approach is to create additional, supporting map products that help to clearly communicate the spatial distribution of uncertainty and bias underlying habitat suitability maps.

18

We used Netica

®

(Norsys, Inc., http://www.norsys.com) to build Bayesian networks (BN), a directed acyclic graphical modeling tool that applies Bayes’ theorem to a network of variables linked by probabilities (Marcot 2006; Jensen & Nielson 2007).

The underlying Bayesian statistics allow for the development of habitat suitability models that can be updated as new data are collected, thereby improving model performance over time. BNs remain robust to small datasets, multicollinearity and missing data

(Heckerman 1995; Kontkanen et al.

1997; Myllymäki et al.

2002; Uusitalo 2007). BNs are also designed to track and propagate uncertainty through the system (Sivia & Skilling

1996; Uusitalo 2007; Gelman et al.

2013) and, in the context of this chapter, can provide a final habitat suitability map along with maps visualizing prediction uncertainty and equivalent sample size.

The goals of this chapter are to 1) introduce the application of Bayesian networks towards the modeling of habitat suitability within the context of a small, marine dataset; and 2) describe a novel approach of using a BN model to create maps of prediction uncertainty and the empirical basis for the habitat suitability map (equivalent sample size). The two additional map products here will improve manager interpretation of habitat suitability maps, enhancing map usability for spatial planning purposes. We present a case study using marine macrofauna found within soft sediment substrata along the Pacific Northwest continental shelf off the western United States, and our techniques have a broader applicability across other systems and species.

19

2.3 Materials and Methods

2.3.1 Benthic macrofauna data

Motivated by plans to develop renewable energy along the Pacific Northwest continental shelf and slope, the Bureau of Ocean Energy Management (BOEM), U.S.

Department of Interior, implemented a project to characterize benthic habitat at a regional scale along the continental shelf (Goldfinger et al.

2014; Henkel et al.

2014). Box core grab samples of bottom sediment were collected at 153 stations across 8 sites of the continental shelf from Northern California to south of the Olympic Coast National

Marine Reserve of Washington (Figure 2.1). A sub-sample of sediment was collected to

determine grain size, percent silt, and percent sand (using an LD-PSA) as well as percent organic carbon and nitrogen (using acid composition). The remaining sediment from each grab sample was then filtered through a 1.0-mm screen and all benthic macrofauna left behind were preserved for identification (Henkel et al.

2014). If a species was identified within a sample, then the sample was assigned a “present” value, otherwise an “absent” value was assigned. As benthic macrofauna species (animals greater in size than 1.0-mm) were the focal group for models within this chapter, this “absent” value represented true absence since all samples were thoroughly examined for remaining organisms on the 1.0mm screen. CTD (conductivity, temperature, depth, plus additional sensors) casts were conducted at each box core sampling station to analyze temperature, dissolved oxygen, pH, fluorescence, and turbidity.

We selected seven benthic macrofauna species for modeling habitat suitability based on a Primer SIMPER analysis, and species selected for modeling were those whose

20 variations in densities highly contributed to distinctions between assemblages at multiple sampling sites (See Henkel et al. 2014 for a full description of methods and analysis).

Three of these seven modeled species are highlighted in this chapter as a case study of modeling methods: a marine gastropod, Aystris gausapata (Gould, 1850), a marine bivalve, Axinopsida serricata (Carpenter, 1864), and a marine polychaete worm,

Sternaspis fossor

(Stimpson, 1854) (Figure 2.2).

Figure 2.1 Study Area

Southern and northern bounds defined by macrofauna sampling locations (39º 30ΚΉ 25.668ΚΊ N to 47º 01ΚΉ 2.64ΚΊ N).

Eastern and western bounds defined by shallowest and deepest samples (-20 to -130 meters). Areas of hard rock

(dark gray), cobble and gravel (white) are masked from the final habitat suitability map.

21

22

Figure 2.2. Benthic macrofauna species chosen for habitat suitability models.

From left to right: Axinopsida serricata, Aystris gausapata, and Sternaspis fossor. Two species were chosen due to expected changes in distribution based on sediment changes due to offshore installations: A. gausapata and S. fossor . A. serricata was chosen due to unique characteristics of its distribution, warranting further investigation with habitat suitability models to help determine the utility of the tool across a spectrum of species.

2.3.2 Environmental data

Multiple sediment characteristics and water column properties were initially

considered for habitat suitability modeling (Table 2.1). A preliminary exploration of

variables using frequency histograms indicated low variability in salinity (range: 33.1 –

33.9 PSU) and temperature (range: 7.3 – 9.8 ˚C). We judged that such variability is unlikely to be biologically significant, so we excluded these two variables from habitat modeling. Further, we excluded pH, fluorescence and turbidity due to unacceptable measurement errors likely encountered during the equipment calibration process. High resolution bathymetry data were derived for each sampling location by using existing theme data (Goldfinger et al.

2014) in ArcGIS 10.1. An additional parameter, distance from each benthic sample station to shore was calculated in GIS using Euclidean distance analysis on a polyline shoreline.

Spatial data on presence and absence of benthic macrofauna species were joined to the raster data on high-resolution bathymetry and distance to shore. Results were data

on species presence, absence, and environmental parameters at each sample location, including the remotely sensed variables depth and distance to shore and the in situ substrate variables mean grain size (MGS), total organic carbon (TOC), total nitrogen

(TN), percent silt, and percent sand. We then constructed the BN models of habitat suitability for each of the three species using this full data set.

We used a regional 250-meter resolution MGS raster data set (Goldfinger et al.

23

2014) in GIS to represent coarse regional information, that is, a 250-meter grid of points to match the scale of the MGS raster. We extracted values for MGS, depth, and distance to shore for use in the BN model.

24

Table 2.1. Environmental Variables.

Data were either collected in situ with species or calculated as a raster in ArcGIS. Models were parameterized with in situ variables and high resolution rasters. Variables not included in the model were excluded either due to being insignificant or having insufficient data. Variables used for prediction were all generalized to a 250 meter cell size or predicted within the network.

Variables Data Source

Model

Parameterization

Model

Prediction

Units of

Variable

Mean Grain

Size

Collected in situ

Regional Raster

(Goldfinger et al.

2014) in situ

Regional

Raster – 250m phi

Latitude

Percent Silt

Percent Sand

Total Organic

Carbon (TOC)

Total Nitrogen

(TN)

Salinity

Temperature pH

Fluorescence

Turbidity

Depth

Distance to

Shore

Collected in situ

Collected in situ

Collected in situ

Collected in situ

Collected in situ

Collected in situ

Collected in situ

Collected in situ

Collected in situ

Collected in situ

High Resolution Raster

(Goldfinger et al.

2014)

ArcGIS Raster in situ in situ in situ in situ in situ

Excluded – insignificant

Excluded – insignificant

Excluded –

Insufficient data

Excluded – insufficient data

Excluded – insufficient data

High Resolution

Raster

ArcGIS Raster

Regional

Raster – 250m

Predicted in

Network

Predicted in

Network

Predicted in

Network

Predicted in

Network

Insignificant

Variable

Insignificant

Variable

Insufficient

Data

Insufficient

Data

Insufficient

Data

Regional

Raster – 250m

Regional

Raster – 250m degrees percent percent percent by weight percent by weight

PSU degrees Celsius pH ug/l

NTU meters meters

2.3.3 Model development

We designed the benthic macrofauna BN models along guidelines of Marcot et al.

(2006) and Uusitalo (2007). The modeling steps included: 1) variable discretization, 2) network structure development, 3) model parameterization, 4) model calibration, and 5) model prediction, selection, and validation.

25

2.3.3.1 Variable discretization

Variable discretization entailed identifying quantitative ranges of state values of

continuous variables (Figure 2.3). Although several automated (unsupervised, machine-

learning) techniques exist for variable discretization, none is specifically standard in the context of ecological datasets. Myllymäki et al. (2002) recommended methods which use ecologically significant breakpoints and that minimize the number of states so that each interval contains enough data to parameterize and run the model.

Figure 2.3. An example Bayesian Network with respective conditional probability table.

This model describes an example relationship between species (child node) and environment (parent nodes), where Mean Grain Size (right node) is informed by Depth

(left node) and the macrofauna species ( S. fossor ; bottom node) is dependent on both

Depth and Mean Grain Size. Each node is composed of states, which categorize the data into different bins. The conditional probability table describes the probability of each child state for each possible combination of parent states.

We used a supervised technique by visually inspecting frequency-value histograms and comparing presence and absence of each species in relation to values of

each variable (Figure 2.4). We initially estimated breakpoints by selecting values at the

minimum and maximum range of a species presence response in the frequency-value

26 histograms, or where histograms shifted in histogram density between present and absent.

We structured a simple species-single environment model within Netica

®

based on initial estimated breakpoints, parameterized CPTs and calculated the resulting percent error. We then incrementally increased or decreased the number of cutoff points and adjusted breakpoint values, re-parameterized CPTs, and recalculated percent error. Through this iterative process, multiple discretization schemes were compared to optimize breakpoint locations and number of states.

Figure 2.4. Supervised discretization technique to select breakpoints and state parameters for building the BN model.

Supervised breakpoints (red lines) are determined by visually inspecting histograms of mean grain size at sites where a species was absent (top graph) and present (bottom graph). This is an example using S. fossor and does not represent this species’ final model.

27

2.3.3.2 Network structure development

In this step, we identified direct correlations or causal linkages among variables

(Figure 2.3). Previous benthic macrofauna BN models (Locket 2012) used the tree

augmented naïve (TAN) algorithm (Friedman, Geiger & Goldszmidt 1997), e.g., as used by Aguilera et al. (2010) and Dlamini (2011), for unsupervised learning of the BN network structure. However, the TAN structure learns relationships directly from the sample dataset. When no other information is available, this is an appropriate technique.

When prior knowledge exists about correlations among variables, however, a more robust model structure may result from applying supervised techniques to identify and designate link structures around known interdependencies (Uusitalo 2007). Further, as the TAN structure is based on the original sampling dataset alone, the incorporation of new information or data into the network requires the TAN structure to be rebuilt with each new update and can lead to significant changes in the network structure. Using a supervised structure based off known environmental relationships allows for the development of a more reusable network that can be updated with new information without applying significant changes to the network.

For these reasons, we used a supervised link structure where a combination of expert knowledge and correlations between variables within the dataset guided the design. Grain size, percent silt and percent sand essentially measure the same phenomenon and therefore are highly correlated (Folk & Ward 1957). Total organic carbon (TOC) and total nitrogen (TN) are known to be highly correlated with percent silt

(Hedges & Keil 1995). Therefore, we designed the BN models explicitly linking

28 correlated sediment variables. Further, we inspected the sample data for correlations using Pearson’s correlation coefficient. This step confirmed the high correlation between sediment variables and also revealed correlation between depth and the sediment

variables, MGS, percent silt, percent sand, TOC, and TN (Figure 2.5). Links were then

added between depth and sediment variables to account for all such correlations.

Figure 2.5. Supervised BN Structure.

The links between nodes is this structure were developed based on a prior understanding of the environment and confirmed by correlations in the data. Mean grain size, percent sand, percent silt, total organic carbon and total nitrogen were all highly correlated with a

Pearson Correlation Coefficient greater than 0.9. Depth was correlated with each of the above variables with a Pearson Correlation Coefficient greater than 0.7

We identified two scales of explanatory environmental variables: regional variables, or those that correspond with regional raster datasets and represent continuous coverage throughout the region of interest; and in situ variables, or those whose values are known only at sediment sample locations. We designed a network structure to

29 facilitate the prediction of in situ variables by inserting intermediate nodes into the

networks (Figure 2.6). These intermediate nodes re-discretized regional variables

(distance to shore, depth, latitude, and MGS) to best predict the correlated in situ variables (percent silt, percent sand, TOC, and TN). The uncertainty of actual in situ variable values existed in their output, which was carried through the network into the uncertainty of the final suitability prediction, expressed as the posterior probability distribution of the benthic macrofauna absence and presence states. The result of this process was a benthic macrofauna model framework for invertebrates living within

marine sediment, which is both adaptable to new species and updateable (Figure 2.7).

The BN network structure can be adapted to other benthic macrofauna species that are influenced by the suite of environmental variables presented here, with only slight modifications to species-environment discretization breakpoints. The model can be updated with new information about either a species of interest or relationships between explanatory variables (e.g., percent silt and TOC).

30

Figure 2.6. Regional raster and local in situ variables

Regional raster variables predicted local in situ variables, which all combined to predict habitat suitability for a given species. Blue circles indicate intermediate nodes, their function being to re-discretize the parent node to best predict the child node.

31

32

2.3.3.3 Model parameterization

After initial models were established with uniform prior probability distributions, conditional probability tables (CPT) were parameterized using the expectationmaximization (EM) algorithm. The EM algorithm is a robust method for updating prior and conditional probabilities, particularly when missing data are present (Dempster, Laird

& Rubin 1977; Watanabe & Yamaguchi 2003; Marcot 2006). Prior to learning CPTs from species sampling data, CPTs of MGS, depth, percent silt and percent sand were learned from the U.S. Seabed sampling database (Reid et al.

2006) using the EM learning algorithm. This step used prior knowledge of the relationships between depth and sediment size measurements throughout the study region as prelude to learning the species-environment relationships.

2.3.3.4 Model prediction, selection, and validation

Final models were selected by evaluating three model performance metrics: confusion matrix error rates, spherical payoff (SP), and true skill statistic (TSS) (Marcot

2012). Confusion error (0, 100%) was chosen as it represents the combination of Type I and Type II (false positive and false negative) error rates. A false positive error, in the context of habitat suitability models, occurs if a model predicts that a species is present in a particular habitat when, in fact, it is absent; a false negative error occurs if a model predicts that a species is absent in a particular habitat when, in fact, it is present. Best performing models have 0% confusion error. Spherical payoff (0, 1) was chosen as it outperforms AUC, area under the curve (Marcot 2012). Best performing models have an SP score of 1. TSS (-1, 1) was chosen as it is independent of prevalence (Allouche, Tsoar &

33

Kadmon 2006; Marcot 2012). A TSS score of 1 represents a model with no error, a score of 0 represents a model with random error, and a score of -1 represents a model with total error. Each metric has different assumptions. Confusion error is based on the highest probability state and conflates error types, which may oversimplify the utility of the model; SP is influenced by the number of states in the response variable; and TSS conflates error types (Marcot 2012). For these reasons, all three metrics were compared when selecting final models.

We built and tested 12 models for each species (Table 2.2). Macrofauna

invertebrates are known to have physiological constraints related to pressure and a reliance on a sediment-associated food supply (Snelgrove 1999). We therefore kept the variables depth and MGS in each model tested. Models varied based on the other two regional variables latitude and distance to shore, and the inclusion or exclusion of in situ sediment characteristics percent silt, percent sand, TOC, and TN. We also tested the inclusion or exclusion of intermediate nodes for predicting in situ variables. Sensitivity analysis was performed on each network to determine the degree to which each environmental variable explained the variation in the species posterior probability distributions (Marcot 2012). Final models were selected by evaluating performance metrics from the 4-fold cross validation tests to avoid selecting an overfit model.

34

Table 2.2. Model Tests conducted for each species.

In total, there were 12 model permutations for each species as models 5-8 were tested with and without intermediate nodes. Mean Grain Size (MGS) and depth were included in each model test as previous research has established their importance to benthic invertebrate distribution patterns.

MGS Depth Latitude

Distance to Shore

TOC TN

Percent

Silt

Percent

Sand

Model 1 X X

Model 2

Model 3

Model 4

Model 5

Model 6

Model 7

Model 8

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

X

An additional 14 box core grab samples were collected from the Northwest

National Marine Renewable Energy Center’s (NNMREC’s) Pacific Marine Energy

Center South Energy Test Site (SETS; Figure 2.8) in August and again in October of

2013 (Pers. Comm. Henkel, Sarah, OSU 2014). We compared observations of species presence or absence with model results from the SETS region using models developed without SETS data. If the species was observed in a SETS sample, the model prediction at the same location was considered a true positive result if its suitability score was above

0.5, and a false negative result (Type II error) if its suitability score was below 0.5. If the species was absent from a SETS sample, the model prediction at the same location was considered a true negative result if its suitability score was below 0.5, and a false positive result (Type I error) if its suitability score was above 0.5. Using this information, performance metrics were calculated.

35

Figure 2.8. Study area of field validation

SETS sampling stations (black dots) indicate box core samples taken August and October

2013 and used in field validation. Grey dots indicated the closest sample locations sites,

Newport and Cape Perpetua used to train predictive models.

2.3.4 Map Creation

Three map products were created for each species describing habitat suitability, uncertainty, and experience for the prediction region. Habitat suitability maps provided regional predictions of suitable habitat given the regional environmental variables. We used a database of regional latitude, depth, MGS, and distance to shore with a 250-m resolution cell size to generate habitat suitability predictions. Depth and MGS regional rasters were developed using methods described by Goldfinger et al. (2014). The MGS raster, being a product of a spatial kriging analysis, had an associated mean square error

36 of 0.8 phi. This error was included with MGS values as input into the network when making predictions, allowing the error to be incorporated into the uncertainty of the final

HSP prediction. Regions of rock, cobble, and gravel were masked from the final predictive maps because the models were developed only for soft sediment species. For each sample location, the models calculated a probability of suitable habitat for each species.

Uncertainty maps communicated the posterior distribution of the habitat suitability prediction, which is a measure of certainty in the probability predictions. We created uncertainty maps by mapping the standard deviation of the expected value of the probabilities of habitat suitability based on 0 = fully unsuitable and 1 = fully suitable.

Experience maps described the percentage of the benthic macrofauna sampling dataset used to inform each unique state probability. We created these maps by accessing

Netica

®

experience tables, which report the number of cases used by the EM learning algorithm to parameterize each row of the CPT values in the model, or the equivalent sample size for each unique combination of environmental states. We used an arcPy Con

(Spatial Analyst) statement within Python to build an if-else evaluation on the multiple environmental rasters and the experience table. The code went through each line of the experience table and assigned an experience value to a raster cell if and only if the overlaying environmental raster values matched the values on the particular row of the experience table. Values were reported as a percentage of the overall sample size.

Experience values were associated with unique combinations of environmental parameters. High experience will occur outside sampling locations if environmental conditions mirror those of regions heavily sampled. Experience values will differ for each

species, as they are dependent on the unique environmental parameters important to the

37 species of interest.

2.4 Results

The following results are reported for each species: 1) model performance metrics; 2) HSP maps; 3) uncertainty maps; 4) experience maps and 5) SETS field validation results.

Axinopsida serricata (Carpenter, 1864) is a marine bivalve in the family

Thyasiridae within the Lucinoida order. A very ubiquitous species, it was seen in 83% of

samples. The final BN model (Figure 2.9) selected from the 4-fold cross validation

approach (Table 2.3) included depth, MGS, and latitude. The final model trained using all

data had a TSS score of 0.58, an SP score of 0.91, and an 11% error rate. Sensitivity analysis indicated that bivalve habitat suitability was most sensitive to MGS followed by

depth and latitude (Table 2.6).

Aystris gausapata (Gould, 1850) is a marine gastropod in the Columbellidae family within the Neogastropoda order. This species of snail was present in 43% of

samples. The final BN model (Figure 2.10) selected from the 4-fold cross validation

approach (Table 2.4) included MGS, depth, and distance to shore. The final model

trained using all data had a TSS score of 0.43, an SP score of 0.79, and a 29% error rate.

Sensitivity analysis indicated that snail habitat suitability was most sensitive to MGS,

followed by distance to shore and depth (Table 2.6).

Sternaspis fossor (Stimpson, 1854) is a polychaete in the family Sternaspidae within the Terebellidae order. This species of marine worm was present in 24% of

samples. The final BN model (Figure 2.11) selected from the 4-fold cross validation

approach (Table 2.5) included all variables except distance to shore. The final model

trained using all data had a TSS score of 0.89, an SP score of 0.96, and a 5% error rate.

38

Sensitivity analysis indicated that polychaete habitat suitability was most sensitive to

percent silt, followed by MGS, TOC, percent sand, TN, depth, and latitude (Table 2.6).

Latitude

39 to 42

42 to 44

44 to 49

18.8

34.9

46.3

44.2 ± 2.6

Depth

-50 to -20

-65 to -50

-85 to -65

-130 to -85

24.1

18.0

27.3

30.6

-72.1 ± 29

Mean Grain Size

-5 to 1.75

1.75 to 2.3

2.3 to 10.5

18.1

21.3

60.6

4.02 ± 3.8

Axinopsida serricata

Absent

Present

24.0

76.0

0.76 ± 0.43

Figure 2.9. Bayesian network of Axinopsida serricata

Dist2Shore

0 to 8500

8500 to 11500

11500 to 16000

16000 to 64999.2

50.0

21.6

20.2

8.26

10400 ± 11000

Depth

-50 to -20

-85 to -50

-130 to -85

24.1

45.3

30.6

-71.9 ± 29

MeanGSphi

-5 to 2.25

2.25 to 3.25

3.25 to 4

4 to 10.5

37.5

22.8

11.1

28.6

2.59 ± 3.9

39

Alia gausapata

Absent

Present

59.7

40.3

0.403 ± 0.49

Figure 2.10. Bayesian network of Aystris gausapata

40

41

Table 2.3. Axinopsida serricata model results.

A good performance score has a low confusion matrix error rate (Error Rate), high spherical payoff (SP) and True Skill Statistic (TSS). Simple models refer to ones without intermediate nodes while complex models refer to ones with intermediate nodes. The scores from the final selected model are boxed.

Model Test TSS SP Error Rate

Simple Complex Simple Complex Simple Complex

Model 1 4-fold cv 0.45 - 0.87 - 15 -

Model 2 4-fold cv 0.51

Model 3 4-fold cv 0.33

Model 4 4-fold cv 0.44

Model 5 4-fold cv 0.45

Model 6 4-fold cv 0.43

Model 7 4-fold cv 0.44

Model 8 4-fold cv 0.44

Model 1 sets 0.51

Model 2 sets

Model 3 sets

Model 4 sets

Model 5 sets

Model 6 sets

Model 7 sets

Model 8 sets

0.51

0.24

0.24

0.41

0.41

0.21

0.21

-

-

-

0.46

0.44

0.48

0.46

-

-

-

-

0.41

0.41

0.21

0.21

0.85

0.83

0.83

0.87

0.86

0.84

0.82

0.79

0.79

0.78

0.78

0.74

0.74

0.76

0.76

-

-

-

0.46

0.44

0.48

0.46

-

-

-

-

0.79

0.79

0.78

0.78

15

24

20

15

17

20

20

26

26

33

33

30

30

33

33

-

-

-

14

17

17

20

-

-

-

-

30

30

33

33

42

Table 2.4. Aystris gausapata model results.

A good performance score has a low confusion matrix error rate (Error Rate), high spherical payoff (SP) and True Skill Statistic (TSS). Simple models refer to ones without intermediate nodes while complex models refer to ones with intermediate nodes. The scores from the final selected model are boxed.

Model Test TSS SP Error Rate

Simple Complex Simple Complex Simple Complex

Model 1 4-fold cv 0.02

Model 2 4-fold cv 0.29

-

-

0.71

0.75

-

-

48

35

-

-

Model 3 4-fold cv 0.36

Model 4 4-fold cv 0.33

Model 5 4-fold cv 0.19

Model 6 4-fold cv 0.29

Model 7 4-fold cv 0.24

Model 8 4-fold cv 0.33

Model 1 sets 0.48

Model 2 sets 0.11

Model 3 sets 0.25

Model 4 sets

Model 5 sets

Model 6 sets

Model 7 sets

Model 8 sets

-

-

0.15

0.15

0.17

0.23

-

-

-

0.72

0.41

0.03

0.35

-

0.4

0.03

0.73

0.74

0.72

0.17 0.09 0.72

0.48 -0.05 0.69

0.74

0.71

0.74

0.71

0.75

0.77

0.74

0.74

-

-

0.7

0.73

0.7

0.74

-

-

-

-

0.74

0.71

0.68

0.67

30

31

39

35

37

33

26

33

41

48

30

44

37

48

-

-

43

43

40

37

-

-

-

-

30

48

44

52

43

Table 2.5. Sternaspis fossor model results.

A good performance score has a low confusion matrix error rate (Error Rate), high spherical payoff (SP) and True Skill Statistic (TSS). Simple models refer to ones without intermediate nodes while complex models refer to ones with intermediate nodes. Scores from final selected models are boxed.

Model Test TSS SP Error Rate

Simple Complex Simple Complex Simple Complex

Model 1 4-fold cv 0.65

Model 2 4-fold cv 0.67

Model 3 4-fold cv 0.56

Model 4 4-fold cv 0.63

Model 5 4-fold cv 0.65

-

-

-

-

0.58

0.91

0.91

0.89

0.91

0.91

-

-

-

-

0.9

11

11

15

12

11

-

-

-

-

12

Model 6 4-fold cv 0.64

Model 7 4-fold cv 0.56

Model 8 4-fold cv 0.66

0.72

0.56

0.62

0.91

0.9

0.92

0.91

0.89

0.9

12

15

12

10

14

13

Model 1 sets

Model 2 sets

Model 3 sets

Model 4 sets

Model 5 sets

Model 6 sets

Model 7 sets

Model 8 sets

1

1

1

1

1

1

1

1

-

-

-

-

1

1

1

1

1

1

1

1

1

1

1

1

-

-

-

-

1

1

1

1

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

0

44

Table 2.6. Sensitivity analysis results .

Percent variance reduction represents the degree to which each environmental variable is explained by the variation in the species posterior probability distributions. Variables are listed in the order of importance for each species.

Rank

1

2

3

4

5

6

7

Axinopsida serricata

Model 2

Variable Percent

MGS

Variance

Reduction

16.8

Depth

Latitude

12.4

0.1

Aystris gausapata

Variable Percent

MGS

Model 3

Distance to shore

Depth

Variance

Reduction

6.2

3.9

1.3

Sternaspis Fossor

Model 6

Variable Percent

Silt

Variance

Reduction

26.2

MGS

TOC

Sand

TN

Depth

Latitude

25

24.5

23.5

23.2

16

5.1

The HSP map of A. serricata

(Figure 2.12a) depicts high probability of suitable

habitat throughout most of the region with small pockets of moderately unsuitable habitat

found near shore in shallow regions. The uncertainty map (Figure 2.12b) reflects the

patterns in the HSP maps: regions of highly suitable habitat correspond with higher precision (low uncertainty) around the posterior probability whereas areas of the map with suitable habitat predicted around 0.5 correspond with regions of lower precision

(high uncertainty). Highest uncertainty occurs in the nearshore, shallow regions, particularly off the mouth of the Columbia River between the borders of Oregon and

Washington, a unique sedimentary environment which wasn’t sampled in this study. The

experience map (Figure 2.12c) indicates that the largest percentage of data informed

probabilities from the southern, deep region; followed by the shallow to mid-depth, midlatitude region; and with the least amount of data informing probabilities in the northern,

deep region, and in the shallow to mid-depth regions of the most northern and southern

45 extents.

Field validation analysis at the SETS site (Figure 2.12d) indicates

Axinopsida serricata was more frequently present at deeper stations and more frequently absent at shallow stations, following similar patterns to HSP predictions. Seven model predictions

were in error (Figure 2.12d) compared to 28 SETS observations: five Type I errors (BN

model predicted presence but the species was absent in SETS sample data) and two Type

II errors (BN model predicted absence but the species was present in SETS sample data).

Three of the five Type I errors occurred at stations that sampled species presence one season and species absence another season. This temporal difference in sampling effort may indicate a region that has an even probability of suitable habitat, but further temporal sampling would need to occur in order to verify this trend. The remaining model prediction errors occurred on the boundary between suitable and unsuitable habitat.

46

Figure 2.12. Axinopsida serricata

Habitat Suitability (a) ranges from 0 (blue: very low probability of suitable habitat) to 0.5 (yellow: unknown or even probability) to 1 (red: very high probability of suitable habitat). Uncertainty (b) ranges from 0 (light: high precision and low uncertainty in probability estimate) to 0.5

(dark: low precision and high uncertainty in probability estimate). Experience (c) ranges from 1 (light: high percent of data inform probabilities) to 0 (dark: low percentage of data inform probabilities). Field validation (d) of observed species presence and absence compared to HSP model predictions.

47

The HSP map of A. gausapata

(Figure 2.13a) depicts areas of low probabilities of

suitable habitat in deeper offshore areas. In addition, large regions are classified as having weak probability predictions (HSP scores between 0.4-0.6) with areas of approximately even probabilities of unsuitable and suitable habitat, including the region

off the Columbia River mouth. The uncertainty map (Figure 2.13b) expresses low

precision (high uncertainty) throughout most of the region. Lowest uncertainty values are found in the deeper, offshore areas, where the species is predicted to be absent. The

experience map (Figure 2.13c) indicates small pockets in the shallow, mid latitude

regions where probabilities were informed by a higher percentage of data. In general, the majority of predictions were informed by little to no data.

Field validation analysis at the SETS site (Figure 2.13d) indicates

Aystris gausapata was more likely present at deeper stations and more likely absent at shallower stations, although there was slightly greater error between observations and habitat suitability patterns than with the other species. Eleven model predictions were in error

(Figure 2.13d) compared to 28 SETS observations: three Type I errors (BN model

predicted presence but the species was absent in SETS sample data); and eight Type II errors (BN model predicted absence but the species was present in SETS sample data).

Six of the Type II errors occurred in areas predicted to be somewhat unsuitable (HSP score ~ 0.47). The remaining two Type II errors occurred in areas predicted to be moderately unsuitable (HSP scores: 0.2-0.4). One Type I error occurred in an area predicted to be somewhat suitable (HSP ~ 0.59), the other two occurred in areas predicted to be moderately suitable (HSP scores: 0.6-0.8).

Overall, metric results from cross validation and field validation (Table 2.4)

indicate this model to have poorer performance than other species modeled.

Corresponding maps help to communicate this error, uncertainty, and lack of data informing the model.

48

49

Figure 2.13. Aystris gausapata

Habitat Suitability (a) ranges from 0 (blue: very low probability of suitable habitat) to 0.5 (yellow: unknown or even probability) to 1 (red: very high probability of suitable habitat). Uncertainty

(b) ranges from 0 (light: high precision and low uncertainty in probability estimate) to 0.5

(dark: low precision and high uncertainty in probability estimate). Experience (c) ranges from 1 (light: high percent of data inform probabilities) to 0 (dark: low percentage of data inform probabilities). Field validation

(d) of observed species presence and absence compared to HSP model predictions.

50

The HSP map of S. fossor

(Figure 2.14a) depicts high probability of unsuitable

habitat throughout shallow, sandy regions, transitioning to suitable habitat in deeper, silty

regions. The uncertainty map (Figure 2.14b) follows the patterns of the HSP maps:

regions of highly suitable or highly unsuitable habitat correspond with higher precision

(low uncertainty) around the posterior probability whereas areas of the map with intermediate probabilities of suitable habitat correspond with regions of lower precision

(high uncertainty). Areas of 0.5 probabilities occur in the southern, nearshore regions and in northern deeper regions. The nearshore environment in the south expresses unique, unsampled conditions, where the shelf drops off steeply close to shore. The experience

map (Figure 2.14c) indicates that the largest percentage of data informed probabilities

associated with the mid-latitude, shallow, sandy habitat regions, and much of the remaining area corresponded with little to no experience. Probabilities of suitable habitat were informed by a lower percentage of data. This is likely because this less sampled species occupies a more specialized niche, representing an under-sampled combination of habitat parameters within this study.

Field validation analysis at the SETS site (Figure 2.14d) indicates

Sternaspis fossor was found to be absent throughout, corresponding with a consistent prediction of

absence throughout the area. Therefore, no error was observed (Figure 2.14d) between

SETS sample observations and model predictions. Overall, metric results from cross

validation and field validation (Table 2.5) indicate this model to have good performance.

51

Figure 2.14. Sternaspis fossor

Habitat Suitability (a) ranges from 0 (blue: very low probability of suitable habitat) to 0.5 (yellow: unknown or even probability) to 1 (red: very high probability of suitable habitat). Uncertainty (b) ranges from 0 (light: high precision and low uncertainty in probability estimate) to 0.5

(dark: low precision and high uncertainty in probability estimate). Experience (c) ranges from 1 (light: high percent of data inform probabilities) to 0 (dark: low percentage of data inform probabilities). Field validation (d) of observed species presence and absence compared to HSP model predictions.

52

2.5 Discussion

HSP maps reflect static probability predictions of suitable habitat for benthic macrofauna species given regional raster information. Raster data used to calculate regional probabilities include depth, mean grain size, distance to shore, and latitude. A habitat suitability probability of 0.5 can mean either no data were available to update the prior probability of 0.5; or a posterior probability calculated from case data resulting in the value. Using the experience map can help differentiate between the two possibilities, the implications of which can be quite different, representing either knowledge absence or presence.

HSP models within this report represent likely suitable habitat for benthic macrofauna species and not species abundance. These models are limited by knowledge of individual species-environment associations. Models are predominantly learned from species relationships to geomorphic variables that are easily measured on a regional scale and therefore do not capture variability associated with biological, chemical, or demographic dynamics. Improved knowledge of species-environment associations to geomorphic, biological, and chemical features of the landscape can be inserted into the

BN, allowing for future model improvements. Using species counts, representing species abundance or ordinal habitat preferences from a wider sampling effort could enhance confidence in predictions.

The visual representation of uncertainty improves interpretability of habitat suitability maps. Low uncertainty means greater confidence in the prediction, given the selected, validated model. High uncertainty means, in the case of these binary-outcome

53 models, to consult the associated experience maps to determine how much confidence to place in the outcome given the amount of supporting data.

Experience maps are a novel product that communicates the percentage of data informing probabilities in the model. These maps suggest the degree of regional confidence in predictions arising from the sampling effort, or equivalent sampling size, used to build the model. Experience values do not necessarily mirror uncertainty values.

A region can be high in experience and low in precision if a large percentage of data were collected from somewhat suitable habitat. In contrast a region can have high precision and low experience if a small percentage of data were collected from highly suitable habitat. As an example note the off-shore area of central Oregon where results on

Axinopsida serricata suggested high habitat suitability, high precision, yet very low

experience (Figure 2.12).

Due to properties listed above, experience maps also provide a novel interpretation in that they can be used to identify regions to target for future sampling effort. Larger sample size over both time and space leads to better habitat suitability predictions. However, this is a costly process in the marine environment, and therefore directed sampling effort can maximize the increase in information while minimizing the cost. Experience maps can be used to recommend future directed sampling by highlighting spatial regions low in experience for a given species. Species within this study will be used as an example for how the experience maps can inform future sampling effort.

For example, the Aystris gausapata HSP model was based on low experience throughout the study area. This is likely because this species had weak ecological

responses to the environmental variables used in the modeling effort. Models for this species would benefit from increased sampling effort throughout the region. Models for this species could potentially also benefit by identifying new environmental predictor

54 variables, which may improve habitat suitability predictions. The Sternaspis fossor HSP model was based on higher experience in the mid-latitude, shallow, sandy, region of the study area yet had higher HSP scores in deeper, silty environments which were also associated with lower experience. Therefore, more samples within deeper regions would likely increase information about this species. The Axinopsida serricata HSP model was based on higher experience in offshore, deeper water in the south compared with the north. While this species was highly prevalent throughout the region, model predictions would be improved by increasing sampling effort in the northern, offshore region.

In addition to increasing the number of samples, efforts to improve model confidence should endeavor to uniformly sample across consistent depth ranges across the latitudinal gradient. The lack of uniform sampling across the latitudinal gradient resulted in minimal information of this variable, necessitating a simple discretization structure. Such simplification may not have captured biologically significant latitude patterns. Further, the simple discretization structure resulted in boundary artifacts, which can be noted in Axinopsida serricata

’s experience and uncertainty maps around the 44 th

parallel (Figure 2.12b, c).

Due to the lack of uniformity in sampling effort, under-sampled regions in this study include the shallowest and deepest extents, the southern portion of the study site where deeper, siltier environments are found closer to shore, and the northern region surrounding the output of the Columbia River. The Columbia River plume is a large

driver in sedimentary patterns in the near shore environment and can influence benthic

55 conditions across a broad latitudinal range. A better understanding of this system will improve model predictions, which were predominantly reported for this part of the study area as somewhat suitable probabilities, high uncertainty, and low experience for all species.

Finally, an opportunity arose during the course of this study to conduct a field validation at the South Energy Test Sites (SETS), which is located in an area planned to test offshore renewable energy devices. While the data from the SETS site provided an opportunity to compare model predictions with a new dataset, the spatial extent of the

SETS site compared to the overall region of prediction, and the small sample size, prevented the data from being used exclusively to select models. Rather, model validation with the SETS data were used to confirm results from the 4-fold cross validation approach and to aid in selecting between two models with close results. Experience maps indicate that modeling results for this region have a moderate level of experience with environmental features important for Aystris gausapata and Axinopsida serricata , and high levels with habitat features important for Sternaspis fossor. To improve the models, new sampling sites should be identified to maximize geographic coverage and unique environmental areas. Additional field validation, over both time and space, with subsequent model updating will improve estimations of model prediction success.

Habitat suitability modeling is a process that converts statistical speciesenvironmental associations to spatial maps. As the process is an approximation, the development of multiple maps of prediction probabilities, uncertainty, and experience to convey statistical confidence and underlying effort will improve interpretation of habitat

56 suitability maps. As maps are frequently used to make management decisions in regards to conservation issues, understanding uncertainty and limitations underlying map products will lead to a better informed decisions and strategies.

57

Chapter 3.

Habitat suitability of the dictyonine glass sponge assemblage using Bayesian networks

3.1 Literature Review

Hexactinellids, or glass sponges, are a unique group of sessile, filter-feeding

Porifera whose skeletons are silica based (Leys & Lauzon 1998). Dictyonine sponges are an assemblage of hexactinellid species with siliceous skeletons that do not dissolve post mortem (Krautter, Conway & Barrie 2006), allowing for the development of sponge reefs. Under the proper conditions, sponge reefs trap fine sediment, eventually burying the underlying rock surface; this allows for the development of stable reef complexes, considered important nursery habitat for rockfish (Krautter, Conway & Barrie 2006;

Cook, Conway & Burd 2008; Stone et al.

2013). As these complex reef systems, built from fragile, long-lived organisms, are typically found in deep water habitat, they are known to come into contact with, and be susceptible to bottom trawling fisheries

(Conway et al.

2001; Jamieson & Chew 2002; Austin et al.

2007).

This chapter develops a habitat suitability model of the three most dominant species of dictyonine sponges: Aphrocallistes vastus , Heterochone calyx and Farrea occa

(Figure 3.1). Models are built using data from National Oceanic and Atmospheric

Administration’s (NOAA) bottom trawl surveys within continental US waters, conducted by Alaska Fisheries Science Center (AFSC) from 1975 to 2004. Due to the longevity of the species under study, data from the entire time series were used to build the models.

The three species will be treated as one modeling unit as they can be difficult to tell apart visually and are believed to share similar habitat characteristics (Pers. Comm.

Stone, Robert, AFSC 2014). These three species are known to form sponge reef

complexes and have been previously described in the northeastern Pacific from British

58

Columbia, Canada to Juneau, Alaska, U.S. (Stone et al.

2013). South of British

Columbia, in the region which this study will focus, species have been observed in more isolated communities known as sponge grounds. This expression of habitat is less complex than sponge reef ecosystems and is strongly associated with hard bottom substrate and high current where conditions are not optimal for reef formation (Pers.

Comm. Stone, Robert, AFSC 2014). Nevertheless, sponge grounds may still provide structure for benthic organisms co-habiting on sponge grounds.

Figure 3.1. Dictyonine glass sponge species.

Image from AUV transect survey along Sponge Reef site off Washington State. Photo credit:

NOAA 2010.

As dictyonine sponges are filter feeders, they are highly susceptible to sediment smothering, and therefore cannot survive in regions of high wave energy and high sedimentation rates (Whitney et al.

2005). This property limits their depth range from

59 shallow regions affected by wave turbulence, typically shallower than 50m. Reef complexes north of British Columbia have been found in glacial sediments, moraines, and glacial promontories in regions of low sedimentation (Conway et al.

2004; Krautter,

Conway & Barrie 2006) in water depths of 50-550m where bottom currents are strong

(Conway, Barrie & Krautter 2005; Yahel et al.

2007; Cook, Conway & Burd 2008).

Reefs are typically found in linear patterns along ridges (Conway et al.

2007) and near the heads of shelf canyons (Whitney et al.

2005). Bottom water conditions near and around reef complexes are reported to have the following oceanographic conditions: 43-

75 μM silicate, 64-152 μM dissolved oxygen, 5.5-7.3 degrees Celsius, and 33.2 – 34.2 salinity (Whitney et al.

2005). Glass sponges grow at slow rates (~ 1-2 cm yr), therefore, silica uptake remains lower than measured silica background levels of sponge reef sites off British Colombia (Yahel et al.

2007). Considering silica levels are generally higher in the North Pacific, this variable may not be a limiting factor. We did not include this variable in our initial analysis as a regional silica bottom concentration map had yet to be identified.

A preliminary spatial examination of glass sponge observations within the study area highlighted a hotspot of dictyonine species in a geographic region, previously described as soft sediment habitat. Geologically defined as the “Newport embayment,” this region was previously believed to be devoid of underlying surface structure that would create hard bottom substrate (Pers. Comm. Goldfinger, Chris, OSU 2014). While dictyonine sponges require hard substrate for attachment, substrate surfaces do not need to be large or complex. Several hypotheses were generated to explain the possible source of hard material into this sedimentary basin: 1. carbonate seepage from the seafloor could

60 form a hard layer for sponge attachment; 2. local and regional landslides could deliver hard material from exposed bedrock up to 5 km from the failure zone; 3. underlying faults could extend to the surface, exposing bedrock; 4. paleo-channel systems could bring gravel into deep water sedimentary basins from drainages active during low-stand conditions; and 5. human debris could provide a hard surface for sponge attachment.

Further, localized depressions on the seafloor could allow for the accumulation of hard material moved from the source by gravity or current.

3.2 Materials and Methods

3.2.1 Dictyonine sponge data

I compiled dictyonine sponge records from NOAA’s coral and sponge database

(NOAA 2013). Records represented the total weight and the highest level of taxonomic classification for organisms collected during each NOAA bottom trawl survey from 1975 to 2011. NOAA bottom trawl surveys within the continental US waters were conducted by Alaska Fisheries Science Center (AFSC) from 1975 to 2004, at which time

Hexactinellids were identified to species. After this time period, Northwest Fisheries

Science Center began conducting the annual surveys and dictyonine species were recorded as unidentified Hexactinellids. In order to maintain high taxonomic resolution in the models, I only used dictyonine species records collected from AFSC.

I obtained additional records from AFSC containing all bottom trawl surveys regardless of sponge and coral observations. I only included records from surveys that used four seam trawl gear with footrope (developed for hard bottom). Such gear is

effective in collecting Hexactinellid species when they are present so that bottom trawl

61 surveys without dictyonine sponge observations could be used as absence records (Pers.

Comm. Stone, Robert, AFSC 2014; Pers Comm. Rooper, Chris, AFSC 2014). While the absent records within this chapter could not be confirmed with the same level of confidence as absent records from Chapter 2, they still provide a greater confidence in absence than would allow from the generation of pseudo-absence values.

I eliminated any unidentified Porifera from the analysis to ensure that no dictyonine species was counted as an absence. I further eliminated any records with poor performance scores indicating gear damage, gear conflict, or gear performance problems; as gear problems could potentially impact the ability to detect presence or absence of dictyonine sponges. Finally, I limited records to the Washington and Oregon outer coast including the Northern portion of California waters down to Mendocino ridge as this region represents the area for which we have developed 100m resolution environmental

raster layers (Figure 3.2). The final database for modeling consisted of 3011 records

collected from 1989 to 2004, of which 334 had observations of dictyonine species.

Each bottom trawl record contained a start and stop coordinate for the ship’s position. I used the wire length of the tow and the water depth to calculate a corrected position for the net on the bottom. I created polyline features in ArcGIS representing the path of the net for each bottom trawl survey. The average length of bottom trawls was 2.6 km with a standard deviation of 0.6 km. I buffered this polyline based on the depth of the water to account for net drift that can occur up to 45 degrees from the line of travel. The average buffered distance from the trawl line was 18 m with a standard deviation of 5.8 m. Buffers and corrections did not account for any deviations from ship course.

Figure 3.2. Dictyonine Sponge Study Area

Northern bound defined by US border (48° 30’ N).

Southern bound defined by Mendocino Ridge

(40° 29’ 49.108” N). Eastern bound defined by the shoreline and western bound defined by the deepest bottom trawl survey (-1280 meters).

62

63

3.2.2 Environmental data

I developed a simple influence diagram representing three variables stated by the literature as being potential drivers for dictyonine sponge distribution within the continental US waters: depth, availability of hard substrate, and high bottom current

(Figure 3.3). In order to successfully predict habitat suitability throughout the region,

environmental layers were identified to act as proxies for ecological drivers (excluding depth). A probability of rock outcrop map (Goldfinger et al.

2014) was used as a proxy for regional hard substrate patterns. Regional, high spatial and temporal resolution bottom current maps could not be identified for the time period of the sampling data.

Previous research recommends bathymetric derivatives measuring the seafloor’s rugosity

(e.g. standard deviation of depth (SDD), slope, standard deviation of slope (SDS) and bathymetric position index (BPI)) as these measures are highly associated with bottom current patterns (Wilson et al.

2007; Dunn & Halpin 2009; Ierodiaconou et al.

2011).

Previous habitat suitability modeling of benthic organisms have relied on rugosity measures of the seafloor as proxies for bottom current (Rattray et al.

2009; Guinotte &

Davies 2014).

64

Figure 3.3. Dictyonine glass sponge influence diagram.

This diagram illustrates key environmental drivers believed to influence dictyonine glass sponge presence.

I calculated the rugosity measures: slope, SDD, SDS, and BPI; for a range of neighborhoods (scales) using a moving window in ArcGIS 10.1. Neighborhoods ranged from local (300m) to broad (20 km) scales. I selected the best performing neighborhood for each measure by comparing AIC scores from a logistic regression model (binomial link) of each single variable compared to presence or absence of dictyonine species. I selected two neighborhoods for the SDD and slope variables as there was strong performance at both the local and broad scales. I also created another bathymetric derivative, flow drop, using ArcGIS Spatial Analyst Hydrology tool. This variable is measured as a percentage of the ratio of maximum change in elevation from each cell along the direction of flow and is an output of the flow accumulation tool, a bottom

current proxy recommended by Dunn and Halpin (2009). Table 3.1 lists the final

variables selected for modeling dictyonine sponges.

65

Table 3.1. Environmental variables.

Environmental variables were chosen for the dictyonine glass sponge model to represent key environmental drivers. Depth is known to influence sponge distribution patterns. The probability of rock outcrop is a proxy for hard bottom substrate patterns. Slope, flow drop, the standard deviation of depth, the standard deviation of slope, and bathymetric position index are all proxies for bottom current patterns.

Variables

Depth

Probability of rock outcrop

Slope

Flow drop

Standard deviation of depth

Standard deviation of slope

Bathymetric position index

Neighborhood

100m

500m

100m, 13 km

100m

300m, 10km

500 m

2 km

Resolution

100m

500m

100m

100m

100m

100m

100m

Discretization

Expert (180, 300, 500)

Expert (0.333, 0.5)

Expert (5, 10)

Equal Frequency (4, 5, 6 state)

Equal Frequency (4, 5, 6 state)

Expert (5)

Equal Frequency (4, 5, 6 state)

I used ArcGIS 10.1 to convert each benthic trawl segment into a 2m point grid and extracted values for each environmental variable that intersected with each grid point. I then dissolved each 2m benthic trawl point grid back to its original segment, which converged the extracted environmental variables into mean and standard deviation values. The results were presence and absence of dictyonine species for each trawl segment and the mean and standard deviation of each environmental raster layer that intersected with that trawl segment.

3.3.3 Model Development

Variables were discretized based on expert knowledge or equal frequency methods when expert knowledge was unavailable. The depth variable was discretized based on knowledge of significant breakpoints along the continental shelf and slope.

Breakpoints were identified at 180 and 300 meters in depth as these values represent the range at which the continental slope of Washington and Oregon begins (Kulm & Fowler

66

1974). An additional breakpoint was added at 500m to signify the depth at which several basins along the shelf drop off to the middle continental slope. Slope and the standard deviation of slope were discretized at 5 degrees, a significant angle that differentiates between flat and rising surfaces (Weiss 2001). An additional discretization break for slope was included at 10 degrees to differentiate between low and high slopes as this value was found to separate sediment slopes at their angle of repose, from slopes almost certain to have exposed rock (Romsos 2006). This empirical value is likely to be influenced not only by the angle of repose of the sediment cover, but also the frequent seismic shaking expected in this subduction setting (Goldfinger et al.

2012). The probability of rock outcrop variable was discretized at 0.5, the midpoint between high and low probability of rock outcrop. An additional breakpoint was included at 0.333 as the majority of sponge records had probability of rock outcrop values below this breakpoint. The remaining variables, flow drop, standard deviation of depth, and bathymetric position index, were discretized into 4-state, 5-state, and 6-state bins using the equal intervals method.

Model variables were predominantly proxy derivatives of bathymetric data, and therefore, were likely to be correlated. As the proxy variables were not direct environmental cues for species, an expertly driven network structure was not realistic.

Due to this dearth of true expert knowledge about how the covariates might influence glass sponge presence or absence, the tree-augmented naïve (TAN) algorithm (Friedman,

Geiger & Goldszmidt 1997) was used to learn the model structure (specifically the correlative relationship of the network) directly from the data. This algorithm, however, has the disadvantage of limiting the linkages between nodes to two in order to minimize

67 model complexity. Therefore, it was possible that additional correlations between covariates existed that were unaccounted for using the TAN algorithm. CPTs of the BN were populated using the EM algorithm (Dempster, Laird & Rubin 1977). Environmental values were entered into the network using Netica

® ’s uncertainty format: mean +- standard deviation. This format accounted for the spatial uncertainty inherent in the benthic trawl sampling dataset.

A preliminary model selection process was performed on the bathymetric derivative variables to narrow the selection down to the most significant proxies. All possible combinations of local slope, broad slope, flow drop (4, 5, and 6 state nodes), local SDD (4, 5, and 6 state nodes), broad SDD (4, 5, and 6 state nodes), SDS (4, 5, and 6 state nodes), and BPI (4, 5, and 6 state nodes) were compared with presence and absence of dictyonine glass sponges within Netica

®

using a TAN structure and EM learning algorithm. Model results were compared by using TSS and false negative error rate as models were more likely to under predict species presence than over predict. This initial model selection excluded the variables, SDS and BPI from the final model selection process.

Final models were built comparing all possible combinations of depth, probability of rock outcrop, local slope, broad slope, flow drop (5 state node), local SDD (5 and 6 state nodes), and broad SDD (4 and 5 state nodes). Depth and probability of rock outcrop were included in each model as well as at least one bottom current proxy. Final models were compared using TSS and false negative error rate. Sensitivity analysis was performed on the final network to determine the degree to which each environmental

68 variable explained the variation in the species posterior probability distributions (Marcot

2012).

Habitat suitability and uncertainty maps were developed according to the methods outlined in Chapter 2. Final predictive maps were created at 100m resolution. Experience maps were not developed due to limitations of the TAN Bayes network structure. The networks of the macrofauna models within Chapter 2 were developed using an expert process and linkages pointed from the environmental covariates to the species node.

Subsequently, the species node contained an experience table listing an experience value for every possible combination of environmental covariate states. However, the TAN algorithm designed the network with the linkages pointing from the species node to the environmental covariates. Within this network structure, the experience table of the species node only listed the total number of records used to learn the network (n = 3011).

The methods developed in Chapter 2 to create the experience map, therefore, could not be used to develop one from the glass sponge model.

3.3 Results

The following results are reported for the dictyonine sponge assemblage: 1) model performance metrics; 2) HSP maps; and 3) uncertainty maps.

The dictyonine sponge assemblage was rare throughout the region and was seen

in 11% of bottom trawl surveys. The final BN model (Figure 3.4) selected included

depth, probability of rock outcrop, both local and broad standard deviation of depth variables, and flow drop. The final model had the highest TSS score of 0.34 and the second lowest false negative error rate of 0.62. Total error for the final model was 11%.

69

Two other models had false negative error rates of 0.61, but lower TSS scores of 0.33.

The latter two models also included more variables, so the simpler model with the slightly higher false negative error rate was selected as the final model. Sensitivity analysis indicated that dictyonine glass sponges were most sensitive to depth followed by

local SDD, probability of rock outcrop, flow drop, and broad SDD (Table 3.2).

Table 3.2. Sensitivity analysis results .

Percent variance reduction represents the degree to which each environmental variable is explained by the variation in the species posterior probability distributions. Variables are listed in the order of importance for dictyonine sponges.

Rank Variable

1 Depth

2 SDD, local

3 Prob. Rock

4 Flow Drop

5 SDD, broad

Percent

Variance

Reduction

7.8

4.3

3.9

3.7

2.1

The HSP map (Figure 3.5a) of the dictyonine sponge assemblage depicted low

probability of suitable habitat throughout most of the region with small pockets of highly suitable habitat found around known ridges and canyons. In addition, a region of highly suitable habitat was predicted within the “Newport embayment.” Isolated cells of high suitability predictions were scattered throughout the continental shelf. These predictions were likely in error due to the attenuation of noise error in bathymetric derivatives. The

uncertainty map (Figure 3.5b) reflected moderate uncertainty throughout the region (μ =

0.17 +- 0.04), likely due to the combination of more uniform sampling effort and

70 consistent spatial uncertainty inherent in the bottom trawl dataset. Slightly higher regions of uncertainty were found along the deeper regions of the continental slope.

71

(a)

72

(b)

Figure 3.5. Dictyonine sponge assemblage.

Habitat Suitability (a) ranges from 0 (blue: very low probability of suitable habitat) to 0.5

(yellow: unknown/even probability) to 1 (red: very high probability of suitable habitat).

Uncertainty (b) ranges from 0 (light: high precision and low uncertainty in probability estimate) to 0.5 (dark: low precision and high uncertainty in probability estimate).

73

3.4 Discussion

Bottom trawl surveys from the “Newport embayment” observed dictyonine

sponges at weights greater than 50 kg (Figure 3.6), indicating significant abundance of

organisms from this region. While the HSP analysis did not attempt to test the different hypotheses for potential sources of hard substrate within this region, model results indicate support for the possible mechanism that local depressions may provide a site for hard material accumulation. Bottom current proxies, measures of bathymetric rugosity, can highlight regions of local depressions. Model results from the “Newport embayment” indicate high HSP values in regions considered to be locally flat depressions within the basin. Therefore, the bottom current proxies, in addition to estimating areas of high bottom current, could also be providing additional cues into the location of hard substrate within the sedimentary basins found within the depth ranges of 300-500m.

The existence of dictyonine sponges within the “Newport embayment” highlights the scale limitations of current substrate maps; biology is likely responding to a scale finer than that of our current surficial geological habitat maps. Further evidence for this mis-match in scale hypothesis is reflected by the fact that most dictyonine sponge observations were associated with probability of rock values less than 0.5, suggesting an under-prediction of hard substrate throughout the region.

(a) (b)

74

Figure 3.6 Newport embayment offshore of Oregon

Figure 3.6a illustrates dictyonine sponge presence bottom trawl surveys overlaid on the

Surficial Geologic Habitat map, Version 4 (Goldfinger et al.

2014). Figure 3.6b illustrates dictyonine glass sponge presence bottom trawl surveys overlaid on HSP model results.

Dictyonine sponge observations are predominantly found in areas depicted as mud.

Bottom trawl segments are included for surveys that observed dictyonine sponges and are categorized into less than 5 kg (indicating a single organism retrieved from the trawl survey), 5-50 kg, and greater than 50 kg, indicating significant abundance encountered by the trawl survey.

In the Klamath basin, another site with dictyonine sponge occurrence, the basin is underlain by an extensive low angle normal fault, and the basin itself is an extensional basin, unusual for a compressional subduction setting. The extension of the cover sequence in the basin is generating numerous normal faults that may act as fluid conduits, and in turn generate carbonate hard substrates associated with venting. Further exploration in the Newport embayment and Klamath basin are necessary to confirm the

75 presence of dictyonine species and determine fine scale substrate patterns that may provide hard material for attachment. Such information can also be informative in testing the different hypotheses potentially explaining the source of hard substrate within these sedimentary basins: carbonate seepage, landslide, fault, paleo-channel systems, or human debris.

This work represents a first attempt to model the dictyonine glass sponge species assemblage. The current model is likely over predicting habitat suitability along the shallow continental shelf due to the attenuation of noise error in bathymetric derivatives, and under predicting habitat suitability along the continental slope (as evidenced by the high false negative error rate). Bathymetric noise level is high in many areas due to the lack of modern multibeam data, and the relatively sparse sounding data or older single beam data that is available. Other sources of noise are the overlapping swaths of multibeam data that may or may not have had good sound velocity control, creating vertical artifacts along track.

Model improvement is needed prior to its use for management purposes.

Recommendations to improve the model include: gathering of spatially explicit dictyonine sponge observations (such as from ROV and AUV footage) with direct environmental drivers (such as bottom current, silica concentrations, and other possible oceanographic properties); further exploration of the Newport embayment to better understand the processes producing and accumulating hard bottom surfaces in the region; and refining model probabilities by communicating further with dictyonine sponge experts. After the dictyonine sponge model is refined, the model should undergo a more rigorous model performance testing, review, and validation phase.

76

Chapter 4.

Conclusions

4.1 Future Recommendations

Bayesian networks provide a robust analytical tool for modeling benthic invertebrate habitat suitability. Performance metrics presented in this thesis indicate that models performed better than would be expected from a random model (error rates =

50%, TSS score = 0, spherical payoff score = 0.5) or from an erroneous model (error rates > 50%, TSS < 0, spherical payoff < 0.5). Multicollinearity was seen among predictor variables (Pearson’s correlation > 0.7), indicating the need for a modeling tool that can incorporate high correlation.

Map products from Chapter 2 are available online

(http://bhc.coas.oregonstate.edu/benthic/) and can be downloaded for marine spatial planning purposes. The data format for model output is ESRI floating point grid format

(valid HSP range: 0-1; valid uncertainty range: 0-0.5; valid experience range: 0-100).

Users may interact with the model outputs either through the online viewer or by adding the dataset directly to a desktop application (e.g. ArcGIS, Matlab and R) from a local copy or live map service. In this way the user has various mechanisms to apply model outputs to activities related to marine spatial planning.

Overall results from models presented in this thesis indicate that when appropriate modeling techniques are used, it is possible to model habitat suitability distribution of benthic invertebrate species. Uncertainty and experience maps do illustrate, however, that improvements can be made to models and maps by the acquisition of more data and

77 knowledge of benthic mega and macrofauna. This section attempts to list several recommendations for future studies.

4.1.1 Improve Environmental Data

The success of creating a habitat suitability map of any species depends on the resolution and quantity of regional environmental data from which regional predictions can be made. It must be noted that regional environmental data must either represent ecological drivers of species distribution or proxies to ecological drivers (Franklin 2009).

For example, if a species distribution is primarily driven by fine scale patterns of pH on the ocean floor and no regional raster of bottom pH concentration levels exist and is not related to any environmental factor that can be measured, then regional prediction is not possible.

Improvements to environmental data can be made on two fronts: increase ocean floor coverage of direct environmental drivers and identify new regional scale environmental proxies. Models will be enhanced by a finer scale description of direct ecological driver patterns, such as available organic matter, grain size, hard bottom substrata, and percent silt or sand. For example in Chapter 2, benthic macrofauna may be responding to finer scale processes in grain size than were captured by the 250 meter resolution cell size; in Chapter 3, dictyonine sponge species are likely attaching to hard substrate surfaces that exist on a scale smaller than what is currently captured by geological surficial maps.

The collection of other variables not used in these models, which may be important to mega and macrofauna, is recommended. These variables may include

78 biological (e.g., larval disbursement and recruitment patterns, species trophic interactions and food supply, etc.), chemical (e.g., dissolved oxygen, pH and silica concentrations, etc.) and physical drivers (e.g., bottom temperature, bottom currents and topological habitats, etc.).

Identifying new environmental proxies, which are strongly correlated with direct, ecological drivers, is also important. Proxies, which can be measured on a regional scale with little effort (e.g., remotely sensed data), can improve model performance at little cost. For example, in lieu of a high resolution spatial and temporal bottom current data, isolating the best regional predictor of such information could enhance the prediction of sedentary or sessile, filter feeding species that rely on bottom currents to carry in their food supply.

4.1.2 Increase Spatially Explicit Sampling Effort

Larger sample size over both time and space inevitably leads to better habitat suitability predictions. Samples identifying species-environmental associations should endeavor to be as spatially accurate as feasible. Increased sampling effort should aim at collecting environmental information and species presence and absence data in situ .

However, this is a costly process in the marine environment, and therefore directed sampling effort can maximize the increase in information while minimizing the cost.

Experience maps, detailed for the first time in this study, can be used to recommend future directed sampling by highlighting spatial regions which are low in experience for a given species.

79

In addition to increasing the number of samples, efforts should endeavor to uniformly sample across consistent depth ranges across the latitudinal gradient. As an example in Chapter 2, the lack of uniformity in sampling effort resulted in under sampled categories of environmental conditions. Under sampled regions included the shallowest and deepest extents, the southern portion of the study site where deeper, siltier environments was found closer to shore, and the northern region surrounding the output of the Columbia River. The Columbia River plume is a large driver in sedimentary patterns in the near shore environment and can influence a broad latitudinal range, where plume effects can sometimes be seen as far south as Newport. A better understanding of this unique system will improve model predictions.

Improved taxonomic resolution of identified species will lead to larger datasets available for modeling individual species. Current sponge specimen retrieved from bottom trawl effort are not identified to high enough taxonomic resolution to warrant the data useable for habitat suitability analysis. The models developed in Chapter 3 relied on older data when species were identified to higher taxonomic orders. Reinstating a more rigorous identification of sponge and coral species retrieved from bottom trawls will improve future modeling efforts.

Finally, future sampling effort will improve the temporal understanding of species ecological distributions. Hierarchical Bayesian modeling (e.g. Win Bugs) is capable of handling temporal data and can provide information in regards to both temporal and spatial variability. Acquiring demographic data on benthic invertebrate species will allow for a more thorough understanding of species-specific habitat requirements.

4.1.3 Accessibility of Model Predictions

Collaboration and data sharing decreases the overall costs placed on any individual organization. Pooling resources and publishing map products on portals improves map accessibility for marine spatial planning purposes. Examples of data portals include, but are not limited to, the Marine Cadastre (www.marinecadastre.gov), data.gov, and PaCOOS (efh-catalog.coas.oregonstate.edu/overview/). Providing managers with access to models within the framework of decision support tools will improve managers’ ability to use models in marine management and spatial planning.

80

4.2 Implications for Marine Resource Management

Marine managers are often tasked with coordinating multiple uses of ocean space across the seascape. In assessing trade-offs between multiple social and ecological interests, managers often seek out tangible tools to facilitate the planning process. Marine spatial planning, a tool operating under the philosophy of marine ecosystem-based management (MEBM), offers guidelines and procedures for analyzing social and ecological tradeoffs across a spatial landscape (Ehler & Douvere 2009; Foley et al.

2010). This tool often employs maps to visualize and denote boundaries for the use of marine space (Shucksmith & Kelly 2014).

Maps based on statistical data, such as habitat suitability maps, often report average values and do not communicate the statistical uncertainty inherent in the models used to design the maps (Rocchini et al.

2011). This uncertainty is not uniform across the landscape and additional uncertainty maps can assist in HSP map interpretation (Elith,

Burgman & Regan 2002). When resource managers and decision makers fail to make the

81 connection between a map value and its underlying uncertainty, misinterpretation is more likely to occur, which may lead to negative consequences for MEBM goals.

Errors in decisions that result from interpreting maps as completely certain, when they are not, can have implications to marine species and people. For example, drawing a protected area boundary around a region erroneously believed to be suitable for a species, results in a loss of benefit to people who could use the area for economic benefit. Placing a renewable energy device in a location erroneously believed to be unsuitable for keystone macrofauna species could result in harm to the benthic community. The development of supplementary map products that visualize error, uncertainty, and data limitations underlying model predictions, as detailed in this thesis promote transparency in the interpretation of habitat suitability maps.

As statistical maps are developed from an uncertain process, managers must anticipate that maps can change and improve over time. Managers must take into account that the next study of groundfish habitat may change current fishing closure boundaries. The next high resolution mapping project may alter the boundaries of sedimentary patches, leading to changes in predictive outputs of benthic habitat suitability maps. Such awareness can be an advantage as uncertainty can be used to identify data poor regions to direct future sampling effort leading to future uncertainty reduction (Lester et al.

2010) and HSP map improvement. Awareness of the evolving map promotes the development of adaptive management strategies and better prepares managers and decision makers for future planning efforts.

Knowledge of marine species is limited due to costs, time, and challenges related to marine data collection. Marine ecological knowledge remains data poor, especially in

82 comparison to terrestrial counterparts. Statistical models, as exemplified in this thesis as habitat suitability models, are a robust way of filling in the gaps of ecosystem knowledge so that managers can move forward in marine spatial planning with the best available science. Maps are intuitively easy for humans to interpret and understand (Board &

Taylor 1977), yet without an expression of underlying uncertainties, can also be misinterpreted. Maps of error, bias, statistical confidence intervals, and data limitations, should regularly accompany habitat suitability maps. Such maps of uncertainty will enhance managers understanding and allow for better interpretation of habitat suitability maps.

83

Bibliography

Aguilera, P., Fernandez, A., Reche, F. & Rumi, R. (2010) Hybrid Bayesian network classifiers: Application to species distribution models. Environmental Modelling &

Software , 25 , 1630–1639.

Ahmadi-Nedushan, B., St-Hilaire, A., Bérubé, M., Robichaud, É., Thiémonge, N. &

Bobée, B. (2006) A review of statistical methods for the evaluation of aquatic habitat suitability for instream flow assessment. River Research and Applications ,

22 , 503–523.

Allouche, O., Tsoar, A. & Kadmon, R. (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). Journal of

Applied Ecology , 43 , 1223–1232.

Amoudry, L., Bell, P.S., Black, K.S., Gatliff, R.W., Helsby, R., Souza, A.J., Thorne, P.D.

& Wolf, J. (2009) A Scoping Study onβ€―: Research into Changes in Sediment

Dynamics Linked to Marine Renewable Energy Installations. NERC Marine

Renewable Energy Theme Action Plan Report , 120 p.

Austin, W.C., Conway, K.W., Barrie, J.V. & Krautter, M. (2007) Growth and morphology of a reef-forming glass sponge, Aphrocallistes vastus (Hexactinellida), and implications for recovery from widespread trawl damage. Porifera Research:

Biodiversity, Innovation and Sustainability - 2007 , 139–145.

Barry, S. & Elith, J. (2006) Error and uncertainty in habitat models. Journal of Applied

Ecology , 43 , 413–423.

Bierman, S.M., Butler, A., Marion, G. & Kühn, I. (2010) Bayesian image restoration models for combining expert knowledge on recording activity with species distribution data. Ecography , 33 , 451–460.

Board, C. & Taylor, R.M. (1977) Perception and Maps: Human Factors in Map Design and Interpretation. Transactions of the Institute of British Geographers , 2 , 19–36.

Boehlert, G.W. & Gill, A.B. (2010) Environmental and Ecological Effects Of Ocean

Renewable Energy Development: A Current Synthesis. Oceanography , 23 , 68–81.

Byers, J.E. & Grabowski, J.H. (2013) Soft-Sediment Communities. Marine Community

Ecology and Conservation (ed J. Bertness, MD, Bruno, JF, Silliman, BR,

Stachowicz), pp. 227–249. Sinauer Associates, Incorporated.

84

Chuenpagdee, R., Morgan, L.E., Maxwell, S.M., Norse, E.A. & Pauly, D. (2003) Shifting gears: assessing collateral impacts of fishing methods in US waters. Frontiers in

Ecology and the Environment , 1 , 517–524.

Cleaves, D.A. (1995) Assessing and communicating uncertainty in decision support systems: lessons from an ecosystem policy analysis. AI Applications , 9 , 87–102.

Coates, D., Vanaverbeke, J. & Vincx, M. (2013) Enrichment of the soft sediment macrobenthos around a gravity based foundation on the Thorntonbank. Offshore wind farms in the Belgian part of the North Sea: heading for an understanding of environmental impacts (eds S. Degraer, R. Brabant & B. Rumes), pp. 41–54. Royal

Beligian Institute of Natural Sciences, Brussels, Belgium.

Conway, K., Barrie, J., Hill, P., Austin, W. & Picard, K. (2007) Mapping sensitive benthic habitats in the Strait of Georgia, coastal British Columbia: deep-water sponge and coral reefs. Geological Survey of Canada, Current Research , 2007-A2 ,

1–6.

Conway, K.W., Barrie, J.V. & Krautter, M. (2005) Geomorphology of unique reefs on the western Canadian shelf: sponge reefs mapped by multibeam bathymetry. Geo-

Marine Letters , 25 , 205–213.

Conway, K.W., Krautter, M., Barrie, J.V. & Neuweiler, M. (2001) Hexactinellid Sponge

Reefs on the canadian Continental Shelf: A Unique ‘Living Fossil’.

Geoscience

Canada , 28 , 71–78.

Conway, K.W., Krautter, M., Barrie, J., Whitney, F., Thomson, R., Reiswig, H., Lehnert,

H., Mungov, G. & Bertram, M. (2004) Sponge reefs in the Queen Charlotte Basin,

Canada: controls on distribution, growth and development. Deep-sea corals and ecosystems Springer, Berlin Heidelberg New York.

Cook, S.E., Conway, K.W. & Burd, B. (2008) Status of the glass sponge reefs in the

Georgia Basin. Marine environmental research , 66 Suppl , S80–S86.

Cressie, N., Calder, C.A., Clark, J.S., Ver Hoef, J.M. & Wikle, C.K. (2009) Accounting for Uncertainty in Ecological Analysisβ€―: the Strengths and Limitations of

Hierarchical Statistical Modeling. Ecological Applications , 19 , 553–570.

Degraer, S., Verfaillie, E., Willems, W., Adriaens, E., Vincx, M. & Van Lancker, V.

(2007) Habitat suitability modelling as a mapping tool for macrobenthic communitiesβ€―: An example from the Belgian part of the North Sea.

Continental Shefl

Research .

85

Dempster, A., Laird, N. & Rubin, D. (1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society , 39 , 1–38.

Dlamini, W. (2011) Bioclimatic modeling of southern African bioregions and biomes using Bayesian networks. Ecosystems , 14 , 366–381.

Dransfield, A., Hines, E., Mcgowan, J., Holzman, B., Nur, N., Elliott, M., Howar, J. &

Jahncke, J. (2014) Where the whales areβ€―: using habitat modeling to support changes in shipping regulations within National Marine Sanctuaries in Central California.

Endangered Species Research , 26 , 39–57.

Dunn, D.C. & Halpin, P.N. (2009) Rugosity-based regional modeling of hard-bottom habitat. Marine Ecology Progress Series , 377 , 1–11.

Ehler, C. & Douvere, F. (2009) Marine Spatial Planning: a step-by-step approach toward ecosystem-based management. Intergovernmental Oceanographic Commission and

Man and the Biosphere Programme. IOC Manual and Guides No. 53, ICAM

Dossier No. 6 , Paris, UNESCO.

Eleftheriou, A. & McIntyre, A. (2005) Methods for Study of Marine Benthos, 3rd Edition .

Wiley-Blackwell, Oxford.

Elith, J., Burgman, M.A. & Regan, H.M. (2002) Mapping epistemic uncertainties and vague concepts in predictions of species distribution. Ecological Modelling , 157 ,

313–329.

Elith, J. & Graham, C.H. (2009) Do they? How do they? WHY do they differ? On finding reasons for differing performances of species distribution models.

Ecography , 32 , 66–77.

Foley, M.M., Halpern, B.S., Micheli, F., Armsby, M.H., Caldwell, M.R., Crain, C.M.,

Prahler, E., Rohr, N., Sivas, D., Beck, M.W., Carr, M.H., Crowder, L.B., Emmett

Duffy, J., Hacker, S.D., McLeod, K.L., Palumbi, S.R., Peterson, C.H., Regan, H.M.,

Ruckelshaus, M.H., Sandifer, P. a. & Steneck, R.S. (2010) Guiding ecological principles for marine spatial planning. Marine Policy , 34 , 955–966.

Folk, R. & Ward, W. (1957) Brazos River Bar: A Study in the Significance of Grain Size

Parameters. SEPM Journal of Sedimentary Research , Vol. 27 , 3–26.

Fotheringham, A., Brunsdon, C. & Charlton, M. (2002) Geographically Weighted

Regression: The Analysis of Spatially Varying Relationships . Wiley, Chichester.

Franklin, J. (2009) Mapping Species Distributions . Cambridge University Press,

Cambridge.

Friedman, N., Geiger, D. & Goldszmidt, M. (1997) Bayesian Network Classifiers.

Machine learning , 29 , 131–163.

Gage, J.D. & Tyler, P.A. (1991) Deep-Sea Biology: A Natural History of Organisms on the Deep-Sea Floor . Cambridge University Press, Cambridge, UK.

86

Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A. & Rubin, D. (2013) Bayesian

Data Analysis . CRC Press.

Glockzin, M. & Zettler, M.L. (2008) Spatial macrozoobenthic distribution patterns in relation to major environmental factors- A case study from the Pomeranian Bay

(southern Baltic Sea). Journal of Sea Research , 59 , 144–161.

Gogina, M., Glockzin, M. & Zettler, M.L. (2010) Distribution of benthic macrofaunal communities in the western Baltic Sea with regard to near-bottom environmental parameters. 2. Modelling and prediction. Journal of Marine Systems , 80 , 57–70.

Goldfinger, C., Henkel, S., Romsos, C., Havron, A. & Black, B. (2014) Benthic Habitat

Characterization: Volume 1 Evaluation of Continental Shelf Geology Offshore the

Pacific Northwest. US Dept. of the Interior, Bureau of OCean Energy Management,

Pacific OCS Region. OCS Study BOEM 2014-662 , 161 p.

Goldfinger, C., Nelson, C.H., Morey, A.E., Joel E, J., Patton, J., Karabanov, E.,

Gutierrez-Pastor, J., Eriksson, A., Gracia, E., Dunhill, G., Enkin, R., Dallimore, A.

& Valiier, T. (2012) Turbidite Event History — Methods and Implications for

Holocene Paleoseismicity of the Cascadia Subduction Zone. U.S. Geological Survey

Professional Paper 1661-F , 170 p.

Graham, M. (2003) Confronting multicollinearity in ecological multiple regression.

Ecology , 84 , 2809–2815.

Gray, J.S. (1974) Animal-Sediment Relationships. Oceanography and Marine Biology: an Annual Review , 12 , 223–261.

Guinotte, J.M. & Davies, A.J. (2014) Predicted deep-sea coral habitat suitability for the

U.S. West coast. PloS one , 9 , e93918.

Guisan, A. & Thuiller, W. (2005) Predicting species distribution: offering more than simple habitat models. Ecology Letters , 8 , 993–1009.

Guisan, A. & Zimmermann, N.E. (2000) Predictive habitat distribution models in ecology. Ecological Modelling , 135 , 147–186.

Hall, L.S., Krausman, P.R. & Morrison, M.L. (1997) The habitat concept and a plea for standard terminology. Wildlife Society Bulletin , 25 , 173–182.

87

Heckerman, D. (1995) A tutorial on learning with Bayesian networks. Technical report

MSR-TR-95-06, Microsoft Research .

Hedges, J.I. & Keil, R.G. (1995) Sedimentary organic matter preservation: an assessment and speculative synthesis. Marine Chemistry , 49 , 137–139.

Henkel, S., Goldfinger, C., Romsos, C., Hemery, L., Havron, A. & Politano, K. (2014)

Benthic Habitat Characterization: Volume 2 Evaluation of Benthic Communities on the Outer Continental Shelf of the Pacific Northwest. US Dept. of the Interior,

Bureau of Ocean Energy Management, Pacific OCS Region. OCS Study BOEM

2014-662.

, 2 , 221 p.

Hessler, R.R. & Jumars, P.A. (1974) Abyssal community analysis from replicate box cores in the central North Pacific. Deep-Sea Research , 21 , 185–209.

Hiddink, J.G., Jennings, S. & Kaiser, M.J. (2006) Indicators of the ecological impact of bottom-trawl disturbance on seabed communities. Ecosystems , 9 , 1190–1199.

Hiddink, J.G., Jennings, S., Kaiser, M.J., Queiros, a. M., Duplisea, D.E. & Piet, G.J.

(2009) Cumulative impacts of seabed trawl disturbance on benthic biomass, production and species richness in different habitats. , 736 , 721–736.

Hutchinson, G.E. (1957) Concluding Remarks. Cold Spring Harbor Symposia on

Quantitative Biology , 22 , 415–427.

Ierodiaconou, D., Monk, J., Rattray, a., Laurenson, L. & Versace, V.L. (2011)

Comparison of automated classification techniques for predicting benthic biological communities using hydroacoustics and video observations. Continental Shelf

Research , 31 , S28–S38.

Ioannidis, J.P.A. (2005) Why most published research findings are false. PLoS medicine ,

2 , e124.

Jamieson, G.S. & Chew, L. (2002) Hexactinellid Sponge Reefsβ€―: Areas of Interest as

Marine Protected Areas in the North and Central Coast Areas. Canadian Science

Advisory Secretariat, Research Document 2002/122 , 2002 , 78 p.

Jensen, F. V. & Nielson, T.D. (2007) Bayesian Networks and Decision Graphs, 2nd

Edition . Springer, New York.

88

Johnson, C.J. & Gillingham, M.P. (2004) Mapping uncertainty: Sensitivity of wildlife habitat ratings to expert opinion. Journal of Applied Ecology , 41 , 1032–1041.

Kontkanen, P., Myllymaki, P., Silander, T., Tirri, H. & Grunwald, P. (1997) Comparing predictive inference methods for discrete domains. Proceedings of the Sixth

International Workshop on Artificial Intelligence and Statistics pp. 311–318.

Krautter, M., Conway, K.W. & Barrie, J.V. (2006) Recent Hexactinosidan Sponge Reefs

(Silicate Mounds) off British Columbia, Canada: Frame-Building Processes. Journal of Paleontology , 80 , 38–48.

Kulm, L.D. & Fowler, G.A. (1974) Oregon Continental Margin Structure and

Stratigraphy: A Test of the Imbricate Thrust Model. The Geology of Continental

Margins (eds C.A. Burk & C.L. Drake), pp. 261–283. Springer-Verlag, New York.

Laine, A.O. (2003) Distribution of soft-bottom macrofauna in the deep open Baltic Sea in relation to environmental variability. Estuarine, Coastal and Shelf Science , 57 , 87–

97.

De Laplace, P.S. (1812) Théorie Analytique Des Probabilities . Courcier, Paris.

Lele, S.R., Merrill, E.H., Keim, J. & Boyce, M.S. (2013) Selection, use, choice and occupancy: Clarifying concepts in resource selection studies. Journal of Animal

Ecology , 82 , 1183–1191.

Lester, S.E., McLeod, K.L., Tallis, H., Ruckelshaus, M., Halpern, B.S., Levin, P.S.,

Chavez, F.P., Pomeroy, C., McCay, B.J., Costello, C., Gaines, S.D., Mace, A.J.,

Barth, J.A., Fluharty, D.L. & Parrish, J.K. (2010) Science in support of ecosystembased management for the US West Coast and beyond. Biological Conservation ,

143 , 576–587.

Leys, S.P. & Lauzon, N.R.. (1998) Hexactinellid sponge ecology: growth rates and seasonality in deep water sponges. Journal of Experimental Marine Biology and

Ecology , 230 , 111–129.

Locket, D.E. (2012) A Bayesian Approach to Habitat Suitability Prediction . Oregon State

University.

Marcot, B.G. (2006) Characterizing Species at Risk Iβ€―: Modeling Rare Species Under the

Northwest Forest Plan. Ecology and Society , 11 , 10 p.

Marcot, B.G. (2012) Metrics for evaluating performance and uncertainty of Bayesian network models. Ecological Modelling , 230 , 50–62.

Marcot, B.G., Steventon, J., Sutherland, G. & McCann, R. (2006) Guidelines for developing and updating Bayesian belief networks applied to ecological modeling and conservation. Canadian Journal of Forest Research , 36 , 3063–3074.

89

McConnaughey, R. (2000) An examination of chronic trawling effects on soft-bottom benthos of the eastern Bering Sea. ICES Journal of Marine Science , 57 , 1377–1388.

Mid-Atlantic Fishery Management Council. (2015) Deep Sea Coral Amendment to the

Atlantic Mackerel, Squid, and Butterfish Fishery Management Plan. Public

Document , 1–95.

Miller, R.G., Hutchison, Z.L., Macleod, A.K., Burrows, M.T., Cook, E.J., Last, K.S. &

Wilson, B. (2013) Marine renewable energy development: Assessing the Benthic

Footprint at multiple scales. Frontiers in Ecology and the Environment , 11 , 433–

440.

Morrison, M.L., Marcot, B.G. & Mannan, R.W. (1992) Wildlife-Habitat Relationships:

Concepts and Applications . University of Wisconson Press, Madison.

Myllymäki, P., Silander, T., Tirri, H. & Uronen, P. (2002) B-Course: A web-based tool for Bayesian and causal data analysis. International Journal on Artificial

Intelligence Tools , 11 , 369–387.

Neill, S.P., Litt, E.J., Couch, S.J. & Davies, A.G. (2009) The impact of tidal stream turbines on large-scale sediment dynamics. Renewable Energy , 34 , 2803–2812.

NMFS Pacific Fishery Management Council (PFMC). (2014) Pacific Coast Groundfish

Fishery Management Plan. National Oceanic and Atmospheric Administration

Award Number NA05NMF441008 , 146 p.

NOAA Deep Sea Coral Research and Technology. (2013) Deep-Sea Coral National

Geographic Database, version 4.

Parkinson, S.C., Dragoon, K., Reikard, G., García-Medina, G., Özkan-Haller, H.T. &

Brekken, T.K.A. (2015) Integrating ocean wave energy at large-scales: A study of the US Pacific Northwest. Renewable Energy , 76 , 551–559.

Rattray, A., Ierodiaconou, D., Laurenson, L., Burq, S. & Reston, M. (2009) Hydroacoustic remote sensing of benthic biological communities on the shallow South

East Australian continental shelf. Estuarine, Coastal and Shelf Science , 84 , 237–

245.

90

Reid, J.A., Reid, J.M., Jenkins, C.J., Zimmermann, M., Williams, S.J. & Field, M.E.

(2006) usSeabed: Pacific Coast (California, Oregon, Washington) Offshore

Surficial-Sediment Data Release: U.S. Geological Survey Data Series 182, Version

1.0

.

Reikard, G., Robertson, B. & Bidlot, J.-R. (2015) Combining wave energy with wind and solar: Short-term forecasting. Renewable Energy , 81 , 442–456.

Rengstorf, A.M., Yesson, C., Brown, C. & Grehan, A.J. (2013) High-resolution habitat suitability modelling can improve conservation of vulnerable marine ecosystems in the deep sea. Journal of Biogeography , 40 , 1702–1714.

Rijnsdorp, A.D., Buys, A.M., Storbeck, F. & Visser, E.G. (1998) Micro-scale distribution of beam trawl effort in the southern North Sea between 1993 and 1996 in relation to the trawling frequency of the sea bed and the impact on benthic. ICES Journal of

Marine Science , 55 , 403–419.

Robison, B.H. (2004) Deep pelagic biology. Journal of Experimental Marine Biology and Ecology , 300 , 253–272.

Rocchini, D., Hortal, J., Lengyel, S., Lobo, J.M., Jimenez-Valverde, A., Ricotta, C.,

Bacaro, G. & Chiarucci, A. (2011) Accounting for uncertainty when mapping species distributions: The need for maps of ignorance. Progress in Physical

Geography , 35 , 211–226.

Romsos, C. (2006) Mapping Surficial Geological Habitats of the Oregon Continental

Margin Using Integrated Interpretive GIS Techniques . Oregon State University.

Sanders, H.L. (1968) Marine benthic diversity: a comparative study. American

Naturalist , 102 , 243–282.

Shackeroff, J.M., Hazen, E.L. & Crowder, L.B. (2009) The Oceans as Peopled

Seascapes. Ecosystem-based management for the oceans (eds K.L. McLeod & H.M.

Leslie), p. 368. Island Press, Washington, D.C.

Shucksmith, R.J. & Kelly, C. (2014) Data collection and mapping – Principles, processes and application in marine spatial planning. Marine Policy , 50 , 27–33.

Sivia, D. & Skilling, J. (1996) Data Analysis: A Bayesian Tutorial . Oxford University

Press, New York.

Snelgrove, P.V.R. (1999) Getting to the Bottom of Marine Biodiversity: Sedimentary

Habitats. BioScience , 49 , 129.

91

Snelgrove, P.V.R. & Butman, C.A. (1994) Animal-Sediment Relationships Revisited:

Cause Versus Effect. Oceanography and Marine Biology: an Annual Review , 32 ,

111–177.

Stone, R.P., Conway, K.W., Csepp, D.J. & Barrie, J.V. (2013) The Boundary Reefs:

Glass Sponge (Porifera: Hexactinellida) Reefs on the International Border Between

Canada and the United States. U.S Dep. Commer., NOAA Tech. Memo , NMFS-

AFSC, 31 p.

Thuiller, W., Araujo, M. & Lavorel, S. (2004) Do we need land-cover data to model species distributions in Europe? Journal of Biogeography , 31 , 353–361.

Tittensor, D.P., Baco, A.R., Brewin, P.E., Clark, M.R., Consalvey, M., Hall-Spencer, J.,

Rowden, A. a., Schlacher, T., Stocks, K.I. & Rogers, A.D. (2009) Predicting global habitat suitability for stony corals on seamounts. Journal of Biogeography , 36 ,

1111–1128.

Uusitalo, L. (2007) Advantages and challenges of Bayesian networks in environmental modelling. Ecological Modelling , 203 , 312–318.

Vierod, A.D.T., Guinotte, J.M. & Davies, A.J. (2014) Predicting the distribution of vulnerable marine ecosystems in the deep sea using presence-background models.

Deep-Sea Research Part II: Topical Studies in Oceanography , 99 , 6–18.

Watanabe, M. & Yamaguchi, K. (eds). (2003) The EM Algorithm and Related Statistical

Models . Marcel Dekker, New York.

Weiss, A. (2001) Topographic position and landforms analysis, Poster presentation. ESRI

User Conference San Diego.

Whitney, F., Conway, K., Thomson, R., Barrie, V., Krautter, M. & Mungov, G. (2005)

Oceanographic habitat of sponge reefs on the Western Canadian Continental Shelf.

Continental Shelf Research , 25 , 211–226.

Whittaker, R. (1960) Vegetation of the Siskiyou Mountains , Oregon and California.

Ecological Monographs , 30 , 279–338.

Wilson, M.F.J., O’Connell, B., Brown, C., Guinan, J.C. & Grehan, A.J. (2007) Multiscale

Terrain Analysis of Multibeam Bathymetry Data for Habitat Mapping on the

Continental Slope. Marine Geodesy , 30 , 3–35.

92

Yahel, G., Whitney, F., Reiswig, H.M., Leys, S.P. & Eerkes-Medrano, D.I. (2007) In situ feeding and metabolism of glass sponges (Hexactinellida, Porifera) studied in a deep temperate fjord with a remotely operated submersible. Limnology and

Oceanography , 52 , 428–440.

Download