Multi-scale Analysis: Options for Modeling Presence/Absence of Bird Species Kathryn M. Georgitis1, Alix I. Gitelman1, and Nick Danz2 Statistics Department, Oregon State University 2 Natural Resources Research Institute University of Minnesota-Duluth 1 R82-9096-01 The research described in this presentation has been funded by the U.S. Environmental Protection Agency through the STAR Cooperative Agreement CR82-9096-01 Program on Designs and Models for Aquatic Resource Surveys at Oregon State University. It has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred Talk Overview • • • • Ecological Question of Interest Western Great Lakes Breeding Bird Study Interesting Features of our Example Options for Modeling Species Presence/Absence (1) Separate Models for Each Spatial Extent (2) One Model for all Spatial Extents (3) Model using Functionals of Explanatory Variables (4) Graphical Model Ecological Question of Interest • How does the relationship between landscape characteristics and presence of a bird species change with scale? • What scale is the most useful in terms of understanding bird presence/absence? Concentric Circle Sampling Design 1000m 500m 100 m Western Great Lakes Breeding Bird Study • Response Variable: – Presence/Absence of Pine Warbler • Explanatory Variables: – % land cover within 4 different spatial extents – Ten land cover types Interesting Features of the Data Correlation between Explanatory Variables Spatial Extent pine and oak-pine/ spruce-fir lowland non-forest/ n. hardwoods / n. hardwoods aspen-birch 100m -0.31 (0.08) -0.08 (0.08) -0.07 (0.08) 500m 0.03 (0.08) -0.17 (0.08) -0.14 (0.08) 1000m 0.11 (0.08) -0.24 (0.08) -0.26 (0.08) 5000m 0.21 (0.08) -0.58 (0.06) -0.63 (0.06) Correlation Between Pine and Oak-Pine Measured at Different Scales Spatial Extent 100m 500m 1000m 5000m 100m 1 0.81 0.70 0.45 500m 1000m (0.05) (0.06) (0.07) 1 0.95 0.70 (0.03) (0.06) 1 0.79 (0.05) Relationship between Land Cover Variables and 10 20 30 40 50 Chequamegon Forest Chippewa Forest St. Croix Forest Superior Forest 0 Percentage of Pine and Oak-Pine 60 Spatial Extent 0 1000 2000 3000 Spatial Extent (m) 4000 5000 Options for Modeling Presence/Absence of Pine Warbler (1) Separate Models for Each Spatial Extent (2) One Model for all Spatial Extents (3) Model using Functionals of Explanatory Variables (4) Bayesian Network (Graphical) Model Option 1: Separate Models Approach (100m) M1 : log(p(1-p)-1) = C1b1 (500m) M5 : log(p(1-p)-1) = C5b5 (1000m) M10 : log(p(1-p)-1) = C10b10 (5000m) M50 : log(p(1-p)-1) = C50b50 where Y denotes n-length vector of binary response with Pr(Yi=1) = pi, C1 denotes matrix of explanatory variables at the 100m scale Option 1: Separate Models Approach Model Significant explanatory variables selected using BIC criteria M1 lowland conifer, pine and oak-pine M5 lowland conifer, pine and oak-pine, spruce-fir, spruce-fir:pine and oak-pine M10 pine and oak-pine, spruce-fir, spruce-fir:pine and oak-pine M50 pine and oak-pine, foresta, foresta:spruce-fir, spruce-fir a: The forest variable is an indicator for stands located in the Chequamegon national forest in Wisconsin. Option 1: Separate Models Approach • Disadvantages: – does not account for possible relationships between spatial extents – multi-collinearity of explanatory variable – 210 possible models for each spatial extent Options for Modeling Presence/Absence of Pine Warbler (1) Separate Models for Each Spatial Extent (2) One Model for all Spatial Extents (3) Model using Functionals of Explanatory Variables (4) Bayesian Network (Graphical) Model Option 2: One Model for all Spatial Extents Mall : log (p (1-p)-1) = Zall ball where Y denotes n-length vector of binary response with Pr(Yi=1) = pi, Zall = [C1, C5, C10] Option 2: One Model for all Spatial Extents Spatial extent Explanatory variables selected using BIC for Mall 100m aspen-birch, northern hardwoods, pine and oak-pine, spruce-fir 500m none 1000m spruce-fir 100m:1000m pine and oak-pine:spruce-fir Option 2: One Model for all Spatial Extents Advantages: – allows for interactions between scales Disadvantages: – serious multi-collinearity problems – 230 possible models Options for Modeling Presence/Absence of Pine Warbler (1) Separate Models for Each Spatial Extent (2) One Model for all Spatial Extents (3) Model using Functionals of Explanatory Variables (4) Bayesian Network (Graphical) Model Option 3: Model using Functionals of Explanatory Variables • Difference Model Mdiff : log (p (1-p)-1) = Zdiff bdiff where Zdiff = C5 - C1 (element-wise) • Proportional Model Mprop : log (p (1-p)-1) = Zprop bprop where Zprop = C5 /C1 (element-wise) Option 3: Model using Functionals of Explanatory Variables Model Explanatory variables selected using BIC Mdiff pine and oak-pinediff Mprop aspen-birchprop , pine and oak-pineprop Option 3: Model using Functionals of Explanatory Variables • Advantages: – incorporates two spatial extents • Disadvantages: – biologically meaningful? – multi-collinearity – model selection Options for Modeling Presence/Absence of Pine Warbler (1) Separate Models for Each Spatial Extent (2) One Model for all Spatial Extents (3) Model using Functionals of Explanatory Variables (4) Bayesian Network (Graphical) Model Option 4: Graphical Model - think of explanatory variables and response holistically (i.e., as a single multivariate observation) X1 X2 X3 Y Logistic Regression Model X4 X1 X2 X3 X4 Y Bayesian Network (Graphical) Model Option 4: Graphical Model For comparison with MALL, we use the same “explanatory” variables pine & oak-pine 100m aspen-birch 100m spruce-fir 1000m n. hardwoods 100m spruce-fir 100m Pine Warble r Option 4: Graphical Model Diagram of MALL N. hardwoods 100m aspen-birch 100m spruce-fir 100m Diagram of Bayesian MALL spruce-fir 1000m pine & oak-pine 100m Pine Warbler Where Z= variables in MALL log (p (1-p)-1) = Zball ; fixed Z N. hardwoods 100m aspen-birch 100m spruce-fir 100m spruce-fir 1000m pine & oak-pine 100m Pine Warbler Z ~ Multinomial(P,100) log(spruce-fir1000)~ N(m,s2) log (p (1-p)-1) = Z b + b5 log(spruce-fir1000) Option 4: Graphical Model Comparison of MALL and Bayesian MALL Land cover type variable intercept MALL Bayesian MALL -3.87 (1.27) -4.20 (1.18) aspen-birch100 0.02 (0.01) 0.03 (0.01) northern hardwoods100 0.03 (0.01) 0.03 (0.01) pine and oak-pine100 0.06 (0.01) 0.10 (0.02) spruce-fir100 0.02 (0.01) 0.02 (0.01) log(spruce-fir1000) 0.3 (0.44) 0.34 (0.41) -0.02 (0.008) -0.02 (0.008) pine and oak-pine100: log(spruce-fir1000) Option 4: Graphical Model Bayesian MALL N. hardwoods 100m aspen-birch 100m spruce-fir 100m Bayesian Network Model spruce-fir 1000m pine & oak-pine 100m Pine Warbler Where Z= variables in MALL Z ~ Multinomial(P,100) log(spruce-fir1000)~ N(m,s2) log (p (1-p)-1) = Z b + b5 log(spruce-fir1000) N. hardwoods 100m aspen-birch 100m spruce-fir 100m spruce-fir 1000m pine & oak-pine 100m Pine Warbler Zi ~ Multinomial(Pi,100) Pi=(Pi,1, Pi,2, Pi,3, Pi,4, Pi,5) log(Pi,1/(1- Pi,1))=f0 + f1 log(spruce-fir1000) log(spruce-fir1000)~ N(m,s2) log(p (1-p)-1) = b0 + b1 pine & oak-pine100 Option 4: Graphical Model Comparison of two Bayesian Network Models Component -2log likelihood for Bayesian MALL 160.9 -2 log likelihood for Bayes Network Model 179.4 100m Scale 25699.5 24478 1000m Scale 379.4 379.4 26239.8 25036.8 26354 (13) 25062 (11) PIWA Total BIC total Option 4: Graphical Model • Advantages: – considers ecological system holistically – can eliminate multi-collinearity – biologically meaningful • Disadvantages: – model selection – implementation issues Acknowledgements Don Stevens, OSU Jerry Niemi, N.R.R.I Univ. of Minn., Duluth JoAnn Hanowski, N.R.R.I Univ. of Minn., Duluth