Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado State University Developed under the EPA STAR Research Assistance Agreement CR-829095 Motivating Problem • Various stream sites in the Mid-Atlantic region of the United States were visited in Summer 1994. – For each site, each observed fish species was crosscategorized according to several traits – Environmental variables are also measured at each site (e.g. precipitation, chloride concentration, …) • Relative proportions are more informative (species composition). • How can we examine complex relationships between and within the covariates and response traits ? Graphical Models (Chain Graphs) • Graphical models (e.g. log-linear models, lattice spatial models) have been explored for examination of conditional dependencies within multivariate random variables Y1 X1 X2 Y2 Model: f (y1, y2 | x1, x2) f (x1) f (x2) Probability Model for Individuals • Response variables – Set F of discrete categorical variables – Notation: y is a specific cell • Explanatory variables – Set G of explanatory variables (covariates) – Notation: x refers to a specific explanatory observation • Random effects – Allows flexibility when sampling many “sites” – Unobserved covariates – Notation: ef, f F, refers to a random effect. Probability Model and Extended Chain Graph, Ge • Joint distribution f (y, x, e) = f (y|x, e) f (x) f (e) • Graph illustrating possible dependence relationships for the full model, Ge. X1 Y1 e1 e{1,2} X2 Y2 e2 Graphical Models for Discrete Compositions • Sampling many individuals at a site results in cell counts, C(y)i = # individuals in cell y at site i = 1,…,S. • Conditional count likelihood [C(y)i]y | xi ~ multinomial(Ni; [f(y|xi, ei)]y ), • Joint covariate count likelihood multinomial(Ni; [f(y|xi, ei)]y ) MVN(m, -1) • Parameter estimation: Gibbs sampler with hierarchical centering. – Easier to impliment – Improved convergence Fish Species Richness in the Mid-Atlantic Highlands • 91 stream sites in the Mid Atlantic region of the United States were visited in an EPA EMAP study • Response composition: Observed fish species were cross-categorized according to 2 discrete variables: 1. Habit 2. Pollution tolerance • Column species • Intolerant • Benthic species • Intermediate • Tolerant Fish Species Richness in the Mid-Atlantic Highlands • Environmental covariates values were measured at each site for the following covariates 1. 2. 3. 4. 5. 6. Mean watershed precipitation (m) Minimum watershed elevation (m) Turbidity (ln NTU) Chloride concentration (ln meq/L) Sulfate concentration (ln meq/L) Watershed area (ln km2) Fish Species Richness Model • Composition Graphical Model 6 f y | xi ,εi exp F xi ,εi f y x i e fi y f F 1 f F ε fi ~ MVN 0 , Tf1 and xi ~ MVN μ , Ψ 1 • Prior distributions f y ~ iid N 0, 2 ; γ 0,...,6, f F Tf ~ Wish , R f Ψ ~ Wish 6, R Fish Species Functional Groups Posterior suggested chain graph for independence model (lower DIC than dependent response model) Precipitation Habit Elevation Area Turbidity Sulfate Tolerance Chloride Edge exclusion determined from 95% HPD intervals for parameters and off-diagonal elements of . Comments and Conclusions • Using the proposed state-space model for discrete compositional data, – Relationships evaluated as a Markov random field – Multi-way compositions can be analyzed with specified dependence structure between cells – MVN random effects imply that the cell probabilities have a constrained LN distribution • DR models also extend the capabilities of graphical models. – Data can be analyzed from many multiple sites. – Over dispersion in cell counts can be added. The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of presenter and the STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this presentation. # CR - 829095