Random Effects Graphical Models and the Analysis of Compositional Data STARMAP

advertisement
Random Effects Graphical Models and the
Analysis of Compositional Data
Devin S. Johnson and Jennifer A. Hoeting
STARMAP
Department of Statistics
Colorado State University
Developed under the EPA STAR Research Assistance Agreement
CR-829095
Motivating Problem
• Various stream sites in the Mid-Atlantic region of the
United States were visited in Summer 1994.
– For each site, each observed fish species was crosscategorized according to several traits
– Environmental variables are also measured at each
site (e.g. precipitation, chloride concentration, …)
• Relative proportions are more informative (species
composition).
• How can we examine complex relationships between
and within the covariates and response traits ?
Graphical Models (Chain Graphs)
•
Graphical models (e.g. log-linear models, lattice spatial
models) have been explored for examination of
conditional dependencies within multivariate random
variables
Y1
X1
X2
Y2
Model:
f (y1, y2 | x1, x2)  f (x1)  f (x2)
Probability Model for Individuals
• Response variables
– Set F of discrete categorical variables
– Notation: y is a specific cell
• Explanatory variables
– Set G of explanatory variables (covariates)
– Notation: x refers to a specific explanatory
observation
• Random effects
– Allows flexibility when sampling many “sites”
– Unobserved covariates
– Notation: ef, f  F, refers to a random effect.
Probability Model and Extended Chain Graph, Ge
• Joint distribution
f (y, x, e) = f (y|x, e)  f (x)  f (e)
• Graph illustrating possible dependence relationships for
the full model, Ge.
X1
Y1
e1
e{1,2}
X2
Y2
e2
Graphical Models for Discrete Compositions
• Sampling many individuals at a site results in cell counts,
C(y)i = # individuals in cell y at site i = 1,…,S.
• Conditional count likelihood
[C(y)i]y | xi ~ multinomial(Ni; [f(y|xi, ei)]y ),
• Joint covariate count likelihood
multinomial(Ni; [f(y|xi, ei)]y )  MVN(m, -1)
• Parameter estimation:
Gibbs sampler with hierarchical centering.
– Easier to impliment
– Improved convergence
Fish Species Richness in the Mid-Atlantic Highlands
•
91 stream sites in the Mid Atlantic region of the United
States were visited in an EPA EMAP study
•
Response composition:
Observed fish species were cross-categorized
according to 2 discrete variables:
1. Habit
2. Pollution tolerance
• Column species
• Intolerant
• Benthic species
• Intermediate
• Tolerant
Fish Species Richness in the Mid-Atlantic Highlands
•
Environmental covariates
values were measured at each site for the following
covariates
1.
2.
3.
4.
5.
6.
Mean watershed precipitation (m)
Minimum watershed elevation (m)
Turbidity (ln NTU)
Chloride concentration (ln meq/L)
Sulfate concentration (ln meq/L)
Watershed area (ln km2)
Fish Species Richness Model
• Composition Graphical Model
6


f  y | xi ,εi   exp  F  xi ,εi      f  y  x i   e fi  y 
f F  1
f F



ε fi ~ MVN 0 , Tf1
and

xi ~ MVN μ , Ψ 1


• Prior distributions
 f  y  ~ iid N  0, 2  ; γ  0,...,6, f  F
Tf ~ Wish  , R f

Ψ ~ Wish  6, R  
Fish Species Functional Groups
Posterior suggested chain graph for independence model
(lower DIC than dependent response model)
Precipitation
Habit
Elevation
Area
Turbidity
Sulfate
Tolerance
Chloride
Edge exclusion determined from 95% HPD intervals for 
parameters and off-diagonal elements of .
Comments and Conclusions
• Using the proposed state-space model for discrete
compositional data,
– Relationships evaluated as a Markov random field
– Multi-way compositions can be analyzed with
specified dependence structure between cells
– MVN random effects imply that the cell probabilities
have a constrained LN distribution
• DR models also extend the capabilities of graphical
models.
– Data can be analyzed from many multiple sites.
– Over dispersion in cell counts can be added.
The work reported here was developed under the STAR
Research Assistance Agreement CR-829095 awarded by
the U.S. Environmental Protection Agency (EPA) to
Colorado State University. This presentation has not been
formally reviewed by EPA. The views expressed here are
solely those of presenter and the STARMAP, the Program
he represents. EPA does not endorse any products or
commercial services mentioned in this presentation.
# CR - 829095
Download