∼ Inv − χ (a, b)
• Note: the hierarchy induces a correlation.
yj |α, σj2, Σβ ∼ N (Xj α, σj2I + Xj Σβ Xj )
• Implied model is
σj2|a, b
p(α, Σβ ) ∝ 1
βj |α, Σβ ∼ N (1α, Σβ )
• Priors and hyperpriors, for example:
y1 ⎥⎥
⎢ X1 0 . . . 0 ⎥
⎢ β1
y ⎥⎥
⎢ 0 X ...0 ⎥
⎢ β
y = ..2 ⎥⎥⎥ = X = ⎢⎢⎢ .. .. 2. . . .. ⎥⎥⎥ × ⎢⎢⎢ ..2
0 . . . . . . XJ
• Putting all regression models together into a single
with yj = (y1j , y2j , ..., ynj j ).
yj |βj , σj2 ∼ N (Xj βj , σj2)
• Model for jth experiment is
View as J regression experiments
induced by models with random effects.
those sired by different bulls: intraclass correlation
• Cows that are sired by same bull are more similar than
allocate flat priors to “fixed” parameters.
and sire is random effect. For us, all random, but we
• In classical statistics, herd and age are “fixed” effect,
• Exchangeability: given sire, age and herd, cows are
• Other covariates are herd and age of cow.
“genetic” groups.
in dataset. Sires “group” the cows into J different
• nj cows are daughters of bull j. There are J sires
• Data on milk production from n cows.
Milk production of cows - Mixed model
yij = xiβ + sj + eij
yij ∼ N (xiβ, σs2 + σ 2)
• To see that models are equivalent, do p(y) = p(y, β)dβ.
ρ = σβ2 /(σ 2 + σβ2 ) and the two model formulations are
• If β ∼ N (α, σβ2 I) and if we let η 2 = σ 2 + σβ2 , then
with X an n × J matrix of group indicators.
• For ρ ≥ 0, now consider model y ∼ N (Xβ, σ 2I),
cov(yij , ykl ) = 0, for different group
cov(yij , ykj ) = ρη , for same group
• Let var(yij ) = η , and let
• Model: y ∼ N (α, Σy )
as above.
effects parameters.
• Each rat gets her “own” curve if αi and βi are random
yij = αi + βi(xij − x̄i) + eij
xi = (8, 15, 22, 29, 36) days
1, ..., 5
yij : weight of ith rat in jth week, ii = 1, ..., 30, j =
• Assume linear growth (rats are young) and let:
for five weeks. Interest is in growth curve.
• CIBA-GEIGY measured the growth of 30 rats weekly,
• Suppose that observations come from J groups or
clusters so that y = (y1, y2, ..., yJ ), and yj = (y1j , y2j , ..., ynj j
• From Gelfand et al., 1990, JASA.
Example: growth curves in rats
Typically, Xβ = 1, Σβ = σβ2 I
β|α, Σβ ∼ N (Xβ α, Σβ )
– Prior dist. for J regression coefficients
Often Σy = σ 2I
y|β, Σy ∼ N (Xβ, Σy )
– Sampling model:
• Set-up:
– mixed models
– random effects models
• Unified approach to:
work with hierarchical framework
• Hierarchical linear models combine regression frame-
• Much of this material already seen in Chapters 5 and
Hierarchical Linear Models
• Random effects introduce correlations
Intra-class correlation
duction of cows sired by same bull:
ρ= 2 s 2
σs + σ
• Intra-class correlation: correlation between milk pro-
with X : n × p, Z : n × q.
y N (Xβ, σ 2I + σs2ZZ )
• In matrix form:
• Likelihood:
and (β, σs2, σ 2) are unknown.
pendent. xi = (herd, age) are herd and age effects,
with sj ∼ N (0, σs2), eij ∼ N (0, σ 2) and (s, e) inde-
• Mixed model:
Cows (cont’d)
yij ∼ N (µi, σ 2)
σα2 , σβ2 simInv − χ2
α0, β0 ∼ N (0.01, 10000)
σ 2 ∼ Inv − χ2(ν, σ02)
βi ∼ N (β0, σβ2 )
dependent in the population distribution.
• A more reasonable formulation is to model αi, βi as
Same for prior for σ 2 if desired.
by having very small degrees of freedom parameter.
• Priors for σα2 , σβ2 can be as non-informative as possible
• Priors:
αi ∼ N (α0, σα2 )
• Population distributions:
with µi = αi + βi(xij − x̄i).
• Likelihood:
Rats (cont’d)
and fixed.
ters in the priors for Σy , Σβ . Typically assume known
• Finally, might need to set priors for the hyperparame-
• Also need priors for Σy , Σβ .
with α0, Σα knwown, often p(α) ∝ 1.
α|α0, Σα ∼ N (α0, Σα)
• Hyperprior on K parameters α:
Hierarchical Linear Models
intra-class correlations:
sampling and example therein.
• For one example, go back to earlier lecture on Gibbs
χ2, even if prior is improper.
– For scale parameters, conditionals are also Inv −
of priors and hyperpriors), conditionals are normal
– For location parameters (regression coefficients, means
• All conditional distributions are of standard form:
Inv − χ2 (or Wishart).
– Variance components (or variance matrices) are
– Regression parameters are N
– Observations are N
• With conjugate prior, recall that:
• Hierarchical linear models have nice structure for com-
p(βJ1+1 , ..., βJ ) ∝ N (1α, σβ2 I) → random effects
p(β1, ..., βJ1 ) ∝ 1 → “fixed” effects
• Mixed effects models:
• This is general formulation for several more general
β ∼ N (1α, σβ2 I)
fine the clusters or groups
achieved by conditioning on the indicators that de-
• Exchangeability at the level of the observations is
• The J components of β are divided into K clusters
βjk |αk , σk2 ∼ N (1αk , σk2I), ; ; jk = 1, ..., Jk
bj1 |α1, σ12 ∼ N (1α1, σ12I), j1 = 1, ..., J1
p(βi) ∝ 1, i = 1, ..., I
ferent random effects that generate different sets of
is reflected by indicators whose regression coefficients
have the population distribution
• A more general version of the mixed model has dif-
with a random effects model where class membership
Mixed effects models
• Positive intra-class correlations can be accommodated
Intra-class correlation