⎡ ⎤ ⎡ ⎤ ⎡ ∼ Inv − χ (a, b) 2 ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦ ⎤ 7 • Note: the hierarchy induces a correlation. yj |α, σj2, Σβ ∼ N (Xj α, σj2I + Xj Σβ Xj ) • Implied model is σj2|a, b p(α, Σβ ) ∝ 1 βj |α, Σβ ∼ N (1α, Σβ ) • Priors and hyperpriors, for example: ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣ y1 ⎥⎥ ⎢ X1 0 . . . 0 ⎥ ⎢ β1 ⎢ ⎥ ⎢ ⎢ ⎥ ⎢ y ⎥⎥ ⎢ 0 X ...0 ⎥ ⎢ β y = ..2 ⎥⎥⎥ = X = ⎢⎢⎢ .. .. 2. . . .. ⎥⎥⎥ × ⎢⎢⎢ ..2 ⎥ ⎢ ⎥ ⎢ ⎦ ⎣ ⎦ ⎣ yJ βJ 0 . . . . . . XJ model: • Putting all regression models together into a single with yj = (y1j , y2j , ..., ynj j ). yj |βj , σj2 ∼ N (Xj βj , σj2) • Model for jth experiment is View as J regression experiments 5 induced by models with random effects. those sired by different bulls: intraclass correlation • Cows that are sired by same bull are more similar than allocate flat priors to “fixed” parameters. and sire is random effect. For us, all random, but we • In classical statistics, herd and age are “fixed” effect, exchangeable. • Exchangeability: given sire, age and herd, cows are • Other covariates are herd and age of cow. “genetic” groups. in dataset. Sires “group” the cows into J different • nj cows are daughters of bull j. There are J sires • Data on milk production from n cows. Milk production of cows - Mixed model example yij = xiβ + sj + eij yij ∼ N (xiβ, σs2 + σ 2) 2 8 • To see that models are equivalent, do p(y) = p(y, β)dβ. equivalent. ρ = σβ2 /(σ 2 + σβ2 ) and the two model formulations are • If β ∼ N (α, σβ2 I) and if we let η 2 = σ 2 + σβ2 , then with X an n × J matrix of group indicators. • For ρ ≥ 0, now consider model y ∼ N (Xβ, σ 2I), cov(yij , ykl ) = 0, for different group cov(yij , ykj ) = ρη , for same group • Let var(yij ) = η , and let 2 • Model: y ∼ N (α, Σy ) as above. effects parameters. 3 • Each rat gets her “own” curve if αi and βi are random yij = αi + βi(xij − x̄i) + eij xi = (8, 15, 22, 29, 36) days 1, ..., 5 yij : weight of ith rat in jth week, ii = 1, ..., 30, j = • Assume linear growth (rats are young) and let: for five weeks. Interest is in growth curve. • CIBA-GEIGY measured the growth of 30 rats weekly, • Suppose that observations come from J groups or clusters so that y = (y1, y2, ..., yJ ), and yj = (y1j , y2j , ..., ynj j • From Gelfand et al., 1990, JASA. Example: growth curves in rats 1 Typically, Xβ = 1, Σβ = σβ2 I β|α, Σβ ∼ N (Xβ α, Σβ ) – Prior dist. for J regression coefficients Often Σy = σ 2I y|β, Σy ∼ N (Xβ, Σy ) – Sampling model: • Set-up: – mixed models – random effects models • Unified approach to: work with hierarchical framework • Hierarchical linear models combine regression frame- 14 • Much of this material already seen in Chapters 5 and Hierarchical Linear Models • Random effects introduce correlations Intra-class correlation 6 duction of cows sired by same bull: σ2 ρ= 2 s 2 σs + σ • Intra-class correlation: correlation between milk pro- with X : n × p, Z : n × q. y N (Xβ, σ 2I + σs2ZZ ) • In matrix form: • Likelihood: and (β, σs2, σ 2) are unknown. pendent. xi = (herd, age) are herd and age effects, with sj ∼ N (0, σs2), eij ∼ N (0, σ 2) and (s, e) inde- • Mixed model: Cows (cont’d) yij ∼ N (µi, σ 2) 2 σα2 , σβ2 simInv − χ2 α0, β0 ∼ N (0.01, 10000) σ 2 ∼ Inv − χ2(ν, σ02) βi ∼ N (β0, σβ2 ) 4 dependent in the population distribution. • A more reasonable formulation is to model αi, βi as Same for prior for σ 2 if desired. by having very small degrees of freedom parameter. • Priors for σα2 , σβ2 can be as non-informative as possible • Priors: αi ∼ N (α0, σα2 ) • Population distributions: with µi = αi + βi(xij − x̄i). • Likelihood: Rats (cont’d) and fixed. ters in the priors for Σy , Σβ . Typically assume known • Finally, might need to set priors for the hyperparame- • Also need priors for Σy , Σβ . with α0, Σα knwown, often p(α) ∝ 1. α|α0, Σα ∼ N (α0, Σα) • Hyperprior on K parameters α: Hierarchical Linear Models intra-class correlations: 11 sampling and example therein. • For one example, go back to earlier lecture on Gibbs χ2, even if prior is improper. – For scale parameters, conditionals are also Inv − of priors and hyperpriors), conditionals are normal – For location parameters (regression coefficients, means • All conditional distributions are of standard form: Inv − χ2 (or Wishart). – Variance components (or variance matrices) are – Regression parameters are N – Observations are N • With conjugate prior, recall that: putation. • Hierarchical linear models have nice structure for com- Computation 9 p(βJ1+1 , ..., βJ ) ∝ N (1α, σβ2 I) → random effects p(β1, ..., βJ1 ) ∝ 1 → “fixed” effects • Mixed effects models: models • This is general formulation for several more general β ∼ N (1α, σβ2 I) 10 fine the clusters or groups achieved by conditioning on the indicators that de- • Exchangeability at the level of the observations is • The J components of β are divided into K clusters βjk |αk , σk2 ∼ N (1αk , σk2I), ; ; jk = 1, ..., Jk bj1 |α1, σ12 ∼ N (1α1, σ12I), j1 = 1, ..., J1 .. p(βi) ∝ 1, i = 1, ..., I ferent random effects that generate different sets of is reflected by indicators whose regression coefficients have the population distribution • A more general version of the mixed model has dif- with a random effects model where class membership Mixed effects models • Positive intra-class correlations can be accommodated Intra-class correlation