'Counting chickens and other tales' Using random

advertisement
Using complex random effect
models in epidemiology and
ecology
Dr William J. Browne
School of Mathematical Sciences
University of Nottingham
Outline
• Background to my research, random effect and
multilevel models and MCMC estimation.
• Random effect models for complex data
structures including artificial insemination and
Danish chicken examples.
• Multivariate random effect models and great tit
nesting behaviour.
• Efficient MCMC algorithms.
• Conclusions and future work.
Background
• 1995-1998 – PhD in Statistics, University of Bath.
“Applying MCMC methods to multilevel models.”
• 1998-2003 – Postdoctoral research positions at the
Centre for Multilevel Modelling at the Institute of
Education, London.
• 2003-2006 – Lecturer in Statistics at University of
Nottingham.
• 2006-2007 Associate professor of Statistics at University
of Nottingham.
• 2007- Professor in Biostatistics, University of Bristol.
Research interests:
Multilevel modelling, complex random effect modelling,
applied statistics, Bayesian statistics and MCMC
estimation.
Random effect models
• Models that account for the underlying structure in the dataset.
• Originally developed for nested structures (multilevel models), for
example in education, pupils nested within schools.
• An extension of linear modelling with the inclusion of random effects.
A typical 2-level model is
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
Here i might index pupils and j index schools.
Alternatively in another example i might index cows and j index herds.
The important thing is that the model and statistical methods used are
the same!
Estimation Methods for Multilevel
Models
Due to additional random effects no simple matrix formulae
exist for finding estimates in multilevel models.
Two alternative approaches exist:
1. Iterative algorithms e.g. IGLS, RIGLS, that alternate
between estimating fixed and random effects until
convergence. Can produce ML and REML estimates.
2. Simulation-based Bayesian methods e.g. MCMC that
attempt to draw samples from the posterior distribution
of the model.
One possible computer program to use for multilevel
models which incorporates both approaches is MLwiN.
MLwiN
• Software package designed specifically for fitting
multilevel models.
• Developed by a team led by Harvey Goldstein and Jon
Rasbash at the Institute of Education in London over
past 15 years or so. Earlier incarnations ML2, ML3,
MLN.
• Originally contained ‘classical’ estimation methods
(IGLS) for fitting models.
• MLwiN launched in 1998 also included MCMC
estimation.
• My role in the team was as developer of the MCMC
functionality in MLwiN in my time at Bath and during 4.5
years at the IOE.
Note: MLwiN core team relocated to Bristol in 2005.
MCMC Algorithm
• Consider the 2-level normal response model
yij   0  xij 1  u j  eij
u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
• MCMC algorithms usually work in a Bayesian
framework and so we need to add prior
distributions for the unknown parameters.
• Here there are 4 sets of unknown parameters:
 , u,  , 
2
u
2
e
• We will add prior distributions
2
2
p(  ), p( u ), p( e )
MCMC Algorithm (2)
One possible MCMC algorithm for this model then
involves simulating in turn from the 4 sets of
conditional distributions. Such an algorithm is known
as Gibbs Sampling. MLwiN uses Gibbs sampling for
all normal response models.
Firstly we set starting values for each group of unknown
2
2
parameters,  (0) , u(0) ,  u (0) ,  e (0)
Then sample from the following conditional distributions,
firstly
p( | y, u ,  2 ,  2 )
( 0)
To get  (1) .
u ( 0)
e ( 0)
MCMC Algorithm (3)
We next sample from
p(u | y, (1) ,  u2(0) ,  e2(0) )
to get u (1), then
p( u2 | y, (1) , u(1) ,  e2(0) )
to get  u (1) , then finally
p( e2 | y, (1) , u(1) ,  u2(1) )
2
To get  e2(1) . We have then updated all of the unknowns
in the model. The process is then simply repeated
many times, each time using the previously
generated parameter values to generate the next set
Burn-in and estimates
Burn-in: It is general practice to throw away the
first n values to allow the Markov chain to
approach its equilibrium distribution namely the
joint posterior distribution of interest. These
iterations are known as the burn-in.
Finding Estimates: We continue generating
values at the end of the burn-in for another m
iterations. These m values are then averaged to
give point estimates of the parameter of interest.
Posterior standard deviations and other
summary measures can also be obtained from
the chains.
So why use MCMC?
• Often gives better (in terms of bias) estimates for nonnormal responses (see Browne and Draper, 2006).
• Gives full posterior distribution so interval estimates for
derived quantities are easy to produce.
• Can easily be extended to more complex problems as
we will see next.
• Potential downside 1: Prior distributions required for all
unknown parameters.
• Potential downside 2: MCMC estimation is much slower
than the IGLS algorithm.
• For more information see my book: MCMC Estimation in
MLwiN – Browne (2003).
Extension 1: Cross-classified models
For example, schools by neighbourhoods. Schools will draw pupils from
many different neighbourhoods and the pupils of a neighbourhood will go
to several schools. No pure hierarchy can be found and pupils are said to
be contained within a cross-classification of schools by neighbourhoods:
nbhd 1
nbhd 2
School 1
xx
x
School 2
x
x
Nbhd 3
School 3
xx
x
School 4
x
xxx
School
Pupil
Nbhd
S1
S2
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
N1
N2
N3
Notation
With hierarchical models we use a subscript notation that has one
subscript per level and nesting is implied reading from the left. For
example, subscript pattern ijk denotes the i’th level 1 unit within the
j’th level 2 unit within the k’th level 3 unit.
If models become cross-classified we use the term classification
instead of level. With notation that has one subscript per
classification, that also captures the relationship between
classifications, notation can become very cumbersome. We propose
an alternative notation introduced in Browne et al. (2001) that only
has a single subscript no matter how many classifications are in the
model.
Single subscript notation
School
Pupil
S1
S3
S4
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12
Nbhd
i
1
2
3
4
5
6
7
8
9
10
11
12
S2
N1
nbhd(i)
1
2
1
2
1
2
2
3
3
2
3
3
sch(i)
1
1
1
2
2
2
3
3
4
4
4
4
N2
N3
We write the model as
( 2)
(3)
yi  0  unbhd
( i )  usch(i )  ei
(1)
where classification 2 is neighbourhood and classification 3
is school. Classification 1 always corresponds to the
classification at which the response measurements are
made, in this case pupils. For pupils 1 and 11 equation (1)
becomes:
y1   0  u1( 2)  u1(3)  e1
y11   0  u3( 2)  u 4(3)  e1
Classification diagrams
In the single subscript notation we lose information about the relationship
(crossed or nested) between classifications. A useful way of conveying this
information is with the classification diagram. Which has one node per
classification and nodes linked by arrows have a nested relationship and
unlinked nodes have a crossed relationship.
Neighbourhood
School
Neighbourhood
School
Pupil
Nested structure where
schools are contained within
neighbourhoods
Pupil
Cross-classified structure where
pupils from a school come from
many neighbourhoods and pupils
from a neighbourhood attend several
schools.
Example : Artificial insemination by donor
1901 women
279 donors
1328 donations
12100 ovulatory cycles
response is whether conception occurs in a given cycle
In terms of a unit diagram:
Women
Cycles
w1
c1 c2 c3 c4…
Or a classification diagram:
w2
w3
c1 c2 c3 c4…
Donor
c1 c2 c3 c4…
Donation
Donations
Donors
d1
m1
d2
d1 d2 d3
d1 d2
m2
m3
Woman
Cycle
Model for artificial insemination data
We can write the model as
yi ~ Binomial(1, i )
Results:
Parameter
Description
Estimate(se)
( 2)
( 3)
( 4)
logit(  i )  ( X ) i  u woman
(i )  u donation(i )  u donor(i )
0
intercept
( 2)
2
u woman
(i ) ~ N (0, u ( 2 ) )
1
azoospermia
0.22(0.11)
( 3)
2
u donation
(i ) ~ N (0, u (3) )
2
semen quality
0.19(0.03)
( 4)
2
u donor
(i ) ~ N (0, u ( 4 ) )
3
womens age>35
4
sperm count
0.20(0.07)
5
sperm motility
0.02(0.06)
6
insemination to early
-0.72(0.19)
7
insemination to late
-0.27(0.10)
Note cross-classified models can
be fitted in IGLS but are far easier
to fit using MCMC estimation.
-4.04(2.30)
-0.30(0.14)
 u2( 2)
women variance
1.02(0.21)
 u2(3)
donation variance
0.644(0.21)
 u2( 4)
donor variance
0.338(0.07)
Extension 2: Multiple membership models
When level 1 units are members of more than one higher
level unit we describe a model for such data as a multiple
membership model.
For example,
• Pupils change schools/classes and each school/class has
an effect on pupil outcomes.
• Patients are seen by more than one nurse during the
course of their treatment.
Notation
yi  ( XB) i 
w
( 2)
i, j
jnurse( i )
u
( 2)
j
 ei
u (j2 ) ~ N (0,  u2( 2 ) )
(2)
ei ~ N (0,  e2 )
n1(j=1)
n2(j=2)
n3(j=3)
p1(i=1)
0.5
0
0.5
p2(i=2)
1
0
0
p3(i=3)
0
0.5
0.5
p4(i=4)
0.5
0.5
0
y1  ( XB)1  0.5u1( 2 )  0.5u3( 2 )  e1
y2  ( XB) 2  1u1( 2 )  e2
y3  ( XB) 3  0.5u2( 2 )  0.5u3( 2 )  e3
y4  ( XB) 4  0.5u1( 2 )  0.5u2( 2 )  e4
Note that nurse(i) now indexes the set of
nurses that treat patient i and w(2)i,j is a
weighting factor relating patient i to nurse j.
For example, with four patients and three
nurses, we may have the following weights:
Here patient 1 was seen by nurse 1
and 3 but not nurse 2 and so on. If
we substitute the values of w(2)i,j , i
and j. from the table into (2) we get
the series of equations :
Classification diagrams for multiple
membership relationships
Double arrows indicate a multiple membership relationship between
classifications.
nurse
We can mix multiple membership, crossed and
hierarchical structures in a single model.
hospital
patient
nurse
GP practice
patient
Here patients are multiple members of nurses,
nurses are nested within hospitals and GP
practice is crossed with both nurse and hospital.
Example involving nesting, crossing and
multiple membership – Danish chickens
Production hierarchy
10,127 child flocks
725 houses
304 farms
Breeding hierarchy
10,127 child flocks
200 parent flocks
As a unit diagram:
farm
As a classification diagram:
f1
f2…
Farm
Houses
Child flocks
Parent flock
h1
c1 c2 c3…
p1
h2
h1
c1 c2 c3….
p2
h2
c1 c2 c3…. c1 c2 c3….
p3
p4
p5….
House
Parent flock
Child flock
Model and results
yi ~ Binomial(1,  i )
logit( i )  ( XB )i 
( 3)
( 4)
 wi(,2j)u (j2)  uhouse
( i )  u farm ( i )  ei
j p . flock ( i )
( 3)
2
u (j2 ) ~ N (0,  u2( 2 ) ) uhouse
( i ) ~ N (0,  u ( 3) )
4)
2
u (farm
( i ) ~ N (0,  u ( 4 ) )
Response is cases of
salmonella
Note multiple membership
models can be fitted in IGLS
and this model/dataset
represents roughly the most
complex model that the
method can handle.
Such models are far easier to
fit using MCMC estimation.
Results:
Parameter
Description
Estimate(se)
0
intercept
-2.322(0.213)
1
1996
-1.239(0.162)
2
1997
-1.165(0.187)
3
hatchery 2
-1.733(0.255)
4
hatchery 3
-0.211(0.252)
5
hatchery 4
-1.062(0.388)
 u2( 2)
parent flock variance
0.895(0.179)
 u2( 3)
house variance
0.208(0.108)
 u2( 4)
farm variance
0.927(0.197)
Random effect modelling of
great tit nesting behaviour
• An extension of cross-classified models to
multivariate responses.
• Collaborative research with Richard Pettifor
(Institute of Zoology, London), and Robin
McCleery and Ben Sheldon (University of
Oxford).
Wytham woods great tit dataset
• A longitudinal study of great tits
nesting in Wytham Woods,
Oxfordshire.
• 6 responses : 3 continuous & 3 binary.
• Clutch size, lay date and mean
nestling mass.
• Nest success, male and female
survival.
• Data: 4165 nesting attempts over a
period of 34 years.
• There are 4 higher-level classifications
of the data: female parent, male
parent, nestbox and year.
Data background
The data structure can be summarised as follows:
Source
Number
of IDs
Median
#obs
Mean
#obs
Year
34
104
122.5
Nestbox
968
4
4.30
Male parent
2986
1
1.39
Female parent
2944
1
1.41
Note there is very little information on each
individual male and female bird but we can get
some estimates of variability via a random effects
model.
Diagrammatic representation of the dataset.
Year
Nest-box
1
3
♀16♂16
70
69
2
♀14♂14
5
6
♀17♂17
♀15♂18
♀18♂11
♀13♂15
♀11♂10
68
4
♀15♂11
♀13♂13
♀11♂10
67
66
♀8♂8
65
♀5♂1
64
♀1♂1
♀12♂11
♀12♂11
♀9♂6
♀6♂5
♀7♂6
♀2♂2
♀10♂12
♀10♂9
♀3♂7
♀3♂3
♀4♂4
Univariate cross-classified
random effect modelling
• For each of the 6 responses we will firstly fit a univariate
model, normal responses for the continuous variables
and probit regression for the binary variables. For
example using notation of Browne et al. (2001) and
letting response yi be clutch size:
(2)
(3)
(4)
(5)
y   u
u
u
u
e
i
male(i )
female(i ) nestbox(i )
year (i ) i
(2)
(3)
u
~ N (0,  2 ), u
~ N (0,  2 ),
male(i )
u (2)
female(i )
u (3)
(4)
(5)
u
~ N (0,  2 ), u
~ N (0,  2 ),
nestbox(i )
u (4)
year (i )
u (5)
e ~ N (0,  2 )
i
e
Estimation
• We use MCMC estimation in MLwiN and choose
‘diffuse’ priors for all parameters.
• We run 3 MCMC chains from different starting
points for 250k iterations each (500k for binary
responses) and use the Gelman-Rubin
diagnostic to decide burn-in length.
• We compared results with the equivalent
classical model using the Genstat software
package and got broadly similar results.
• We fit all four higher classifications and do not
consider model comparison.
Clutch Size
Parameter
Estimate (S.E.)
β
8.808 (0.109)
0.365 (0.101)
0.107 (0.026)
0.046 (0.043)
0.974 (0.062)
1.064 (0.055)
 u2(5)
(Year)
 u2( 4) (Nest box)
 u2(3) (Male)
 u2( 2) (Female)
 e2 (Observation)
Percentage
variance
14.3%
4.2%
1.8%
38.1%
41.6%
Here we see that the average clutch size is just
below 9 eggs with large variability between female
birds and some variability between years. Male birds
and nest boxes have less impact.
Lay Date (days after April 1st)
Parameter
β
 u2(5)
(Year)
 u2( 4) (Nest box)
 u2(3) (Male)
 u2( 2) (Female)
 e2 (Observation)
Estimate (S.E.)
29.38 (1.07)
37.74 (10.08)
3.38 (0.56)
0.22 (0.39)
8.55 (1.03)
25.10 (1.04)
Percentage
variance
50.3%
4.5%
0.3%
11.4%
33.5%
Here we see that the mean lay date is around the
end of April/beginning of May. The biggest driver of
lay date is the year which is probably indicating
weather differences. There is some variability due to
female birds but little impact of nest box and male
bird.
Nestling Mass
Parameter
Estimate (S.E.)
β
18.829 (0.060)
0.105 (0.032)
0.026 (0.013)
0.153 (0.030)
0.163 (0.031)
0.720 (0.035)
 u2(5)
(Year)
 u2( 4) (Nest box)
 u2(3) (Male)
 u2( 2) (Female)
 e2 (Observation)
Percentage
variance
9.0%
2.2%
13.1%
14.0%
61.7%
Here the response is the average mass of the chicks
in a brood at 10 days old. Note here lots of the
variability is unexplained and both parents are
equally important.
Human example
Sarah Victoria Browne
Helena Jayne Browne
Born 20th July 2004
Born 22nd May 2006
Birth Weight 6lb 6oz
Birth Weight 8lb 0oz
Father’s birth weight 9lb 13oz, Mother’s birth weight 6lb 8oz
Nest Success
Parameter
Estimate (S.E.)
β
0.010 (0.080)
0.191 (0.058)
0.025 (0.020)
0.065 (0.054)
0.060 (0.052)
 u2(5)
 u2( 4)
 u2(3)
 u2( 2)
(Year)
(Nest box)
(Male)
(Female)
Percentage
overdispersion
56.0%
7.3%
19.1%
17.6%
Here we define nest success as one of the ringed
nestlings captured in later years. The value 0.01 for β
corresponds to around a 50% success rate. Most of
the variability is explained by the Binomial
assumption with the bulk of the over-dispersion
mainly due to yearly differences.
Male Survival
Parameter
Estimate (S.E.)
β
-0.428 (0.041)
0.032 (0.013)
0.006 (0.006)
0.025 (0.023)
0.014 (0.017)
 u2(5)
 u2( 4)
 u2(3)
 u2( 2)
(Year)
(Nest box)
(Male)
(Female)
Percentage
overdispersion
41.6%
7.8%
32.5%
18.2%
Here male survival is defined as being observed
breeding in later years. The average probability is
0.334 and there is very little over-dispersion with
differences between years being the main factor.
Note the actual response is being observed breeding
in later years and so the real probability is higher!
Female survival
Parameter
Estimate (S.E.)
β
-0.302 (0.048)
0.053 (0.018)
0.065 (0.024)
0.014 (0.017)
0.013 (0.014)
 u2(5)
 u2( 4)
 u2(3)
 u2( 2)
(Year)
(Nest box)
(Male)
(Female)
Percentage
overdispersion
36.6%
44.8%
9.7%
9.0%
Here female survival is defined as being observed
breeding in later years. The average probability is
0.381 and again there isn’t much over-dispersion with
differences between nestboxes and years being the
main factors.
Multivariate modelling of the
great tit dataset
• We now wish to combine the six univariate models into
one big model that will also account for the correlations
between the responses.
• We choose a MV Normal model and use latent variables
(Chib and Greenburg, 1998) for the 3 binary responses
that take positive values if the response is 1 and
negative values if the response is 0.
• We are then left with a 6-vector for each observation
consisting of the 3 continuous responses and 3 latent
variables. The latent variables are estimated as an
additional step in the MCMC algorithm and for
identifiability the elements of the level 1 variance matrix
that correspond to their variances are constrained to
equal 1.
Multivariate Model
(2)
(3)
(4)
(5)
y   u
u
u
u
e
i
male(i )
female(i ) nestbox(i )
year (i ) i
(2)
(3)
u
~ MVN (0, 
), u
~ MVN (0, 
),
male(i )
u (2)
female(i )
u (3)
(4)
(5)
u
~ MVN (0, 
), u
~ MVN (0, 
),
nestbox(i )
u (4)
year (i )
u (5)
Here the vector valued
response is decomposed into
a mean vector plus random
effects for each classification.
e ~ MVN (0,  )
i
e
Inverse Wishart priors are used
for each of the classification
variance matrices. The values
are based on considering overall
variability in each response and
assuming an equal split for the 5
classifications.
p(u(i) ) ~ invWishart [6,6 * Su(i) ],
6
0
0
0
0.5 0



(
i
)
Su  




0
0
0
15 0
0
0 0.2
0
0 0 0.05
0
0
0
0
0
0
0
0
0
0
0
0 

0 

0 
0 
0.05
0 

0
0.05
Use of the multivariate model
• The multivariate model was fitted using an
MCMC algorithm programmed into the MLwiN
package which consists of Gibbs sampling steps
for all parameters apart from the level 1 variance
matrix which requires Metropolis sampling (see
Browne 2006).
• The multivariate model will give variance
estimates in line with the 6 univariate models.
• In addition the covariances/correlations at each
level can be assessed to look at how
correlations are partitioned.
e
z
t
c
h
s
i
Partitioning of covariances
C
Survival
l
Raw phenoty pic correlation
L
a
y
i
n
(1) Caus al environmental
effec t
g
Raw phenoty pic correlation
b
u
a
d
a
t
e
(2) Female condition
effect
F
Clutch Size
Survival
(i) Fem ale
Lay ing date
Clutch Size
Clutch Size
Lay ing date
u
Lay ing date
Lay ing date
(iii) Obs er vation
c
partitioning of
c ov ariance at
different lev els
Clutch Size
(ii) Fe m ale
Clutch Size
Lay ing date
e
Fec undity
(ii) Obse rvation
Lay ing date
Survival
(i) Ter r itor y
Clutch Size
[partitioning of
c ov ariance
at different lev els]
Fec undity
n
d
i
t
y
Correlations from a 1-level model
•
If we ignore the structure of the data and consider it as 4165
independent observations we get the following correlations:
CS
LD
NM
NS
MS
LD
-0.30
X
X
X
X
NM
-0.09
-0.06
X
X
X
NS
0.20
-0.22
0.16
X
X
MS
0.02
-0.02
0.04
0.07
X
FS
-0.02
-0.02
0.06
0.11
0.21
Note correlations in bold are statistically significant i.e. 95% credible
interval doesn’t contain 0.
Correlations in full model
CS
LD
NM
NS
MS
LD
N, F, O
-0.30
X
X
X
X
NM
F, O
-0.09
F, O
-0.06
X
X
X
NS
Y, F
0.20
N, F, O
-0.22
O
0.16
X
X
MS
0.02
-0.02
0.04
Y
0.07
X
FS
F, O
-0.02
F, O
-0.02
0.06
Y, F
0.11
Y, O
0.21
Key: Blue +ve, Red –ve: Y – year, N – nestbox, F – female, O - observation
Pairs of antagonistic covariances at
different classifications
There are 3 pairs of antagonistic correlations i.e.
correlations with different signs at different
classifications:
LD & NM : Female 0.20 Observation -0.19
Interpretation: Females who generally lay late, lay heavier
chicks but the later a particular bird lays the lighter its
chicks.
CS & FS : Female 0.48 Observation -0.20
Interpretation: Birds that lay larger clutches are more likely
to survive but a particular bird has less chance of
surviving if it lays more eggs.
LD & FS : Female -0.67 Observation 0.11
Interpretation: Birds that lay early are more likely to survive
but for a particular bird the later they lay the better!
Prior Sensitivity
Our choice of variance prior assumes a priori
• No correlation between the 6 traits.
• Variance for each trait is split equally between the 5
classifications.
We compared this approach with a more Bayesian
approach by splitting the data into 2 halves:
In the first 17 years (1964-1980) there were 1,116
observations whilst in the second 17 years (1981-1997)
there were 3,049 observations
We therefore used estimates from the first 17 years of the
data to give a prior for the second 17 years and
compared this prior with our earlier prior.
Correlations for 2 priors
CS
LD
NM
NS
MS
LD
1. N, F, O
2. N, F, O
(N, F, O)
X
X
X
X
NM
1. F, O
2. F, O
(F, O)
1. O
2. O
(F, O)
X
X
X
NS
1. Y, F
2. Y, F
(Y, F)
1. Y, F, O
2. N, F, O
(N, F, O)
1. O
2. O
(O)
X
X
MS
-
1. M
2. M, O
-
-
1. Y
2. Y
(Y)
X
FS
1. F, O
2. F, O
(F, O)
1. F, O
2. F, O
(F, O)
-
1. Y, F
2. Y, F
(Y, F)
1. Y, O
2. Y, O
(Y, O)
Key: Blue +ve, Red –ve: 1,2 prior numbers with full data results in brackets
Y – year, N – nestbox, M – male, F – female, O - observation
MCMC efficiency for clutch size
response
• The MCMC algorithm used in the univariate analysis
of clutch size was a simple 10-step Gibbs sampling
algorithm.
• The same Gibbs sampling algorithm can be used in
both the MLwiN and WinBUGS software packages
and we ran both for 50,000 iterations.
• To compare methods for each parameter we can look
at the effective sample sizes (ESS) which give an
estimate of how many ‘independent samples we have’
for each parameter as opposed to 50,000 dependent
samples.

• ESS = # of iterations/,   1  2  (k )
k 1
Effective Sample sizes
The effective sample sizes are similar for both packages.
Note that MLwiN is 5 times quicker!!
Parameter
Fixed Effect
Year
Nestbox
Male
Female
Observation
Time
MLwiN
WinBUGS
671
602
30632
29604
833
788
36
33
3098
3685
110
135
519s
2601s
We will now consider methods that will improve the
ESS values for particular parameters. We will firstly
consider the fixed effect parameter.
Trace and autocorrelation plots for fixed effect
using standard Gibbs sampling algorithm
Hierarchical Centering
This method was devised by Gelfand et al. (1995) for
use in nested models. Basically (where feasible)
parameters are moved up the hierarchy in a model
reformulation. For example:
yij   0  u j  eij , u j ~ N (0,  u2 ), eij ~ N (0,  e2 )
is equivalent to
yij   j  eij ,  j ~ N ( 0 ,  u2 ), eij ~ N (0,  e2 )
The motivation here is we remove the strong negative
correlation between the fixed and random effects by
reformulation.
Hierarchical Centering
In our cross-classified model we have 4 possible hierarchies
up which we can move parameters. We have chosen to move
the fixed effect up the year hierarchy as it’s variance had
biggest ESS although this choice is rather arbitrary.
2)
( 3)
( 4)
(5)
y i  u (female
( i )  u male( i )  u nestbox( i )   year ( i )  ei ,
2)
2
( 3)
2
u (female
~
N
(
0
,

),
u
~
N
(
0
,

(i )
u ( 2)
male( i )
u ( 3) ),
( 4)
2
(5)
2
2
u nestbox
~
N
(
0
,

),

~
N
(

,

),
e
~
N
(
0
,

(i )
u ( 4)
year ( i )
0
u (5)
i
e ),
 0  1,  u2( 2 ) ~  1 ( ,  ),  u2(3) ~  1 ( ,  ),
 u2( 4) ~  1 ( ,  ),  u2(5) ~  1 ( ,  ),  e2 ~  1 ( ,  ).
The ESS for the fixed effect increases 50-fold from 602 to
35,063 while for the year level variance we have a smaller
improvement from 29,604 to 34,626. Note this formulation
also runs faster 1864s vs 2601s (in WinBUGS).
Trace and autocorrelation plots for fixed effect
using hierarchical centering formulation
Parameter Expansion
• We next consider the variances and in particular
the between-male bird variance.
• When the posterior distribution of a variance
parameter has some mass near zero this can hamper
the mixing of the chains for both the variance
parameter and the associated random effects.
• The pictures over the page illustrate such poor
mixing.
• One solution is parameter expansion (Liu et al.
1998).
• In this method we add an extra parameter to the
model to improve mixing.
Trace plots for between males variance and a
sample male effect using standard Gibbs sampling
algorithm
Autocorrelation plot for male variance and a
sample male effect using standard Gibbs sampling
algorithm
Parameter Expansion
In our example we use parameter expansion for all 4
hierarchies. Note the  parameters have an impact on
both the random effects and their variance.
2)
( 3)
( 4)
( 5)
yi   0   2 v (female
( i )   3vmale( i )   4 vnestbox( i )   5 v year ( i )  ei ,
2)
2
( 3)
2
v (female
( i ) ~ N (0,  v ( 2 ) ), vmale( i ) ~ N (0,  v ( 3) ),
( 4)
2
( 5)
2
2
vnestbox
( i ) ~ N (0,  v ( 4 ) ), v year ( i ) ~ N (0,  v ( 5 ) ), ei ~ N (0,  e ),
 0  1, k  1,  v2( k ) ~  1 ( ,  ), k  2,..5,  e2 ~  1 ( ,  )
The original parameters can be found by:
ui( k )   k vi( k ) and  u2( k )   k2 v2( k )
Note the models are not identical as we now have
different prior distributions for the variances.
Parameter Expansion
• For the between males variance we have a 20-fold
increase in ESS from 33 to 600.
• The parameter expanded model has different prior
distributions for the variances although these priors
are still ‘diffuse’.
• It should be noted that the point and interval estimate
of the level 2 variance has changed from
0.034 (0.002,0.126) to 0.064 (0.000,0.172).
• Parameter expansion is computationally slower
3662s vs 2601s for our example.
Trace plots for between males variance and a
sample male effect using parameter expansion.
Autocorrelation plot for male variance and a
sample male effect using parameter expansion.
Combining the two methods
Hierarchical centering and parameter expansion can
easily be combined in the same model. Here we
perform centering on the year classification and
parameter expansion on the other 3 hierarchies.
2)
( 3)
( 4)
( 5)
yi   0   2 v (female
( i )   3vmale( i )   4 vnestbox( i )   year ( i )  ei ,
2)
2
( 3)
2
v (female
( i ) ~ N (0,  v ( 2 ) ), vmale( i ) ~ N (0,  v ( 3) ),
( 4)
2
( 5)
2
2
vnestbox
( i ) ~ N (0,  u ( 4 ) ),  year ( i ) ~ N (  0 ,  v ( 5 ) ), ei ~ N (0,  e ),
 0  1, k  1,  v2( k ) ~  1 ( ,  ), k  2,..5,  e2 ~  1 ( ,  )
Effective Sample sizes
As we can see below the effective sample sizes for
all parameters are improved for this formulation
while running time remains approximately the same.
Parameter
Fixed Effect
Year
Nestbox
Male
Female
Observation
Time
WinBUGS
originally
602
WinBUGS
combined
34296
29604
34817
788
5170
33
557
3685
8580
135
1431
2601s
2526s
Conclusions
• In this talk we have considered using complex random
effects models in three application areas.
• For the bird ecology example we have seen how these
models can be used to partition both variability and
correlation between various classifications to identify
interesting relationships.
• We then investigated hierarchical centering and
parameter expansion for a model for one of our
responses. These are both useful methods for improving
mixing when using MCMC.
• Both methods are simple to implement in the WinBUGS
package and can be easily combined to produce an
efficient MCMC algorithm.
Further Work
• Incorporating hierarchical centering and
parameter expansion in MLwiN.
• Investigating their use in conjunction with
the Metropolis-Hastings algorithm.
• Investigate block-updating methods e.g.
the structured MCMC algorithm.
• Extending the methods to our original
multivariate response problem.
References
•
•
•
•
•
•
•
•
•
•
•
Browne, W.J. (2002). MCMC Estimation in MLwiN. London: Institute of Education, University of
London
Browne, W.J. (2004). An illustration of the use of reparameterisation methods for improving
MCMC efficiency in crossed random effect models. Multilevel Modelling Newsletter 16 (1): 13-25
Browne, W.J. (2006). MCMC Estimation of ‘constrained’ variance matrices with applications in
multilevel modelling. Computational Statistics and Data Analysis. 50: 1655-1677.
Browne, W.J. and Draper D. (2006). A Comparison of Bayesian and likelihood methods for fitting
multilevel models (with discussion). Bayesian Analysis. 1: 473-550.
Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple membership multiple classification
(MMMC) models. Statistical Modelling 1: 103-124.
Browne, W.J., McCleery, R.H., Sheldon, B.C., and Pettifor, R.A. (2006). Using cross-classified
multivariate mixed response models with application to the reproductive success of great tits
(Parus Major). Statistical Modelling (to appear)
Chib, S. and Greenburg, E. (1998). Analysis of multivariate probit models. Biometrika 85, 347361.
Gelfand A.E., Sahu S.K., and Carlin B.P. (1995). Efficient Parametrizations For Normal Linear
Mixed Models. Biometrika 82 (3): 479-488.
Kass, R.E., Carlin, B.P., Gelman, A. and Neal, R. (1998). Markov chain Monte Carlo in practice: a
roundtable discussion. American Statistician, 52, 93-100.
Liu, C., Rubin, D.B., and Wu, Y.N. (1998) Parameter expansion to accelerate EM: The PX-EM
algorithm. Biometrika 85 (4): 755-770.
Rasbash, J., Browne, W.J., Goldstein, H., Yang, M., Plewis, I., Healy, M., Woodhouse, G.,Draper,
D., Langford, I., Lewis, T. (2000). A User’s Guide to MLwiN, Version 2.1, London: Institute of
Education, University of London.
Download