Manuscript second revission - IFM

advertisement
1
Title:
2
Estimation of distance related probability of animal movements between
3
holdings and implications for disease spread modeling.
4
Authors: Tom Lindströma, Scott A. Sissonb, Maria Nöremarkc, Annie Jonssond and Uno
5
Wennergrena
6
a
IFM Theory and Modelling, Linköping University, 581 83 Linköping, Sweden
7
b
School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia
8
c
Department of Disease Control and Epidemiology, SVA, National Veterinary Institute, 751 89
9
Uppsala, Sweden
10
d
11
Sweden
Research Centre of Systems Biology, Ecological modelling, University of Skövde, 541 28 Skövde,
12
13
Corresponding Author:
14
Uno Wennergren
15
Tel: +46 13 28 16 66
16
Fax: +46 13 28 13 99
17
Email: unwen@ifm.liu.se
18
Correspondence address: See above.
19
20
Abstract
1
1
Between holding contacts are more common over short distances and this may have implications for
2
the dynamics of disease spread through these contacts. A good estimation of how contacts depend on
3
distance is therefore important when modeling livestock diseases. In this study, we have developed a
4
method for analyzing distant dependent contacts and applied it to animal movement data from Sweden.
5
The data was analyzed with two competing models. The first model assumes that contacts arise from a
6
purely distance dependent process. The second is a mixture model and assumes that, in addition, some
7
contacts arise independent of distance. Parameters were estimated with a Bayesian Markov Chain
8
Monte Carlo (MCMC) approach and the model probabilities were compared. We also investigated
9
possible between model differences in predicted contact structures, using a collection of network
10
measures.
11
We found that the mixture model was a much better model for the data analyzed. Also, the network
12
measures showed that the models differed considerably in predictions of contact structures, which is
13
expected to be important for disease spread dynamics. We conclude that a model with contacts being
14
both dependent on, and independent of, distance was preferred for modeling the example animal
15
movement contact data.
16
Key words
17
Markov Chain Monte Carlo; Mixture Models; Model Selection; Animal Movements; Disease
18
Transmission; Network Analysis
19
20
1. Introduction
21
Pathogens can spread between animal holdings both through direct animal contact and indirect
22
contacts such as persons, vehicles, equipment and products of animal origin. Although some diseases,
23
such as Foot and Mouth Disease (FMD), spread easily through several different routes, direct contact
24
through relocation of infected animals is almost always one of the most important routes of spread.
25
Analyzing contacts between holdings offers the possibility to make predictions about disease
2
1
transmission (Kao et al. 2007) and allows testing of the effect of changed contact patterns (Velthuis
2
and Mourits 2007). In this study we focus on contacts through live animal movements and present a
3
method to analyze the spatial aspect of animal movement. The method presented can however be
4
applied to most types of between holding contacts, either directly or with minor modifications, if such
5
data is available.
6
A possible approach for modeling disease spread is to assume all between-holding contacts to be
7
equally probable, hence fulfilling the criteria of mass-action mixing (MAM). From a disease
8
transmission perspective this means that one infected holding can infect all other holdings with equal
9
probability (assuming equal probability that contact will result in infection). While such an assumption
10
may provide a theoretical epidemiological insight, in most instances contacts do not occur according to
11
MAM. Consequently the dynamics of epidemics will likely deviate from the prediction made by such
12
a supposition (Keeling 2005). Mollison et al. (1993) identified two main deviations from MAM
13
assumptions; those related to population differences (in this case differences in holding characteristics)
14
and those caused by mixing heterogeneities.
15
This paper focuses on the latter and in particular heterogeneity caused by distances between holdings.
16
Between holding contacts, including the live animal movements considered here, occur more likely
17
over short distances (Boender et al. 2007, Robinson and Christley 2007) and consequently epidemics
18
often show spatial aggregation patterns. A good description of how contact probabilities vary with
19
distance is essential for policy making and modeling of livestock diseases. SI and SIR (S, I and R
20
denoting susceptible, infected and recovered holdings, respectively) models assumes that the infective
21
unit, i.e. the holding, can only be infected once. Following that assumption, transmission through
22
limited number of contacts will generally result in an epidemic with a more rapid decline in the
23
reproductive ratio (the number of infections caused by one unit during its infectious period).
24
Comparing two theoretical epidemics with the same initial reproductive ratio but transmission through
25
different number of contacts, depletion of susceptible holdings (as they become infected) will be more
26
crucial when contacts are inherently rare. When transmission occurs through contacts that are more
27
common at short distances the effect is augmented (Keeling 1999). There is a higher probability than
3
1
expected from MAM, that if holding A has contacts with holdings B and C there is also a possible
2
contact between B and C. Hence, if holding A infects holding B there is not only a decrease of local
3
susceptibles for holding A but also for holding C.
4
Subsequently, one may consider that a more realistic alternative to assuming MAM would be to
5
express distance dependent probabilities of contacts using a kernel function. The pattern of decreasing
6
probability with distance can then be described in a generalized way by variance and kurtosis.
7
Variance is a measurement of how rapidly the probability of contact decreases with distance. A high
8
variance means that more contacts/movements occur over long distances. Kurtosis can be viewed as
9
measurement of the “shape” of the kernel. A kernel with high kurtosis means there are many contacts
10
at short distances but at the same time a fat tail describing a large proportion of long-distance contacts.
11
A low kurtosis means that contact probabilities are more uniform over some distance and long-
12
distance contacts are rare. Distributions with different variance and kurtosis are shown in Figure 1.
13
The aim of the study presented here is twofold. First, we present a method for analyzing the
14
probability of contacts through live animal movements given the distance between holdings. It uses
15
Bayesian inference to analyze the data and we obtain the posterior distribution of parameters with
16
Markov Chain Monte Carlo (MCMC) techniques. Two models were implemented; One assuming all
17
contacts dependent on distance and one assuming that in addition some contacts arise independent of
18
distance. The second aim of this study is to investigate possible between model differences in
19
predictions on disease spread via the estimated contact probabilities. This was done using a number of
20
network measurements.
21
22
2. Materials and Methods
23
2.1 Data
24
Data on cattle and pig movements were provided by the Swedish Board of Agriculture. The data
25
consisted of all reported and registered movements during the time period from 1st of July 2005 until
4
1
30th of June 2006. Furthermore, data on the location of individual cattle at the start and the end of this
2
period were provided.
3
Pig movements are reported by the farmer receiving the animals and registered at the group level.
4
Cattle movements are registered at the individual level, reported both by the farmer at the holding of
5
origin and the farmer at the holding of destination. Since this study focuses on movements and not
6
relocation of individual animals, cattle moved between two holdings with the same departure date as
7
well as arrival date were assumed to constitute one movement. Cattle movements which did not have a
8
fully corresponding report were checked for inconsistencies and were excluded unless they could be
9
connected with another reported movement in the dataset (e.g. the same individual and the same
10
holdings but the reported date of movement differed). Also, we included single reported movements in
11
the analysis if the location of individual cattle at the start and end of the reporting period corresponded
12
to that report.
13
Data on geographic location of the holdings were also acquired from the Swedish Board of
14
Agriculture, using a set of data provided in spring 2006. For pig-farmers there is a legal requirement to
15
report the geographic location of the holding, and for cattle holdings the dataset included approximate
16
coordinates generated indirectly through the use of geographical coordinates for farming land for
17
which the farmers had applied for subsidies. Thus, for cattle-holdings the coordinates were not exact
18
and not available for all holdings.
19
Because the geographic location is necessary for the analysis, only holdings with known geographical
20
coordinates were used. We used Euclidean distances. While this is commonly used in studies of
21
disease spread between holdings (e.g. Ferguson et al. 2001, Keeling et al. 2001 and Boender et al.
22
2007), few studies exist that validate this use. However, Savill et al. (2006) and Bessel et al. (2008)
23
conclude that at least for the UK 2001 FMD epidemic, Euclidean distance is usually sufficient to
24
model transmission between holdings.
25
In this article, animal movement refers to between holding live animal movements. Movements of
26
animals destined for slaughter were not included in the analysis. The cattle movement data explicitly
5
1
state when a movement was carried out for slaughter while the pig movement data only contain
2
information of start and end holding for each movement. We therefore used information on the
3
holdings and only included movements between non abattoirs holdings. Finally, we excluded holdings
4
(and movements to/from) located on the large island Gotland because those are only accessible via
5
boat and hence expected to show a different contact pattern. A total number of 50,517 movements of
6
cattle between 28,657 holdings and 19,103 movements of pigs between 28,657 and 7,078 holdings
7
respectively were used in the analysis.
8
9
2.2 Models
10
For notation in the analysis of movements it is sometimes suitable to refer to the starting point and end
11
point and sometimes to the movement itself. Here, while every movement, t, has a start point, i, and an
12
end point, j, we sometimes utilize a reduced notation which avoids indexing on start and end points.
13
We consider two competing models (M1 and M2) to explain the data, and evaluate the relative
14
likelihood of each model via their posterior probabilities. Model M1 is strictly distance dependent
15
(DD). A generalized normal distribution (Nadarajah 2005) was used to model probabilities of a
16
movement distance (d). This has the form
g d a, b  
17
e
 a
d
b
S
18
where a and b are parameters determining the shape and width of the distribution and S is the
19
normalizing constant given as
20
S
b
2a1 b 
21
This distribution was chosen due to its flexible shape. It include familiar distributions such as the
22
Gaussian (b=2) and Laplace (b=1) as special cases and it approaches uniform as b goes to infinity.
6
1
This study focuses on two-dimensional space and for this distribution variance (ν) and kurtosis (κ) are
2
given in Lindström et al. (2008) as
4
 
b
  a2   
2
 
b
3
4
and
6 2
    
b
b
    2 
  4 
   
  b 
5
6
where Γ is the gamma function. Analysis was performed on a and b and the results were converted
7
into ν and κ. Nadarajah (2005) shows methods of estimating a and b assuming that data are random
8
samples from the distribution. Animal movements are however carried out between holdings with
9
fixed locations and these locations are themselves not randomly distributed. We therefore normalize
10
the distribution by the sum over all possible destinations. Hence, the probability of a contact over the
11
movement distance dt from holding i of movement t, t=1,…,T and T is the number of movements, was
12
modeled via the density
13
f1 d t a, b  
e
N 1


e
dt

a 


Dik
b

a 
b
,
(1)
k 1
14
where Dik is the distance from holding i to any possible destination holding k≠i, and N is the number of
15
holdings. This approach avoids confusing distance dependent probabilities of contacts with observing
16
movements at some distance solely because many holdings are found at that distance.
17
7
1
Even though the generalized normal distribution is flexible in its shape, it may still not be satisfactory
2
and it may be difficult to obtain reliable estimates on all relevant scales. We therefore compare this to
3
a mixture model (M2) with both DD and MAM components given by the density
4
wf1 d t a2 , b2   1  w f 2 d t  .
5
The mixing weight, w, denotes the proportion of all movements represented by the DD component (f1).
6
For the MAM component (f2), the probability of a movement distance, dt, is not dependent on the
7
distance itself. It is instead uniform over all possible movements and the probability of observing
8
movement t from i to j (i≠j) in a system of N holdings is given by
f 2 d t   1
9
10
N  1 .
Accordingly, the likelihood function of each model is given by
T
M1 :
11
Ld a1,b1    f1dt a1,b1
T
t1


M 2 : Ld a2,b2,w    wf1 dt a2,b2  1 w  f 2 dt 
t1
,
12
where d  d1,
13
indicator variables to simplify the form of the conditional distribution in model M2 (see section 2.4).


,dT , although in practice we evaluate the likelihood through the use of auxiliary
14
2.3 Priors and posterior probabilities
15
The posterior distribution of the models and model parameters is given by
M1 :
M2 :
16
f1 a1,b1, M1 d  Ld a1,b1, M1 pa1,b1 | M1pM1
f 2 a2 ,b2,w, M 2 d Ld a2 ,b2 ,w, M 2 pa2 ,b2,w | M 2 pM 2 .
17
For example pa1,b1 | M1  denotes the prior probability of model M1 parameters (under model M1) and
18
p(M1) is the prior probability of model M1. The posterior model probabilities are given by


8
M1 :
1

M2 :
 f a ,b , M dda db
f M d  f a ,b ,w, M dda db dw.
f1M1 d
2
2
2
1
1
2
2
1
1
1
2
2
1
2
2
Given the large amount of data available for this study, we choose to work with both non-informative
3
parameter priors p(a1,b1 | M1)  p(a2,b2 | w, M2 ) 1 (noting the factorization
4
p(a2,b2,w | M2 )  p(a2,b2 | w, M2 )p(w | M2 )) and model priors p(M1)=p(M2)=1/2. The prior
5
 for the mixing parameter p(w | M ) is naturally modeled as w~Beta(,) under which
distribution
2
6
we may adopt the Uniform distribution (α=β=1).

7

8
2.4 Markov Chain Monte Carlo estimation
9
Markov chain Monte Carlo (MCMC) methods were used to estimate the posterior parameter and
10
model distributions. Programs were written in MatLab R2007b. Monte Carlo based approaches utilize
11
random draws from the posterior distribution, from which all inference on the models can be made.
12
The basic idea of MCMC is to construct a Markov chain whose limiting distribution is the posterior
13
distribution of interest. Here, writing 1  a1,b1  and  2  a2 ,b2 ,w, we are interested in the joint
14
distribution of model and parameter vector f m ( m , M m | d ) , m=1,2. One such scheme is the Gibbs
15
sampler (Casella and George 1992). In essence, a Gibbs sampler constructs a Markov chain based on
16
simulation from model full conditional distributions. Under broad conditions, it can be shown that this
17
chain converges to the target posterior distribution. If model conditional distributions are not available
18
in closed form, Metropolis-Hastings steps may be substituted (Chib and Greenberg 1995). When the
19
resulting Markov chain has converged, the simulated sample path represents (correlated) draws from
20
this joint posterior. Further information on MCMC methods can be found in for example Gamerman
21
and Lopes (2006).
22
To simplify the model M2 conditional distributions, we introduce auxiliary indicator variables
23
z  z1, ,zT  (Gelman et al. 2004) such that zt 1 or zt  0 if movement t arises from the DD or




9
1
MAM mixture components respectively. Under this parameterization we may rewrite the model M2
2
posterior distribution as
T

 1 w f d 
f 2 a2 ,b2 ,w,z | d  pa2 ,b2 ,w | M 2  pM 2  wf1 dt a2 ,b2 
3
zt
1z t
2
t
.
t1
4
5
6
Intuitively, w still represents the proportion of movements arising from the DD mixture component.

Additionally, under this representation, the posterior probability of movement t arising from the DD
mixture component may also be estimated (if required) as
Pr( z t  1 | d )   z t f 2 (a 2 , b2 , w, z | d )da 2 db2 dwd z .
7
8
Within each iteration of the MCMC sampler we first performed parameter updates within the current
9
model, before attempting to switch between models through a Metropolis-Hastings update. We now
10
present the MCMC updates separately for within-model updates (parameters am, bm for m=1, 2, and w
11
for model M2) and for between-model updates (i.e. between (a1, b1, M1) and (a2, b2, w z , M2)).
12

13
2.4.1 Within Model Updates
14
Under model M2, full conditional distributions of w and z are available, and Gibbs updates may be
15
used. The full conditional distribution of w is given by

z
T z
f w a2 , b2 , z, d , M 2   f w z, M 2   w t 1  w  t pw | M 2  ,
16
which is a Beta(
18
Bernoulli( pt ) with pt  wf1 dt a2,b2 / wf1 dt a2,b2  1 w f 2 dt  , t=1,…,T, such that
19


Przt  1w,a2 ,b2,d, M 2   pt and Przt  0 w,a2 ,b2,d, M2  1 pt .
t
t



 z   , T  z   ) distribution. The full conditional distribution of each z
17
 


t
is


20
Under both models, the full conditionals for a and b are of non-standard form, and we used
21
Metropolis-Hastings updates. For model Mm (m=1,2) we drew candidate parameters from a proposal

10
1
distribution ( a m , bm )~q(a,b|Mm) and accepted these parameters as the next state of the Markov chain
2
with probability
3
 f ( a, b| d)q(a1,b1 | M1 ) 
 f ( a, b,w,z | d)q(a2 ,b2 | M 2 ) 
min 1, 1 1 1
 and min 1, 2 2 2

 f1(a1,b1 | d)q(a1, b1| M1 ) 
 f 2 (a2,b2 ,w,z | d)q(a
2 , b2 | M 2 ) 
4
for models M1 and M2 respectively. We discuss the choice of proposal distributions, q, below.


5
6
2.4.2 Between model updates
7
MCMC also permits identification of the model that best describes the data (and quantifies how much
8
better) through evaluation of posterior model probabilities. To move from the state (a1, b1) in model
9
, w, z) from the proposal
M1 to model M2 we drew new parameter values ( a
2, b2
10
qa2, b2, w , z  qz| a2, b2, w qw qa2, b2 | M 2  and accepted these as the new state in the Markov
11
chain with probability


 f ( a, b, w , z| d) p(M 2 )q(a1,b1 | M1 ) 
min 1, 2 2 2
.
f1(a1,b1 | d) p(M1 )q(a
, z) 

2 , b2, w 
12
13
Conversely, to move from the state ( a2,b2,w,z ) in model M2 to model M1 we drew new parameter
14
values ( a1, b1) from the proposal q( a1, b1|M1), and accepted these as the new state in the Markov chain
15
with probability




16

f1( a1, b1| d) p(M1 )q(a2 ,b2 ,w,z) 
min 1,
.
 f 2 (a2,b2 ,w,z | d) p(M 2 )q(a1, b1| M1 ) 
17

18
2.4.3 Metropolis-Hastings proposal functions
19
Both between- and within-model Metropolis-Hastings parameter updates require the specification of
20
proposal functions. The presence of MAM in M2 was expected to change the joint probability
11
1
distribution of a and b. Accordingly, separate proposal functions q(am,bm|Mm) were used for each
2
model. This becomes especially important when moving between models. Under each model, we
3
performed a preliminary analysis using independent Gaussian random-walk Metropolis-Hastings
4
updates for both a and b. The posterior means, μa and μb, and covariance matrix, Σ, were calculated
5
from the sampler output. In subsequent simulations we utilized Multivariate Gaussian proposals for
6
(am,bm)~q(am,bm|Mm) with the estimated means and covariance matrix as proposal parameters, but
7
doubling the scale of Σ to ensure a proposal distribution that fully covers the posterior. This proposal
8
function promotes faster mixing of the Markov chain both for within-model and (crucially) between-
9
model updates than simpler alternatives. For the remaining components of the between-model move
10
, z  qz| a2 , b2, w qw qa2 , b2 | M 2  we used the following. For q(w) we
proposal qa
2 , b2, w 
11
implement a Beta(10,1) proposal which places most density on large values of w. For each element of
12
13

z we used the Bernoulli full conditional distributions described in the within-model updates section
above, conditional upon the proposed values for a, b, and w.
14
15
2.5 Network generation
16
Four sets of movement networks were generated (the combinations of two species and two models),
17
each with 100 replicates. Holdings carrying the focal species were used. For each replicate,
18
movements were created by first picking one arbitrary holding and subsequently picking the second
19
one randomly with contact probability given by the likelihood of respective model (M1 or M2). Each
20
network was generated with as many movements as the number of observed movements included for
21
each species. Parameters w, a and b were picked randomly from the result of the MCMC. A network
22
represents holdings as nodes, which are connected via animal movements, represented by network
23
links. We use the terms nodes and links when we refer to networks generally, but use the terms
24
holdings and movements when discussing implications specific for the contacts considered in the
25
study.
26
12
1
2.6 Network Analysis
2
We used network measurements to explore the differences between the models. Our concern was
3
possible differences in predictions of the spatial aspect of disease spread dynamics via the animal
4
movement analyzed. The models may predict different distance related probabilities of contacts which
5
lead to different contact structures, represented by networks and analyzed with network measurements.
6
Since we only use the network measurements for between models comparison we analyzed the
7
networks without including the direction of the links. While such information can be incorporated in
8
some measurements we argue that it would not in this study have provided any extra insight for
9
contrasting patterns between models. Four network measurements (Density, Fragmentation Index,
10
Group Betweeness Centralization and Clustering Coefficient) were calculated for each network replica
11
using NetworkX (version 0.36). These measurements provide different information on the structure of
12
the considered contacts.
13
Density indicates how connected the nodes are. A value of one means that all theoretical connections
14
are realized. The Fragmentation Index measures how disconnected a network is by taking into account
15
the size of components (Bell et al. 1999). A high value indicates that there are many isolated
16
components where no contacts occur through the considered contacts. Group Betweeness
17
Centralization measures the heterogeneity in how central nodes are. A node is central if it is located
18
such that the shortest path between many other nodes passes through it. The shortest path is the
19
minimum number of links between two nodes. A low value means that all nodes have the same
20
betweeness and the maximum value of one is reached for a star graph where there is only one central
21
node to which all other nodes are connected (Wasserman and Faust 1994). The Clustering Coefficient
22
(Watts and Strogatz 1998) measures aggregation by looking at the possibility that two neighbors B and
23
C of a node A are also connected to each other.
24
25
3. Results
13
1
3.1 Model probabilities, variance, kurtosis and mixing parameters
2
The MCMC simulations never switched to M1 for either species. Hence, for the observed data the
3
posterior probability of M1 was estimated to be inconsequential compared to that for M2. Consequently
4
the posterior distribution of M1 was estimated via a separate within-model simulation without
5
implementing between model updates. When using smaller datasets switching occurred frequently, and
6
we are reassured that the overwhelming support for M2 was not an effect of problems in the code or
7
model setup.
8
Posterior distributions of variance and kurtosis of M1 and DD component of M2 (here denoted M2DD)
9
are shown in Figure 2 and Figure 3 respectively. Means and 95% credibility interval (shown in
10
brackets) of variance estimates were for pigs 4.12×1011 [3.39×1011 , 4.99×1011 ] m2 under M1 and
11
3.21×1010 [3.06×1010, 4.03×1010] m2 for M2DD. Corresponding values for cattle were lower for both
12
models, 4.17×1010 [3.98×1010, 4.36×1010] m2 for M1 and 1.10×1010 [1.04×1010, 1.16×1010] m2 for
13
M2DD. Hence, a larger proportion of long distant contacts were predicted for pigs.
14
The reversed relation was found for kurtosis with pigs estimated as 32.6 [29.2, 36.4] under M1 and
15
10.6 [9.6, 11.8] under M2DD compared to 42.9 [41.7, 44.1] and 27.8 [27.0, 28.7] respectively for cattle.
16
The mixing parameter w for M2 was estimated to be 0.879 [0.869, 0.889] for pigs and 0.940 [0.937,
17
0.943] for cattle. Hence, a significantly larger proportion of MAM was estimated for pigs. Posterior
18
distributions of w are shown in Figure 4. As the posterior density of w was clearly bounded away from
19
w=1 for both pig and cattle, this supports the overwhelming rejection of M1 in favor of M2.
20
Since both the size of the MAM component of M2 and variance of M2DD and M1 were larger for pigs
21
the results were clear in showing more long distance contacts for pigs. Hence there was less distance
22
related deviation from the MAM assumption.
23
24
3.2 Differences between models
14
1
Figure 5 shows observed movement distances and how predictions differed between models M1 and
2
M2. Comparing the number of estimated contacts shorter than 50 km, it can be concluded that M2
3
predicted more short distances contacts, in particular for pigs. Model M2 also predicted more long
4
distance contacts (>400 km), in particular for cattle. Comparing predicted and observed long distance
5
movement it can be seen that, at least for cattle, these were underestimated by M1.
6
Figure 6 shows differences between the effects of the models in a network context. M2 generated
7
networks with slightly lower Density and higher Clustering Coefficient, hence predicting fewer
8
contacts between holdings. Group Betweeness Centralization did not differ between M1 and M2
9
generated pig networks but was lower for model M2 than M1 within cattle networks, indicating that
10
there was a larger diversity in the role of individual holdings in the contact pattern. No trend could be
11
seen in the Fragmentation Index, thus models M1 and M2 predicted the same number of isolated
12
holdings or small network fragments.
13
14
4. Discussion
15
16
4.1 Model differences
17
Our results showed, not surprisingly, that the estimation of M2DD attained a lower value of both
18
kurtosis and variance compared to M1. These quantities increase with the proportion of long distance
19
movements, which to a large extent were assigned to M2MAM (the MAM part of M2) in estimation of the
20
mixture model M2. Comparing expected movements at long distances, these were more common under
21
M2 (Figure 5). Long distance contacts can have a dramatic effect on disease transmission since they
22
have potential to carry the disease far. For instance, long distance animal movements contributed to
23
rapid and extensive spread during the UK 2001 FMD outbreak (Ferguson et al. 2001, Griffin and
24
O’Reilly 2003, Wilesmith et al. 2003). Reliable estimation of these contacts are therefore of particular
25
importance.
15
1
However, at least for cattle, long distant movements were hard to estimate with M1. Even though high
2
values of kurtosis (i.e. a distribution with a fat tail and concurrently high probability at short distances)
3
were estimated with M1, the analyzed data was better described using some fraction of MAM. Long
4
distances contacts were rarer for cattle and therefore had less effect on the parameter estimations under
5
M2. For pigs, this effect was less clear. However, M1 underestimated contacts at short distances (Figure
6
5). Hence, our study demonstrates the difficulties of using a single distribution to give reliable
7
estimations of both long and short distance contacts.
8
In this study the results were based on large amount of data which allowed for reliable estimation of
9
both parameters and model probabilities. Our analysis clearly showed that the mixture model, M2, was
10
a better model for the available data. The MCMC never switched to M1 and the posterior distribution
11
of w (under M2) was clearly separated from 1, where M1 is defined. The preference for M2 can be
12
interpreted as either being technical or conceptual. In the former case, the preference is assumed to
13
have arisen due to the difficulties mentioned above of using a single distribution for good description
14
on all scales. If conceptually accurate, we may conclude that there are actually two different processes,
15
one dependent on distance and one independent. Such a statement can however not be made with
16
certainty, since our analysis only tested which model best described the data. In addition, even though
17
the generalized normal distribution has a flexible shape, we cannot be assured that it correctly
18
described the assumed DD part. It is however likely, that different processes are of different
19
importance for the probability of contacts on short and long distances. This may add to the difficulties
20
of using a single distribution to describe observed data.
21
22
4.2 Species differences
23
Since the aim of this study was not to analyze the real animal movement networks of pigs and cattle in
24
Sweden we avoid drawing conclusions from the network measurements for comparison of the two
25
species. We can however make statements about the contrast in movement distances. Pigs were
26
estimated to be movemented longer distances with higher variance of M1 and M2DD as well as higher
16
1
M2MAM compared to cattle. Consequently, our results indicate (independent of model) that diseases
2
affecting pigs generally can be expected to show a larger proportion of long distance transmissions
3
through these contacts and less deviation from the MAM assumption. It should however be pointed out
4
that this may not be true for other countries where farming practices and contact patterns may differ.
5
6
4.3 Network structure and spread of disease
7
Studies indicate that network structure is an important factor for the spread of diseases and there is an
8
increasing interest in network models for epidemiological research (Ortiz-Pelaez 2006, Eubank et al.
9
2004, Meyers et al. 2003). The structure of a network can be described by a variety of network
10
measurements brought from social network analysis and based on graph theory (Wasserman and Faust
11
1994). Theory regarding network measures state that network measures provide information about the
12
dynamics of disease spread through the contacts analyzed (Bell et al. 1999). Therefore, rather than
13
exploring the expected dynamics of a specific disease, we focused on what could be implied by
14
differences in the predicted contact patterns per se. The intent was to show what could be expected in a
15
network context from distant dependent contact probabilities and not to describe the complete network
16
of live animal movement.
17
Networks generated with models M1 and M2 differed in three of the measures (Fragmentation Index
18
showed no differences for either of the species). Hence we may assume between model differences in
19
predictions of disease transmission dynamics. Interpreting these differences we use the assumptions of
20
SI and SIR models that a holding can only be infected once and if infected, a holding never returns to a
21
susceptible state. Further we assume that all contacts between infected and susceptible holdings have
22
the same probability of transmitting a disease and that no new holdings are added to the system. Also
23
note that we used network measures as a tool to compare model differences and our results should not
24
be interpreted as statements of real contact networks. This distinction is particularly important when
25
disease transmission may occur through other contacts. Model M2 generated networks with higher
26
Clustering Coefficients. This was due to the higher probability of contacts at short distances which
17
1
results in nearby holdings being more likely to be connected. If holding A is connected to nearby
2
holdings B and C, these are in turn also more likely to be connected. Disease transmission through
3
networks with higher Clustering Coefficients will show a more rapid depletion of susceptibles
4
(Keeling 2005). If A transmits a disease to connected holdings B and C, the number of susceptibles
5
does not only decrease for A, but also for B and C.
6
The higher probability of contacts at shorter distances also resulted in the observed lower Density for
7
networks generated with M2. Links are only counted once in network measures and Density was
8
reduced if more than one movement occurred between holdings. If holdings only can be infected once
9
(as is the assumption here), a holding infecting its neighbor (in terms of network links) will deplete the
10
number of susceptible contacts and with lower density this effect is more pronounced.
11
For cattle, Group Betweeness Centralization was lower for networks generated with M2. This was a
12
result of the large difference between models in long distance contacts. The shortest paths between
13
geographically distant holdings were highly dependent on long distant contacts. These were more
14
common under M2 (especially for cattle). Hence, contacts between distant holdings were not dependent
15
on only a few, central holdings. Long distant contacts have the ability to spark new infections in
16
distant areas where local depletion of susceptibles has not yet occurred. Such dynamics appeared in the
17
UK 2001 FMD outbreak (Keeling et al. 2001).
18
19
4.4 Data
20
While it was not the aim of this paper to give a detailed description of the data used it needs to be
21
pointed out that data was not perfect. Some movements were carried out between holdings where at
22
least one had unknown coordinates and hence could not be used. The movement data was based on
23
farmers reporting movements and there were possible erroneous reports. In addition, many holdings
24
did not send or receive any movements and could possibly be inactive but this could not be concluded
25
from the available data.
18
1
Yet, a large amount of data was available for the analysis and we do not expect any systematic pattern
2
in the missing values that may have influenced the result. Neither do we expect the inclusion of
3
possibly inactive holdings to have influence the results. If there was no geographical pattern in these
4
holdings they would only affect the absolute value of the likelihood, thereby leaving our conclusion
5
unchanged. Also, the network measures analyzed the between model differences and including or
6
excluding holdings may have shown the same effect in networks generated by either model.
7
8
4.5 Implications and possible extensions of the analysis and model
9
Figure 5 shows observed and predicted (for M1 and M2) movement distances. Note that the observed
10
decrease in predicted movements with distance is due to both decreases in probability of contacts as
11
well as number of holdings located at long distances (>400 km). The DD part was normalized over all
12
possible destination holdings (Equation 1) and thereby the spatial distribution of holdings was included
13
in the model. Hence, holdings located in areas with a higher density of holdings would be estimated to
14
have more short distance movements and the opposite would be found for holdings in low density
15
areas. We argue that this is necessary for the credibility of the models. It is however possible to use
16
some other distribution to describe the DD part. We do however believe that the generalized normal
17
distribution used in this study is a good choice, given its flexible shape (see Figure 1). Yet, as
18
mentioned above, our study shows the difficulties of using a single distribution. Good estimates on
19
both short and long distance is important for models of livestock disease and our study show that using
20
mixture models with both MAM and DD can be a good approach. The pattern may of course differ
21
between countries, but we argue that any model where all contacts are assumed to be dependent on
22
distance needs to be used with caution unless that assumption is tested. This is applicable both when
23
models are used to create general preventive guidelines, as well for modeling of specific diseases. In
24
the latter case, other possible routes of transmission should be included for most diseases (e.g.
25
professional farm visits, other types of transports, shared equipment). In this study we have focused on
26
live animal movement but the method presented can be applied to other contact types as well if
27
distance is an important factor.
19
1
Bayesian analysis and MCMC is well suited for epidemiological studies and is being used with
2
increasing frequency in this area (O'Neill et al. 2000, Streftaris and Gibson 2004). It is a flexible tool,
3
which allows for extension of the model. The probability of contact between two holdings does of
4
course not solely depend on the distance. Holdings with different production types will likely have
5
different contact patterns and a larger holding might have more contacts than a smaller one. The model
6
can be expanded to incorporate holding specific data when available. This can give more exact
7
information on which holdings most likely have contacts and thereby give information about both the
8
importance of individual holdings as well as the dynamics of disease transmission through the
9
contacts. Including both holding specific data as well as geographical information could also provide
10
more precise tools for quickly estimating possible contacts and could be used as a complement to
11
collecting observed contacts, given that a holding is found to be infected.
12
13
5. Conclusion
14
Modeling of livestock disease often requires good estimates of how contacts between holdings vary
15
with distance. When using a single distribution it may be hard to estimate contacts at all scales. In this
16
study we analyzed animal movements and found that a mixture model with both distant dependent and
17
non distant dependent contact probabilities better described the data. Also, since the models differed in
18
predictions on contact structures, the extra information provided by the mixture model is expected to
19
be important if used for predictions of disease transmission through the contacts. We conclude that the
20
mixture model is a good approach and advise that models fitted without special attention to contacts on
21
different scales should be used with caution.
22
23
Acknowledgement
24
We would like to thank Swedish Emergency Management Agency for funding and the Swedish Board
25
of Agriculture for supplying the data used. We also like to thank Nina Håkansson (Research Centre of
20
1
Systems Biology, University of Skövde) for contribution to the network analysis used in the paper.
2
Additionally, we thank two anonymous reviewers for valuable comments.
3
4
Conflict of interest
5
None.
6
7
References
8
Bell D. C., Atkinson J. S., Carlson J. W. 1999., Centrality measures for disease transmission networks.
9
Soc. Networks. 21, 1–21.
10
Boender, G.J., Meester, R., Gies, E., De Jong, M.C.M., 2007., The local threshold for geographical
11
spread of infectious diseases between farms. Prev. Vet. Med. 82, 90-101.
12
Casella, G., George E.I., 1992. Explaining the Gibbs sampler. Amer. Statistician. 46, 167-174.
13
Chib, S., Greenberg E., 1995. Understanding the MetropolisHastings Algorithm. Amer. Statistician.
14
49, 327-335.
15
Bessell, P.R., Shaw, D.J., Savill, N.J., Woolhouse, M.E.J., 2008. Geographic and topographic
16
determinants of local FMD transmission applied to the 2001 UK FMD epidemic. BMC Vet. Rec. 4:40.
17
Eubank S, Guclu H, Anil Kumar V. S, Marathe M V, Srinivasan A, Toroczkai Z and Wang N., 2004.
18
Modelling disease outbreaks in realistic urban social networks. Nature. 429, 180-184.
19
Ferguson, N.M., Donnelly, C.A., Anderson, R.M., 2001. The foot-and-mouth epidemic in Great
20
Britain: Pattern of spread and impact of interventions. Science. 292, 1155-1160.
21
Gamerman, D., Lopes, H. F., 2006. Markov chain Monte Carlo: Stochastic Simulation for Bayesian
22
Inference. Second Edition, Chapnam and Hall/CRC Press.
21
1
Gelman A., Carlin, J. B., Stern, H. S., Rubin, D. B., 2004. Bayesian Data Analysis (2nd Edition).
2
Chapman & Hall/CRC. Chapter 18.
3
Griffin, J.M., O’Reilly, P.J., 2003. Epidemiology and control of an outbreak of foot-and-mouth disease
4
in the Republic of Ireland in 2001. Vet. Rec. 152, 705-712.
5
Kao, R.R., Green, D.M., Johnson, J., Kiss, I.Z., 2007. Disease dynamics over very different time-
6
scales: foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R.
7
Soc. Interface. 4, 907-916.
8
Keeling, M. J., 1999. The effects of local spatial structure on epidemiological invasions. Proc. Roy.
9
Soc. London B. 266, 859–869.
10
Keeling, M. J., Woodhouse, M. E., Shaw, D. J., Matthews, L., 2001. Dynamics of the 2001 UK foot
11
and mouth epidemic: Stochastic dispersal in a dynamic landscape. Science. 294, 813-817.
12
Keeling, M., 2005., The implications of network structure for epidemic dynamics. Theo. Pop. Bio. 67.
13
1-8.
14
Lindström, T., Håkansson, N., Westerberg, L., Wennergren., 2008. Splitting the tail of the
15
displacement kernel shows the unimportance of kurtosis. Ecology. 89, 1784–1790.
16
Meyers L. A, Newman M. E .J, Martin M., Schrag S. 2003. Applying network theory to epidemics:
17
control measures for Mycoplasma pneumoniae outbreaks. Emerg. Infect. Dis. 9:2.
18
Mollison, D., Isham, V., Grenfell, B., 1993. Epidemics: Models and Data. J. R. Statist. Soc. B. 157,
19
115-149.
20
Nadarajah, S., 2005. A generalized normal distribution. Appl. Statist. 32, 685-694.
21
O'Neill, P. D., Balding, D. J., Becker, N. G., Eerola, M., Mollison, D. 2000. Analyses of infectious
22
disease data from household outbreaks by Markov Chain Monte Carlo methods. Appl. Stat. 49, 517-
23
542.
22
1
Ortiz-Pelaez A, Pfeiffer D. U., Soares-Magalhães R.J. Guitian F.J. 2006. Use of social network
2
analysis to characterize the pattern of animal movements in the initial phases of the 2001 foot and
3
mouth disease (FMD) epidemic in the UK. Prev. Vet. Med. 75, 40-55.
4
Robinson, S.E., Christley, R.M., 2007. Exploring the role of auction markets in cattle movements
5
within Great Britain. Prev. Vet. Med. 81, 21-37.
6
Savill, N.J., Shaw, D.J., Deardon, R., Tildesley, M.J., Keeling, M.J., Woolhouse, M.E.J. , Brooks, S.P.,
7
Grenfell, B.T., 2006. Topographic determinants of foot and mouth disease transmission in the UK
8
2001 epidemic. BMC Vet. Rec. 2:3
9
Streftaris, G., Gibson, G.J., 2004. Bayesian analysis of experimental epidemics of foot-and-mouth
10
disease. Proc. R. Soc. Lond. B. 271, 1111-1117.
11
Velthuis , A.G., Mourits, M.C., 2007. Effectiveness of movement-prevention regulations to reduce the
12
spread of foot-and-mouth disease in The Netherlands. Prev. Vet. Med. 82, 262-281.
13
Wasserman S., Faust K., 1994. Social network analysis: methods and applications. Cambridge
14
University Press, Cambridge.
15
Watts D.J., Strogatz S. H., 1998. Collective dynamics of “small-world” networks. Nature. 393, 440-
16
442.
17
Wilesmith, J. W., Stevenson, M. A., King, C. B., Morris, R. S., 2003. Spatio-temporal epidemiology of
18
foot-and-mouth disease in two counties of Great Britain in 2001. Prev. Vet. Med. 61, 157–170.
19
23
1
Figure captions
2
Figure 1. Generalized normal distribution with 2-dimensional variance 109 m2 (left) and 2·109 m2 and
3
kurtosis 1.5 (solid), 2 (dotted), and 4 (dashed).
4
Figure 2. Posterior probability distribution of kernel variance estimated for movement distances
5
between holdings of pigs (solid line) and cattle (dotted) for (a) distant dependent model and (b) model
6
with mixture of distance dependence and independence. Estimation is based on movements in Sweden
7
from 1st of July 2005 until 30th of June 2006. Note that x-axis scales are different.
8
Figure 3. Posterior probability distribution of kernel kurtosis estimated for movement distances
9
between holdings of pigs (solid line) and cattle (dotted) for (a) distant dependent model and (b) model
10
with mixture of distance dependence and independence. Estimation is based on movements in Sweden
11
from 1st of July 2005 until 30th of June 2006.
12
Figure 4. Posterior distribution of mixing parameter w for model assuming movements arising as a
13
mixture of both distance dependent and independent processes. Estimated for pigs (solid line) and
14
cattle (dotted) based on movements between holdings in Sweden from 1st of July 2005 until 30th of
15
June 2006. Estimated proportion of mass action mixing (independent of distance) part is 1-w.
16
Figure 5. Observed movement distances (histograms) of cattle (a) and pigs (b) with model predictions
17
of distant dependent model (dotted) and model with mixture of distance dependence and independence
18
(solid). Large axes show distances between 0 and 400 km, axes in smaller graph (embedded) show
19
distances >400 km. Model predictions are given as the mean of 1000 replicates. Data contains animal
20
movement in Sweden from 1st of July 2005 until 30th of June 2006
21
Figure 6. Histograms of four network measurements for networks (100 replicates) generated with
22
movements predicted by distance dependent model (black bars) and model with mixture of distance
23
dependence and independence (white bars). Clear differences are found for both species in Clustering
24
Coefficient and Density as well as Group Betweeness Centralization of Cattle networks. Models were
25
fitted to animal movements in Sweden from 1st of July 2005 until 30th of June 2006.
24
1
25
Download