1 Title: 2 Estimation of distance related probability of animal movements between 3 holdings and implications for disease spread modeling. 4 Authors: Tom Lindströma, Scott A. Sissonb, Maria Nöremarkc, Annie Jonssond and Uno 5 Wennergrena 6 a IFM Theory and Modelling, Linköping University, 581 83 Linköping, Sweden 7 b School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia 8 c Department of Disease Control and Epidemiology, SVA, National Veterinary Institute, 751 89 9 Uppsala, Sweden 10 d 11 Sweden Research Centre of Systems Biology, Ecological modelling, University of Skövde, 541 28 Skövde, 12 13 Corresponding Author: 14 Uno Wennergren 15 Tel: +46 13 28 16 66 16 Fax: +46 13 28 13 99 17 Email: unwen@ifm.liu.se 18 Correspondence address: See above. 19 20 Abstract 1 1 Between holding contacts are more common over short distances and this may have implications for 2 the dynamics of disease spread through these contacts. A good estimation of how contacts depend on 3 distance is therefore important when modeling livestock diseases. In this study, we have developed a 4 method for analyzing distant dependent contacts and applied it to animal movement data from Sweden. 5 The data was analyzed with two competing models. The first model assumes that contacts arise from a 6 purely distance dependent process. The second is a mixture model and assumes that, in addition, some 7 contacts arise independent of distance. Parameters were estimated with a Bayesian Markov Chain 8 Monte Carlo (MCMC) approach and the model probabilities were compared. We also investigated 9 possible between model differences in predicted contact structures, using a collection of network 10 measures. 11 We found that the mixture model was a much better model for the data analyzed. Also, the network 12 measures showed that the models differed considerably in predictions of contact structures, which is 13 expected to be important for disease spread dynamics. We conclude that a model with contacts being 14 both dependent on, and independent of, distance was preferred for modeling the example animal 15 movement contact data. 16 Key words 17 Markov Chain Monte Carlo; Mixture Models; Model Selection; Animal Movements; Disease 18 Transmission; Network Analysis 19 20 1. Introduction 21 Pathogens can spread between animal holdings both through direct animal contact and indirect 22 contacts such as persons, vehicles, equipment and products of animal origin. Although some diseases, 23 such as Foot and Mouth Disease (FMD), spread easily through several different routes, direct contact 24 through relocation of infected animals is almost always one of the most important routes of spread. 25 Analyzing contacts between holdings offers the possibility to make predictions about disease 2 1 transmission (Kao et al. 2007) and allows testing of the effect of changed contact patterns (Velthuis 2 and Mourits 2007). In this study we focus on contacts through live animal movements and present a 3 method to analyze the spatial aspect of animal movement. The method presented can however be 4 applied to most types of between holding contacts, either directly or with minor modifications, if such 5 data is available. 6 A possible approach for modeling disease spread is to assume all between-holding contacts to be 7 equally probable, hence fulfilling the criteria of mass-action mixing (MAM). From a disease 8 transmission perspective this means that one infected holding can infect all other holdings with equal 9 probability (assuming equal probability that contact will result in infection). While such an assumption 10 may provide a theoretical epidemiological insight, in most instances contacts do not occur according to 11 MAM. Consequently the dynamics of epidemics will likely deviate from the prediction made by such 12 a supposition (Keeling 2005). Mollison et al. (1993) identified two main deviations from MAM 13 assumptions; those related to population differences (in this case differences in holding characteristics) 14 and those caused by mixing heterogeneities. 15 This paper focuses on the latter and in particular heterogeneity caused by distances between holdings. 16 Between holding contacts, including the live animal movements considered here, occur more likely 17 over short distances (Boender et al. 2007, Robinson and Christley 2007) and consequently epidemics 18 often show spatial aggregation patterns. A good description of how contact probabilities vary with 19 distance is essential for policy making and modeling of livestock diseases. SI and SIR (S, I and R 20 denoting susceptible, infected and recovered holdings, respectively) models assumes that the infective 21 unit, i.e. the holding, can only be infected once. Following that assumption, transmission through 22 limited number of contacts will generally result in an epidemic with a more rapid decline in the 23 reproductive ratio (the number of infections caused by one unit during its infectious period). 24 Comparing two theoretical epidemics with the same initial reproductive ratio but transmission through 25 different number of contacts, depletion of susceptible holdings (as they become infected) will be more 26 crucial when contacts are inherently rare. When transmission occurs through contacts that are more 27 common at short distances the effect is augmented (Keeling 1999). There is a higher probability than 3 1 expected from MAM, that if holding A has contacts with holdings B and C there is also a possible 2 contact between B and C. Hence, if holding A infects holding B there is not only a decrease of local 3 susceptibles for holding A but also for holding C. 4 Subsequently, one may consider that a more realistic alternative to assuming MAM would be to 5 express distance dependent probabilities of contacts using a kernel function. The pattern of decreasing 6 probability with distance can then be described in a generalized way by variance and kurtosis. 7 Variance is a measurement of how rapidly the probability of contact decreases with distance. A high 8 variance means that more contacts/movements occur over long distances. Kurtosis can be viewed as 9 measurement of the “shape” of the kernel. A kernel with high kurtosis means there are many contacts 10 at short distances but at the same time a fat tail describing a large proportion of long-distance contacts. 11 A low kurtosis means that contact probabilities are more uniform over some distance and long- 12 distance contacts are rare. Distributions with different variance and kurtosis are shown in Figure 1. 13 The aim of the study presented here is twofold. First, we present a method for analyzing the 14 probability of contacts through live animal movements given the distance between holdings. It uses 15 Bayesian inference to analyze the data and we obtain the posterior distribution of parameters with 16 Markov Chain Monte Carlo (MCMC) techniques. Two models were implemented; One assuming all 17 contacts dependent on distance and one assuming that in addition some contacts arise independent of 18 distance. The second aim of this study is to investigate possible between model differences in 19 predictions on disease spread via the estimated contact probabilities. This was done using a number of 20 network measurements. 21 22 2. Materials and Methods 23 2.1 Data 24 Data on cattle and pig movements were provided by the Swedish Board of Agriculture. The data 25 consisted of all reported and registered movements during the time period from 1st of July 2005 until 4 1 30th of June 2006. Furthermore, data on the location of individual cattle at the start and the end of this 2 period were provided. 3 Pig movements are reported by the farmer receiving the animals and registered at the group level. 4 Cattle movements are registered at the individual level, reported both by the farmer at the holding of 5 origin and the farmer at the holding of destination. Since this study focuses on movements and not 6 relocation of individual animals, cattle moved between two holdings with the same departure date as 7 well as arrival date were assumed to constitute one movement. Cattle movements which did not have a 8 fully corresponding report were checked for inconsistencies and were excluded unless they could be 9 connected with another reported movement in the dataset (e.g. the same individual and the same 10 holdings but the reported date of movement differed). Also, we included single reported movements in 11 the analysis if the location of individual cattle at the start and end of the reporting period corresponded 12 to that report. 13 Data on geographic location of the holdings were also acquired from the Swedish Board of 14 Agriculture, using a set of data provided in spring 2006. For pig-farmers there is a legal requirement to 15 report the geographic location of the holding, and for cattle holdings the dataset included approximate 16 coordinates generated indirectly through the use of geographical coordinates for farming land for 17 which the farmers had applied for subsidies. Thus, for cattle-holdings the coordinates were not exact 18 and not available for all holdings. 19 Because the geographic location is necessary for the analysis, only holdings with known geographical 20 coordinates were used. We used Euclidean distances. While this is commonly used in studies of 21 disease spread between holdings (e.g. Ferguson et al. 2001, Keeling et al. 2001 and Boender et al. 22 2007), few studies exist that validate this use. However, Savill et al. (2006) and Bessel et al. (2008) 23 conclude that at least for the UK 2001 FMD epidemic, Euclidean distance is usually sufficient to 24 model transmission between holdings. 25 In this article, animal movement refers to between holding live animal movements. Movements of 26 animals destined for slaughter were not included in the analysis. The cattle movement data explicitly 5 1 state when a movement was carried out for slaughter while the pig movement data only contain 2 information of start and end holding for each movement. We therefore used information on the 3 holdings and only included movements between non abattoirs holdings. Finally, we excluded holdings 4 (and movements to/from) located on the large island Gotland because those are only accessible via 5 boat and hence expected to show a different contact pattern. A total number of 50,517 movements of 6 cattle between 28,657 holdings and 19,103 movements of pigs between 28,657 and 7,078 holdings 7 respectively were used in the analysis. 8 9 2.2 Models 10 For notation in the analysis of movements it is sometimes suitable to refer to the starting point and end 11 point and sometimes to the movement itself. Here, while every movement, t, has a start point, i, and an 12 end point, j, we sometimes utilize a reduced notation which avoids indexing on start and end points. 13 We consider two competing models (M1 and M2) to explain the data, and evaluate the relative 14 likelihood of each model via their posterior probabilities. Model M1 is strictly distance dependent 15 (DD). A generalized normal distribution (Nadarajah 2005) was used to model probabilities of a 16 movement distance (d). This has the form g d a, b 17 e a d b S 18 where a and b are parameters determining the shape and width of the distribution and S is the 19 normalizing constant given as 20 S b 2a1 b 21 This distribution was chosen due to its flexible shape. It include familiar distributions such as the 22 Gaussian (b=2) and Laplace (b=1) as special cases and it approaches uniform as b goes to infinity. 6 1 This study focuses on two-dimensional space and for this distribution variance (ν) and kurtosis (κ) are 2 given in Lindström et al. (2008) as 4 b a2 2 b 3 4 and 6 2 b b 2 4 b 5 6 where Γ is the gamma function. Analysis was performed on a and b and the results were converted 7 into ν and κ. Nadarajah (2005) shows methods of estimating a and b assuming that data are random 8 samples from the distribution. Animal movements are however carried out between holdings with 9 fixed locations and these locations are themselves not randomly distributed. We therefore normalize 10 the distribution by the sum over all possible destinations. Hence, the probability of a contact over the 11 movement distance dt from holding i of movement t, t=1,…,T and T is the number of movements, was 12 modeled via the density 13 f1 d t a, b e N 1 e dt a Dik b a b , (1) k 1 14 where Dik is the distance from holding i to any possible destination holding k≠i, and N is the number of 15 holdings. This approach avoids confusing distance dependent probabilities of contacts with observing 16 movements at some distance solely because many holdings are found at that distance. 17 7 1 Even though the generalized normal distribution is flexible in its shape, it may still not be satisfactory 2 and it may be difficult to obtain reliable estimates on all relevant scales. We therefore compare this to 3 a mixture model (M2) with both DD and MAM components given by the density 4 wf1 d t a2 , b2 1 w f 2 d t . 5 The mixing weight, w, denotes the proportion of all movements represented by the DD component (f1). 6 For the MAM component (f2), the probability of a movement distance, dt, is not dependent on the 7 distance itself. It is instead uniform over all possible movements and the probability of observing 8 movement t from i to j (i≠j) in a system of N holdings is given by f 2 d t 1 9 10 N 1 . Accordingly, the likelihood function of each model is given by T M1 : 11 Ld a1,b1 f1dt a1,b1 T t1 M 2 : Ld a2,b2,w wf1 dt a2,b2 1 w f 2 dt t1 , 12 where d d1, 13 indicator variables to simplify the form of the conditional distribution in model M2 (see section 2.4). ,dT , although in practice we evaluate the likelihood through the use of auxiliary 14 2.3 Priors and posterior probabilities 15 The posterior distribution of the models and model parameters is given by M1 : M2 : 16 f1 a1,b1, M1 d Ld a1,b1, M1 pa1,b1 | M1pM1 f 2 a2 ,b2,w, M 2 d Ld a2 ,b2 ,w, M 2 pa2 ,b2,w | M 2 pM 2 . 17 For example pa1,b1 | M1 denotes the prior probability of model M1 parameters (under model M1) and 18 p(M1) is the prior probability of model M1. The posterior model probabilities are given by 8 M1 : 1 M2 : f a ,b , M dda db f M d f a ,b ,w, M dda db dw. f1M1 d 2 2 2 1 1 2 2 1 1 1 2 2 1 2 2 Given the large amount of data available for this study, we choose to work with both non-informative 3 parameter priors p(a1,b1 | M1) p(a2,b2 | w, M2 ) 1 (noting the factorization 4 p(a2,b2,w | M2 ) p(a2,b2 | w, M2 )p(w | M2 )) and model priors p(M1)=p(M2)=1/2. The prior 5 for the mixing parameter p(w | M ) is naturally modeled as w~Beta(,) under which distribution 2 6 we may adopt the Uniform distribution (α=β=1). 7 8 2.4 Markov Chain Monte Carlo estimation 9 Markov chain Monte Carlo (MCMC) methods were used to estimate the posterior parameter and 10 model distributions. Programs were written in MatLab R2007b. Monte Carlo based approaches utilize 11 random draws from the posterior distribution, from which all inference on the models can be made. 12 The basic idea of MCMC is to construct a Markov chain whose limiting distribution is the posterior 13 distribution of interest. Here, writing 1 a1,b1 and 2 a2 ,b2 ,w, we are interested in the joint 14 distribution of model and parameter vector f m ( m , M m | d ) , m=1,2. One such scheme is the Gibbs 15 sampler (Casella and George 1992). In essence, a Gibbs sampler constructs a Markov chain based on 16 simulation from model full conditional distributions. Under broad conditions, it can be shown that this 17 chain converges to the target posterior distribution. If model conditional distributions are not available 18 in closed form, Metropolis-Hastings steps may be substituted (Chib and Greenberg 1995). When the 19 resulting Markov chain has converged, the simulated sample path represents (correlated) draws from 20 this joint posterior. Further information on MCMC methods can be found in for example Gamerman 21 and Lopes (2006). 22 To simplify the model M2 conditional distributions, we introduce auxiliary indicator variables 23 z z1, ,zT (Gelman et al. 2004) such that zt 1 or zt 0 if movement t arises from the DD or 9 1 MAM mixture components respectively. Under this parameterization we may rewrite the model M2 2 posterior distribution as T 1 w f d f 2 a2 ,b2 ,w,z | d pa2 ,b2 ,w | M 2 pM 2 wf1 dt a2 ,b2 3 zt 1z t 2 t . t1 4 5 6 Intuitively, w still represents the proportion of movements arising from the DD mixture component. Additionally, under this representation, the posterior probability of movement t arising from the DD mixture component may also be estimated (if required) as Pr( z t 1 | d ) z t f 2 (a 2 , b2 , w, z | d )da 2 db2 dwd z . 7 8 Within each iteration of the MCMC sampler we first performed parameter updates within the current 9 model, before attempting to switch between models through a Metropolis-Hastings update. We now 10 present the MCMC updates separately for within-model updates (parameters am, bm for m=1, 2, and w 11 for model M2) and for between-model updates (i.e. between (a1, b1, M1) and (a2, b2, w z , M2)). 12 13 2.4.1 Within Model Updates 14 Under model M2, full conditional distributions of w and z are available, and Gibbs updates may be 15 used. The full conditional distribution of w is given by z T z f w a2 , b2 , z, d , M 2 f w z, M 2 w t 1 w t pw | M 2 , 16 which is a Beta( 18 Bernoulli( pt ) with pt wf1 dt a2,b2 / wf1 dt a2,b2 1 w f 2 dt , t=1,…,T, such that 19 Przt 1w,a2 ,b2,d, M 2 pt and Przt 0 w,a2 ,b2,d, M2 1 pt . t t z , T z ) distribution. The full conditional distribution of each z 17 t is 20 Under both models, the full conditionals for a and b are of non-standard form, and we used 21 Metropolis-Hastings updates. For model Mm (m=1,2) we drew candidate parameters from a proposal 10 1 distribution ( a m , bm )~q(a,b|Mm) and accepted these parameters as the next state of the Markov chain 2 with probability 3 f ( a, b| d)q(a1,b1 | M1 ) f ( a, b,w,z | d)q(a2 ,b2 | M 2 ) min 1, 1 1 1 and min 1, 2 2 2 f1(a1,b1 | d)q(a1, b1| M1 ) f 2 (a2,b2 ,w,z | d)q(a 2 , b2 | M 2 ) 4 for models M1 and M2 respectively. We discuss the choice of proposal distributions, q, below. 5 6 2.4.2 Between model updates 7 MCMC also permits identification of the model that best describes the data (and quantifies how much 8 better) through evaluation of posterior model probabilities. To move from the state (a1, b1) in model 9 , w, z) from the proposal M1 to model M2 we drew new parameter values ( a 2, b2 10 qa2, b2, w , z qz| a2, b2, w qw qa2, b2 | M 2 and accepted these as the new state in the Markov 11 chain with probability f ( a, b, w , z| d) p(M 2 )q(a1,b1 | M1 ) min 1, 2 2 2 . f1(a1,b1 | d) p(M1 )q(a , z) 2 , b2, w 12 13 Conversely, to move from the state ( a2,b2,w,z ) in model M2 to model M1 we drew new parameter 14 values ( a1, b1) from the proposal q( a1, b1|M1), and accepted these as the new state in the Markov chain 15 with probability 16 f1( a1, b1| d) p(M1 )q(a2 ,b2 ,w,z) min 1, . f 2 (a2,b2 ,w,z | d) p(M 2 )q(a1, b1| M1 ) 17 18 2.4.3 Metropolis-Hastings proposal functions 19 Both between- and within-model Metropolis-Hastings parameter updates require the specification of 20 proposal functions. The presence of MAM in M2 was expected to change the joint probability 11 1 distribution of a and b. Accordingly, separate proposal functions q(am,bm|Mm) were used for each 2 model. This becomes especially important when moving between models. Under each model, we 3 performed a preliminary analysis using independent Gaussian random-walk Metropolis-Hastings 4 updates for both a and b. The posterior means, μa and μb, and covariance matrix, Σ, were calculated 5 from the sampler output. In subsequent simulations we utilized Multivariate Gaussian proposals for 6 (am,bm)~q(am,bm|Mm) with the estimated means and covariance matrix as proposal parameters, but 7 doubling the scale of Σ to ensure a proposal distribution that fully covers the posterior. This proposal 8 function promotes faster mixing of the Markov chain both for within-model and (crucially) between- 9 model updates than simpler alternatives. For the remaining components of the between-model move 10 , z qz| a2 , b2, w qw qa2 , b2 | M 2 we used the following. For q(w) we proposal qa 2 , b2, w 11 implement a Beta(10,1) proposal which places most density on large values of w. For each element of 12 13 z we used the Bernoulli full conditional distributions described in the within-model updates section above, conditional upon the proposed values for a, b, and w. 14 15 2.5 Network generation 16 Four sets of movement networks were generated (the combinations of two species and two models), 17 each with 100 replicates. Holdings carrying the focal species were used. For each replicate, 18 movements were created by first picking one arbitrary holding and subsequently picking the second 19 one randomly with contact probability given by the likelihood of respective model (M1 or M2). Each 20 network was generated with as many movements as the number of observed movements included for 21 each species. Parameters w, a and b were picked randomly from the result of the MCMC. A network 22 represents holdings as nodes, which are connected via animal movements, represented by network 23 links. We use the terms nodes and links when we refer to networks generally, but use the terms 24 holdings and movements when discussing implications specific for the contacts considered in the 25 study. 26 12 1 2.6 Network Analysis 2 We used network measurements to explore the differences between the models. Our concern was 3 possible differences in predictions of the spatial aspect of disease spread dynamics via the animal 4 movement analyzed. The models may predict different distance related probabilities of contacts which 5 lead to different contact structures, represented by networks and analyzed with network measurements. 6 Since we only use the network measurements for between models comparison we analyzed the 7 networks without including the direction of the links. While such information can be incorporated in 8 some measurements we argue that it would not in this study have provided any extra insight for 9 contrasting patterns between models. Four network measurements (Density, Fragmentation Index, 10 Group Betweeness Centralization and Clustering Coefficient) were calculated for each network replica 11 using NetworkX (version 0.36). These measurements provide different information on the structure of 12 the considered contacts. 13 Density indicates how connected the nodes are. A value of one means that all theoretical connections 14 are realized. The Fragmentation Index measures how disconnected a network is by taking into account 15 the size of components (Bell et al. 1999). A high value indicates that there are many isolated 16 components where no contacts occur through the considered contacts. Group Betweeness 17 Centralization measures the heterogeneity in how central nodes are. A node is central if it is located 18 such that the shortest path between many other nodes passes through it. The shortest path is the 19 minimum number of links between two nodes. A low value means that all nodes have the same 20 betweeness and the maximum value of one is reached for a star graph where there is only one central 21 node to which all other nodes are connected (Wasserman and Faust 1994). The Clustering Coefficient 22 (Watts and Strogatz 1998) measures aggregation by looking at the possibility that two neighbors B and 23 C of a node A are also connected to each other. 24 25 3. Results 13 1 3.1 Model probabilities, variance, kurtosis and mixing parameters 2 The MCMC simulations never switched to M1 for either species. Hence, for the observed data the 3 posterior probability of M1 was estimated to be inconsequential compared to that for M2. Consequently 4 the posterior distribution of M1 was estimated via a separate within-model simulation without 5 implementing between model updates. When using smaller datasets switching occurred frequently, and 6 we are reassured that the overwhelming support for M2 was not an effect of problems in the code or 7 model setup. 8 Posterior distributions of variance and kurtosis of M1 and DD component of M2 (here denoted M2DD) 9 are shown in Figure 2 and Figure 3 respectively. Means and 95% credibility interval (shown in 10 brackets) of variance estimates were for pigs 4.12×1011 [3.39×1011 , 4.99×1011 ] m2 under M1 and 11 3.21×1010 [3.06×1010, 4.03×1010] m2 for M2DD. Corresponding values for cattle were lower for both 12 models, 4.17×1010 [3.98×1010, 4.36×1010] m2 for M1 and 1.10×1010 [1.04×1010, 1.16×1010] m2 for 13 M2DD. Hence, a larger proportion of long distant contacts were predicted for pigs. 14 The reversed relation was found for kurtosis with pigs estimated as 32.6 [29.2, 36.4] under M1 and 15 10.6 [9.6, 11.8] under M2DD compared to 42.9 [41.7, 44.1] and 27.8 [27.0, 28.7] respectively for cattle. 16 The mixing parameter w for M2 was estimated to be 0.879 [0.869, 0.889] for pigs and 0.940 [0.937, 17 0.943] for cattle. Hence, a significantly larger proportion of MAM was estimated for pigs. Posterior 18 distributions of w are shown in Figure 4. As the posterior density of w was clearly bounded away from 19 w=1 for both pig and cattle, this supports the overwhelming rejection of M1 in favor of M2. 20 Since both the size of the MAM component of M2 and variance of M2DD and M1 were larger for pigs 21 the results were clear in showing more long distance contacts for pigs. Hence there was less distance 22 related deviation from the MAM assumption. 23 24 3.2 Differences between models 14 1 Figure 5 shows observed movement distances and how predictions differed between models M1 and 2 M2. Comparing the number of estimated contacts shorter than 50 km, it can be concluded that M2 3 predicted more short distances contacts, in particular for pigs. Model M2 also predicted more long 4 distance contacts (>400 km), in particular for cattle. Comparing predicted and observed long distance 5 movement it can be seen that, at least for cattle, these were underestimated by M1. 6 Figure 6 shows differences between the effects of the models in a network context. M2 generated 7 networks with slightly lower Density and higher Clustering Coefficient, hence predicting fewer 8 contacts between holdings. Group Betweeness Centralization did not differ between M1 and M2 9 generated pig networks but was lower for model M2 than M1 within cattle networks, indicating that 10 there was a larger diversity in the role of individual holdings in the contact pattern. No trend could be 11 seen in the Fragmentation Index, thus models M1 and M2 predicted the same number of isolated 12 holdings or small network fragments. 13 14 4. Discussion 15 16 4.1 Model differences 17 Our results showed, not surprisingly, that the estimation of M2DD attained a lower value of both 18 kurtosis and variance compared to M1. These quantities increase with the proportion of long distance 19 movements, which to a large extent were assigned to M2MAM (the MAM part of M2) in estimation of the 20 mixture model M2. Comparing expected movements at long distances, these were more common under 21 M2 (Figure 5). Long distance contacts can have a dramatic effect on disease transmission since they 22 have potential to carry the disease far. For instance, long distance animal movements contributed to 23 rapid and extensive spread during the UK 2001 FMD outbreak (Ferguson et al. 2001, Griffin and 24 O’Reilly 2003, Wilesmith et al. 2003). Reliable estimation of these contacts are therefore of particular 25 importance. 15 1 However, at least for cattle, long distant movements were hard to estimate with M1. Even though high 2 values of kurtosis (i.e. a distribution with a fat tail and concurrently high probability at short distances) 3 were estimated with M1, the analyzed data was better described using some fraction of MAM. Long 4 distances contacts were rarer for cattle and therefore had less effect on the parameter estimations under 5 M2. For pigs, this effect was less clear. However, M1 underestimated contacts at short distances (Figure 6 5). Hence, our study demonstrates the difficulties of using a single distribution to give reliable 7 estimations of both long and short distance contacts. 8 In this study the results were based on large amount of data which allowed for reliable estimation of 9 both parameters and model probabilities. Our analysis clearly showed that the mixture model, M2, was 10 a better model for the available data. The MCMC never switched to M1 and the posterior distribution 11 of w (under M2) was clearly separated from 1, where M1 is defined. The preference for M2 can be 12 interpreted as either being technical or conceptual. In the former case, the preference is assumed to 13 have arisen due to the difficulties mentioned above of using a single distribution for good description 14 on all scales. If conceptually accurate, we may conclude that there are actually two different processes, 15 one dependent on distance and one independent. Such a statement can however not be made with 16 certainty, since our analysis only tested which model best described the data. In addition, even though 17 the generalized normal distribution has a flexible shape, we cannot be assured that it correctly 18 described the assumed DD part. It is however likely, that different processes are of different 19 importance for the probability of contacts on short and long distances. This may add to the difficulties 20 of using a single distribution to describe observed data. 21 22 4.2 Species differences 23 Since the aim of this study was not to analyze the real animal movement networks of pigs and cattle in 24 Sweden we avoid drawing conclusions from the network measurements for comparison of the two 25 species. We can however make statements about the contrast in movement distances. Pigs were 26 estimated to be movemented longer distances with higher variance of M1 and M2DD as well as higher 16 1 M2MAM compared to cattle. Consequently, our results indicate (independent of model) that diseases 2 affecting pigs generally can be expected to show a larger proportion of long distance transmissions 3 through these contacts and less deviation from the MAM assumption. It should however be pointed out 4 that this may not be true for other countries where farming practices and contact patterns may differ. 5 6 4.3 Network structure and spread of disease 7 Studies indicate that network structure is an important factor for the spread of diseases and there is an 8 increasing interest in network models for epidemiological research (Ortiz-Pelaez 2006, Eubank et al. 9 2004, Meyers et al. 2003). The structure of a network can be described by a variety of network 10 measurements brought from social network analysis and based on graph theory (Wasserman and Faust 11 1994). Theory regarding network measures state that network measures provide information about the 12 dynamics of disease spread through the contacts analyzed (Bell et al. 1999). Therefore, rather than 13 exploring the expected dynamics of a specific disease, we focused on what could be implied by 14 differences in the predicted contact patterns per se. The intent was to show what could be expected in a 15 network context from distant dependent contact probabilities and not to describe the complete network 16 of live animal movement. 17 Networks generated with models M1 and M2 differed in three of the measures (Fragmentation Index 18 showed no differences for either of the species). Hence we may assume between model differences in 19 predictions of disease transmission dynamics. Interpreting these differences we use the assumptions of 20 SI and SIR models that a holding can only be infected once and if infected, a holding never returns to a 21 susceptible state. Further we assume that all contacts between infected and susceptible holdings have 22 the same probability of transmitting a disease and that no new holdings are added to the system. Also 23 note that we used network measures as a tool to compare model differences and our results should not 24 be interpreted as statements of real contact networks. This distinction is particularly important when 25 disease transmission may occur through other contacts. Model M2 generated networks with higher 26 Clustering Coefficients. This was due to the higher probability of contacts at short distances which 17 1 results in nearby holdings being more likely to be connected. If holding A is connected to nearby 2 holdings B and C, these are in turn also more likely to be connected. Disease transmission through 3 networks with higher Clustering Coefficients will show a more rapid depletion of susceptibles 4 (Keeling 2005). If A transmits a disease to connected holdings B and C, the number of susceptibles 5 does not only decrease for A, but also for B and C. 6 The higher probability of contacts at shorter distances also resulted in the observed lower Density for 7 networks generated with M2. Links are only counted once in network measures and Density was 8 reduced if more than one movement occurred between holdings. If holdings only can be infected once 9 (as is the assumption here), a holding infecting its neighbor (in terms of network links) will deplete the 10 number of susceptible contacts and with lower density this effect is more pronounced. 11 For cattle, Group Betweeness Centralization was lower for networks generated with M2. This was a 12 result of the large difference between models in long distance contacts. The shortest paths between 13 geographically distant holdings were highly dependent on long distant contacts. These were more 14 common under M2 (especially for cattle). Hence, contacts between distant holdings were not dependent 15 on only a few, central holdings. Long distant contacts have the ability to spark new infections in 16 distant areas where local depletion of susceptibles has not yet occurred. Such dynamics appeared in the 17 UK 2001 FMD outbreak (Keeling et al. 2001). 18 19 4.4 Data 20 While it was not the aim of this paper to give a detailed description of the data used it needs to be 21 pointed out that data was not perfect. Some movements were carried out between holdings where at 22 least one had unknown coordinates and hence could not be used. The movement data was based on 23 farmers reporting movements and there were possible erroneous reports. In addition, many holdings 24 did not send or receive any movements and could possibly be inactive but this could not be concluded 25 from the available data. 18 1 Yet, a large amount of data was available for the analysis and we do not expect any systematic pattern 2 in the missing values that may have influenced the result. Neither do we expect the inclusion of 3 possibly inactive holdings to have influence the results. If there was no geographical pattern in these 4 holdings they would only affect the absolute value of the likelihood, thereby leaving our conclusion 5 unchanged. Also, the network measures analyzed the between model differences and including or 6 excluding holdings may have shown the same effect in networks generated by either model. 7 8 4.5 Implications and possible extensions of the analysis and model 9 Figure 5 shows observed and predicted (for M1 and M2) movement distances. Note that the observed 10 decrease in predicted movements with distance is due to both decreases in probability of contacts as 11 well as number of holdings located at long distances (>400 km). The DD part was normalized over all 12 possible destination holdings (Equation 1) and thereby the spatial distribution of holdings was included 13 in the model. Hence, holdings located in areas with a higher density of holdings would be estimated to 14 have more short distance movements and the opposite would be found for holdings in low density 15 areas. We argue that this is necessary for the credibility of the models. It is however possible to use 16 some other distribution to describe the DD part. We do however believe that the generalized normal 17 distribution used in this study is a good choice, given its flexible shape (see Figure 1). Yet, as 18 mentioned above, our study shows the difficulties of using a single distribution. Good estimates on 19 both short and long distance is important for models of livestock disease and our study show that using 20 mixture models with both MAM and DD can be a good approach. The pattern may of course differ 21 between countries, but we argue that any model where all contacts are assumed to be dependent on 22 distance needs to be used with caution unless that assumption is tested. This is applicable both when 23 models are used to create general preventive guidelines, as well for modeling of specific diseases. In 24 the latter case, other possible routes of transmission should be included for most diseases (e.g. 25 professional farm visits, other types of transports, shared equipment). In this study we have focused on 26 live animal movement but the method presented can be applied to other contact types as well if 27 distance is an important factor. 19 1 Bayesian analysis and MCMC is well suited for epidemiological studies and is being used with 2 increasing frequency in this area (O'Neill et al. 2000, Streftaris and Gibson 2004). It is a flexible tool, 3 which allows for extension of the model. The probability of contact between two holdings does of 4 course not solely depend on the distance. Holdings with different production types will likely have 5 different contact patterns and a larger holding might have more contacts than a smaller one. The model 6 can be expanded to incorporate holding specific data when available. This can give more exact 7 information on which holdings most likely have contacts and thereby give information about both the 8 importance of individual holdings as well as the dynamics of disease transmission through the 9 contacts. Including both holding specific data as well as geographical information could also provide 10 more precise tools for quickly estimating possible contacts and could be used as a complement to 11 collecting observed contacts, given that a holding is found to be infected. 12 13 5. Conclusion 14 Modeling of livestock disease often requires good estimates of how contacts between holdings vary 15 with distance. When using a single distribution it may be hard to estimate contacts at all scales. In this 16 study we analyzed animal movements and found that a mixture model with both distant dependent and 17 non distant dependent contact probabilities better described the data. Also, since the models differed in 18 predictions on contact structures, the extra information provided by the mixture model is expected to 19 be important if used for predictions of disease transmission through the contacts. We conclude that the 20 mixture model is a good approach and advise that models fitted without special attention to contacts on 21 different scales should be used with caution. 22 23 Acknowledgement 24 We would like to thank Swedish Emergency Management Agency for funding and the Swedish Board 25 of Agriculture for supplying the data used. We also like to thank Nina Håkansson (Research Centre of 20 1 Systems Biology, University of Skövde) for contribution to the network analysis used in the paper. 2 Additionally, we thank two anonymous reviewers for valuable comments. 3 4 Conflict of interest 5 None. 6 7 References 8 Bell D. C., Atkinson J. S., Carlson J. W. 1999., Centrality measures for disease transmission networks. 9 Soc. Networks. 21, 1–21. 10 Boender, G.J., Meester, R., Gies, E., De Jong, M.C.M., 2007., The local threshold for geographical 11 spread of infectious diseases between farms. Prev. Vet. Med. 82, 90-101. 12 Casella, G., George E.I., 1992. Explaining the Gibbs sampler. Amer. Statistician. 46, 167-174. 13 Chib, S., Greenberg E., 1995. Understanding the MetropolisHastings Algorithm. Amer. Statistician. 14 49, 327-335. 15 Bessell, P.R., Shaw, D.J., Savill, N.J., Woolhouse, M.E.J., 2008. Geographic and topographic 16 determinants of local FMD transmission applied to the 2001 UK FMD epidemic. BMC Vet. Rec. 4:40. 17 Eubank S, Guclu H, Anil Kumar V. S, Marathe M V, Srinivasan A, Toroczkai Z and Wang N., 2004. 18 Modelling disease outbreaks in realistic urban social networks. Nature. 429, 180-184. 19 Ferguson, N.M., Donnelly, C.A., Anderson, R.M., 2001. The foot-and-mouth epidemic in Great 20 Britain: Pattern of spread and impact of interventions. Science. 292, 1155-1160. 21 Gamerman, D., Lopes, H. F., 2006. Markov chain Monte Carlo: Stochastic Simulation for Bayesian 22 Inference. Second Edition, Chapnam and Hall/CRC Press. 21 1 Gelman A., Carlin, J. B., Stern, H. S., Rubin, D. B., 2004. Bayesian Data Analysis (2nd Edition). 2 Chapman & Hall/CRC. Chapter 18. 3 Griffin, J.M., O’Reilly, P.J., 2003. Epidemiology and control of an outbreak of foot-and-mouth disease 4 in the Republic of Ireland in 2001. Vet. Rec. 152, 705-712. 5 Kao, R.R., Green, D.M., Johnson, J., Kiss, I.Z., 2007. Disease dynamics over very different time- 6 scales: foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R. 7 Soc. Interface. 4, 907-916. 8 Keeling, M. J., 1999. The effects of local spatial structure on epidemiological invasions. Proc. Roy. 9 Soc. London B. 266, 859–869. 10 Keeling, M. J., Woodhouse, M. E., Shaw, D. J., Matthews, L., 2001. Dynamics of the 2001 UK foot 11 and mouth epidemic: Stochastic dispersal in a dynamic landscape. Science. 294, 813-817. 12 Keeling, M., 2005., The implications of network structure for epidemic dynamics. Theo. Pop. Bio. 67. 13 1-8. 14 Lindström, T., Håkansson, N., Westerberg, L., Wennergren., 2008. Splitting the tail of the 15 displacement kernel shows the unimportance of kurtosis. Ecology. 89, 1784–1790. 16 Meyers L. A, Newman M. E .J, Martin M., Schrag S. 2003. Applying network theory to epidemics: 17 control measures for Mycoplasma pneumoniae outbreaks. Emerg. Infect. Dis. 9:2. 18 Mollison, D., Isham, V., Grenfell, B., 1993. Epidemics: Models and Data. J. R. Statist. Soc. B. 157, 19 115-149. 20 Nadarajah, S., 2005. A generalized normal distribution. Appl. Statist. 32, 685-694. 21 O'Neill, P. D., Balding, D. J., Becker, N. G., Eerola, M., Mollison, D. 2000. Analyses of infectious 22 disease data from household outbreaks by Markov Chain Monte Carlo methods. Appl. Stat. 49, 517- 23 542. 22 1 Ortiz-Pelaez A, Pfeiffer D. U., Soares-Magalhães R.J. Guitian F.J. 2006. Use of social network 2 analysis to characterize the pattern of animal movements in the initial phases of the 2001 foot and 3 mouth disease (FMD) epidemic in the UK. Prev. Vet. Med. 75, 40-55. 4 Robinson, S.E., Christley, R.M., 2007. Exploring the role of auction markets in cattle movements 5 within Great Britain. Prev. Vet. Med. 81, 21-37. 6 Savill, N.J., Shaw, D.J., Deardon, R., Tildesley, M.J., Keeling, M.J., Woolhouse, M.E.J. , Brooks, S.P., 7 Grenfell, B.T., 2006. Topographic determinants of foot and mouth disease transmission in the UK 8 2001 epidemic. BMC Vet. Rec. 2:3 9 Streftaris, G., Gibson, G.J., 2004. Bayesian analysis of experimental epidemics of foot-and-mouth 10 disease. Proc. R. Soc. Lond. B. 271, 1111-1117. 11 Velthuis , A.G., Mourits, M.C., 2007. Effectiveness of movement-prevention regulations to reduce the 12 spread of foot-and-mouth disease in The Netherlands. Prev. Vet. Med. 82, 262-281. 13 Wasserman S., Faust K., 1994. Social network analysis: methods and applications. Cambridge 14 University Press, Cambridge. 15 Watts D.J., Strogatz S. H., 1998. Collective dynamics of “small-world” networks. Nature. 393, 440- 16 442. 17 Wilesmith, J. W., Stevenson, M. A., King, C. B., Morris, R. S., 2003. Spatio-temporal epidemiology of 18 foot-and-mouth disease in two counties of Great Britain in 2001. Prev. Vet. Med. 61, 157–170. 19 23 1 Figure captions 2 Figure 1. Generalized normal distribution with 2-dimensional variance 109 m2 (left) and 2·109 m2 and 3 kurtosis 1.5 (solid), 2 (dotted), and 4 (dashed). 4 Figure 2. Posterior probability distribution of kernel variance estimated for movement distances 5 between holdings of pigs (solid line) and cattle (dotted) for (a) distant dependent model and (b) model 6 with mixture of distance dependence and independence. Estimation is based on movements in Sweden 7 from 1st of July 2005 until 30th of June 2006. Note that x-axis scales are different. 8 Figure 3. Posterior probability distribution of kernel kurtosis estimated for movement distances 9 between holdings of pigs (solid line) and cattle (dotted) for (a) distant dependent model and (b) model 10 with mixture of distance dependence and independence. Estimation is based on movements in Sweden 11 from 1st of July 2005 until 30th of June 2006. 12 Figure 4. Posterior distribution of mixing parameter w for model assuming movements arising as a 13 mixture of both distance dependent and independent processes. Estimated for pigs (solid line) and 14 cattle (dotted) based on movements between holdings in Sweden from 1st of July 2005 until 30th of 15 June 2006. Estimated proportion of mass action mixing (independent of distance) part is 1-w. 16 Figure 5. Observed movement distances (histograms) of cattle (a) and pigs (b) with model predictions 17 of distant dependent model (dotted) and model with mixture of distance dependence and independence 18 (solid). Large axes show distances between 0 and 400 km, axes in smaller graph (embedded) show 19 distances >400 km. Model predictions are given as the mean of 1000 replicates. Data contains animal 20 movement in Sweden from 1st of July 2005 until 30th of June 2006 21 Figure 6. Histograms of four network measurements for networks (100 replicates) generated with 22 movements predicted by distance dependent model (black bars) and model with mixture of distance 23 dependence and independence (white bars). Clear differences are found for both species in Clustering 24 Coefficient and Density as well as Group Betweeness Centralization of Cattle networks. Models were 25 fitted to animal movements in Sweden from 1st of July 2005 until 30th of June 2006. 24 1 25