1 Title: 2 Bayesian analysis of animal movements related to factors at herd and between herd levels: 3 Implications for disease spread modeling. 4 Authors: Tom Lindströma, Scott A. Sissonb, Susanna Stenberg Lewerinc and Uno Wennergrena 5 a IFM Theory and Modelling, Linköping University, 581 83 Linköping, Sweden 6 b School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia 7 8 c 9 Corresponding Author: Department of Disease Control and Epidemiology, SVA, National Veterinary Institute, 751 89 Uppsala, Sweden 10 Uno Wennergren 11 Tel: +46 13 28 16 66 12 Fax: +46 13 28 13 99 13 Email: unwen@ifm.liu.se 14 Correspondence address: See above. 15 16 Key words 17 Markov Chain Monte Carlo; Hierarchical Bayesian; Mixture models; Indicator variable; Animal 18 databases; Animal movements; Contact structure 19 20 Abstract 21 A method to assess the influence of between herd distances, production types and herd sizes on patterns of 22 between herd contacts is presented. It was applied on pig movement data from a central database of 23 Swedish Board of Agriculture. To determine the influence of these factors on the contact between 1 1 holdings we used a Bayesian model and Markov chain Monte Carlo (MCMC) methods to estimate the 2 posterior distribution of model parameters. The analysis showed that the contact pattern via animal 3 movements is highly heterogeneous and influenced by all three factors, production type, herd size, and 4 distance between farms. Most production types showed a positive relationship between maximum capacity 5 and the probability of both incoming and outgoing movements. In agreement with previous studies, 6 holdings also differed in both the number of contacts as well as with what holding types contact occurred 7 with. Also, the scale and shape of distance dependence in contact probability was shown to differ 8 depending on the production types of holdings. 9 To demonstrate how the methodology may be used for risk assessment, disease transmissions via animal 10 movements were simulated with the model used for analysis of contacts, and parameterized by the 11 analyzed posterior distribution. A Generalized Linear Model showed that herds with production types Sow 12 pool center, Multiplying herd and Nucleus herd have higher risk of generating a large number of new 13 infections. Multiplying herds are also expected to generate many long distance transmissions, while 14 transmissions generated by Sow pool centers are confined to more local areas. We argue that the 15 methodology presented may be a useful tool for improvement of risk assessment based on data found in 16 central databases. 17 18 1. Introduction 19 In order to understand the process of disease spread between animal holdings, researchers are increasingly 20 studying the contact patterns that contribute to the transmission (e.g. Velthuis and Mourits 2007, Dubé et 21 al. 2009, Nöremark et al. 2009a, Vernon and Keeling 2009, Lindström et al. 2010). Analysis of the contact 22 pattern allows for making predictions about disease spread (Kao et al. 2007) as well as testing the effects 23 of changes in its contact structure (Velthuis and Mourits 2007). Different diseases may be transmitted via 24 different paths, but generally between-holding movements of animals may be regarded as the main risk 2 1 factor for transmission of livestock diseases (Févre et al. 2006, Ortiz-Pelaez et al. 2006, Rweyemamu et al. 2 2008). 3 In this paper we present a methodology to transform data of between holding contacts into probability 4 distributions useful for predictions and direct analysis. We analyze pig movements in Sweden and show 5 how it is possible to assess the influence of several factors on the contact pattern, specifically production 6 type, number of animals and distances between holdings. For the latter, it has been shown that contacts 7 between holdings are more common at short distances (Boender et al. 2007, Robinson and Christley 2007, 8 Lindström et al. 2009, Ribbens et al. 2009). Epidemiological studies often describe the probability of 9 transmission dependent on distance with a spatial kernel (Keeling et al. 2001, Tildesley et al. 2008), and 10 Lindström et al. (2009) showed that a good description of contact probabilities at both short and long 11 distances are important. As in Lindström et al. (2009) we characterize the spatial kernel by its two 12 dimensional variance (quantifying the scale) and kurtosis (quantifying the shape). A high value of the 13 kernel variance means that contacts occur more frequently at longer distances. A high value of kurtosis 14 indicates that contacts are frequent at short distances, but concurrently long distance contacts, represented 15 by a fat tail of the kernel, are also common. From ecological research it is known that leptokurtic kernels 16 (i.e. kernels with higher kurtosis than a normal distribution) may be the result of heterogeneity in the 17 dispersal processes (see Hawkes 2009 and references therein). Hence, kurtosis may be interpreted as a 18 measure of the heterogeneity of the distance related contact probability. Figure 1 shows spatial kernels 19 with different combinations of 2D-variance and kurtosis. More frequent contacts at shorter distances result 20 in spatially clustered contact patterns (Keeling 1999) which lead to depletion of local susceptibles and a 21 rapid decline in the reproductive ratio (Keeling 2005). Hence, such dynamics are expected from disease 22 transmission where contacts are estimated to be described by a kernel with a small variance. A large value 23 of kurtosis also results in such patterns, but since long distance contacts are also common, the number of 24 contacts needed for connecting otherwise distant holdings is reduced. From theory of small-world 3 1 networks (Watts and Strogatz 1998) it is known that such contacts may have substantial impact on the 2 dynamics of an outbreak. 3 Contact heterogeneity due to production types has been reported and shown to be important for the 4 dynamics of outbreaks (Dickey et al. 2008, Ribbens et al. 2009, Lindström et al. 2010). The production 5 type of a holding may be expected to influence both the number of animal movements as well as which 6 other holdings animals are moved from/to. For the latter case, clustering patterns may be expected, similar 7 to contact heterogeneities found due to spatial clustering. However, unlike the spatial factor, directional 8 differences may be expected with animal movements being more common from production type A to type 9 B than from B to A. Holdings of some production types also have many contacts (i.e. trade many animals) 10 which means that if infected, these holdings may cause a large number of secondary infections and 11 thereby function as super spreaders (Matthews and Woolhouse 2005). As concluded by Duerr et al. 12 (2007), the presence of super spreaders may cause stochastic transmission dynamics. 13 The number of animals on a holding is also expected to influence the contact pattern. Typically, larger 14 holdings are expected to trade more animals and hence have more contacts (Ribbens et al. 2009), making 15 such holdings candidates as super spreaders. However, differences may be expected between production 16 types. For example, the frequency of live animal movements to and from holdings with farrow-to-finish 17 production might not be strongly dependent on the number of animals kept on the premises, since this 18 production type includes both piglet producing units and fattening units on the same holding. 19 Commonly, between-holdings contacts are studied with network analysis (Bigras-Poulin et al. 2008, 20 Brennan et al. 2008, Nöremark et al. submitted). Such studies provide good quantitative measures of the 21 observed structure and may provide important understanding of the contact pattern and the dynamics of 22 disease transmission through the contacts (Keeling 2005, Kao et al. 2007). It may however be difficult to 23 parameterize a model from network measures and simulation models are usually confined to resampling 24 observed contacts (e.g Vernon and Keeling 2009). While the method presented in this paper addresses 25 many of the same questions as network studies, it alternatively utilizes (hierarchical) Bayesian models and 4 1 builds on methodology presented in Lindström et al. (2009) and Lindström et al. (2010). In Lindström et 2 al. (2009) a method was presented for estimation of distance-related probability of contacts, but it was 3 there assumed that all holdings are identical. Lindström et al. (2010) introduced a method that analyzed 4 the contact patterns based on production types, but other factors were excluded. In this paper present a 5 model that describes contact via live animal movements between holdings where probability of contacts 6 depends on production types, the number of animals at each holding and distance between holdings. We 7 estimate the posterior distribution of model parameters with Markov chain Monte Carlo (MCMC) 8 methods and utilize data found in central databases of animal movements. EU members and some other 9 states (e.g. Australia and New Zealand) are required to keep databases on all livestock holdings and 10 register all movements of pigs and cattle, which means that such data may be available for analysis. The 11 level of details included in the databases does however vary between countries. While data quality is a 12 problem (Nöremark et al. 2009a, Lindström et al. 2010), analysis of such data allows for investigation of 13 large scale trends in the contact patterns. One should however be aware of its limitations when drawing 14 conclusions from the analysis. 15 The aim of this paper, and the method presented, is to investigate how the contact pattern is influenced by 16 distance between holdings, herd sizes and production types. Our aim is also to investigate how the 17 influence of distance and herd size dependence differ between holdings of different production types. We 18 also aim to show how the analyzed contact pattern can be used in risk assessment. 19 20 2. Material and method 21 2.1 Data 22 The data used was supplied by the Swedish Board of Agriculture. Due to legal requirements, the analysis 23 was performed on encoded data such that the ID number of specific holdings or names of farmers could 24 not be retrieved. This prohibited the tracing of potentially unexpected contacts. Holdings that were 5 1 considered to be inactive were removed, as well as holdings that did not have spatial coordinates (see 2 Nöremark et al. 2009a for more details on this). A total number of 3084 holdings and 20231 movements 3 (carried out from July 2005 until June 2006) were included in the analysis. Movements to slaughterhouses 4 were not included in the analysis. 5 Data included the maximum capacity (i.e. the reported maximum number of animals that could be kept on 6 the premises) of each holding, recorded separately for sows and fattening pigs. If maximum capacity of a 7 demographic group was missing in the database, it was assumed that the maximum capacity was zero. 8 Such entries were mostly found for holdings with production types that are expected not to have animals 9 of that demographic group and we found that maximum capacity equal to zero was rarely entered in the 10 database. In addition, previous studies (Nöremark et al. 2009b) has shown a better consistency between 11 larger holdings and the entries in the database, indicating that while 0 may not be accurate in some 12 instances, a low rather than a high number is to be expected. Seven production types were included in the 13 study: Sow pool centers, Sow pool satellite, Farrow-to-finish, Nucleus herd, Piglet producer, Multiplying 14 herd and Fattening herd. When reported by the farmer, the form has an option for free text. Holdings that 15 only had this information entered were placed in a group denoted “Missing information”. For more details 16 on the included production types, the pig farming structure of Sweden and how the data is entered in the 17 data base, see Nöremark et al. (2009a) and Lindström et al. (2010). 18 19 2.2 Model and parameter estimation 20 Data include production types of holdings, and this information if included in the model by a matrix πΉ of 21 size π × πΎ, where π is the number of holdings and πΎ is the number of production types (including the 22 artificial type Missing information). We denote π ππ = 1 if holding π has production type π and π ππ = 0 23 otherwise, for π = 1, … , πΎ and π = 1, … , π. Data also includes spatial coordinates of holdings and these 24 are translated into a distance matrix, π« of dimensions π × π, where π·ππ is the Euclidian distance between 6 1 holdings π and π. Herd sizes of pig holdings are measured by the maximum capacity of fattening pigs and 2 sows, and these are denoted πΊπ and πΊπ (both vectors with π elements), respectively. When we refer to 3 either of these demographic classes we write πΊ, and ππ’π refers to size π’ (π’ = 1,2) of holding π. We use 4 notation such that each movement, π‘ (π‘ = 1,2, … , π, where π is the number of movements), has a start 5 holding π π‘ and destination holding ππ‘ and vectors π and π (both with π elements) refers to all start and 6 destination holdings. 7 We want to estimate how contact probabilities depend on the factors herd size, production type and 8 distance between holdings. As in Lindström et al. (2010), we assume that holdings with more than one 9 type will behave as some mixture of each type, and rather than assuming that a holding will behave as an 10 equally weighted mixture of each type ,we estimate how much different production types will determine 11 the behavior of a holding. This is estimated with a parameter vector π with πΎ − 1 elements (see below for 12 explanation of this) where ∑π π£π = 1. A high value of π£π indicates that production type π has a large 13 influence on the contact pattern of a holding that has reported this type concurrently with other types. 14 Also, as in Lindström et al. (2010), dependence on production type was modeled with a parameter matrix 15 π, of dimensions πΎ × πΎ with ∑πΌπ½ βπΌπ½ = 1, where the value of βπΌπ½ is a measurement of how common 16 movements are from holdings of production type πΌ to holdings with type π½. Accordingly π takes into 17 account that some production types are more common than others and the elements are referred to as 18 commonness indices. We expand the analysis of Lindström et al. regarding production types and also give 19 estimates of the absolute number of movements between production types and refer to this as πΈ, of 20 dimensions πΎ × πΎ, where ππΌπ½ are the estimated number of movements from type πΌ to π½. Some production 21 types are reported much more frequently than others and hence the estimates of πΈ and π provide different 22 and complementary insight to the contact pattern and the role of different holdings in a potential disease 23 outbreak. 7 1 Dependence on sizes πΊπ and πΊπ was modeled as a power function with parameters πΜ and πΜ , both of 2 dimension 2 × πΎ, where πΜπ’π and πΜπ’π (π’ = 1,2, corresponding to sizes πΊπ and πΊπ , respectively, and π = 3 1, … , πΎ) is the size dependence of type π for sending and receiving contacts, respectively. In the 4 following, we use notation π where we wish to refer to either πΜ or πΜ . If ππ’πΌ = 0, there is no size 5 dependence for size π’, and ππ’πΌ < 0 (ππ’πΌ > 0) indicates a negative (positive) relationship between size and 6 contact probability. For ππ’πΌ = 1, there is approximately a linear relationship such that e.g. twice as many 7 animals results in a twice as high probability of contacts. 8 Distance dependence is modeled with a spatial kernel, which may be characterized by its variance (π) and 9 kurtosis (π ), measuring the scale and shape, respectively (Lindström et al. 2008, 2009). We assume a 10 rotationally symmetric distribution and define the 2-D variance (in the below, this is what we refer to by 11 variance) as the second moment around zero (i.e. raw moment) of the radial distance. Kurtosis is 12 analogously defined by the fourth raw moment divided by the square of the second raw moment, 13 following suggestion from Clark et al. (1999). Contact probabilities dependent on distance between 14 holdings may differ depending on production types and we therefore estimate ππΌπ½ and π πΌπ½ (indicating 15 elements of matrices π½ and πΏ, respectively, of dimensions πΎ × πΎ) for every combination of πΌ, π½. 16 However, the underlying processes (e.g. economical and social) are not completely different. For such 17 systems it is suitable to use a hierarchical model (Lee 2004) where the parameters, in this case the 18 elements of π½ and πΏ, have a hierarchical prior with a set of unknown hyper-parameters. This approach has 19 the benefit that it improves the estimation of parameters where the data is weak, a concept known as 20 “borrowing strength” (Gelman et al. 2004). If it may be argued a priori that the parameters are not 21 completely unrelated, then the estimation of one parameter may be informed by the estimation of other 22 parameters. If there is in fact little similarity between the parameters, the hierarchical prior will have little 23 influence on the estimations of π½ and πΏ, indicating that there is little similarity between the parameters. 24 In Lindström et al. (2009) it was shown that data was better estimated when movements were modeled as 25 arising from a mixture of distance dependent and mass action mixing (MAM) processes. In that study all 8 1 other factors were excluded and a single kernel was used to describe all contacts. To simplify the model 2 and reduce the number of parameters in this study we exclude the MAM part. As we here include other 3 factors and let ππΌπ½ and π πΌπ½ be different for different production types πΌ, π½ we assume that the model can 4 account for factors that that could not be estimated with a single spatial kernel. To test these assumptions 5 we visually compare the predicted and observed movement distances (see below). 6 7 2.2.1 Model specifics 8 A holding π is assumed to consist of a proportion π£Μππ of each production type π, and is determined by π 9 and πΉ through π ππ π£π for π ≠ π ∑π π ππ π£π } if π ππ = 0 π£Μππ = 0 for π = π π£Μππ = (1) π£Μππ = 0 for π ≠ π } if π ππ = 1 π£Μππ = 1 for π = π 10 for π = 1, … , πΎ − 1 where π ππ = 1 if holding π has type π and π ππ = 0 otherwise. Production type π is 11 an artificial type introduced for holdings with missing information. This is never shared with any other 12 type (as it then would not be missing) and is therefore excluded from π. Hence, π has πΎ − 1 rather than πΎ 13 elements. It is possible to formulate a model where the missing production types of holdings are estimated 14 by writing a joint distribution of parameter estimates and unobserved production types. However, as the 15 farmers of these holdings have chosen not to report any of the seven included production types, we may 16 not assume that these holdings in fact are of any of these types. We rather interpret that this group mainly 17 contains holdings that for different reasons does not fit into any of the listed types. We therefore include 18 them in the analysis as an artificial type and expect the holdings included in the group to be 19 heterogeneous. 9 1 We used a model formulation π(ππ‘ , π π‘ |π½) = ∑ ∑ π(ππ‘ |π π‘ , πΌ, π½, π½π )π(π π‘ |πΌ, π½π )π(πΌ, π½|π½π ), πΌ (2) π½ 2 where π½π , π½π and π½π are subsets of π½ and refers to particular sets of parameters, yet to be defined. To 3 clarify, we use the indication with π½ here to give a more transparent description of the general outline of 4 the model. Equation 2 should be interpreted such that for movement π‘, the probability of destination 5 holding ππ‘ is conditional on the start holding π π‘ and the production types of π and π, denoted πΌ and π½ 6 respectively. Start holding π π‘ is conditional on production type πΌ of the start holding. The joint distribution 7 π(ππ‘ , π π‘ | … ) is a mixture distribution and (since holdings may have more than one type) the probability is 8 summed over all types with ∑πΌ ∑π½ π(πΌ, π½|π½π ) = 1. We use a probability function π(πΌ, π½|π½π ) = 9 π(πΌ, π½|π, π, πΉ) as introduced in Lindström et al. (2010) π(πΌ, π½|π, π, πΉ) = 10 ΜπΌ π ΜΜπ½πΌ βπΌπ½ π ΜπΌ π ΜΜπ½πΌ ∑πΌ ∑π½ βπΌπ½ π , (3) where conditionally on π and πΉ we define π£Μ ΜπΌ = ∑ π£ΜππΌ , π ΜΜπ½πΌ = ∑ π£Μππ½ (1 − ππΌ ), π ΜπΌ π π (4) π 11 ΜπΌ and π ΜΜπ½πΌ may be interpreted as measurements of the where π£Μππ is given by equation 1. The quantities π 12 amount of each production type at a holding, taking into account that holdings may not be of only one 13 ΜΜπ½πΌ is adjusted to account for exclusion of type (if more than one type is reported). The quantity π 14 movements ending up at the same destination as the start holding. 15 The distribution of π π‘ conditional on type πΌ (i.e. π(π π‘ |πΌ, π½π ) of equation 2) is modeled as π(π π‘ |π, πΉ, πΌ, πΊ) = π£Μπ πΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) , ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) 10 (5) 1 where πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) is a function describing dependence on sizes π1π , π2π of holding π, and πΜ1πΌ , πΜ2πΌ 2 are the parameters determining the size dependence of sizes π1π , π2π , respectively, for production type πΌ. 3 As in the FMD model presented in Tildesley et al. (2008), we assume that contact probability dependence 4 on herd size may be modeled as a power function and we write πΜ1πΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) = (π1π + 1) πΜ2πΌ (π2π + 1) . (6) 5 We use π1π + 1 rather than just π1π to avoid π(π |π, πΉ, πΌ, πΊ) = 0 if ππ’π = 0 for any π’ = 1,2 (e.g. Fattening 6 herds are expected to have no sows). 7 The probability distribution of π conditional on types πΌ, π½ and start holding π π‘ (i.e. π(ππ‘ |π π‘ , πΌ, π½, π½π ) of 8 equation 2) is dependent on both sizes, πΊ, and distances, π« and given by π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ ) = 9 π£Μππ½ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) , ∑π π£Μππ½ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) (7) for π ≠ π . The destination of a movement may not be the same holding as the start, 10 π(π π‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ ) = 0. Recall that πΜ is the equivalent of πΜ but used for modeling of 11 contact probability of incoming movements. The function πΉ is used for modeling of dependence of 12 between herd distance. As in Lindström et al. (2009), a generalized normal distribution is used. We write πΉ(π·ππ , π πΌπ½ , ππΌπ½ ) = π·ππ π −( ) π π , (8) 13 where the relationships between π, π and π, π are given for two dimensional kernels in Lindström et al. 14 (2008) as 4 6 2 π€( ) π€( )π€( ) π π π . π=π ,π = 2 2 4 π€( ) π (π€ (π)) 2 11 (9) 1 For continuous functions, equation 8 is normalized by 2ππ2 π€(2⁄π)⁄π. This cancels out in equation 7 and 2 normalization is instead performed by summation of the functions over all possible destination holdings 3 (see equation 7). 4 Writing the full formulation of the joint probability distribution of ππ‘ and π π‘ (equation 2) we get π(ππ‘ , π π‘ |π, π, πΉ, πΊ, π«, π, πΏ, π½) = ∑ ∑ π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ )π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° )π(πΌ, π½|π, π, πΉ). πΌ (10) π½ 5 To improve estimation of parameters πΏ and π½ we implement hierarchical priors as described in Appendix 6 A. In particular, this allows for improved estimation of parameters where the data is weak, i.e. where few 7 movements are recorded between the production types. Estimation of parameters is performed with 8 MCMC, and an indicator variable is introduced to aid computation (see Appendix B). This indicator 9 variable is also used to calculate the posterior distribution of the number of transports between production 10 types, denoted πΈ, of dimensions πΎ × πΎ, where ππΌπ½ refers to the estimated number of movements from 11 production type πΌ to π½. Note that πΈ is introduced because many holdings have several production types, 12 and therefore the exact numbers of movements between holdings of different production types are 13 unobserved. 14 15 2.2.2 Comparing observed and predicted movement distances 16 As we altered the model of Lindström et al. (2009) and removed the MAM part of the spatial kernel, we 17 compared the observed movement distances with the predictions under the model presented above. Note 18 that while this is a simplification of the kernel function, we are adding complexity by using a different 19 kernel for every combination of production types of the sending and receiving holding. The predicted 20 distances were obtained by generating π animal movements with the model described in section 2.2.1. 12 1 Two thousand replicates were generated; each parameterized by a random draw from the posterior 2 distribution (based on the MCMC output) and the mean cumulative distribution was calculated and 3 compared to the cumulative distribution of observed distances. 4 5 2.3 Simulation 6 To demonstrate how the analyzed contact pattern may be used for risk assessment we performed a 7 simplistic simulation with holdings in the analyzed database as infective units. The aim was to study the 8 effect of the observed contact pattern and not provide estimates for any specific disease. All other contacts 9 between holdings were excluded as well as intra-herd dynamics and recovery. Hence, an infected holding 10 remained infectious for the entire time of the simulation. Simulations were initiated with one randomly 11 infected holding and we simulated posterior predictive movements with probabilities given by equation 10 12 and parameterized the model with random draws from the posterior distribution. If a movement was 13 simulated from an infected holding A to a susceptible holding B, the latter was assumed to become 14 instantaneously infected and any subsequent movements from B to a susceptible holding C were assumed 15 to result in transmission. Movements between two already infected holdings were not assumed to generate 16 a new infection. 17 We ran the simulation 1000 times and simulated πΜ = 40462 movements for every replicate. For each 18 replicate and infected holding we recorded the number of first, second and third degree infections. These 19 are defined such that if holding A infects B, this is a first degree infection of A. If B subsequently infects 20 C, this is a second degree infection of A and if C later infects D, this is a third degree infection of A. The 21 number of infections caused by every infected holding were recorded at 4 weeks after becoming infected, 22 corresponding to a total of 1556 movements (as 20231 holdings was analyzed for the one year period). 23 Long distance transmissions are often of particular interest and therefore we also recorded the number of 24 first degree infections caused by movements longer than 10, 100 and 500 km. 13 1 We analyzed the results using Generalized Linear Models (GLM). Since the response variable was 2 measured as the number of infections, which are natural numbers (0, 1, 2…), we used a Poisson error 3 distribution with a log link function. The response variable π was a vector of π elements where π = 4 ∑π π€π and π€π is the total number of holdings for replicate π (π = 1, … ,1000) infected earlier than 5 movement πΜ − 1556. Holdings infected later were excluded as the number of infections after 1556 6 consecutive movements could not be recorded. For most analyses we focus on production types only and 7 for this we used a dummy variable πΏ, of dimension π × πΎ, as predictor. The elements ππΜπ = 1 if holding 8 πΜ (πΜ = 1, … π) was reported to have production type π. 9 To demonstrate the effect of the maximum capacity we divided the holdings into classes, denoted small, 10 medium and large, respectively. We used the definitions given in Nöremark et al. (2009b) such that a 11 holding is small if the maximum capacity of sows < 15 and slaughter pigs < 300, large if sows > 299 or 12 slaughter pigs > 4999 and is otherwise medium. Hence, 3πΎ combinations of production types and size 13 classes were obtained and similarly to πΏ, we used a dummy variable π, of size π × 3πΎ, as predictor 14 where ππΜπ§ = 1 if holding πΜ is reported to belong to combination π§ (π§ = 1, … ,3πΎ). Note that while a 15 holding may have more than one production type, it will always belong to exactly one size class in this 16 analysis of the simulation. No holding with “Missing information” was classified as large and so this 17 combination was excluded from the analysis. 18 Since we simulate a large number of replicates and each replicate may include many infected holdings, 19 significance is less relevant. We instead look at the coefficients of the parameters given by the GLM. A 20 large positive value should be interpreted such that if an infected holding is reported to belong to a 21 production type (or type and size class in the analysis where the latter is included), it is expected to 22 generate a large number of new infected (by the analyzed degree) holdings. A large negative value means 23 that holdings with the type are expected to generate few infections. 24 14 1 3. Results 2 3.1 Parameter estimates of contact probabilities 3 3.1.1 Weight of production types, π 4 Table 1 shows estimates of π, modeling dominance of production types in determining the contact pattern 5 of a holding. The highest value was estimated for Sow pool centers and the lowest for Fattening herds. 6 7 3.1.2 Size dependent parameters, π 8 Table 2 shows estimates for parameters π, determining how the maximum capacity of the holdings 9 influences the contact pattern. Most estimates (23 of a total 32) showed a clear positive relationship 10 between size and contact probabilities (i.e. π = 0 is not included in the 95% central credibility interval) 11 while 7 estimates showed a negative relationship. Of the negative relationships, 3 were found for estimates 12 of the artificial type “Missing information”. 13 14 3.1.3 Movements between production types, π and πΈ 15 Table 3 lists estimated values of the most common movement, defined by either π or πΈ. The estimated 16 values of commonness indices, π, showed large resemblance to the estimates given in Lindström et al. 17 (2010). The five highest mean estimates were found for (in decreasing order) movements from 18 Multiplying herds to Sow pool centers, Nucleus herds to other Nucleus herds, Nucleus herds to 19 Multiplying herds, Sow pool centers to Sow pool satellites and Sow pool satellites to Sow pool centers. 20 Estimates of πΈ showed that animals were most frequently moved between (in decreasing order) Piglet 21 producers to Fattening herds, Sow pool satellites to Fattening herds, Multiplying herds to Piglet producers, 22 Farrow-to-finish to Fattening herds, Sow pool center to Sow pool satellite, Sow pool satellite to Sow pool 15 1 center and Multiplying herds to Farrow-to-finish. We list the top 7 rather than five to avoid the false 2 connotation that movements from Sow pool center to Sow pool satellite is more frequent than Sow pool 3 satellite to Sow pool center. These were ranked as number 5 and 6 and this ranking only differed by one 4 movement. Also we include estimates for the seventh highest, Multiplying herds to Farrow-to-finish, as 5 this showed the highest estimates for kernel variance (see below). 6 7 3.1.4 Distance related parameters, π½ and πΏ 8 We only include the estimates of π½ and πΏ for the most common movements (given above) and list these in 9 table 3. Variance of the spatial kernel, π½, is a measure of the scale at which contacts occur and a high 10 value of ππΌπ½ indicate that long distance movements are common from holdings of production type πΌ to π½. 11 The highest estimate was found for movements from Multiplying herds to Farrow-to-finish holdings and 12 the lowest was found for movements between Sow pool centers. 13 Kernel kurtosis, πΏ, is a measure of the difference in movement distances. A high value indicates that there 14 are many short distance movements but concurrently many at long distance, and low value indicates that 15 movement distances are mode uniform. In Lindström et al. (2009), where differences in production types 16 were excluded, the kernel kurtosis was estimated at 32.6 with 95% central credibility interval (29.2, 36.4). 17 Of the 64 estimates of π πΌπ½ in this study, 49 showed strong evidences for lower values (non overlapping 18 central credibility intervals) and 2 showed strong evidence for higher. 19 Figure 2 illustrates the cumulative distribution of observed and predicted movement distances. 20 21 3.2 Simulation of disease transmission 16 1 Figure 3 shows the coefficients estimated by the GLM’s as described in section 2.3. Large positive values 2 indicate that herds with this characteristic are to generate a large number of new infections and negative 3 values indicate that a herd is expected to generate few transmissions. Error bars were small and is 4 excluded in the picture for clarity. Sow pool centers, Nucleus herds, Sow pool satellites and Fattening 5 herds were estimated to have increased (relative to other production types) risk of generating large number 6 of new infections when higher degree infections are accounted for (Figure 3a). Piglet producers were 7 estimated to generate fewer infections when accounting for higher degree infections. Sow pool centers and 8 satellites as well as holdings of the artificial type Missing information were estimated to have lower 9 probability of generating new infections when studying long distance transmissions (Figure 3b). Holdings 10 with Multiplying herds and Farrow-to-finish were estimated to have higher risk of long distance 11 transmission. 12 For most production types, larger holdings were estimated to have a higher risk of causing new infections 13 (Figure 3c). No holdings with type Missing information were classified as large, but the risk of infecting 14 other holdings were estimated to be lower for Medium than Small holdings. Also, Large Sow pool centers 15 had a lower coefficient than Medium holdings reported with the same production type. When observing 16 the data we found that no large holding reported with production type Sow pool centers were reported as 17 only this type. 18 19 4. Discussion 20 In this paper we have presented a hierarchical Bayesian model for analysis of contacts between holdings, 21 applied to movements of pigs in Sweden. The analysis revealed a highly heterogeneous contact structure. 22 Holdings of different production types were estimated to differ in the number of contacts and with what 23 other production types the contacts occurred. This was demonstrated both in the estimates of commonness 24 indices, π, as well as the estimated absolute number of movements, πΈ. Absolute number of movements, 17 1 which was not included in Lindström et al. (2010), provides additional information about which contacts 2 are important. Posterior distribution of π were similar to the estimates given in Lindström et al (2010). 3 Also, estimates of π showed little difference to the results of that study, but the Sow pool centers were less 4 dominant in determining the contact pattern of a holding when the model was extended. However, it was 5 still the dominant production type when reported concurrently with other types. Posterior means of π were 6 estimated at more than four times larger than the second most dominant type, Sow pool satellite, and 81 7 times larger than the least dominant type, Fattening herd. 8 Differences were also shown in how the probability of contacts depends on the maximum capacity of 9 holdings (estimated with π), although most production types showed a positive relationship between 10 maximum capacity and the probability of both incoming and outgoing movements. This was found for 11 both demographic groups (sows and fattening pigs). The few negative relationships are generally 12 unexpected results. Most negative estimates were small, but for Nucleus herds a large negative value was 13 found for incoming movements depending on the maximum capacity of sows. This, as well as other 14 inconsistencies, may be a result of the reporting system. As the production type is reported by the farmer, 15 and no proper definition of the production types are provided, it leaves room for interpretation. Previous 16 studies (Nöremark et al. 2009b) have reported that farmers with small herds had a different interpretation 17 of their type, and, in particular, it was found that they sometimes regarded themselves as breeders even 18 though only a few sows where kept on the premises. Moreover, there is no requirement for updating the 19 information in the database if the production type is changed. Thus, the information on production type 20 may be incorrect in the database. While Nucleus herds (by common definition) generally receive few 21 animals but send many, this may not be the case for small herds (here mainly few sows). Nucleus herds 22 are also expected to have a low maximum capacity of slaughter pigs, and herds with large numbers for this 23 demographic group might also behave differently, which may explain the large positive estimates of πΜ π for 24 Nucleus herds (Table 2). The holdings in the group Missing information showed a negative relationship 25 between the number of contacts and the maximum capacity in 3 out of 4 estimates. We believe this was a 18 1 result of the heterogeneous nature of this group, which contains all farms not reported to belong to any of 2 the other seven types. Farrow-to-finish holdings showed a slightly lower probability of both incoming and 3 outgoing movements with larger maximum capacity of fattening pigs. This is not entirely surprising as 4 large herds in this category may be expected to produce piglets that are kept in the herd until fattening and 5 thus the main movement would be animals sent to slaughter (which was not included in this study). 6 The analysis of π½ and πΏ also showed considerable difference in how contact probability is influenced by 7 the distance between holdings. Comparing the estimates of πΏ to the estimates of Lindström et al. (2009) 8 where production type differences were ignored, we found that the kernel kurtosis for movement between 9 production types was generally lower. This supports the interpretation of kurtosis as a measurement of the 10 heterogeneity of the distance related processes resulting in movements between holdings. Also, by 11 including production type differences we found a good fit between observed and predicted movement 12 distances (Figure 2). Of the more common types (i.e. high values of π or πΈ listed in Table 3), the largest 13 mean posterior of πΏ was found for movements from Multiplying herds to Farrow-to-finish holdings. High 14 kurtosis was also found for movements from Farrow-to-finish to Fattening herds. This may be interpreted 15 that the contacts between these types are the result of heterogeneous processes. This may partly be 16 explained by the fact that in Farrow-to-finish herds with a limited capacity for fattening pigs, some piglets 17 are sold to fattening herds, while Farrow-to-finish herds with larger fattening units keep all their piglets. 18 Thus, it is not only the herd size but the relative size of the piglet producing and fattening units in the herd 19 that affects the contact pattern of this herd category. Moreover, there is a trend towards specialized 20 production units in pig farming and some herds registered as Farrow-to-finish may have changed their 21 production into either piglet production or fattening pigs without this being recorded in the database. The 22 lowest value in Table 3 was found for movements from Sow pool centers to Sow pool satellites, indicating 23 that such movements are the result of homogeneous processes. The sow pool system is a very specific 24 production type with pregnant sows moved from the center to the satellite for farrowing, and subsequent 25 movement after weaning back to the center for insemination. Given this system, the movement pattern 19 1 from centers to satellites should be very similar to the pattern of satellites to centers. This is found for π½, π 2 and πΈ (we believe this gives credibility to the model), but while πΏ is low also for movements from 3 satellites to centers, it is higher than for centers to satellites. Hence, satellites are more homogeneously 4 distributed around the centers than the converse, which is expected if the center is found geographically 5 situated in the center of its satellites. 6 Movements from Multiplying herds were estimated to have high variance (π½), with the highest estimate of 7 the study found for Multiplying herds to Farrow-to-finish. This indicates that long distance movements are 8 common compared to other production types, and from a disease transmission perspective, Multiplying 9 herds may cause long distance transmission. Movements between Sow pool centers and satellites are 10 found to have low estimates of π½, indicating that while many movements occur between these types (both 11 in absolute number and relative to their abundance), these movements are of relatively short distance. 12 Extrapolating the kernel variance estimates to implications about disease transmission we may expect that 13 an infected e.g. Sow pool center will cause few long distance transmissions, while a Multiplying herd (if 14 infected) may rapidly increase the range of an emerging disease. This was also found in the simulation 15 study. The coefficients of Sow pool centers (Figure 3b) decreased with distance but increase for 16 Multiplying herds. Coefficients also increased for Farrow-to-finish holdings, which had a high estimated 17 value of π½ for the main contact type, Fattening herds. Generally we expect contacts between types 18 estimated with high a π½ to be particularly important in later stages of outbreaks. When local susceptibles 19 are depleted (due to becoming infected from more local transmission), long distance contacts may spark 20 new infection where depletion has not yet occurred. This dynamic was found e.g. in the UK 2001 FMD 21 outbreak (Keeling 2001). 22 Matthews and Woolhouse (2005) argued that animal markets acted as super spreaders in this outbreak. 23 Such markets are rare in Sweden, and identification of possible super spreaders in the system may instead 24 focus on holdings of different production types. However, by international standards the animal farming is 20 1 less intensive, and movements are less frequent. Our analysis of holdings as potential super spreaders 2 should therefore be interpreted as relative to other holdings in the system. Figure 3a shows how the 3 potential of generating new infections changes with the infection degree for the different production types. 4 This is mainly a result of the estimates of π and πΈ, and the largest increase was found for Nucleus herds. 5 These mainly move animals to other Nucleus herds and Multiplying herds, which both in turn have many 6 contacts. Piglet producers showed a lower potential (compared to other production types) of generating 7 new infections when looking at higher degree infections. The general pattern is however quite similar for 8 different infection degrees, and we may conclude that production types with high estimates of π for 9 outgoing movements may act as super spreaders. For many diseases where the time between infection and 10 first symptoms are short, second (and consequently third) degree transmission may be prevented by 11 movement restrictions. While early detection is always crucial, our results suggest that the importance 12 should be even more emphasized for some production types, such as Nucleus herds and Sow pool centers. 13 Fattening herds usually only send animals to slaughterhouses, which are excluded in this analysis, and this 14 results in the low estimated coefficient for this type. The increased coefficients for higher degree 15 infections shown for Fattening herds in Figure 3a may be explained such as that while movements from 16 such herds are rare, when they do occur the animals are sent to holdings with many contacts. However, it 17 may also be a result of deficiencies in the recorded data, as pure fattening herds should by definition only 18 send animals to slaughter and nowhere else. 19 Further, the analysis of the simulation study showed that for most production types, infected holdings in 20 the larger size classes were likely to generate more transmissions. Hence, we may conclude that larger 21 holdings generally have higher potential to act as super spreaders. Exceptions were found for holdings 22 with Missing information (which is expected due to the negative relationship between maximum capacity 23 and contact probability, see above) and Sow pool satellites. The latter showed a decrease in the coefficient 24 given by the GLM for Large holdings. We believe this is a result of the fact that this size class contained 25 no holdings reported with only this production type, and the number of transmissions was largely 21 1 explained by the coexisting production types. Inclusion of all interaction effects would require 147 2 additional coefficients to be estimated (3 size classes and 7x7 combinations of production type, excluding 3 the Missing information group which is never shared with other types) to be estimated and for 4 transparency we fitted a GLM with only presence or absence of production types. It should be stressed 5 that the Missing information group it is not a true type and it needs to be addressed with caution. 6 However, the results of the simulation study show that the holdings in this group generally have a low risk 7 in disease transmission via animal movements. 8 In the analysis of the simulation model we grouped holdings in size classes to give an overview of the 9 system. The aim was to demonstrate how the model and estimated parameters may aid the estimation of 10 risks for disease transmission. If the interest instead is the risk of an individual holding, it may be relevant 11 to include as much information as possible, which is why we choose to include the reported maximum 12 capacity of sows and slaughter pigs separately in the contact model. By simulating contacts (and possible 13 transmission) from individual holdings, the model may be used to estimate the different levels of risk for a 14 holding based on its characteristics such as herd size and production type, before any disease outbreak. If 15 an outbreak occurs, the estimated risk may be used for prioritizing control efforts and as a complement to 16 contact tracing. While there is no substitute for the latter, there is a lag time in the entries of animal 17 movements into databases (Nöremark 2009b) and interviews with farmers and hauliers may take a long 18 time to complete. Hence, a simulated contact pattern may give quick assessments of between which 19 holdings contacts are likely to have occurred in the type of holding that is affected, and the geographic 20 range of possible transmissions (including higher degree infections). 21 Using databases of holdings and animal movements to estimate parameters allows for assessment of large 22 scale patterns. Also, unlike qualitative studies, where inference is made from a few handpicked holdings 23 checked for consistency, the parameters are estimated from the same type of data that may be used for 24 outbreak simulations. If, for example, parameters are estimated from 100 holdings with every trait 25 checked and edited in great detail, they may not be used for modeling contacts of holdings that have not 22 1 been checked in the same way. However, erroneous and dubious reports pose a problem. In order to 2 provide better estimates, the data quality needs to be improved. Better guides to farmers, as well as a 3 requirement for regular updates of recorded information may provide more reliable data. In particular, the 4 interpretation of production type is of great importance for work such as in this study. Production type has 5 a high influence on the contact patterns and could be of great use in risk estimation if the data is reliable. 6 Working with data from central databases always means a risk of erroneous reports affecting the results. 7 Using the same data as in this study, Lindström et al. (2010) reported unexpectedly many movements 8 between Sow pool centers, a highly unlikely event and probably due to deficiencies in the recorded data 9 on production type. As these are rare production types but with many movements, they are particularly 10 sensitive to erroneous entries in the data base and this may have affected the results. 11 A more realistic simulation model of a disease outbreak should include other relevant contact types as 12 well as disease specific parameters, such as incubation time, recovery rate and intra-herd dynamics. In 13 addition, while the model presented here is cumbersome in the number of parameters, there are further 14 aspects of the contact structure that are not included. Reoccurring contacts between holdings may be 15 expected, and in particular this is true for the holdings in the Sow pool system. The number of contacts 16 with susceptible holdings will decrease more rapidly if a large number of contacts occur between the same 17 holdings. A low variance of the spatial kernel, as reported for contacts between Sow pool centers and 18 satellites, results in a high probability of contacts with nearby holdings, and a high rate of reoccurring 19 contacts are expected to occur in the simulation study. However, since it is not incorporated explicitly, we 20 believe this may have caused an overestimation of the number of infections caused by Sow pool centers 21 and satellites. While it requires estimation of additional parameters, this may be a salient extension of the 22 model. 23 While the contact model presented may be improved, we believe that much of the relevant features are 24 included. If data of other contacts are available, the method may also be applied to these. Using 25 generalized measures of contact probabilities (such as the parameters of the presented contact model) 23 1 allows for the comparison, both between different holdings but also between different types of contacts. 2 Also, if the same model is applied to data of other countries, comparisons of parameters may inform us of 3 differences in the contact structure. We also believe that the model may be used for risk assessment as a 4 complement to classic methods, but there is a need for better data quality for reliable inference. Yet, an 5 analysis as the one presented may also guide towards increased data quality in the future. 6 7 5. Conclusion 8 In this study we have analyzed live animal movements between pig holdings and found that the contact 9 pattern is highly heterogeneous. We found that generally, but not always, a positive relationship exists 10 between the maximum capacity of a holding and the number of contacts. 11 Describing distance dependence with a spatial kernel and analyzing its characteristics provides valuable 12 information about the contact pattern between holdings and is a main feature in predicting disease spread. 13 Hence the more detailed knowledge gained by the methodology presented may improve both knowledge 14 and predictive power. We found that the probability of contacts between holdings dependent on distance 15 was influenced by the production type of the start and end holding. 16 Heterogeneous contact patterns, with some holdings likely to act as super spreaders, and differences in the 17 probability of long distance contacts is expected to cause stochastic dynamics of a disease outbreak where 18 animal movements are important for the transmission. 19 20 Conflict of interest 21 We have no conflict of interest. 22 24 1 Acknowledgement 2 We thank the Swedish Board of Agriculture for supplying the data and Swedish Civil Contingencies 3 Agency for funding 4 5 References 6 Bigras-Poulin, M., Thompson, R.A., Chriel, M., Mortensen, S., Greiner, M., 2006. Network analysis of 7 Danish cattle industry trade patterns as an evaluation of risk potential for disease spread. Prev. Vet. Med. 8 76, 11–39. 9 Boender, G.J., Meester, R., Gies, E., De Jong, M.C.M., 2007. The local threshold for geographical spread 10 of infectious diseases between farms. Prev. Vet. Med. 82, 90–101. 11 Brennan, M.L., Kemp, R., Christley, R.M. 2008. Direct and indirect contacts between cattle farms in 12 north-west England. Prev. Vet. Med. 84, 242–260. 13 Casella, G., George, E.I., 1992. Explaining the Gibbs sampler. Am. Stat. 46, 167–174. 14 Clark, J. S., Silman, M., Kern, R., Macklin, E., HilleRisLambers, E. 1999. Seed dispersal near and far: 15 generalized patterns across temperate and tropical forests. Ecology 80, 1475–1494. 16 Chib, S., Greenberg, E., 1995. Understanding the Metropolis-Hastings algorithm. Am. Stat. 49, 327–335. 17 Dickey, B.F., Carpenter, T.E., Bartell, S.M., 2008. Use of heterogeneous operation-specific contact 18 parameters changes predictions for foot-and-mouth disease outbreaks in complex simulation models. Prev. 19 Vet. Med. 87, 272–287. 25 1 Dubé, C., Ribble, C., Kelton, D., McNab, B., 2009. A review of network analysis terminology and its 2 application to foot-and-mouth disease modelling and policy development. Transbound. Emerg. Dis. 56, 3 73–85. 4 Duerr, H.-P., Schwehm, M., Leary, C.C., de Vlas, S.J., Eichner, M. 2007. The impact of contact structure 5 on infectious disease control: influenza and antiviral agents. Epidemiol. Infect. 135, 1124–1132. 6 Fan Y., Dortet-Bernadet, J.-L. and Sisson, S. A. 2010. A note on Bayesian curve fitting via auxiliary 7 variables. Journal of Computational and Graphical Statistics. DOI: 10.1198/jcgs.2010.08178 8 Févre, E.M., Bronsvoort, B.M.de C., Hamilton, K.A., Cleaveland, S., 2006. Animal movements and the 9 spread of infectious diseases. Trends Microbiol. 14, 125–131. 10 Gamerman, D., Lopes, H.F., 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian 11 Inference, second ed. CRC Press, Chapman & Hall. 12 Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis, second ed. Chapman & 13 Hall/CRC (Chapter 18). 14 Hawkes, C. 2009 Linking movement behaviour, dispersal and population processes: is individual variation 15 a key? J. Anim. Ecol. 78, 894–906. 16 Kao, R.R., Green, D.M., Johnson, J., Kiss, I.Z., 2007. Disease dynamics over very different time-scales: 17 foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R. Soc. Interface 18 4, 907–916. 19 Keeling, M.J., 1999. The effects of local spatial structure on epidemiological invasions. Proc. R. Soc. 20 London B 266, 859–869. 21 Keeling, M.J., Woodhouse, M.E., Shaw, D.J., Matthews, L., 2001. Dynamics of the 2001 UK foot and 22 mouth epidemic: stochastic dispersal in a dynamic landscape. Science 294, 813–817. 26 1 Keeling, M., 2005. The implications of network structure for epidemic dynamics. Theo. Pop. Bio. 67, 1–8. 2 Lee, P. M., 2004. Bayesian statistics: An introduction (3rd ed.). New York: Hodder/Oxford University 3 Press. 4 Lindström, T., Håkansson, N., Westerberg, L., Wennergren, U., 2008. Splitting the tail of the 5 displacement kernel shows the unimportance of kurtosis. Ecology 89, 1784–1790. 6 Lindström, T., Sisson, S.A., Nöremark, M., Jonsson A., Wennergren, U., 2009. Estimation of distance 7 related probability of animal movements between holdings and implications for disease spread modeling. 8 Prev. Vet. Med. 91, 85–94. 9 Lindström, T., Sisson, S.A., Sternberg Lewerin, S., Wennergren, U., 2010. Estimating animal movement 10 contacts between holdings of different production types. Prev. Vet. Med. In press. 11 Matthews, L., Woolhouse, M., 2005. New approaches to quantifying the spread of infection. Nat. Rev. 12 Mol. Cell Biol. 3, 529–536. 13 Nöremark, M., Håkansson, N., Lindström, T., Wennergren, U., Sternberg Lewerin, S., 2009a. Spatial and 14 temporal investigations of reported movements, births and deaths of cattle and pigs in Sweden. Acta Vet. 15 Scand. 51:37. 16 Nöremark M, Lindberg A, Vågsholm I, Sternberg Lewerin S. 2009b. Disease awareness, information 17 retrieval and change in biosecurity routines among pig farmers in association with the first PRRS outbreak 18 in Sweden. Prev. Vet. Med. 90, 1–9. 19 Nöremark, M., Håkansson, N., Sternberg Lewerin, S., Lindberg, A. Jonsson A., Network analysis of cattle 20 and pig movements in Sweden; measures relevant for disease control and risk based surveillance. 21 submitted. 27 1 Ortiz-Pelaez A, Pfeiffer D. U., Soares-Magalhães R.J., Guitian F.J., 2006. Use of social network analysis 2 to characterize the pattern of animal movements in the initial phases of the 2001 foot and mouth disease 3 (FMD) epidemic in the UK. Prev. Vet. Med. 75, 40–55. 4 Rweyemamu, M., Roeder, P., Mackay, D., Sumption, K., Brownlie, J., Leforban, Y., Valarcher, J.F., 5 Knowles, N.J., Saraiva, V., 2008. Epidemiological patterns of foot-and-mouth disease worldwide. 6 Transbound. Emerg. Dis. 55, 57–72. 7 Ribbens, S., Dewulf, J., Koenen, F., Mintiens, K., de Kruif, A., Maes, D., 2009. Type and frequency of 8 contacts between Belgian pig herds. Prev. Vet. Med. 88, 57–66. 9 Robinson, S.E., Christley, R.M., 2007. Exploring the role of auction markets in cattle movements within 10 Great Britain. Prev. Vet. Med. 81, 21–37. 11 Tildesley,M.J., Deardon, R., Savill, N.J., Bessell, P.R. Brooks, S.P., Woolhouse, M.E.J., Grenfell, B.T., 12 Keeling, M.J., 2008. Accuracy of models for the 2001 foot-and-mouth epidemic. Proc. Roy. Soc. London 13 B. 275, 1459–1468. 14 Velthuis, A.G., Mourits, M.C., 2007. Effectiveness of movement-prevention regulations to reduce the 15 spread of foot-and-mouth disease in The Netherlands. Prev. Vet. Med. 82, 262–281. 16 Vernon, M.C., Keeling, M.J., 2009. Representing the UK’s cattle herd as static and dynamic networks. 17 Proc. Roy. Soc. London B. 276, 469–476. 18 Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of ‘‘small-world’’ networks. Nature 393, 440–442. 19 20 Appendix A. Hierarchical priors 21 We implement hierarchical priors for π½ and πΏ, and denote these πΉ(ππΌπ½ |ππ½ ) and πΉ(π πΌπ½ |ππΏ ), respectively, 22 where ππ½ and ππΏ are vectors of unknown hyper-parameters. These vectors model the degree of similarity 28 1 between parameters and (if similarities are prominent) improves estimation of the π½ and πΏ when the data 2 is weak (e.g. few movements between types πΌ, π½). For π½ we use an inverse gamma distribution with hyper- 3 parameters πΌπ and π½π πΌ π½ π πΌ πΉ(ππΌπ½ |πΌπ , π½π ) = π (1⁄ππΌπ½ ) π π −π½π⁄ππΌπ½ . π€(πΌπ ) (A.1) 4 The probability density function of the inverse gamma distribution is defined for values larger than zero. 5 Since the lower limiting value of πΏ is 4/3 (a uniform distribution obtained for π → ∞, Lindström et al. 6 2008) we use a shifted inverse gamma distribution as hierarchical prior for πΏ and write πΌ πΉ(π πΌπ½ |πΌπ , π½π ) = π½π π πΌπ (1⁄(π πΌπ½ − 4⁄3)) π −π½π ⁄(π πΌπ½−4⁄3) . π€(πΌπ ) (A.2) 7 8 Appendix B. Indicator variables 9 To facilitate computations when sampling from posteriors associated with mixture distributions, a 10 common strategy is to introduce indicator variables (for example Gelman et al. 2004). With this approach, 11 equation 10 may be rewritten as π(π , π|π, π, πΉ, πΊ, π«, π, πΏ, π½, πΌ) = ∏ ∏ ∏[π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ )π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° )π(πΌ, π½|π, π, πΉ)] π‘ πΌ ππΌπ½π‘ , (B.1) π½ 12 where πΌ is a tensor of dimension πΎ × πΎ × π and ππΌπ½π‘ = 1 for exactly one combination of πΌ, π½ for each 13 movement π‘. The full posterior distribution of unobserved parameters π, πΏ, π½, π, π, πΌπ , π½π , πΌπ , π½π , πΌ is π(π , πΏ, π½, π, π, πΌπ , π½π , πΌπ , π½π , πΌ|πΉ, πΊ, π«, π , π) = π(π , π|π, π, πΉ, πΊ, π«, π, πΏ, π½, πΌ)πΉ(πΏ|πΌπ , π½π ) πΉ(π½|πΌπ , π½π )π(π)π(π)π(π)π(πΌπ )π(π½π )π(πΌπ )π(π½π ), (B.2) 14 where ,π(π), π(π), π(π), π(πΌπ ), π(π½π ), π(πΌπ ) and π(π½π ) are prior distributions of parameters and 15 hyper-parameters. For π and π (recall that the elements of these sum to one) we use uninformative 29 1 π·ππππβπππ‘(1,1, … ,1) priors. Priors π(π), π(πΌπ ) and π(πΌπ ) are set to be proportional to one on the support 2 of the parameters, while π(π½π ) and π(π½π ) are defined as uniform for π½ > 1. The inverse gamma 3 distribution does not have a finite mean for π½ < 1 and we assume that both π and π are finite quantities. 4 Incorporation of the unobserved indicator variable πΌ in the model also allows for posterior estimation of 5 the number of movements between production types. The posterior distribution of πΈ is calculated from the 6 posterior distribution of πΌ by ππΌπ½ = ∑ ππΌπ½π‘ . (B.3) π‘ 7 8 Appendix C. MCMC estimation 9 We use Markov chain Monte Carlo (MCMC) techniques to estimate the posterior distribution of the model 10 parameters. This involves the construction of a stochastic Markov chain with stationary distribution given 11 by the posterior distribution of interest. Given a current state of the chain, MCMC methods sequentially 12 update the parameters either individually or in blocks, based on the full posterior conditional distributions 13 of each parameter under the model. Repeating this procedure, and after the chain has converged, the states 14 of the chain represent (correlated) draws from the posterior distribution of model parameters. Two basic 15 updates are involved. If the conditional distribution of a parameter is of a standard form, Gibbs sampling 16 (see e.g. Casella and George 1992 for further details) may be used. If however the distribution is of non- 17 standard form, Metropolis-Hastings updates may be used (see e.g. Chib and Greenberg 1995 for further 18 details). In this case, parameter values π½∗ are proposed from a density function π(π½∗ |π½) and subsequently 19 accepted with probability πππ (1, π(π½∗ | … )π(π½∗ )π(π½|π½∗ ) ), π(π½| … )π(π½)π(π½∗ |π½) 30 (C.1) 1 where π½ and π½∗ are current and proposed parameter values, π(π½) is the prior and π(π½| … ) is the 2 likelihood evaluated at π½. Further information on MCMC methods can be found in Gamerman and Lopes 3 (2006). 4 All parameters except πΌ may be updated with Metropolis-Hastings steps. Parameter matrix π is only 5 involved in π(πΌ, π½|π, π, πΉ) of equation B.1, and so the resulting conditional posterior distribution of π is π πΎ πΎ π(π|π, πΌ, πΉ) ∝ ∏ ∏ ∏[π(πΌπ‘ , π½π‘ |π, π, πΉ) ]ππΌπ½π‘ π(π). (C.2) π‘=1 πΌ=1 π½=1 6 πΎ ππΌπ½π‘ The prior, π(π), is proportional to 1 and the distribution ∏ππ‘=1 ∏πΎ is given in πΌ=1 ∏π½=1[π(πΌπ‘ , π½π‘ |π, π, πΉ) ] 7 Lindström et al. (2010) as πΏ1 = ππ’ππ‘πππππππ(π1,1 , π1,2 , … π2,1 , π2,2 … ππ,π−1 , ππ,π |π1,1 , π1,2 , … π2,1 , π2,2 … ππ,π−1 , ππ,π ), (C.3) 8 where ππΌ,π½ = π(πΌ, π½|π, π, πΉ) is given in equation 3, ππΌ,π½ = ∑π‘ ππΌπ½π‘ for all πΌ, π½. Dirichlet distributions were 9 used for proposals. More details may be found in Lindström et al. (2010). 10 Dirichlet proposals were also used for updates of π, which is included in each distribution 11 π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ ), π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° ) and π(πΌ, π½|π, π, πΉ). For simplicity we write π πΎ πΎ πΏ2 = ∏ ∏ ∏ [ π‘=1 πΌ=1 π½=1 12 π£Μπ πΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) ππΌπ½π‘ ] ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) (C.4) and π πΎ πΎ π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) ππΌπ½π‘ πΏ3 = ∏ ∏ ∏ [ ] ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) , π ≠ π , (C.5) π‘=1 πΌ=1 π½=1 13 and here the conditional posterior distribution for π is π(π |π, π, πΉ, πΊ, π«, πΏ, π½, πΌ, π±, π°) = πΏ1 πΏ2 πΏ3 π(π). (C.6) 14 The elements of πΜ and πΜ were updated separately using Gaussian random walk proposal distributions. The 15 conditional posterior distribution of πΜπ’πΌ (π’ = 1,2) is 31 π(πΜπ’πΌ |πΜπ’ΜπΌ , π, πΌ, πΉ, πΌ, πΊ, π) π πΎ ππΌπ½π‘ π£Μπ πΌ πΊ(ππ’π , ππ’Μπ , πΜπ’πΌ , πΜπ’ΜπΌ ) = (∏ ∏ [ ] ∑π π£ΜππΌ πΊ(ππ’π , ππ’Μπ , πΜπ’πΌ , πΜπ’ΜπΌ ) ) π(πΜπ’πΌ ) (C.7) π‘=1 π½=1 1 where π’Μ = 2 for π’ = 1 and π’Μ = 1 for π’ = 2. Similarly, the conditional posterior distribution of πΜπ’π½ is π(πΜπ’π½ |πΜπ’Μπ½ , π, π, πΉ, πΊ, π«, πΏ, π½, πΌ, π±) π πΎ = ∏∏[ π‘=1 πΌ=1 π£ΜππΌ πΊ(ππ’π , ππ’Μπ , πΜπ’π½ , πΜπ’Μπ½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) ππΌπ½π‘ ] ∑π π£ΜππΌ πΊ(ππ’π , ππ’Μπ , πΜπ’π½ , πΜπ’Μπ½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) π(πΜπ’π½ ), (C.8) π ≠ π . 2 Joint updates of πΏ and π½ were performed separately for each combination of πΌ, π½ using multivariate 3 Gaussian random walk on the logarithm of π πΌπ½ and ππΌπ½ with proposals from a multivariate normal 4 distribution. The joint conditional distribution of π πΌπ½ , ππΌπ½ is π(π πΌπ½ , ππΌπ½ |πΜ π± , π, π, πΉ, πΊ, π«, πΌ, π±) π π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) ππΌπ½π‘ = ∏[ ] ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) π‘=1 π ≠ π . (C.9) πΉ(π πΌπ½ |πΌπ , π½π ) πΉ(ππΌπ½ |πΌπ , π½π ), 5 As the gamma distribution is the conjugate prior for the inverse gamma distribution we may use a Gibbs 6 update of the π½ parameter of the hyper-prior πΉ. A gamma distribution can however not be completely 7 uninformative, and therefore we choose to utilize Metropolis-Hastings updates for both πΌ and π½. In order 8 to improve the mixing we updated the parameters of the hierarchical priors five times for every update of 9 the other parameters (e.g. Fan et al. 2010). The posterior distribution of πΌπ , π½π (π = π, π ) is π(πΌπ |π½) = πΉ(π½|πΌπ , π½π ) π(πΌπ ) π(π½π |π½) = πΉ(π½|πΌπ , π½π ) π(π½π ) (C.10) 10 with πΉ given by equations A.1 (for π½ = π½) and A.2 (for π½ = πΏ). 11 As the indicator variable ππΌπ½π‘ = 1 for exactly one combination of πΌ, π½, then πΌ may be updated with Gibbs 12 sampling by drawing one random number for each π‘ from a multinomial distribution with probabilities 13 given by 32 ππ(ππΌπ½π‘ = 1) π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ )π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° )π(πΌ, π½|π, π, πΉ) = . ∑π ∑π π(ππ‘ |π, πΉ, π π‘ , π, π, πΊ, π«, πΜ π , π ππ , ππΌπ½ )π(π π‘ |π, πΉ, π, πΊ, πΜ π )π(π, π|π, π, πΉ) (C.11) 1 2 Figure Legends 3 Figure 1. Examples of spatial kernel with (a) kurtosis =3.33 and variance=1000 (dashed), 10000 (solid), 4 100000 (dotted) (km2) and (b) variance=1000000 (km2) and kurtosis =2 (dashed), 4 (solid), 8 (dotted). 5 Embedded axis’ shows same as major axes but with logarithmic y-axis and larger distances included. 6 7 Figure 2. Cumulative distribution of observed (solid line) and predicted (dotted line) distances of live pig 8 movements between holdings. Note that the x-axis is on the log scale. 9 10 Figure 3. Coefficients of (panels a and b) explanatory variables production type (0 if holding had not 11 reported the production type, 1 if reported) and (panel c) the combination of production type and size class 12 (S=small, M=medium, L=large) analyzed by GLM. Response variable was (a) the number of first, second 13 and third degree infections caused by a holding if infected, (b) number of first degree infections at 14 distances longer than 10, 100 and 500 km and (c) the number of first degree infections (but with size 15 classes included in the explanatory variables). Note that (a) and (b) are the result of three separate analyses 16 (each with 8 explanatory variables) while (c) is the result of one analysis. Legend abbreviations: 17 SPC=Sow pool centers, MH=Multiplying herds, NH=Nucleus herds, PP=Piglet producers, SPS=Sow pool 18 satellites, FF=Farrow-to-finish, FH=Fattening herds, MI=Missing information. 33