1 Title: 2 Bayesian analysis of animal movements related to factors at herd and between herd levels: 3 Implications for disease spread modeling. 4 Authors: Tom Lindströma, Scott A. Sissonb, Susanna Stenberg Lewerinc and Uno Wennergrena 5 a IFM Theory and Modelling, Linköping University, 581 83 Linköping, Sweden 6 b School of Mathematics and Statistics, University of New South Wales, Sydney 2052, Australia 7 8 c 9 Corresponding Author: Department of Disease Control and Epidemiology, SVA, National Veterinary Institute, 751 89 Uppsala, Sweden 10 Uno Wennergren 11 Tel: +46 13 28 16 66 12 Fax: +46 13 28 13 99 13 Email: unwen@ifm.liu.se 14 Correspondence address: See above. 15 16 Key words 17 Markov Chain Monte Carlo; Hierarchical Bayesian; Mixture models; Indicator variable; Animal 18 databases; Animal movements; Contact structure 19 20 Abstract 21 A method to assess the influence of between herd distances, production types and herd sizes on patterns of 22 between herd contacts is presented. It was applied on pig movement data from a central database of the 23 Swedish Board of Agriculture. To determine the influence of these factors on the contact between 1 1 holdings we used a Bayesian model and Markov chain Monte Carlo (MCMC) methods to estimate the 2 posterior distribution of model parameters. The analysis showed that the contact pattern via animal 3 movements is highly heterogeneous and influenced by all three factors, production type, herd size, and 4 distance between holdings. Most production types showed a positive relationship between maximum 5 capacity and the probability of both incoming and outgoing movements. In agreement with previous 6 studies, holdings also differed in both the number of contacts as well as with what holding types contact 7 occurred with. Also, the scale and shape of distance dependence in contact probability was shown to differ 8 depending on the production types of holdings. 9 To demonstrate how the methodology may be used for risk assessment, disease transmissions via animal 10 movements were simulated with the model used for analysis of contacts, and parameterized by the 11 analyzed posterior distribution. A Generalized Linear Model showed that herds with production types Sow 12 pool center, Multiplying herd and Nucleus herd have higher risk of generating a large number of new 13 infections. Multiplying herds are also expected to generate many long distance transmissions, while 14 transmissions generated by Sow pool centers are confined to more local areas. We argue that the 15 methodology presented may be a useful tool for improvement of risk assessment based on data found in 16 central databases. 17 18 1. Introduction 19 In order to understand the process of disease spread between animal holdings, researchers are increasingly 20 studying the contact patterns that contribute to the transmission (e.g. Velthuis and Mourits 2007, Dubé et 21 al. 2009, Nöremark et al. 2009a, Vernon and Keeling 2009, Lindström et al. 2010). Analysis of the contact 22 pattern allows for making predictions about disease spread (Kao et al. 2007) as well as testing the effects 23 of changes in its contact structure (Velthuis and Mourits 2007). Different diseases may be transmitted via 24 different paths, but generally between-holding movements of animals may be regarded as the main risk 2 1 factor for transmission of livestock diseases (Févre et al. 2006, Ortiz-Pelaez et al. 2006, Rweyemamu et al. 2 2008). 3 In this paper we present a methodology to transform data of between holding contacts into probability 4 distributions useful for predictions and direct analysis. We analyze pig movements in Sweden and show 5 how it is possible to assess the influence of several factors on the contact pattern, specifically production 6 type, number of animals and distances between holdings. For the latter, it has been shown that contacts 7 between holdings are more common at short distances (Boender et al. 2007, Robinson and Christley 2007, 8 Lindström et al. 2009, Ribbens et al. 2009). Epidemiological studies often describe the probability of 9 transmission dependent on distance with a spatial kernel (Keeling et al. 2001, Tildesley et al. 2008), and 10 Lindström et al. (2009) showed that a good description of contact probabilities at both short and long 11 distances are important. As in Lindström et al. (2009) we characterize the spatial kernel by its two 12 dimensional variance (quantifying the scale) and kurtosis (quantifying the shape). A high value of the 13 kernel variance means that contacts occur more frequently at longer distances. A high value of kurtosis 14 indicates that contacts are frequent at short distances, but concurrently long distance contacts, represented 15 by a fat tail of the kernel, are also common. From ecological research it is known that leptokurtic kernels 16 (i.e. kernels with higher kurtosis than a normal distribution) may be the result of heterogeneity in the 17 dispersal processes (see Hawkes 2009 and references therein). Hence, kurtosis may be interpreted as a 18 measure of the heterogeneity of the distance related contact probability. Figure 1 shows spatial kernels 19 with different combinations of 2D-variance and kurtosis. More frequent contacts at shorter distances result 20 in spatially clustered contact patterns (Keeling 1999) which lead to depletion of local susceptibles and a 21 rapid decline in the reproductive ratio (Keeling 2005). Hence, such dynamics are expected from disease 22 transmission where contacts are estimated to be described by a kernel with a small variance. A large value 23 of kurtosis also results in such patterns, but since long distance contacts are also common, the number of 24 contacts needed for connecting otherwise distant holdings is reduced. From theory of small-world 3 1 networks (Watts and Strogatz 1998) it is known that such contacts may have substantial impact on the 2 dynamics of an outbreak. 3 Contact heterogeneity due to production types has been reported and shown to be important for the 4 dynamics of outbreaks (Dickey et al. 2008, Ribbens et al. 2009, Lindström et al. 2010). The production 5 type of a holding may be expected to influence both the number of animal movements as well as which 6 other holdings animals are moved from/to. For the latter case, clustering patterns may be expected, similar 7 to contact heterogeneities found due to spatial clustering. However, unlike the spatial factor, directional 8 differences may be expected with animal movements being more common from production type A to type 9 B than from B to A. Holdings of some production types also have many contacts (i.e. trade many animals) 10 which means that if infected, these holdings may cause a large number of secondary infections and 11 thereby function as super spreaders (Matthews and Woolhouse 2005). 12 The number of animals on a holding is also expected to influence the contact pattern. Typically, larger 13 holdings are expected to trade more animals and hence have more contacts (Ribbens et al. 2009), making 14 such holdings candidates as super spreaders. However, differences may be expected between production 15 types. For example, the frequency of live animal movements to and from holdings with farrow-to-finish 16 production might not be strongly dependent on the number of animals kept on the premises, since this 17 production type includes both piglet producing units and fattening units on the same holding. 18 Commonly, between-holdings contacts are studied with network analysis (Bigras-Poulin et al. 2008, 19 Brennan et al. 2008, Nöremark et al. In press). Such studies provide good quantitative measures of the 20 observed structure and may also provide an understanding of the contact patterns and the dynamics of 21 disease transmission through these contacts (Keeling 2005, Kao et al. 2007). It may however be difficult 22 to parameterize a model from network measures. Simulation models in a network context are usually 23 confined to resampling observed contacts (e.g. Vernon and Keeling 2009). While the method presented in 24 this paper addresses many of the same questions as network studies, it utilizes (hierarchical) Bayesian 25 models and builds on methodology presented in Lindström et al. (2009) and Lindström et al. (2010). In 4 1 Lindström et al. (2009) a method was presented for estimation of distance-related probability of contacts, 2 but it was there assumed that all holdings are identical. Lindström et al. (2010) introduced a method that 3 analyzed the contact patterns based on production types, but other factors were excluded. In this paper we 4 present a model that describes contact via live animal movements between holdings where the probability 5 of contacts depends on production types, the number of animals at each holding and distance between 6 holdings. We estimate the posterior distribution of model parameters with Markov chain Monte Carlo 7 (MCMC) methods and utilize data found in central databases of animal movements. EU members and 8 some other states (e.g. Australia and New Zealand) are required to keep databases on all livestock 9 holdings and register all movements of pigs and cattle, which means that such data may be available for 10 analysis. The level of detail included in the databases does however vary between countries. While data 11 quality is a problem (Nöremark et al. 2009a, Lindström et al. 2010), analysis of such data allows for 12 investigation of large scale trends in the contact patterns. One should however be aware of its limitations 13 when drawing conclusions from the analyses. 14 The aim of this paper, and the method presented, is to use a probabilistic model to investigate how the 15 contact pattern is influenced by distance between holdings, herd sizes and production types. Our aim is 16 also to investigate how the influence of distance and herd size dependence differ between holdings of 17 different production types. We also aim to show how the analyzed contact pattern can be used in risk 18 assessment. 19 20 2. Material and method 21 2.1 Data 22 The data used was supplied by the Swedish Board of Agriculture. Due to legal requirements, the analysis 23 was performed on encoded data such that the ID number of specific holdings or names of farmers could 24 not be retrieved. This prohibited the tracing of potentially unexpected contacts. Holdings that were 5 1 considered to be inactive were removed, as well as holdings that did not have spatial coordinates (see 2 Nöremark et al. 2009a for more details on this). A total number of 3084 holdings and 20231 movements 3 (carried out from July 2005 until June 2006) were included in the analysis. Movements to slaughterhouses 4 were not included in the analysis. 5 Data included the maximum capacity (i.e. the reported maximum number of animals that could be kept on 6 the premises) of each holding, recorded separately for sows and fattening pigs. If maximum capacity of a 7 demographic group was missing in the database, it was assumed that the maximum capacity was zero. 8 Such entries were mostly found for holdings with production types that are expected not to have animals 9 of that demographic group and we found that maximum capacity equal to zero was rarely entered in the 10 database. In addition, previous studies (Nöremark et al. 2009b) has shown a better consistency between 11 larger holdings and the entries in the database, indicating that while 0 may not be accurate in some 12 instances, a low rather than a high number is to be expected. Seven production types were included in the 13 study: Sow pool centers, Sow pool satellite, Farrow-to-finish, Nucleus herd, Piglet producer, Multiplying 14 herd and Fattening herd. When reported by the farmer, the form has an option for free text. Holdings that 15 only had this information entered were placed in a group denoted “Missing information”. Note that when 16 we use this term, we only refer to missing information about the production type and that farmers may still 17 have reported location and herd sizes. For more details on the included production types, the pig farming 18 structure of Sweden and how the data is entered in the data base, see Nöremark et al. (2009a) and 19 Lindström et al. (2010). 20 21 2.2 Model and parameter estimation 22 Data include holding production types, and this information is included in the model by a matrix πΉ of size 23 π × πΎ, where π is the number of holdings and πΎ is the number of production types (including the artificial 24 type Missing information). We denote π ππ = 1 if holding π has production type π and π ππ = 0 otherwise, 6 1 for π = 1, … , πΎ and π = 1, … , π. Data also includes spatial coordinates of holdings and these are 2 translated into a distance matrix, π« of dimensions π × π, where π·ππ is the Euclidean distance between 3 holdings π and π. Herd sizes of pig holdings are measured by the maximum capacity of fattening pigs and 4 sows, and these are denoted πΊπ and πΊπ (both vectors with π elements), respectively. When we refer to 5 either of these demographic classes we write πΊ, and ππ’π refers to size π’ (π’ = 1,2) of holding π. We use 6 notation such that each movement, π‘ (π‘ = 1,2, … , π, where π is the number of movements), has a start 7 holding π π‘ and destination holding ππ‘ and vectors π and π (both with π elements) refers to all start and 8 destination holdings. 9 10 2.2.1 Weight on production types 11 We want to estimate how contact probabilities depend on the factors herd size, production type and 12 distance between holdings. As in Lindström et al. (2010), we assume that holdings with more than one 13 type will behave as some mixture of each type, and rather than assuming that a holding will behave as an 14 equally weighted mixture of each type, we estimate how much different production types will determine 15 the behavior of a holding. This is estimated with a parameter vector π with πΎ − 1 elements (see below for 16 explanation of this) where ∑π π£π = 1. A high value of π£π indicates that production type π has a large 17 influence on the contact pattern of a holding that has reported this type concurrently with other types. 18 A holding π is assumed to consist of a proportion π£Μππ of each production type π, and is determined by π 19 and πΉ through π ππ π£π for π ≠ π ∑π≠π π ππ π£π } if π ππ = 0 π£Μππ = 0 for π = π π£Μππ = 7 (1) π£Μππ = 0 for π ≠ π } if π ππ = 1 π£Μππ = 1 for π = π 1 where π ππ = 1 if holding π has type π and π ππ = 0 otherwise. Production type π is an artificial type 2 introduced for holdings with missing information. This is never shared with any other type (as it then 3 would not be missing) and is therefore excluded from π. Hence, π has πΎ − 1 rather than πΎ elements. It is 4 possible to formulate a model where the missing production types of holdings are estimated by writing a 5 joint distribution of parameter estimates and unobserved production types. However, as the farmers of 6 these holdings have chosen not to report any of the seven included production types, we may not assume 7 that these holdings in fact are of any of these types. We rather interpret that this group mainly contains 8 holdings that for different reasons does not fit into any of the listed types. We therefore include them in 9 the analysis as an artificial type and expect the holdings included in the group to be heterogeneous. 10 11 2.2.2 Dependence on production types 12 Also, as in Lindström et al. (2010), dependence on production type was modeled with a parameter matrix 13 π, of dimensions πΎ × πΎ with ∑πΌπ½ βπΌπ½ = 1, where a high (low) value of βπΌπ½ indicates that movements from 14 production types πΌ to π½ are common (rare). The estimation of π takes into account that some production 15 types are more common than others and the elements are referred to as commonness indices. We expand 16 the analysis of Lindström et al. regarding production types and also give estimates of the absolute number 17 of movements between production types and refer to this as πΈ, of dimensions πΎ × πΎ, where ππΌπ½ are the 18 estimated number of movements from type πΌ to π½, however taking into account that herds often are 19 reported to have more than one type. Some production types are reported much more frequently than 20 others and hence the estimates of πΈ and π provide different and complementary insight to the contact 21 pattern and the role of different holdings in a potential disease outbreak. 22 2.2.3 Herd size dependence 8 1 Dependence on sizes πΊπ and πΊπ was modeled as a power function with parameters πΜ and πΜ , both of 2 dimension 2 × πΎ, where πΜπ’π and πΜπ’π (π’ = 1,2, corresponding to sizes πΊπ and πΊπ , respectively, and π = 3 1, … , πΎ) is the size dependence of type π for sending and receiving contacts, respectively. In the 4 following, we use notation π where we wish to refer to either πΜ or πΜ . If ππ’πΌ = 0, there is no size 5 dependence for size π’, and ππ’πΌ < 0 (ππ’πΌ > 0) indicates a negative (positive) relationship between size and 6 contact probability. For ππ’πΌ = 1, there is approximately a linear relationship such that e.g. twice as many 7 animals results in a twice as high probability of contacts. 8 9 2.2.4 Distance dependence 10 Distance dependence is modeled with a spatial kernel, which may be characterized by its variance (π) and 11 kurtosis (π ), measuring the scale and shape, respectively (Lindström et al. 2008, 2009). We assume a 12 rotationally symmetric distribution and define the 2-D variance (in the below, this is what we refer to by 13 variance) as the second moment around zero (i.e. raw moment) of the radial distance. Kurtosis is 14 analogously defined by the fourth raw moment divided by the square of the second raw moment, 15 following suggestion from Clark et al. (1999). Contact probabilities dependent on distance between 16 holdings may differ depending on production types and we therefore estimate ππΌπ½ and π πΌπ½ (indicating 17 elements of matrices π½ and πΏ, respectively, of dimensions πΎ × πΎ) for every combination of πΌ, π½. 18 However, the underlying processes (e.g. economical and social) are not completely different. For such 19 systems it is suitable to use a hierarchical model (Gelman et al. 2004) where the parameters, in this case 20 the elements of π½ and πΏ, have a hierarchical prior with a set of unknown hyper-parameters. This approach 21 has the benefit that it improves the estimation of parameters where the data is weak, a concept known as 22 “borrowing strength”. If it may be argued a priori that the parameters are not completely unrelated, then 23 the estimation of one parameter may be informed by the estimation of other parameters. If there is in fact 9 1 little similarity between the parameters, the hierarchical prior will have little influence on the estimations 2 of π½ and πΏ, indicating that there is little similarity between the parameters. 3 In Lindström et al. (2009) it was shown that data was better estimated when movements were modeled as 4 arising from a mixture of distance dependent and mass action mixing (MAM) processes. In that study all 5 other factors were excluded and a single kernel was used to describe all contacts. To simplify the model 6 and reduce the number of parameters in this study we exclude the MAM part. As we here include other 7 factors and let ππΌπ½ and π πΌπ½ be different for different production types πΌ, π½ we assume that the model can 8 account for factors that that could not be estimated with a single spatial kernel. To test these assumptions 9 we visually compare the predicted and observed movement distances (see below). 10 11 2.2.5 Contact probability model 12 We used a model formulation π(ππ‘ , π π‘ |π½) = ∑ ∑ π(ππ‘ |π π‘ , πΌ, π½, π½π )π(π π‘ |πΌ, π½π )π(πΌ, π½|π½π ), πΌ (2) π½ 13 where π½π , π½π and π½π are subsets of π½ and refers to particular sets of parameters, yet to be defined. To 14 clarify, we use the indication with π½ here to give a more transparent description of the general outline of 15 the model. Equation 2 should be interpreted such that for movement π‘, the probability of destination 16 holding ππ‘ is conditional on the start holding π π‘ and the production types of π and π, denoted πΌ and π½ 17 respectively. Start holding π π‘ is conditional on production type πΌ of the start holding. The joint distribution 18 π(ππ‘ , π π‘ | … ) is a mixture distribution and (since holdings may have more than one type) the probability is 19 summed over all types with ∑πΌ ∑π½ π(πΌ, π½|π½π ) = 1. We use a probability function π(πΌ, π½|π½π ) = 20 π(πΌ, π½|π, π, πΉ) as introduced in Lindström et al. (2010) 10 π(πΌ, π½|π, π, πΉ) = 1 ΜπΌ π ΜΜπ½πΌ βπΌπ½ π ΜπΌ π ΜΜπ½πΌ ∑πΌ ∑π½ βπΌπ½ π , (3) where conditionally on π and πΉ we define π£Μ ΜπΌ = ∑ π£ΜππΌ , π ΜΜπ½πΌ = ∑ π£Μππ½ (1 − ππΌ ), π ΜπΌ π π (4) π 2 ΜπΌ and π ΜΜπ½πΌ may be interpreted as measurements of the where π£Μππ is given by equation 1. The quantities π 3 amount of each production type at a holding, taking into account that holdings may not be of only one 4 ΜΜπ½πΌ is adjusted to account for exclusion of type (if more than one type is reported). The quantity π 5 movements ending up at the same destination as the start holding. 6 The distribution of π π‘ conditional on type πΌ (i.e. π(π π‘ |πΌ, π½π ) of equation 2) is modeled as π(π π‘ |π, πΉ, πΌ, πΊ) = π£Μπ πΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) , ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) (5) 7 where πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) is a function describing dependence on sizes π1π , π2π of holding π, and πΜ1πΌ , πΜ2πΌ 8 are the parameters determining the size dependence of sizes π1π , π2π , respectively, for production type πΌ. 9 As in the FMD model presented in Tildesley et al. (2008), we assume that contact probability dependence 10 on herd size may be modeled as a power function and we write πΜ1πΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) = (π1π + 1) πΜ2πΌ (π2π + 1) . (6) 11 We use π1π + 1 rather than just π1π to avoid π(π |π, πΉ, πΌ, πΊ) = 0 if ππ’π = 0 for any π’ = 1,2 (e.g. Fattening 12 herds are expected to have no sows). 13 The probability distribution of π conditional on types πΌ, π½ and start holding π π‘ (i.e. π(ππ‘ |π π‘ , πΌ, π½, π½π ) of 14 equation 2) is dependent on both sizes, πΊ, and distances, π« and given by 11 π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ ) = π£Μππ½ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) , ∑π π£Μππ½ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) (7) 1 for π ≠ π . The destination of a movement may not be the same holding as the start, 2 π(π π‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ ) = 0. Recall that πΜ is the equivalent of πΜ but used for modeling of 3 contact probability of incoming movements. The function πΉ is used for modeling of dependence of 4 between herd distance. As in Lindström et al. (2009), a generalized normal distribution is used. We write πΉ(π·ππ , π πΌπ½ , ππΌπ½ ) = π·ππ π −( ) π π , (8) 5 where the relationships between π, π and π, π are given for two dimensional kernels in Lindström et al. 6 (2008) as 4 6 2 π€( ) π€( )π€( ) π π π . π=π ,π = 2 2 4 π€( ) π (π€ ( )) π 2 (9) 7 For continuous functions, equation 8 is normalized by 2ππ2 π€(2⁄π)⁄π. This cancels out in equation 7 and 8 normalization is instead performed by summation of the functions over all possible destination holdings 9 (see equation 7). When incorporated in this manner, π is the parameter that will have the highest influence 10 on the disease spread dynamic (Lindström et al. In press) but π also provides important information. Also, 11 it is difficult to a priori determine π and erroneous assumptions may result in erroneous estimations of π. 12 Writing the full formulation of the joint probability distribution of ππ‘ and π π‘ (equation 2) we get π(ππ‘ , π π‘ |π, π, πΉ, πΊ, π«, π, πΏ, π½) = ∑ ∑ π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ )π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° )π(πΌ, π½|π, π, πΉ). πΌ (10) π½ 13 To improve estimation of parameters πΏ and π½ we implement hierarchical priors as described in Appendix 14 A. In particular, this allows for improved estimation of parameters where the data is weak, i.e. where few 12 1 movements are recorded between the production types. Estimation of parameters is performed with 2 MCMC, and an indicator variable is introduced to aid computation (see Appendix B). This indicator 3 variable is also used to calculate the posterior distribution of the number of transports between production 4 types, denoted πΈ, of dimensions πΎ × πΎ, where ππΌπ½ refers to the estimated number of movements from 5 production type πΌ to π½. Note that πΈ is introduced because many holdings have several production types, 6 and therefore the exact numbers of movements between holdings of different production types are 7 unobserved. 8 9 2.2.6 Comparing observed and predicted movement distances 10 As we altered the model of Lindström et al. (2009) and removed the MAM part of the spatial kernel, we 11 compared the observed movement distances with the predictions under the model presented above. Note 12 that while this is a simplification of the kernel function, we are adding complexity by using a different 13 kernel for every combination of production types of the sending and receiving holding. The predicted 14 distances were obtained by generating π animal movements with the model described in section 2.2.1. 15 Two thousand replicates were generated; each parameterized by a random draw from the posterior 16 distribution (based on the MCMC output) and the mean cumulative distribution was calculated and 17 compared to the cumulative distribution of observed distances. 18 19 2.3 Simulation 20 To demonstrate how the analyzed contact pattern may be used for risk assessment we performed a 21 simplistic simulation with holdings in the analyzed database as infective units. The aim was to study the 22 effect of the observed contact pattern and not provide estimates for any specific disease. All other contacts 23 between holdings were excluded as well as intra-herd dynamics and recovery. Hence, an infected holding 13 1 remained infectious for the entire time of the simulation. Simulations were initiated with one randomly 2 infected holding and we simulated posterior predictive movements with probabilities given by equation 10 3 and parameterized the model with random draws from the posterior distribution. If a movement was 4 simulated from an infected holding A to a susceptible holding B, the latter was assumed to become 5 instantaneously infected and any subsequent movements from B to a susceptible holding C were assumed 6 to result in transmission. Movements between two already infected holdings were not assumed to generate 7 a new infection. 8 We ran the simulation 1000 times and simulated πΜ = 40462 movements for every replicate. For each 9 replicate and infected holding we recorded the number of first, second and third degree infections. These 10 are defined such that if holding A infects B, this is a first degree infection of A. If B subsequently infects 11 C, this is a second degree infection of A and if C later infects D, this is a third degree infection of A. The 12 number of infections caused by every infected holding were recorded after 1556 movements, which 13 assuming a constant rate of movements corresponds to a period of four weeks (as 20231 holdings was 14 analyzed for the one year period). The analysis presented in Nörmark et al. 2009a suggests that there is in 15 fact little seasonal variation in the number of pig movements. Long distance transmissions are often of 16 particular interest and therefore we also recorded the number of first degree infections caused by 17 movements longer than 10, 100 and 500 km. 18 We analyzed the results using Generalized Linear Models (GLM). Since the response variable was 19 measured as the number of infections, which are natural numbers (0, 1, 2…), we used a Poisson error 20 distribution with a log link function. The response variable π was a vector of π elements where π = 21 ∑π π€π and π€π is the total number of holdings for replicate π (π = 1, … ,1000) infected earlier than 22 movement πΜ − 1556. Holdings infected later were excluded as the number of infections after 1556 23 consecutive movements could not be recorded. For most analyses we focus on production types only and 24 for this we used a dummy variable πΏ, of dimension π × πΎ, as predictor. The elements ππΜπ = 1 if holding 25 πΜ (πΜ = 1, … π) was reported to have production type π. 14 1 To demonstrate the effect of the maximum capacity we divided the holdings into classes, denoted small, 2 medium and large, respectively. We used the definitions given in Nöremark et al. (2009b) such that a 3 holding is small if the maximum capacity of sows < 15 and slaughter pigs < 300, large if sows > 299 or 4 slaughter pigs > 4999 and is otherwise medium. Hence, 3πΎ combinations of production types and size 5 classes were obtained and similarly to πΏ, we used a dummy variable π, of size π × 3πΎ, as predictor 6 where ππΜπ§ = 1 if holding πΜ is reported to belong to combination π§ (π§ = 1, … ,3πΎ). Note that while a 7 holding may have more than one production type, it will always belong to exactly one size class in this 8 analysis of the simulation. No holding with “Missing information” was classified as large and so this 9 combination was excluded from the analysis. 10 Since we simulate a large number of replicates and each replicate may include many infected holdings, 11 significance is less relevant. We instead look at the coefficients of the parameters given by the GLM. A 12 large positive value should be interpreted such that if an infected holding is reported to belong to a 13 production type (or type and size class in the analysis where the latter is included), it is expected to 14 generate a large number of new infected (by the analyzed degree) holdings. A large negative value means 15 that holdings with the type are expected to generate few infections. 16 All programs for analysis and simulation was written in and implemented in MatLab 7.8. 17 18 3. Results 19 3.1 Parameter estimates of contact probabilities 20 3.1.1 Weight of production types, π 21 Table 1 shows estimates of π, modeling dominance of production types in determining the contact pattern 22 of a holding. The highest value was estimated for Sow pool centers and the lowest for Fattening herds. 15 1 2 3.1.2 Movements between production types, π and πΈ 3 Table 2 lists estimated values of the most common movement, defined by either π or πΈ. The estimated 4 values of commonness indices, π, showed large resemblance to the estimates given in Lindström et al. 5 (2010). The five highest mean estimates were found for (in decreasing order) movements from 6 Multiplying herds to Sow pool centers, Nucleus herds to other Nucleus herds, Nucleus herds to 7 Multiplying herds, Sow pool centers to Sow pool satellites and Sow pool satellites to Sow pool centers. 8 Estimates of πΈ showed that animals were most frequently moved between (in decreasing order) Piglet 9 producers to Fattening herds, Sow pool satellites to Fattening herds, Multiplying herds to Piglet producers, 10 Farrow-to-finish to Fattening herds, Sow pool center to Sow pool satellite, Sow pool satellite to Sow pool 11 center and Multiplying herds to Farrow-to-finish. We list the top 7 rather than five to avoid the false 12 connotation that movements from Sow pool center to Sow pool satellite is more frequent than Sow pool 13 satellite to Sow pool center. These were ranked as number 5 and 6 and this ranking only differed by one 14 movement. Also we include estimates for the seventh highest, Multiplying herds to Farrow-to-finish, as 15 this showed the highest estimates for kernel variance (see below). 16 17 3.1.3 Size dependent parameters, π 18 Table 3 shows estimates for parameters π, determining how the maximum capacity of the holdings 19 influences the contact pattern. Most estimates (23 of a total 32) showed a clear positive relationship 20 between size and contact probabilities (i.e. π = 0 is not included in the 95% central credibility interval) 21 while 7 estimates showed a negative relationship. Of the negative relationships, 3 were found for estimates 22 of the artificial type “Missing information”. 23 16 1 3.1.4 Distance related parameters, π½ and πΏ 2 We only include the estimates of π½ and πΏ for the most common movements (given above) and list these in 3 Table 2. Variance of the spatial kernel, π, is a measure of the scale at which contacts occur and a high 4 value of ππΌπ½ indicates that long distance movements are common from holdings of production type πΌ to π½. 5 The highest estimate was found for movements from Multiplying herds to Farrow-to-finish holdings and 6 the lowest was found for movements between Sow pool centers. 7 Kernel kurtosis, πΏ, is a measure of the difference in movement distances. A high value indicates that there 8 are many short distance movements but concurrently many at long distance, and a low value indicates that 9 movement distances are mode uniform. In Lindström et al. (2009), where differences in production types 10 were excluded, the kernel kurtosis was estimated at 32.6 with 95% central credibility interval (29.2, 36.4). 11 Of the 64 estimates of π πΌπ½ in this study, 49 showed strong evidences for lower values (non overlapping 12 central credibility intervals) and 2 showed strong evidence for higher. 13 Figure 2 illustrates the cumulative distribution of observed and predicted movement distances. 14 15 3.2 Simulation of disease transmission 16 Figure 3 shows the coefficients estimated by the GLM’s as described in section 2.3. Large positive values 17 indicate that herds with this characteristic are to generate a large number of new infections and negative 18 values indicate that a herd is expected to generate few transmissions. Error bars were small and is 19 excluded in the picture for clarity. Sow pool centers, Nucleus herds, Sow pool satellites and Fattening 20 herds were estimated to have increased (relative to other production types) risk of generating large number 21 of new infections when higher degree infections are accounted for (Figure 3a). However, apart from the 22 herds in the group Missing information, the Fattening herds had the lowest coefficient also for higher 23 degrees and are still considered low risk herds. Piglet producers were estimated to generate fewer 17 1 infections when accounting for higher degree infections. Sow pool centers and satellites as well as 2 holdings of the artificial type Missing information were estimated to generate fewer new infections when 3 studying long distance transmissions (Figure 3b). Holdings with Multiplying herds and Farrow-to-finish 4 were estimated to have higher risk of long distance transmission. 5 For most production types, larger holdings were estimated to have a higher risk of causing new infections 6 (Figure 3c). No holdings with type Missing information were classified as large, but the risk of infecting 7 other holdings were estimated to be lower for Medium than Small holdings. Also, Large Sow pool centers 8 had a lower coefficient than Medium holdings reported with the same production type. When observing 9 the data we found that no large holding reported with production type Sow pool centers were reported as 10 only this type. 11 12 4. Discussion 13 In this paper we have presented a hierarchical Bayesian model for analysis of contacts between holdings, 14 applied to movements of pigs in Sweden. The analysis revealed a highly heterogeneous contact structure. 15 Holdings of different production types were estimated to differ in the number of contacts and with what 16 other production types the contacts occurred. This was demonstrated both in the estimates of commonness 17 indices, π, as well as the estimated absolute number of movements, πΈ. These two measures provide 18 somewhat different information about the contact pattern. Whereas π takes into account how frequent the 19 production types are, πΈ does not. For example, Farrow-to-finish herds and Fattening herds are both 20 common production types and while there are overall many transports between herds with the former to 21 the latter type (high value of πΈ), transports between individual holdings of these types are rare (low value 22 of π). 23 Posterior distributions of π and π were similar to the estimates given in Lindström et al. (2010). The Sow 24 pool centers were less dominant in determining the contact pattern of a holding when the model was 18 1 extended. However, it was still the dominant production type when reported concurrently with other types. 2 Posterior means of π were estimated at more than four times larger than the second most dominant type, 3 Sow pool satellite, and 81 times larger than the least dominant type, Fattening herd. 4 Differences were also shown in how the probability of contacts depends on the maximum capacity of 5 holdings (estimated with π), although most production types showed a positive relationship (negative 6 values were generally small) between maximum capacity and the probability of both incoming and 7 outgoing movements. This was found for both demographic groups (sows and fattening pigs). Without 8 proper understanding about the production types and structure of the industry, the negative values of π 9 may seem unexpected. Negative estimates indicate that larger holdings have fewer contacts and one might 10 perhaps expect that these are more active and hence trade more animals. For instance, Farrow-to-finish 11 holdings showed a slightly lower probability of both incoming and outgoing movements with larger 12 maximum capacity of fattening pigs. This is however not surprising as large herds in this category may be 13 expected to produce piglets that are kept in the herd until fattening and thus the main movement would be 14 animals sent to slaughter (which was not included in this study). Note that this study does not include 15 shipments to slaughterhouses. For Nucleus herds a large negative value was found for incoming 16 movements depending on the maximum capacity of sows. This might be due to the fact that large breeding 17 herds mainly introduce new genetic material in the form of semen and rarely buy live animals, as part of 18 their biosecurity policy. Other inconsistencies may be a result of the reporting system. As the production 19 type is reported by the farmer, and no proper definition of the production types are provided, it leaves 20 room for interpretation. Previous studies (Nöremark et al. 2009b) have reported that farmers with small 21 herds had a different interpretation of their type, and, in particular, it was found that they sometimes 22 regarded themselves as breeders even though only a few sows where kept on the premises. Moreover, 23 there is no requirement for updating the information in the database if the production type is changed. 24 Thus, the information on production type may be incorrect in the database. While Nucleus herds (by 25 common definition) generally receive few animals but send many, this may not be the case for small herds 19 1 (here mainly few sows). Nucleus herds are also expected to have a low maximum capacity of slaughter 2 pigs, and herds with large numbers for this demographic group might also behave differently, which may 3 explain the large positive estimates of πΜ π for Nucleus herds (Table 3). The holdings in the group Missing 4 information showed a negative relationship between the number of contacts and the maximum capacity in 5 3 out of 4 estimates. We believe this was a result of the heterogeneous nature of this group, which contains 6 all holdings not reported to belong to any of the other seven types. 7 The analysis of π½ and πΏ also showed considerable difference in how contact probability is influenced by 8 the distance between holdings. Comparing the estimates of πΏ to the estimates of Lindström et al. (2009) 9 where production type differences were ignored, we found that the kernel kurtosis for movement between 10 production types was generally lower. This supports the interpretation of kurtosis as a measurement of the 11 heterogeneity of the distance related processes resulting in movements between holdings. Also, by 12 including production type differences we found a good fit between observed and predicted movement 13 distances (Figure 2). Of the more common types (i.e. high values of π or πΈ listed in Table 2), the largest 14 mean posterior of πΏ was found for movements from Multiplying herds to Farrow-to-finish holdings. High 15 kurtosis was also found for movements from Farrow-to-finish to Fattening herds. This may be interpreted 16 that the contacts between these types are the result of heterogeneous processes. This may partly be 17 explained by the fact that in Farrow-to-finish herds with a limited capacity for fattening pigs, some piglets 18 are sold to fattening herds, while Farrow-to-finish herds with larger fattening units keep all their piglets. 19 Thus, it is not only the herd size but the relative size of the piglet producing and fattening units in the herd 20 that affects the contact pattern of this herd category. Moreover, there is a trend towards specialized 21 production units in pig farming and some herds registered as Farrow-to-finish may have changed their 22 production into either piglet production or fattening pigs without this being recorded in the database. 23 Movements from Multiplying herds were estimated to have high variance (π½), with the highest estimate of 24 the study found for Multiplying herds to Farrow-to-finish. This indicates that long distance movements are 25 common compared to other production types, and from a disease transmission perspective, Multiplying 20 1 herds may cause long distance transmission. Movements between Sow pool centers and satellites are 2 found to have low estimates of π½, indicating that while many movements occur between these types (both 3 in absolute number and relative to their abundance), these movements are of relatively short distance. 4 Of the two parameters used to model distance dependence, π½ and πΏ, the former is the main determinant of 5 disease spread dynamics (Lindström et al. In press). Extrapolating the kernel variance estimates to 6 implications about disease transmission we may expect that an infected e.g. Sow pool center will cause 7 few long distance transmissions, while a Multiplying herd (if infected) may rapidly increase the range of 8 an emerging disease. This was also found in the simulation study. The coefficients of Sow pool centers 9 (Figure 3b) decreased with distance but increase for Multiplying herds. Coefficients also increased for 10 Farrow-to-finish holdings, which had a high estimated value of π½ for the main contact type, Fattening 11 herds. Generally we expect contacts between types estimated with a high π½ to be particularly important in 12 later stages of outbreaks. When local susceptibles are depleted (due to becoming infected from more local 13 transmission), long distance contacts may spark new infection where depletion has not yet occurred. This 14 dynamic was found e.g. in the UK 2001 FMD outbreak (Keeling et al. 2001). 15 Matthews and Woolhouse (2005) argued that animal markets acted as super spreaders in this outbreak. 16 Such markets are rare in Sweden, and identification of possible super spreaders in the system may instead 17 focus on holdings of different production types. However, by international standards the animal farming is 18 less intensive, and movements are less frequent. Our analysis of holdings as potential super spreaders 19 should therefore be interpreted as relative to other holdings in the system. Figure 3a shows how the 20 potential of generating new infections changes with the infection degree for the different production types. 21 This is mainly a result of the estimates of π and πΈ, and the largest increase was found for Nucleus herds. 22 These mainly move animals to other Nucleus herds and Multiplying herds, which both in turn have many 23 outgoing contacts. Piglet producers showed a lower potential (compared to other production types) of 24 generating new infections when looking at higher degree infections. The general pattern is however quite 25 similar for different infection degrees, and we may conclude that production types with high estimates of 21 1 π for outgoing movements may act as super spreaders. For many diseases where the time between 2 infection and first symptoms are short, second (and consequently third) degree transmission may be 3 prevented by movement restrictions. While early detection is always crucial, our results suggest that the 4 importance should be even more emphasized for some production types, such as Nucleus herds and Sow 5 pool centers. 6 Further, the analysis of the simulation study showed that for most production types, infected holdings in 7 the larger size classes were likely to generate more transmissions. Hence, we may conclude that larger 8 holdings generally have higher potential to act as super spreaders. Exceptions were found for holdings 9 with Missing information (which is expected due to the negative relationship between maximum capacity 10 and contact probability, see above) and Sow pool satellites. The latter showed a decrease in the coefficient 11 given by the GLM for Large holdings. We believe this is a result of the fact that this size class contained 12 no holdings reported with only this production type, and the number of transmissions was largely 13 explained by the coexisting production types. 14 Using databases of holdings and animal movements to estimate parameters allows for assessment of large 15 scale patterns. Also, unlike qualitative studies, where inference is made from a few handpicked holdings 16 checked for consistency, the parameters are estimated from the same type of data that may be used for 17 outbreak simulations. If, for example, parameters are estimated from 100 holdings with every trait 18 checked and edited in great detail, they may not be reliably used for modeling contact data for holdings 19 that have not been checked in the same way. However, erroneous and dubious reports pose a problem. In 20 order to provide better estimates, the data quality needs to be improved. Better guides to farmers, as well 21 as a requirement for regular updates of recorded information may provide more reliable data. In particular, 22 the interpretation of production type is of great importance for work such as in this study. Production type 23 has a high influence on the contact patterns and could be of great use in risk estimation if the data is 24 reliable. Working with data from central databases always means a risk of erroneous reports affecting the 25 results. Using the same data as in this study, Lindström et al. (2010) reported unexpectedly many 22 1 movements between Sow pool centers, a highly unlikely event and probably due to deficiencies in the 2 recorded data on production type. As these are rare production types but with many movements, they are 3 particularly sensitive to erroneous entries in the data base and this may have affected the results. 4 Network analysis is another common approach to study animal movement contacts based on data base 5 entries. This commonly involves measurements that captures the overall structures and conclusions about 6 disease spread is then made from these measures (see Dubé et al. 2009 and references therein). The 7 method we have presented here can be seen as a lower lever analysis and uses a set of parameters to 8 capture the underlying process of between herd contacts that ultimately results in the higher level structure 9 captured by network measurements. For instance, our results show that generally holdings with larger herd 10 sizes both send and receive more animals and hence would, in a network context, have both higher in- and 11 outdegree (i.e. number of ingoing and outgoing links). Further, if βπΌπ½ is generally large for all production 12 types π½ (πΌ), then holdings of type πΌ (π½) are expected to have a high outdegree (indegree). An advantage of 13 our analysis is that we may also address secondary (and higher order) infections and as shown in Figure 3a 14 this can change the picture of which holding are to be considered as potential super spreaders. 15 The distance dependence parameters, in particular π½ (Håkansson et al. 2010), also relate to some 16 commonly used network measurements. Various types of centrality indices are frequently used in network 17 analyses. These capture how important nodes are for the overall connectivity of the network (Wasserman 18 and Faust 1994). Holdings with production types with high probability of long distance movements (large 19 values of π½) have the ability to connect otherwise distant (in terms of number of links) holdings and would 20 have a high centrality. A strong spatial component, i.e. low values of π½, will instead result in highly 21 clustered networks (Keeling 1999). 22 With Bayesian inference it is straightforward to incorporate uncertainties in the parameter estimates. The 23 more movements we include, the smaller the credibility intervals. A disadvantage of network analysis is 24 that a single measure is presented, and this measure depends largely on the number of movements 23 1 analyzed, hence on the time scale chosen. Whereas network analysis considers an observed animal 2 movement between two holdings as a fixed link, our approach instead consider this as a random event, 3 however with different probabilities. Another advantage of analysis at this lower level is that the 4 parameter may be directly incorporated in explicit simulations of outbreaks. Here we have presented a 5 simplistic simulation model to demonstrate how the parameters translate to predictions of disease spread. 6 A more realistic but far more complex simulation model of a disease outbreak should include other 7 relevant contact types as well as disease specific parameters, such as incubation time, recovery rate and 8 intra-herd dynamics. 9 An advantage of network analysis is however the accessibility. At present, there are numerous software 10 packages available which allows for straightforward analyses of data. The method we have presented here 11 is in comparison computationally heavy and requires some basic knowledge about MCMC techniques. 12 It should also be stressed that, while the model presented here is cumbersome in the number of 13 parameters, there are further aspects of the contact structure that are not included. Reoccurring contacts 14 between holdings may be expected, and in particular this is true for the holdings in the Sow pool system. 15 A low variance of the spatial kernel, as reported for contacts between Sow pool centers and satellites, 16 results in a high probability of contacts with nearby holdings, and a high rate of reoccurring contacts are 17 expected to occur in the simulation study. However, since it is not incorporated explicitly, we believe this 18 may have caused an overestimation of the number of infections caused by Sow pool centers and satellites 19 in the simulation study. Recurrent contacts are expected to influence disease spread dynamics and while it 20 requires estimation of additional parameters, this may be a salient extension of the model. 21 While the contact model presented may be improved, we believe that much of the relevant features are 22 included. If data of other contacts are available, the method may also be applied to these. Using 23 generalized measures of contact probabilities (such as the parameters of the presented contact model) 24 allows for the comparison, both between different holdings but also between different types of contacts. 25 Also, if similar analyses are applied to data of other countries, comparisons of parameters may inform us 24 1 of differences in the contact structure. We also believe that the model may be used for risk assessment as a 2 complement to classic methods, but there is a need for better data quality for reliable inference. Yet, an 3 analysis as the one presented may also guide towards increased data quality in the future. 4 5 5. Conclusion 6 In this study we have analyzed live animal movements between pig holdings and found that the contact 7 pattern is highly heterogeneous. We found that generally, but not always, a positive relationship exists 8 between the maximum capacity of a holding and the number of contacts. 9 Describing distance dependence with a spatial kernel and analyzing its characteristics provides valuable 10 information about the contact pattern between holdings and is a main feature in predicting disease spread. 11 Hence the more detailed knowledge gained by the methodology presented may improve both knowledge 12 and predictive power. We found that the probability of contacts between holdings dependent on distance 13 was influenced by the production type of the start and end holding. 14 Heterogeneous contact patterns, with some holdings likely to act as super spreaders, and differences in the 15 probability of long distance contacts is expected to cause stochastic dynamics of a disease outbreak where 16 animal movements are important for the transmission. 17 18 Conflict of interest 19 We have no conflict of interest. 20 21 Acknowledgement 25 1 We thank the Swedish Board of Agriculture for supplying the data and Swedish Civil Contingencies 2 Agency for funding 3 4 References 5 Bigras-Poulin, M., Thompson, R.A., Chriel, M., Mortensen, S., Greiner, M., 2006. Network analysis of 6 Danish cattle industry trade patterns as an evaluation of risk potential for disease spread. Prev. Vet. Med. 7 76, 11–39. 8 Boender, G.J., Meester, R., Gies, E., De Jong, M.C.M., 2007. The local threshold for geographical spread 9 of infectious diseases between farms. Prev. Vet. Med. 82, 90–101. 10 Brennan, M.L., Kemp, R., Christley, R.M. 2008. Direct and indirect contacts between cattle farms in 11 north-west England. Prev. Vet. Med. 84, 242–260. 12 Casella, G., George, E.I., 1992. Explaining the Gibbs sampler. Am. Stat. 46, 167–174. 13 Clark, J. S., Silman, M., Kern, R., Macklin, E., HilleRisLambers, E. 1999. Seed dispersal near and far: 14 generalized patterns across temperate and tropical forests. Ecology 80, 1475–1494. 15 Chib, S., Greenberg, E., 1995. Understanding the Metropolis-Hastings algorithm. Am. Stat. 49, 327–335. 16 Dickey, B.F., Carpenter, T.E., Bartell, S.M., 2008. Use of heterogeneous operation-specific contact 17 parameters changes predictions for foot-and-mouth disease outbreaks in complex simulation models. Prev. 18 Vet. Med. 87, 272–287. 19 Dubé, C., Ribble, C., Kelton, D., McNab, B., 2009. A review of network analysis terminology and its 20 application to foot-and-mouth disease modelling and policy development. Transbound. Emerg. Dis. 56, 21 73–85. 26 1 Fan Y., Dortet-Bernadet, J.-L., Sisson, S. A. 2010. A note on Bayesian curve fitting via auxiliary 2 variables. J. Comput. Graph. Stats. 19, 626–644. 3 Févre, E.M., Bronsvoort, B.M.de C., Hamilton, K.A., Cleaveland, S., 2006. Animal movements and the 4 spread of infectious diseases. Trends Microbiol. 14, 125–131. 5 Gamerman, D., Lopes, H.F., 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian 6 Inference, second ed. CRC Press, Chapman & Hall. 7 Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis, second ed. Chapman & 8 Hall/CRC (Chapter 18). 9 Hawkes, C. 2009 Linking movement behaviour, dispersal and population processes: is individual variation 10 a key? J. Anim. Ecol. 78, 894–906. 11 Håkansson, N., Jonsson, A., Lennartsson, J., Lindström, T., Wennergren, U., 2010. Generating structure 12 specific networks. Adv. Complex Syst. 13, 239-250. 13 Kao, R.R., Green, D.M., Johnson, J., Kiss, I.Z., 2007. Disease dynamics over very different time-scales: 14 foot-and-mouth disease and scrapie on the network of livestock movements in the UK. J. R. Soc. Interface 15 4, 907–916. 16 Keeling, M.J., 1999. The effects of local spatial structure on epidemiological invasions. Proc. R. Soc. 17 London B 266, 859–869. 18 Keeling, M.J., Woodhouse, M.E., Shaw, D.J., Matthews, L., 2001. Dynamics of the 2001 UK foot and 19 mouth epidemic: stochastic dispersal in a dynamic landscape. Science 294, 813–817. 20 Keeling, M., 2005. The implications of network structure for epidemic dynamics. Theo. Pop. Bio. 67, 1–8. 21 Lindström, T., Håkansson, N., Westerberg, L., Wennergren, U., 2008. Splitting the tail of the 22 displacement kernel shows the unimportance of kurtosis. Ecology 89, 1784–1790. 27 1 Lindström, T., Sisson, S.A., Nöremark, M., Jonsson A., Wennergren, U., 2009. Estimation of distance 2 related probability of animal movements between holdings and implications for disease spread modeling. 3 Prev. Vet. Med. 91, 85–94. 4 Lindström, T., Sisson, S.A., Sternberg Lewerin, S., Wennergren, U., 2010. Estimating animal movement 5 contacts between holdings of different production types. Prev. Vet. Med. 95, 23-31. 6 Lindström, T., Håkansson, N., Wennergren, U., The shape of the spatial kernel and its implications for 7 biological invasions in patchy environments. Proc. Roy. Soc. London B. In press. 8 Matthews, L., Woolhouse, M., 2005. New approaches to quantifying the spread of infection. Nat. Rev. 9 Mol. Cell Biol. 3, 529–536. 10 Nöremark, M., Håkansson, N., Lindström, T., Wennergren, U., Sternberg Lewerin, S., 2009a. Spatial and 11 temporal investigations of reported movements, births and deaths of cattle and pigs in Sweden. Acta Vet. 12 Scand. 51:37. 13 Nöremark M, Lindberg A, Vågsholm I, Sternberg Lewerin S. 2009b. Disease awareness, information 14 retrieval and change in biosecurity routines among pig farmers in association with the first PRRS outbreak 15 in Sweden. Prev. Vet. Med. 90, 1–9. 16 Nöremark, M., Håkansson, N., Sternberg Lewerin, S., Lindberg, A. Jonsson A., Network analysis of cattle 17 and pig movements in Sweden; measures relevant for disease control and risk based surveillance. In press. 18 Ortiz-Pelaez A, Pfeiffer D. U., Soares-Magalhães R.J., Guitian F.J., 2006. Use of social network analysis 19 to characterize the pattern of animal movements in the initial phases of the 2001 foot and mouth disease 20 (FMD) epidemic in the UK. Prev. Vet. Med. 75, 40–55. 21 Rweyemamu, M., Roeder, P., Mackay, D., Sumption, K., Brownlie, J., Leforban, Y., Valarcher, J.F., 22 Knowles, N.J., Saraiva, V., 2008. Epidemiological patterns of foot-and-mouth disease worldwide. 23 Transbound. Emerg. Dis. 55, 57–72. 28 1 Ribbens, S., Dewulf, J., Koenen, F., Mintiens, K., de Kruif, A., Maes, D., 2009. Type and frequency of 2 contacts between Belgian pig herds. Prev. Vet. Med. 88, 57–66. 3 Robinson, S.E., Christley, R.M., 2007. Exploring the role of auction markets in cattle movements within 4 Great Britain. Prev. Vet. Med. 81, 21–37. 5 Tildesley, M.J., Deardon, R., Savill, N.J., Bessell, P.R. Brooks, S.P., Woolhouse, M.E.J., Grenfell, B.T., 6 Keeling, M.J., 2008. Accuracy of models for the 2001 foot-and-mouth epidemic. Proc. Roy. Soc. London 7 B. 275, 1459–1468. 8 Velthuis, A.G., Mourits, M.C., 2007. Effectiveness of movement-prevention regulations to reduce the 9 spread of foot-and-mouth disease in The Netherlands. Prev. Vet. Med. 82, 262–281. 10 Vernon, M.C., Keeling, M.J., 2009. Representing the UK’s cattle herd as static and dynamic networks. 11 Proc. Roy. Soc. London B. 276, 469–476. 12 Wasserman, S., Faust, K., 1994. Social Network Analysis: Methods and Applications. Cambridge 13 University Press, Cambridge. 14 Watts, D.J., Strogatz, S.H., 1998. Collective dynamics of ‘‘small-world’’ networks. Nature 393, 440–442. 15 16 Appendix A. Hierarchical priors 17 We implement hierarchical priors for π½ and πΏ, and denote these πΉ(ππΌπ½ |ππ½ ) and πΉ(π πΌπ½ |ππΏ ), respectively, 18 where ππ½ and ππΏ are vectors of unknown hyper-parameters. These vectors model the degree of similarity 19 between parameters and (if similarities are prominent) improves estimation of the π½ and πΏ when the data 20 is weak (e.g. few movements between types πΌ, π½). For π½ we use an inverse gamma distribution with hyper- 21 parameters πΌπ and π½π 29 πΌ πΉ(ππΌπ½ |πΌπ , π½π ) = π½π π πΌπ (1⁄ππΌπ½ ) π −π½π⁄ππΌπ½ . π€(πΌπ ) (A.1) 1 The probability density function of the inverse gamma distribution is defined for values larger than zero. 2 Since the lower limiting value of πΏ is 4/3 (a uniform distribution obtained for π → ∞, Lindström et al. 3 2008) we use a shifted inverse gamma distribution as hierarchical prior for πΏ and write πΌ πΉ(π πΌπ½ |πΌπ , π½π ) = π½π π πΌπ (1⁄(π πΌπ½ − 4⁄3)) π −π½π ⁄(π πΌπ½−4⁄3) . π€(πΌπ ) (A.2) 4 5 Appendix B. Indicator variables 6 To facilitate computations when sampling from posteriors associated with mixture distributions, a 7 common strategy is to introduce indicator variables (for example Gelman et al. 2004). With this approach, 8 equation 10 may be rewritten as π(π , π|π, π, πΉ, πΊ, π«, π, πΏ, π½, πΌ) = ∏ ∏ ∏[π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ )π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° )π(πΌ, π½|π, π, πΉ)] π‘ πΌ ππΌπ½π‘ , (B.1) π½ 9 where πΌ is a tensor of dimension πΎ × πΎ × π and ππΌπ½π‘ = 1 for exactly one combination of πΌ, π½ for each 10 movement π‘. The full posterior distribution of unobserved parameters π, πΏ, π½, π, π, πΌπ , π½π , πΌπ , π½π , πΌ is π(π , πΏ, π½, π, π, πΌπ , π½π , πΌπ , π½π , πΌ|πΉ, πΊ, π«, π , π) = π(π , π|π, π, πΉ, πΊ, π«, π, πΏ, π½, πΌ)πΉ(πΏ|πΌπ , π½π ) πΉ(π½|πΌπ , π½π )π(π)π(π)π(π)π(πΌπ )π(π½π )π(πΌπ )π(π½π ), (B.2) 11 where ,π(π), π(π), π(π), π(πΌπ ), π(π½π ), π(πΌπ ) and π(π½π ) are prior distributions of parameters and 12 hyper-parameters. For π and π (recall that the elements of these sum to one) we use uninformative 13 π·ππππβπππ‘(1,1, … ,1) priors. Priors π(π), π(πΌπ ) and π(πΌπ ) are set to be proportional to one on the support 14 of the parameters, while π(π½π ) and π(π½π ) are defined as uniform for π½ > 1. The inverse gamma 15 distribution does not have a finite mean for π½ < 1 and we assume that both π and π are finite quantities. 30 1 Incorporation of the unobserved indicator variable πΌ in the model also allows for posterior estimation of 2 the number of movements between production types. The posterior distribution of πΈ is calculated from the 3 posterior distribution of πΌ by ππΌπ½ = ∑ ππΌπ½π‘ . (B.3) π‘ 4 5 Appendix C. MCMC estimation 6 We use Markov chain Monte Carlo (MCMC) techniques to estimate the posterior distribution of the model 7 parameters. This involves the construction of a stochastic Markov chain with stationary distribution given 8 by the posterior distribution of interest. Given a current state of the chain, MCMC methods sequentially 9 update the parameters either individually or in blocks, based on the full posterior conditional distributions 10 of each parameter under the model. Repeating this procedure, and after the chain has converged, the states 11 of the chain represent (correlated) draws from the posterior distribution of model parameters. Two basic 12 updates are involved. If the conditional distribution of a parameter is of a standard form, Gibbs sampling 13 (see e.g. Casella and George 1992 for further details) may be used. If however the distribution is of non- 14 standard form, Metropolis-Hastings updates may be used (see e.g. Chib and Greenberg 1995 for further 15 details). In this case, parameter values π½∗ are proposed from a density function π(π½∗ |π½) and subsequently 16 accepted with probability πππ (1, π(π½∗ | … )π(π½∗ )π(π½|π½∗ ) ), π(π½| … )π(π½)π(π½∗ |π½) (C.1) 17 where π½ and π½∗ are current and proposed parameter values, π(π½) is the prior and π(π½| … ) is the 18 likelihood evaluated at π½. Further information on MCMC methods can be found in Gamerman and Lopes 19 (2006). 20 All parameters except πΌ may be updated with Metropolis-Hastings steps. Parameter matrix π is only 21 involved in π(πΌ, π½|π, π, πΉ) of equation B.1, and so the resulting conditional posterior distribution of π is 31 π πΎ πΎ π(π|π, πΌ, πΉ) ∝ ∏ ∏ ∏[π(πΌπ‘ , π½π‘ |π, π, πΉ) ]ππΌπ½π‘ π(π). (C.2) π‘=1 πΌ=1 π½=1 1 πΎ ππΌπ½π‘ The prior, π(π), is proportional to 1 and the distribution ∏ππ‘=1 ∏πΎ is given in πΌ=1 ∏π½=1[π(πΌπ‘ , π½π‘ |π, π, πΉ) ] 2 Lindström et al. (2010) as πΏ1 = ππ’ππ‘πππππππ(π1,1 , π1,2 , … π2,1 , π2,2 … ππ,π−1 , ππ,π |π1,1 , π1,2 , … π2,1 , π2,2 … ππ,π−1 , ππ,π ), (C.3) 3 where ππΌ,π½ = π(πΌ, π½|π, π, πΉ) is given in equation 3, ππΌ,π½ = ∑π‘ ππΌπ½π‘ for all πΌ, π½. Dirichlet distributions were 4 used for proposals. More details may be found in Lindström et al. (2010). 5 Dirichlet proposals were also used for updates of π, which is included in each distribution 6 π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ ), π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° ) and π(πΌ, π½|π, π, πΉ). For simplicity we write π πΎ πΎ πΏ2 = ∏ ∏ ∏ [ π‘=1 πΌ=1 π½=1 7 π£Μπ πΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) ππΌπ½π‘ ] ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1πΌ , πΜ2πΌ ) (C.4) and π πΎ πΎ π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) ππΌπ½π‘ πΏ3 = ∏ ∏ ∏ [ ] ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) , π ≠ π , (C.5) π‘=1 πΌ=1 π½=1 8 and here the conditional posterior distribution for π is π(π |π, π, πΉ, πΊ, π«, πΏ, π½, πΌ, π±, π°) = πΏ1 πΏ2 πΏ3 π(π). 9 10 (C.6) The elements of πΜ and πΜ were updated separately using Gaussian random walk proposal distributions. The conditional posterior distribution of πΜπ’πΌ (π’ = 1,2) is π(πΜπ’πΌ |πΜπ’ΜπΌ , π, πΌ, πΉ, πΌ, πΊ, π) π πΎ π£Μπ πΌ πΊ(ππ’π , ππ’Μπ , πΜπ’πΌ , πΜπ’ΜπΌ ) ππΌπ½π‘ = (∏ ∏ [ ] ∑π π£ΜππΌ πΊ(ππ’π , ππ’Μπ , πΜπ’πΌ , πΜπ’ΜπΌ ) ) π(πΜπ’πΌ ) (C.7) π‘=1 π½=1 11 where π’Μ = 2 for π’ = 1 and π’Μ = 1 for π’ = 2. Similarly, the conditional posterior distribution of πΜπ’π½ is 32 π(πΜπ’π½ |πΜπ’Μπ½ , π, π, πΉ, πΊ, π«, πΏ, π½, πΌ, π±) π πΎ = ∏∏[ π‘=1 πΌ=1 π£ΜππΌ πΊ(ππ’π , ππ’Μπ , πΜπ’π½ , πΜπ’Μπ½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) ππΌπ½π‘ ] ∑π π£ΜππΌ πΊ(ππ’π , ππ’Μπ , πΜπ’π½ , πΜπ’Μπ½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) π(πΜπ’π½ ), (C.8) π ≠ π . 1 Joint updates of πΏ and π½ were performed separately for each combination of πΌ, π½ using multivariate 2 Gaussian random walk on the logarithm of π πΌπ½ and ππΌπ½ with proposals from a multivariate normal 3 distribution. The joint conditional distribution of π πΌπ½ , ππΌπ½ is π(π πΌπ½ , ππΌπ½ |πΜ π± , π, π, πΉ, πΊ, π«, πΌ, π±) π π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) ππΌπ½π‘ = ∏[ ] ∑π π£ΜππΌ πΊ(π1π , π2π , πΜ1π½ , πΜ2π½ )πΉ(π·π π , π πΌπ½ , ππΌπ½ ) π‘=1 π ≠ π . πΉ(π πΌπ½ |πΌπ , π½π ) πΉ(ππΌπ½ |πΌπ , π½π ), (C.9) 4 We utilized Metropolis-Hastings updates for both πΌ and π½. In order to improve the mixing we updated the 5 parameters of the hierarchical priors five times for every update of the other parameters (e.g. Fan et al. 6 2010). The posterior distribution of πΌπ , π½π (π = π, π ) is π(πΌπ |π½) = πΉ(π½|πΌπ , π½π ) π(πΌπ ) π(π½π |π½) = πΉ(π½|πΌπ , π½π ) π(π½π ) (C.10) 7 with πΉ given by equations A.1 (for π½ = π½) and A.2 (for π½ = πΏ). 8 As the indicator variable ππΌπ½π‘ = 1 for exactly one combination of πΌ, π½, then πΌ may be updated with Gibbs 9 sampling by drawing one random number for each π‘ from a multinomial distribution with probabilities 10 given by ππ(ππΌπ½π‘ = 1) π(ππ‘ |π, πΉ, π π‘ , πΌ, π½, πΊ, π«, πΜ π± , π πΌπ½ , ππΌπ½ )π(π π‘ |π, πΉ, πΌ, πΊ, πΜ π° )π(πΌ, π½|π, π, πΉ) = . ∑π ∑π π(ππ‘ |π, πΉ, π π‘ , π, π, πΊ, π«, πΜ π , π ππ , ππΌπ½ )π(π π‘ |π, πΉ, π, πΊ, πΜ π )π(π, π|π, π, πΉ) 11 12 Figure Legends 33 (C.11) 1 Figure 1. Examples of spatial kernel with (a) kurtosis =3.33 and variance=1000 (dashed), 10000 (solid), 2 100000 (dotted) (km2) and (b) variance=1000000 (km2) and kurtosis =2 (dashed), 4 (solid), 8 (dotted). 3 Embedded axis’ shows same as major axes but with logarithmic y-axis and larger distances included. 4 5 Figure 2. Cumulative distribution of observed (solid line) and predicted (dotted line) distances of live pig 6 movements between holdings. Note that the x-axis is on the log scale. 7 8 Figure 3. Coefficients of (panels a and b) explanatory variables production type (0 if holding had not 9 reported the production type, 1 if reported) and (panel c) the combination of production type and size class 10 (S=small, M=medium, L=large) analyzed by GLM. Response variable was (a) the number of first, second 11 and third degree infections caused by a holding if infected, (b) number of first degree infections at 12 distances longer than 10, 100 and 500 km and (c) the number of first degree infections (but with size 13 classes included in the explanatory variables). Note that (a) and (b) are the result of three separate analyses 14 (each with 8 explanatory variables) while (c) is the result of one analysis. Legend abbreviations: 15 SPC=Sow pool centers, MH=Multiplying herds, NH=Nucleus herds, PP=Piglet producers, SPS=Sow pool 16 satellites, FF=Farrow-to-finish, FH=Fattening herds, MI=Missing information. 34