Abstract This thesis focuses on the spread of organisms in both ecological and epidemiological contexts. In most of the studies presented, displacement is modeled with a spatial kernel function, which is characterized by scale and shape. These are measured by the net squared displacement (or kernel variance) and kurtosis, respectively. If organisms disperse by the assumptions of a random walk or correlated random walk, a Gaussian shaped kernel is expected. Empirical studies often report deviations from this, and commonly leptokurtic distributions are found, often as a result of heterogeneity in the dispersal process. In two of the included papers, the importance of the kernel shape is tested, by using a family of kernels where the shape and scale can be separated effectively. Both studies utilize spectral density approaches for modeling the spatial environment. It is concluded that the shape is not important when studying the population distribution in a habitat/matrix context. The shape is however important when looking at the invasion of organisms in a patchy environment, when the arrangement of patches deviates from randomly distributed. The introduced method for generating patch distribution is also compared to empirical distributions of patches (farms and old trees) and it is concluded that the assumptions used for modeling of the spatial environment are consistent with the observed patterns. These assumptions include fractal properties such that the same aggregational patterns are found at different scales. In a series of papers, movements of animals are considered as vectors for betweenherd disease spread. The studies are based on data found in databases kept by the Swedish Board of Agricultural (SJV), consisting of reported movements, as well as farm location and characteristics. The first study focuses on the distance related probability of contacts between herds. In the following papers, the analysis is expanded to include production type and herd size. Movement data of pigs (and cattle in Paper I) are analyzed with Bayesian analysis, implemented with Markov Chain Monte Carlo (MCMC). This is a flexible approach that allows for parameter estimations of complex models, and at the same time includes parameter uncertainty. In Paper IV, the effects of the included factors are investigated. It is shown that all three factors (herd size, production type structure and distance related probability of contacts) are expected to influence disease spread dynamics, and the production type i structure is found to be the most important factor. This emphasizes the value of keeping such information in central databases. The models presented can be used as support for risk analysis and disease tracing. However, reliable data is always a problem, and implementation may be improved with better quality data. The thesis also shows that utilizing spatial kernels for description of the spatial spread of organisms is an appropriate approach. However, these kernels must be flexible and flawed assumptions about the shape may lead to erroneous conclusions. Hence, the joint distribution of kernel shape and scale should be estimated. The flexibility of Bayesian analysis, implemented with MCMC techniques, is a good approach for this, and further allows for implementation of more complex models where other factors may be included. ii Populärvetenskaplig sammanfattning Alla organismer måste kunna sprida sig. Att kunna uppskatta denna spridning är en av grundstenarna för att förstå både ekologiska och epidemiologiska processer. Denna avhandling innehåller två delar. Dels generella frågeställningar om betydelsen av spridningskaraktärer inom ekologiska och epidemiologiska system, och dels tillämpade studier med målet att modellera spridning av smittsamma djursjukdomar mellan gårdar, med fokus på djurtransporter. Det är vanligt att man representerar spridningen med en spridningsfunktion, dvs. en funktion som beskriver spridning beroende på avståndet. Spridningsfunktionen kan karakteriseras med två mått; dels en skalfaktor, som talar om ifall spridning generellt sker över långa avstånd eller begränsat till kortare distanser, och dels en spridningsfunktionens form, som avgör ifall det är stora skillnader i spridningsavstånden. Att matematisk kunna särskilja dessa aspekter gör att man kan studera när och hur dessa karaktärer är viktiga att ta hänsyn till när man studerar spridning av organismer. I min avhandling visar jag bland annat att formen är viktig att ta hänsyn till när spridningen sker mellan enheter som är aggregerade i rummet. Det är alltså viktigt att utvärdera karaktärer på landskapet som spridningen sker i. Detta resultat har konsekvenser för studier av invasion av främmande arter, arters möjlighet att nyttja de resurser som finns i landskapet och smittspridning. Det finns tydliga paralleller mellan ekologiska och epidemiologiska studier. Där man inom t.ex. bevarandebiologin är ute efter att optimera överlevnaden hos en hos en art, är man inom epidemiologin ute efter att få smittan att dö ut. Avhandlingen innehåller även en serie studier som analyserar hur gårdar i Sverige är i kontakt med varandra via djurtransporter. Detta analyseras ur ett smittskyddsperspektiv, och med syfte att öka förståelsen för smittspridning. Alla djurtransporter av gris och nöt måste rapporteras till jordbruksverket, och utifrån detta har jag försökt beskriva (matematiskt) sannolikheten för att en gård skicka djur till en annan. Grisnäringen är väldigt strukturerad i Sverige. Olika besättningar fokuserar på att t.ex. avel, slaktgrisproduktion eller uppfödning av gyltor (suggor kallas gyltor fram till och med att de fött kultingar en gång). En mer detaljerad analys har därför utförts på gristransporter. Sannolikheten för kontakt mellan gårdarna har uppskattats beroende på avstånd mellan gårdarna, besättningsstorlek och produktionstyp. Alla faktorerna är viktiga att ta hänsyn till, men produktionsinriktningsstrukturen är den faktor som mest påverka smittspridningsdynamiken. Detta är en del av ett långsiktigt projekt vars mål iii är att kunna simulera smittspridning i Sverige av smittsamma sjukdomar, som t.ex. mul- och klövsjuka. iv Table of Contents Abstract ........................................................................................................................... i Populärvetenskaplig sammanfattning ........................................................................... iii Publications included in thesis ..................................................................................... vii Publications not included in thesis .............................................................................. viii Abbreviations ................................................................................................................ ix 1 Introduction ............................................................................................................. 1 1.1 Aims ................................................................................................................. 3 2 The spatial kernel .................................................................................................... 5 2.1 Movement, dispersal and the spatial kernel ..................................................... 6 2.2 Kernel kurtosis and disease spread ................................................................. 11 2.2.1 Kurtosis as measurement of heterogeneity in the spatial process ........... 11 2.2.2 The spatial kernel in Papers I and III ....................................................... 13 2.2.3 Kurtosis and stochastic disease transmission .......................................... 14 2.3 Relative and absolute distance dependence.................................................... 14 3 The spatial context ................................................................................................ 19 3.1 Segmented landscapes .................................................................................... 20 3.2 Point pattern landscapes ................................................................................. 21 4 Modeling of between-herd contacts ...................................................................... 24 4.1 Data................................................................................................................. 25 4.2 Overview of Papers I – IV .............................................................................. 26 4.3 Use of the model in risk assessment ............................................................... 30 5 Bayesian analysis and Markov Chain Monte Carlo .............................................. 33 5.1 A small bibliographic study ........................................................................... 35 5.2 Bayesian and frequentistic statistics ............................................................... 37 6 Conclusions and recommendations for further research ....................................... 41 6.1 The spatial kernel ........................................................................................... 41 6.2 Neutral point pattern landscapes .................................................................... 42 6.3 Disease spread modeling ................................................................................ 42 6.3.1 Further development of the contact model .............................................. 43 6.3.2 Application to specific diseases ............................................................... 44 Acknowledgements ...................................................................................................... 45 References .................................................................................................................... 47 v Publications included in thesis This thesis is based on the following seven papers, which will be referred to in the text by their Roman numerals: I. Lindström, T., Sisson, S.A., Nöremark, M., Jonsson, A., Wennergren, U. 2009. Estimation of distance related probability of animal movements between holdings and implications for disease spread modeling. PREVENTIVE VETERINARY MEDICINE. 91, 85–94. II. Lindström, T., Sisson, S.A., Stenberg Lewerin, S., Wennergren, U. 2010. Estimating animal movement contacts between holdings of different production types. PREVENTIVE VETERINARY MEDICINE. In press. doi:10.1016/j.prevetmed.2010.03.002 III. Lindström, T., Sisson, S.A., Stenberg Lewerin, S., Wennergren, U. Bayesian analysis of animal movements related to factors at herd and between herd levels: Implications for disease spread modeling. Submitted manuscript. IV. Lindström, T., Stenberg Lewerin, S., Wennergren, U. Expected effect of herd size, production type and between herd distances on the dynamic of between herd disease spread. Manuscript. V. Lindström, T., Håkansson, N., Westerberg, L., Wennergren, U. 2008. Splitting the tail of the displacement kernel shows the unimportance of kurtosis. ECOLOGY. 89, 1784–1790. VI. Lindström, T., Håkansson, N., Wennergren, U. The shape of the spatial kernel and its implications for biological invasions in patchy environments. Manuscript. VII. Westerberg, L., Lindström, T., Nilsson, E., Wennergren, U. 2008. The effect on dispersal from complex correlations in small-scale movement. ECOLOGICAL MODELLING. 213, 263–272. Papers I, II, V and VI are reprinted with permission from the publishers. vii Publications not included in thesis 1. Håkansson, N., Jonsson, A., Lennartsson, J., Lindström, T., Wennergren, U. Generating structure specific networks. ADVANCES IN COMPLEX SYSTEMS. In press. 2. Nöremark, M., Håkansson, N., Lindström, T. Wennergren, U., Sternberg Lewerin, S., 2009. Spatial and temporal investigations of reported movements, births and deaths of cattle and pigs in Sweden. ACTA VETERINARIA SCANDINAVICA. 51:37. viii Abbreviations CRW Correlated random walk FMD Foot and mouth disease LDD Long distance dispersal MAM Mass action mixing MCMC Markov Chain Monte Carlo NPPL Neutral point pattern landscape RW Random walk SJV Swedish board of agriculture ix 1 Introduction This thesis focuses on the spread of organisms and in the context of both ecological and epidemiological studies. At first glance, these may seem like diverse topics, but largely they are based on very similar concepts. Epidemiology is often regarded to have been founded by Hippocrates of Cos (460 BC to 377 BC) who introduced the word epidemic and was the first to consider the distribution and spread of disease in a rational manner (Merrill and Timmreck 2006). A modern definition of epidemiology can be found in Smith (2006, p.1), where it is described as the study of “distribution and determinants of disease in populations”. The word epidemiology is for some authors reserved for human diseases (Kleinbaum et al. 2007), but Dohoo et al. (1994) points out that there are no linguistic or scientific reasons to make such distinction. The definitions of ecology also differs somewhat between authors, but most definitions are based on that given by Ernst Haeckel 1866, here with translation given in Worster (1994, p.192), ”the science of the relations of living organisms to the external world, their habitat, customs, energies, parasites, etc.”. This definition is in many ways similar to the more recent (as well as more poetic) definition given by Ehrlich and Roughgarden (1987, p.vii): “Ecology is about the living beings of earth and how they interact with one another and with their nonliving environments”. By the definitions above, it is clear that the two fields are concerned with similar topics, and an alternative (and perhaps somewhat more provoking) definition of epidemiology given by Smith (2006, p.1) is that “Epidemiology is nothing more than ecology with a medical and mathematical flavor”. That aside, many concepts developed in ecology have found their way into epidemiological research of infectious diseases. One example is given by Heesterbeek (2002), who outlines the history of the basic reproductive number, 0 , defined as the number of offspring produced by one individual (or sometimes female). Other examples are the relationship between the SIR (Susceptible-Infected-Recovered) models and Levins metapopulations (Grenfell and Keeling 2007) as well as cyclic behavior of hostpathogen fluctuations (Bjørnstad et al. 2002) and traveling waves in biological invasions (Mollison 1977). Given the definitions of both research fields, it should also be easily realized that the spatial factor is a major issue within both ecological and epidemiological research. Any organism (human, animal, pathogen or other) interacts with both their biotic and abiotic environment in a spatial context. As this environment is typically both 1 spatially and temporally variable, all organisms must rely on the ability to disperse and colonize new habitat or hosts, and features of these dispersal abilities are important factors of the fitness of the organism (McPeek and Holt 1992, Hovestadt et al. 2001, Achter and Webb 2006, Phillips et al. 2008). In the field of landscape ecology, this spatial context is often referred to as a landscape (Keitt 2000) and, since the landscape can have diverse characteristics; we may expect different aspects of the dispersal ability to be important in different landscapes. While epidemiology and ecology share many features, studies of the former must (among others) include other pathways for dispersal of pathogens. Four papers included in this thesis focus on veterinary epidemiology and more specifically on transmission of diseases between livestock herds. Pathways such as wind dispersal (e.g. Crauwels et al. 2003, Garner et al. 2006) and insect vectors (e.g. Erasmus 1975, Murray 1995) are pathways with clear resemblance to ecological studies, but transmission between herds is (of course) also influenced by human activities. Human induced contacts between herds pose a great risk for transmission of many livestock diseases, and while these contact cannot be understood on the premises of ecological research, they commonly still have a spatial element (Robinson and Christley 2007, Velthuis and Mourits 2007, Ersbøll and Nielsen 2008, Dubé et al. 2009, Nöremark et al. 2009a). One type of contact that is of particular interest is the movement of animals between herds, and this is the focus of Papers I – IV. While different diseases can be transmitted by different pathways, the introduction of an infected animal to a new herd is always a major risk factor for infectious diseases (Févre et al. 2006, OrtizPelaez et al. 2006, Rweyemamu et al. 2008). The importance of animal movements for disease transmission has long been recognized, and Fischer (1980) states that a changed attitude towards animal movements contributed to a decrease in transmission of cattle disease at the end of the 19th century. Special attention to animal movements between herds has received particular interest after the UK 2001 outbreak of foot and mouth disease (FMD), where animal movements contributed to both large number of transmissions between herds as well as many long distance transmissions (Ferguson et al. 2001, Keeling et al. 2001, Kiss et al. 2006, Dubé et al. 2009). 2 1.1 Aims The unifying factor in this thesis is the analysis of (spatial) spread of organisms in both ecological and epidemiological contexts. The general aim is to develop models and methods to increase our knowledge regarding spread of organisms in spatially explicit settings. While the papers included deal with similar concepts, the specific aims are somewhat diverse and vary between the different studies. I will therefore list the aims here and in bracket state which papers are concerned with each aim. To develop models that describe between-herd contact patterns via live animal movements. This forms part of a larger research project with the long term aim of creating predictive models for disease spread between Swedish livestock herds, which can be used for risk assessment (Papers I, II, III, IV). To contribute to theory regarding the importance of the shape of the spatial kernel (as described by kurtosis) used for modeling spread of organisms in spatially explicit systems. The aim is more specifically to investigate when kurtosis is expected to be important to include when modeling organisms in spatially explicit systems (Papers V, VI). To develop analytical tools for translating observed data into relevant parameters that can be used in the modeling organisms spread (Papers I, II, III, VI, VII). 3 2 The spatial kernel Papers I, III, IV, V and VI utilize a spatial kernel in the modeling of spread of organisms. The term “spatial kernel”, which is more commonly used in epidemiological studies (e.g. Keeling et al. 2001, Boender et al. 2007) is used in this thesis. I use the term as it may apply to both ecological and epidemiological studies. It is however known by many names, and the terminology in ecology includes dispersal kernel (Mollison 1991, Clark 1998, Skarpaas and Shea 2007), redistribution kernels (Kot et al. 1996, Lockwood et al. 2002, Westerberg and Wennergren 2003), dispersal curves (Nathan et al. 2003) and displacement kernels (Santamaría et al. 2007, Preece and Mao 2009). In epidemiological studies it is sometimes also referred to as the contact kernel (Mollison 1977, Ferguson et al. 2001). The spatial kernel used in this thesis describes the density at some distance from the source by the function (, , ) = −( ) (1) where is the distance, and are parameters determining the scale and shape of the kernel and is used to normalize the distribution. For the one dimensional case it is normalized by ⁄(2 (1⁄)) and in two dimensions by 22 (2⁄)⁄ . The function has been labeled a generalized normal distribution (Nadarajah 2005) and power exponential (Klein et al. 2006). The reason why this function is used is that it contains some well known distributions as special cases. For parameter = 1 it is a exponentially decreasing function and for = 2, the distribution may be rewritten as a Gaussian distribution. These are distributions often used in both ecological and epidemiological studies. Rather than just describing the spatial kernel by the parameters and , the kernel can be characterized by two more relevant measures; the scale and the shape. The scale is in this thesis measured as a two dimensional variance and calculated as the second moment around zero (see Paper V). This measure is chosen because it, in some instances, can be estimated as the expected net squared displacement by analysis of small scale movement patterns of individual organisms. The shape is measured by kurtosis, which is calculated as the fourth moment divided by the square of the second moment (as provided in Paper V). 5 2.1 Movement, dispersal and the spatial kernel In the beginning of the 20th century, Albert Einstein studied diffusion of molecules by random walk movements and later, ecologists found that similar approaches could be used to study the dispersal of living organisms (Vogel 2005). By representing the path as a series of consecutive steps, estimation of displacement distances at larger scales may be inferred from analysis of the small scale patterns of the steps. Ideally, these steps should have an ecological or behavioral relevance (Turchin 1998), such as a rhizome branching for clonal growth of plants (Cain 1990) or the flight of a butterfly between host plants (Kareiva and Shigesada 1983). The movement path of continuously moving organisms may also be represented by discrete steps, usually defined by some time interval. Turchin (1998) labels these “artificial moves” and points out that choosing an appropriate duration could be problematic, and both overand under-sampling poses problems to estimation of displacement distances. Yet, analysis of small-scale movement patterns remains an important tool for analysis of movement and dispersal. θ1 l1 θ2 l2 Figure 1. Consecutive steps of a path represented by discrete moves, each with a step length, li, and turning angle, θi. To satisfy the assumptions of a correlated random walk, no correlations are allowed between parameters θ and l and autocorrelations are also prohibited. The distribution of θ may however deviate from uniform between –π and π. To satisfy the assumptions of a random walk (RW), no correlations or autocorrelations of turning angles and step length should be present, and a move is equally probable to occur in any direction. Deviations from RW can be implemented in specialized RW-models. A correlated random walk (CRW) allows for non uniform 6 turning angles, making some turning angles more probable. Turning angles are measured relative to the previous direction and the assumptions of the CRW are violated if turning angles are distributed such that they are more probable in a geographical direction (see figure 1). Turning angles are measured relative to previous direction and are commonly centered around zero, making moves continuing in the similar direction more probable. A CRW allows for relative correlations but is violated if absolute, for example geographical, bias exists. If a path is described by a CRW, the expected net squared displacement of the organism after steps, E(2 ), can be approximated by (Kareiva and Shigesada 1983) E(2 ) ≈ (E(2 ) + 2E()2 ) 1− (2) where E() is the mean move length, E(2 ) is the mean squared move length and is the average cosine of the turning angle. The approximation holds for ≫ 1 and symmetric distribution of turning angle, meaning that left turns and right turns are equally probable. It is noteworthy that the expected net squared displacement increases linearly with . For both RW and CRW, the spatial probability density of an organism converges to a Gaussian distribution as → ∞ (Okubo 1980). The resulting probability distribution of an organism’s spatial location relative to its start point (assuming movement on a two dimensional surface) is then (, | 2 ) = 1 2 2 2 −( + )/(2 ) 2 2 (3) where , are the spatial coordinates. Following for example Cain (1991), 2 = 2, where is the diffusion constant used in diffusion models, which may be given from E(2 ) via = E(2 )⁄(4). Living organisms are however not molecules and therefore they may not necessarily be modeled accurately by a random walk assumptions. The distribution of observed dispersal distances are commonly leptokurtic, and this is found for a wide range of taxa within both plants (Johnson and Webb 1989, Clark 1998, Clark et al. 1999, Nathan et al. 2003, Yamamura 2004, Skarpaas and Shea 2007), lichens (Marshall 1996, Walser 2004) and animals (Inoue 1978, Fraser et al. 2001, Schweiger et al. 2004, Walters et al. 2006, Coombs and Rodríguez 2007). Leptokurtic means that the distribution has a higher kurtosis than a Gaussian distribution. Making this 7 ecologically relevant, Clark et al. (2001, p.537) state that “…the leptokurtosis in the distribution also produces greater distinction between short- and long-range dispersal events”. The opposite, platykurtic distributions, are rarely found, but Holmes (1993) showed that such distributions may be expected by the mere fact that animals have a finite speed when moving. Various explanations have been proposed for the observed leptokurtic distributions. It is most commonly considered a result of a heterogeneous dispersal process (Hawkes et al. 2009) which may be due to behavioral differences (Dobzhansky and Wright 1943, Shigesada et al. 1995, Fraser et al. 2001) or a result of different external factors influencing the dispersal (Johnson and Webb 1989, Shigesada et al. 1995, Clark 1998, Clark et al. 2001). Other reasons include loss of individuals during dispersal (Schneider 1999) and temporal variation in the diffusion process (Yamamura et al. 2007). Figure 2. Kurtosis, , vs. movement step using the results from resampling method 4 in Paper VII. Here, was calculated as = ( )/(( )) , where ( ) is the mean fourth powered displacement and ( ) is the mean squared displacement. The kurtosis tends to = , which is the kurtosis of a two dimensional Gaussian distribution following the definition by raw moments in Paper V. In Paper VII we analyzed the movement pattern of the springtail Protaphorura armata and how the pattern changed in response to food and conspecifics. We used expected net squared displacement as a measurement of the animals tendency to stay in the area. Note that this is the same measure as the variance of the spatial kernel. 8 The shape of the spatial kernel was however not within the scope of this research and therefore not included. Figure 2 shows estimations of kurtosis of resampling method 4 and how it changes with time (one step has a fixed duration of one second). As becomes large, kurtosis tends to = 2, which is the kurtosis expected of a two dimensional Gaussian distribution (Clark et al. 1998, Paper V). While no formal proof is provided, this indicate that inclusion of higher order correlations in movement parameters in the resampling method change the prediction on displacement scale (measured as E(2 )) but not the shape (measured as ). Kurtosis is generally considered to be a measure of “shape”. More specifically, it is defined as the fourth moment divided by the square of the second moment, the latter being the variance. Symmetrical kernels (that is assuming equal probability of dispersal in every direction) are centered at the origin of the dispersal event for both one- and two-dimensional cases. Hence it makes sense to evaluate this property as “raw moment”, meaning that the moments are evaluated around the center rather than the mean. This distinction makes no differences in the one-dimensional case, but it aids the interpretation for the two-dimensional case. Clark et al. (1999) points out that there is no convention for calculating kurtosis for two-dimensional kernels, but states that if the kernel is symmetrical, bivariate measures of kurtosis are undesirably complex. They further argue that the moments should be defined by the distance from zero (rather than Cartesian x-y-coordinates), which is also consistent with the dispersal distance data often collected in field studies, and kurtosis calculated by such moments capture the desired measure of shape. It should be stressed that kurtosis is dimensionless. Any unit cancels out in the division of the fourth moment, with unit 4 , by the square of the second moment, with unit 2 . Thereby, the shape is not confused with the scale. In my opinion this makes the analysis of the spatial kernel by its moments superior to more subjective studies of long distance dispersal (LDD), where LDD events are defined by some fixed distance or some (arbitrary) percentages of the observed dispersal distances (Nathan 2006). Many studies have focused on the kernel characteristics, in particular in relation to invasion speed (e.g. Mollison 1977, van den Bosch et al. 1990, Shigesada et al. 1995, Kot et al. 1996). Kot (1996) summarizes that there are three types of spatial kernels that needs to be considered for studies of biological invasion: 1) Kernels with exponentially bounded tails. Invasions with such dispersal show a constant invasion rate where the range of the population expands linearly (or, equivalently, the square root of the area increases linearly). The kernels used in 9 Papers I, III, IV, V, VI and VII falls under this category for parameter ≥ 1 1 (i.e. ≤ 3 ). 3 2) Kernels without exponentially bounded tails but with finite moments. Such kernels give rise to accelerating invasions. The kernels used in Papers I, III, IV, 1 V and VI falls under this category for parameter < 1 (i.e. < 3 ) 3 3) Kernels without finite moments. These are extremely fat tailed kernels, and analytic approximations of invasion speed is only possible for special cases. The kernel used in the thesis cannot produce such behavior. For the latter, it may seem unrealistic that the dispersal distances are characterized by non finite moments, but Levy flights, which fall under this category, have shown good fit with observed data (Viswanathan et al. 1996, Schick et al. 2008). However, Skarpaas and Shea (2007, p.421) points out that “all realized dispersal distributions consist of a finite number of dispersal events and therefore have exponentially bounded tails”. Also, accelerating invasions may be the result of other factors than the lack of exponentially bounded dispersal distances. For example, recent studies of cane toad invasion in Australia conclude that the accelerating invasion speed is a result of heterogenic environments and different evolutionary pressures at the front of the invasion (Urban et al. 2008). Spatial kernels (though most often not referred to as such) have also been implemented in metapopulation studies. In an effort to introduce spatial realism, Hanski (1994) introduced the incidence function. In Hanski’s formulation, colonization between patches depending on distance is assumed to follow a negative exponential distribution (i.e. = 1 in the spatial kernel used in this thesis). Hawkes (2009) points out that while some authors, such as Heinz et al. (2006), have incorporated different functions, metapoulation studies have generally continued to utilize the negative exponential function. Hawkes also points out that the empirical evidence clearly shows that deviations from this should be expected, and one should not use the negative exponential function by default. 10 2.2 Kernel kurtosis and disease spread Many epidemiological studies of infectious diseases have implemented spatial kernels (Mollison 1977, van den Bosch et al. 1990, Mollison et al. 1994, Keeling et al. 2001, Ferguson et al. 2001, Boender et al. 2007, Boender et al. 2008). Mollison et al. (1994, p.120) states that “Key questions for spatial disease dynamics concern long distance contacts…” and they conclude that analysis of transport pattern is a promising approach. It is from this insight that the analysis in Papers I and III estimates the spatial kernel of between-herd contact probabilities via animal movements. In ecological studies, a Gaussian shape may (theoretically) be assumed, but mechanistically based assumptions regarding kurtosis of the kernels used for human induced contacts are difficult to obtain. There is no underlying random walk process, and therefore no reason to assume a Gaussian distribution. 2.2.1 Kurtosis as measurement of heterogeneity in the spatial process There are also no mechanistic reasons for assuming a (discrete or continuous) mixture of Gaussian distributions, as is often done in ecological studies (Hawkes 2009) when explaining leptokurtosis. In Paper III we however state that kurtosis can be interpreted as a measure of heterogeneity of the spatial process (such as dispersal, colonization or infection), even though we do not assume that there is a mixture of Gaussian processes. Intuitively this makes sense, and here I present a further analysis of the subject. Assume a mixture of processes, where process is described by a spatial kernel () (where is the distance) with kurtosis ̂ , meaning that all processes have the same kurtosis. Kurtosis is defined as the fourth moment, 4 , divided by the square of the second moment, 2 . Therefore, for each process ∞ ∫0 4 () 4 = = ̂ 2 (∫∞ 2 () )2 0 (4) Hence, 4 = ̂ 2 . The mixture process can then be summarized by a resulting mixture distribution , consisting of a fraction (where ∑ = 1) of every mixture component . Then the fourth moment of , 4 , is 11 ∞ 4 = ∫ 4 ∑( ()) = ∑ 4 0 (5) and the variance, ̅ , is given by ∞ ̅ = ∫ 2 ∑( ()) = ∑ . 0 (6) The kurtosis of is then given as ∑ 4 ∑ ̂ 2 ∑ 2 4 = 2 = = = ̂ (∑ )2 (∑ )2 (∑ )2 ̅ (7) Hence, we may conclude that > ̂ if 2 ∑ 2 > (∑ ) (8) (both and ̂ are positive). From equation 6 we know that ̅ is real (both and are real and positive for all ) and therefore ∑ ( − ̅ )2 ≥ 0 (9) Expanding equation 9 we obtain ∑ (2 − 2 ̅ + ̅ 2 ) ≥ 0 (10) and if this is rewritten as three separate summations, then ∑ 2 − ∑ 2 ( ̅) + ∑ ̅ 2 ≥ 0. (11) where ∑ 2 ( ̅ ) = 2̅ ∑ = 2̅ 2 and ∑ = 1). Equation 11 can then be rewritten as ∑ ̅ 2 = ̅ 2 ∑ = ̅ 2 ∑ 2 − 2̅ 2 + ̅ 2 = ∑ 2 − ̅ 2 ≥ 0. 12 (12) (since By the definition of ̅ in equation 6, this is equal to 2 2 ∑ 2 − (∑ ) ≥ 0 ⇔ ∑ 2 ≥ (∑ ) . (13) This is the essential relationship from equation 8 and we may conclude that any mixture distributions of components with the same arbitrary kurtosis ̂ will have kurtosis larger than ̂ unless = 1 for exactly one component (meaning that there is only one component) or = ̅ for all (meaning that all mixture components are identical). This supports the interpretation of kernel kurtosis in Paper IV, that kurtosis may be seen as a measure of the heterogeneity of the spatial processes. 2.2.2 The spatial kernel in Papers I and III Papers I and III analyses the spatial kernel of between-herd contacts, with the aim of making inference about disease spread modeling. In Paper I all farms are assumed to have the same characteristics and only one kernel is fitted to describe distance dependence. However, it is concluded that the data is better described as a mixture of distance dependent contacts (described by the spatial kernel) and non distance dependent contacts, the latter following mass action mixing (MAM) assumptions. This gives a better fit for both long and short distance contacts. In Paper III the contacts are described using a separate kernel for every combination of production types of the start and end herd. In this study, the MAM component is not included, mainly for simplicity. While exclusion of the MAM component in some sense is a simplification, many more parameters are used to describe the distance dependence. There are eight production types included in the study, and each kernel is defined by two parameters ( and ) adding up to 128 parameters (though not completely independent since hierarchical priors are used in the estimation). The spatial kernels are found to differ between production types, both in terms of the scale and shape aspects. Further, a good fit is found (by visual comparison with the data) when each kernel is fitted to describe the contacts between herds of two production types, rather than treating all herds as equal (as is done in Paper I). When the kurtosis of the kernels fitted in Paper III are compared to the kurtosis of a single kernel fitted in Paper I (from analysis excluding the MAM component), it is shown that most kernels fitted in Paper III have a lower kurtosis. Hence, a lower kernel kurtosis is found for contacts between herds of two specific production types, which can be explained by 13 this being a more homogenous process. This is well in line with the interpretation of kurtosis as a measure of heterogeneity of the spatial process. 2.2.3 Kurtosis and stochastic disease transmission High kurtosis may also cause stochastic disease transmission dynamics (Molison 1977). If there is a strong spatial component such that infections are more likely to occur at shorter distances, local susceptibles will rapidly become depleted. Long distant transmissions, described by the fat tail of the leptokurtic kernel, may then transmit the disease to distant, virgin areas. When this happens, there will be a rapid increase of infected units in the new area and Keeling et al. (2001) states that such dynamic was observed in the UK 2001 outbreak of FMD. The kernel shape is therefore an important feature for disease spread modeling, but it may be difficult to make assumptions about this characteristic. Hence, rather than assuming a fixed shape, the joint distribution of both shape and scale should be estimated, as is done in Papers I and III. 2.3 Relative and absolute distance dependence It is of course possible to study the spatial aspect of for instance between-herd contacts without implementation of a spatial kernel. For instance, in Nöremark et al. (2009a) we concluded that the large amount of long distance animal movements between herds support an initial total standstill (meaning no animal movements allowed) in all of Sweden, rather than a regionalization, in the case of an outbreak of for instance FMD. However, it is not straightforward to implement observed distances directly for modeling purposes. This is particularly the case if the herds are not randomly distributed, which is found for both cattle and pig herds in Sweden. These are mainly located in the south and along the coast. Further, in Paper VI it is concluded that the distributions of herds of both species show a non random distribution over multiple scales. Figure 3 shows the distribution of between-herd distances of Swedish pig herds (data from Paper III). This would be the expected distribution of movement distances if there was in fact no distance dependence. Thus, merely analyzing the observed distances may give the false impression that there is a strong spatial component, when in fact the movements only reflect the distribution of the herds. 14 Figure 3. The distribution of distances between Swedish pig herds. The actual distance dependence in the probability of contacts between herds clearly needs to be separated from the spatial distribution of herds. This reasoning also holds for ecological studies, and the observed non random distribution of trees presented in Paper VI stresses the importance of separating this distribution from the dispersal distances when modeling dispersal in this system. In Paper VI spatial kernels are implemented in two different ways, corresponding to different assumptions on the spatial processes. These are referred to as relative and absolute distance dependence, respectively. In the latter, the distribution of other patches (I here use the term patch as in Paper VI to refer to both habitats in ecological settings and infective units in epidemiological settings, and I will use the term landscape to refer to patch distribution) does not influence the probability of infective contacts or dispersal events (I will refer to either of these as colonization) between two patches. The probability of colonization from one patch to another depends only on the distance between the two patches. In relative distance dependence, the colonization probability is also dependent on other patches and the total colonization probability from a patch is distributed over all other patches. This approach is used in the modeling of animal movement contacts (Papers I, III and IV). If absolute distance dependence is used instead, it would correspond to the assumption that more animal movements are shipped from herds in areas with many other herds. In my opinion, the 15 assumption of the relative distance dependence is better fitted to epidemiological studies focusing on disease transmission via animal movements. The assumptions of absolute distance dependence better agrees with passive dispersal in an ecological setting (and also with wind dispersal of pathogens). For instance, seed dispersed by wind will land in some location independent of the distribution of suitable habitat patches. The probability of dispersal decreases with distance, but the parent plant cannot control weather the seed lands in a suitable habitat or in a hostile matrix where it is unable germinate. Hence, the distribution of other patches does not influence the probability of colonization from one patch to another and a higher colonization rate should be expected in areas with high density of patches. Animals with active dispersal can be argued to follow both assumptions. As they may continue to move until they encounter a suitable habitat patch they could be assumed to follow the suppositions of the relative distance dependence. However, unlike the trucks moving animals between herds, there is a probability that the animal does not find a suitable patch, in particular if there are great distances involved. Hence mortality during dispersal needs to be taken into account. While general models, such as those used in Papers V and VI, may be used to provide valuable information about population processes, any model that is applied to a specific system needs to be modified to incorporate relevant factors for that particular system. It is also noteworthy that the distinction between absolute and relative distance dependence may have small effect if patches are randomly distributed, but as mentioned above this is not the case in any of the distributions analyzed in Paper VI. Figure 4 shows how absolute and relative distance dependence may differ in their predictions of colonization depending on the patch distribution. This can be related to the results in Paper VI, where it is shown that in random landscapes the kernel kurtosis is not important for modeling colonization processes, and this result also holds for non random landscapes when using the assumptions of relative distance dependence. However, if absolute distance dependence is assumed, the kernel kurtosis becomes highly important in non random landscapes. Further, if kurtosis is interpreted as a measure of heterogeneity in dispersal as shown above, this means that individual differences within a population may be highly important in heterogeneous landscapes, when dispersal follows the assumptions of absolute distance dependence. 16 Figure 4. Demonstrating the differences between colonization modeled with absolute and relative distance dependence. Colonization was simulated from the encircled patch (left column) for 10000 steps, using kernels with = . and = . (relative to the unit square). Kernels were normalized such that the expected probability of colonization in a random patch distribution was 10% (in the relative distance dependence it was exactly 10%) when all other patches were unoccupied and 2000 patches were used. The right column shows the frequency of colonization distances simulated with absolute (black bars) and relative (white bars) distance dependence in the corresponding patch distribution in the left column. In the random distribution (top row) there is little difference between the relative and absolute distance dependence. In the non random distributions (second and third row) the simulated distances differ such that the relative distance dependence leads to a higher number and more long-distance colonization when simulated from an isolated patch (second row), and the reversed is found if colonization is simulated from a patch in a dense area (third row). 17 Figure 5. The marginal joint posterior probability density of variance and kurtosis of the kernel estimated in Paper III for movements from Piglet producers to Fattening herds. One might interpret this such that the kurtosis did not need to be included in the analysis of Paper III. There are however three important factors why this needs to be included. First, the stochastic aspect is not included in Paper VI. As mentioned in Paper IV, we should expect more stochastic dynamic when the kernel is leptokurtic. Secondly, the information can be useful when the specific contact probabilities of individual herds are of interest (see section 5.3). Thirdly, the estimation of the variance and kurtosis is not independent. As demonstrated in figure 5, there can be a clear correlation in the posterior probability of these measures. If the kernel shape is assumed beforehand, one will likely obtain an erroneous estimate of the variance. A joint probability of the scale and shape should therefore be estimated. 18 3 The spatial context The studies in Papers I, III, IV, V and VI concern modeling of organisms in a spatially explicit context, meaning that the location of every object of interest (in these studies habitats and/or farms) has a specified location in a heterogeneous landscape (Dunning et al. 1995). Spatially implicit models may be appropriate for studying some population processes, mainly when the key interest is the effect of subdivided populations or communities. These models have been used to study for instance source/sink dynamics (Pulliam and Danielson 1995), population synchrony (Ruokolainen et al. 2009), food web stability (Holt 2002) and drug resistance at hospitals (Smith et al. 2004). However, spatially implicit models suffer from lack of realism and are unable to incorporate important factors such as distance dependence in the process studied (Dunning et al. 1995, Débarre et al. 2009). Furthermore, one of the key features of spatially explicit models is the ability to include the spatial arrangement of a landscape. This is a central issue in Papers V and VI, where the interplay between landscape features and kernel properties are considered. Spatial processes are modeled with neutral landscapes, defined by Keitt (2000) as landscapes where the value at any point in the landscape can be considered random. Keitt also pointed out that this does not exclude models with spatial autocorrelation. In the studies presented in both Papers V and VI, spatially explicit landscapes are generated with a spectral density approach. This is given as a two dimensional extension of spectral time series analysis. Any time series can be represented by a set of cosine terms and a random time series is made up of terms where there is no relationship between the amplitude and the frequency, and such time series is referred to as white noise. If instead there is a relationship such that low frequencies have higher amplitude, a smoother time series is obtained and this is referred to as red noise. Figure 6 shows schematically the relationship between the cosine terms and the resulting time series obtained by summation of the terms. A measurement of the noise color can be obtained by plotting log(frequency) vs. log(amplitude) and calculating as the negative slope of a line fitted to the plot. Hence, a large means that the time series is autocorrelated (red) and if = 0 the time series is random (white). A negative (referred to as blue noise) means that there the high frequencies are dominant and a negative autocorrelation is obtained. Analysis of the spectral density by assumes a linear relationship between the log(frequency) and log(amplitude) and is usually referred to as 1/f noise. 19 Figure 6. Describing how different relationships between amplitude and frequency produce different time series characteristics. The left columns show sine waves where there is no relationship between frequency and amplitude of the waves (top row) and waves where high frequency waves have a lower amplitude (bottom row). The summation of the waves are shown in the right column and in the lower row the low frequency waves dominate the pattern and yield a smother time series. 3.1 Segmented landscapes The spatially explicit studies presented in this thesis all focus on two dimensional space, and Keitt (2000) show how the same approach may be used to analyze and generate two dimensional grid data. In Paper V, the continuous values of the generated grid are digitalized to obtain a segmented representation of the landscape, such that it consists of cells that are either habitat or matrix. The value of then determines the arrangement of the habitat cells, such that they are randomly 20 distributed or show a clustered pattern. A different way of generating such pattern is the algorithm developed by Hiebler (2000), which rearranges habitat patches such that a grid landscape is obtained, where every cell has a specified probability of neighboring cells having the same property (that is habitat or matrix). Conceptually (as well as methodologically) these methods differ in that Hieblers nearest neighbor method considers the spatial autocorrelation at a specific scale, while the spectral density approach considers the overall structure of habitat arrangement. 3.2 Point pattern landscapes For some studies, it may be more convenient to represent the landscape by the coordinates of the patches as a point pattern. In Paper VI, a spectral density approach is used to generate neutral point pattern landscapes (NPPL) and also to analyze empirical distributions. Figure 7 show the procedure of generating the NPPL. First, a grid surface consisting of randomly distributed random numbers is generated. Secondly, by considering this grid in the spectral domain, the spectral density function of the surface is filtered by introducing a required (introducing a relationship between log(frequency) and log(amplitude)), then transforming the spectral domain into a new grid. This step introduces the spatial autocorrelation measured as continuity. In the third step, the contrast of the NPPL is introduced. This is to be interpreted as a measure of how different dense and sparse areas are, while the continuity measures how areas with similar density are located relative to each other. Landscapes with high continuity are characterized by large areas of similar patch density. The values of the grid are transformed by a method based on spectral mimicry, as introduced for time series by Cohen et al. (1999). This involves replacing the values in the grid by values from a wanted distribution, thereby controlling the mean and variance, but at the same time keeping the autocorrelation structure as introduced by . Cohen et al. substitute the values of the grid, which are normally distributed, with random values from another a normal distribution of. In Paper VI, the method is instead used to substitute the values of the grid with random values from a gamma distribution. A normal distribution has negative values, and this is not desirable if the surface is to be used as a probability density for patch distribution. By transforming the grid values to obtain gamma distributed values, it is possible to get a grid with wanted coefficient of variation, and at the same time avoiding negative values. The reason for introducing the third step is that it was discovered that the use of a normal distribution could not 21 produce NPPL with sufficiently high contrast values as found in the observed landscapes (particular in the analyzed distributions of trees). In the final step, the patches are distributed according to the surface obtained by step three. Based on a method introduced by Mugglestone and Renshaw (1996), the contrast and continuity values may be calculated on the point pattern distribution. The values differ from the input values of and coefficient of variation, and the relationship between input and output values is found iteratively. Thereby it is possible to obtain NPPL with comparable characteristics to empirically observed point patterns. The assumed linear relationship between the amplitude and frequencies implies a self similarity over different scales, and therefore the pattern may be defined as a fractal. This means that the same pattern found at smaller scales is also found at larger scale. The analyzed data of both farms and tree distributions show a good fit with the linear assumptions (figure VI-6). Halley et al. (2004) list several processes that may lead to fractal spatial patterns. Two of these are relevant to consider for the analyzed distributions. First, power law dispersal (such as Levy flights) may cause such patterns. If the considered tree species have such highly leptokurtic dispersal, it could explain the fractal distribution. Secondly, Halley et al. state that observed fractal distributions may just reflect some other underlying fractal distribution. If suitable farmland has a fractal distribution, it may explain a fractal distribution of farms. Likewise, fractal distributions of the considered tree species may be explained by a fractal distribution of suitable habitats. However, both the observed tree species and farm distributions are the results of several different processes, including geological, ecological (mainly for the tree species) and multiple anthropogenic factors. It may not be expected that these factors together influence the distribution in similar fashion on several scales. As further pointed out by Halley et al., a log-log plot may give a faulty impression of linearity, and one should be careful when drawing conclusions about inherent fractal properties of observed patterns. We do however in Paper VI conclude that the linear assumption produce a good fit. In instances where this is not found, one may use a different function to estimate the relationship between the amplitude and frequency. This does however require additional parameters, and may make the interpretation of these parameters more difficult. 22 Higher Continuit y f iltering Low er Continuity f iltering Low Contrast transformation High Contrast transformation High Contrast transformation Low Contrast transformation (1) (2) (3) (4) Contrast: 2.2 Continuity: 1.7 Contrast: 4.9 Continuity: 2.1 Contrast: 1.5 Continuity: 1.1 Contrast: 4.2 Continuity: 1.0 Figure 7. Illustrating the method used to create landscapes with different patch arrangement. 1. Generating a surface of random values (normal distributed). 2. Filtering to obtain surfaces of different Continuity. This measure the tendency of nearby areas to have similar patch density. 3. Transforming into gamma distributed values to incorporate Contrast and avoid negative values. Contrast is calculated as the coefficient of variation of patch density and is a measurement of difference between dense and sparse areas. 4. Distributing habitats in proportion to the surface generated in step 3. 23 Ripley (2004) gives two other basic methods for generating non random point pattern landscapes. Firstly, the cluster process involves initially distributing a given amount of clusters at random locations, and subsequently distributing patches around these points. This method may be difficult to fit to observed patterns, in particular when the observed pattern does not show a distribution of clearly distinct clusters, as is the case with the analyzed patch distributions in Paper VI. Secondly, a point pattern may be generated as a Poisson process with varying intensity. Usually this is done to generate landscapes with a trend in a specific direction, and this approach may be well fitted for situations where such gradient is an important part of the study. One limitation of the analysis method presented in Paper VI is that it does not provide any information about the underlying process causing the observed pattern. This is however also a benefit. As the method for generating NPPL relies on few assumptions (except for the assumption of linearity in the frequency domain, which may be relaxed if required), the method is well suited for testing the effect of observed landscape characteristic by replicating observed distributions. While some iteration is required to get the relationship between the input parameters and the analyzed parameters of the generated NPPL, there is also a clear relation between these parameters. Further, continuity and contrast may be understood as empirically relevant features of a landscape, both relating to habitat fragmentation. This is a major issue in conservation biology, and usually refers to the process of habitat loss resulting in remaining habitats that show a fragmented pattern (Fahrig 1997). By looking at the generated NPPL in figure 7 it can easily be seen that a single measure of the fragmented state of a point pattern landscape is not sufficient. Both continuity and contrast provide important information about the distribution of habitat patches. Another benefit of the spectral density approach is that it may be extended to include anisotropy, meaning that spatial relations of patches (or continuous habitats for situation where this is a more relevant description) are not independent of the direction. One might expect to find such a pattern if the observed distribution is caused by some underlying process with different forces working in for example a north-south direction than an east-west direction. 24 4 Modeling of between-herd contacts Four of the papers included in this thesis focus on disease transmission between livestock herds via live animal movements. Different pathogens may be transmitted between herds via different routes, such as other farming related contacts, wild animal vectors and wind. Relocation of infected animals is however often considered a major risk factor (Févre et al. 2006, Ortiz-Pelaez et al. 2006, Rweyemamu et al. 2008) and a good description of such contacts is essential for modeling of most livestock diseases. 4.1 Data The data utilized in the studies presented in Papers I – IV are based on reports to SJV. EU legislations state that member states must keep databases on all livestock holdings and register movements of cattle and pigs. Many countries also include more information than the minimum requirements. In Sweden movement data is reported by the farmer, and for cattle movement the data is reported by the farmer at both the receiving and sending holding, while pig movements are only required to be reported by the farmer receiving the animal shipment. Cattle movements are reported at the individual level, using a specific identity number for each animal, whereas pigs are reported at the movement level. When pig holdings are registered in the database, the farmer is required to submit a map showing the location of the holding, as well as the production type and the maximum capacity of fattening pigs and sows, respectively, (Anonymous 2010). No such requirements exist for registering of cattle holdings. The coordinates found in the database are instead approximated from a register of land use (Nöremark et al. 2009a). The entries in the databases are derived from reports by the farmers and some degree of erroneous and missing reports are expected. The data was therefore edited, as described in Nöremark et al. (2009a), and cattle movements with inconsistent reports were removed (for instance an animal movement reported by the farmer sending the animals could not be matched with a movement reported by the receiving farmer). Such data editing could not be performed on the pig movement data as this is based on single reports. Also, inactive holdings are supposed to be removed from the database, but since this is not always done, holdings that had not reported any 25 movement of animals (including movements to slaughter houses) or births or deaths for a one year period were removed. The production type and maximum capacities are reported for a pig holding when it is first registered, and the information is not necessarily updated if changed. Hence, the information registered for a holding may no longer be accurate. While databases may provide important information, data quality will always be a major issue of concern. This needs to be accounted for when conclusions are drawn from the analysis and used for risk assessment. The database could be used more efficiently in risk assessment if data quality was improved. Mainly, updates of data entries would be particularly useful, as well as more detailed instructions to the farmers on how to report information production types. For instance, Nöremark et al. (2009b) showed that different farmers had somewhat different interpretations of the production types. 4.2 Overview of Papers I – IV Papers I – IV presents the development of the contact model. More details are added throughout this series of papers, and I here give an overview of how the model progresses. Paper I focuses exclusively on the distance dependent aspect of betweenherd animal movements. Previous studies have reported that contacts between holdings occur more frequently between nearby holdings (Boender et al. 2007, Robinson and Christley 2007) and this is expected to influence the disease transmission dynamic (Tildesley and Keeling 2009). In Paper I, two models are used and compared in their ability to describe the probability of movements of cattle and pigs between herds. The first model is based on the spatial kernel described in section 2 and highly leptokurtic distributions are estimated for both species. The second model is a mixture model, where contact probability is modeled as arising from a mixture of the spatial kernel and a MAM component. By comparing the posterior probabilities of the model, it is shown that the mixture model is superior in describing the observed contacts. Also, by simulating contacts with each model and comparing the obtained structure with a set of network measures, it is shown that the models are expected to differ in predictions of disease transmission. Paper II is the only spatially implicit study included in this thesis. Herds are considered as spatially separate, but their explicit locations are not included. The paper focuses on the production type structure of pig holdings (production types of cattle holdings are not included in the database), and the probability of animal 26 movements between holdings are modeled as dependent on this structure. As many holdings have more than one production type, contact probabilities are modeled by the assumption that holdings behave as a mixture of the types. Rather than assuming equal proportions of each type, the dominance of different production types in determining the contacts of a holding is estimated. Pig farming in Sweden mainly follows a breeding pyramid structure (Anonymous 2009), as illustrated in figure 8, and animals are expected to mainly be moved from holdings at the top of the pyramid and downwards. The pyramid includes four of the seven production types from Paper II. At the top of the pyramid are the Nucleus herds, where breeding is performed. These herds are then expected to send animals to Multiplying herds, which produce gilts. These are mainly expected to be moved to Piglet producers, and the piglets are then sent to Fattening herds. Nucleus Herds Multiplying herds Piglet producers Fattening herds Figure 8. The breeding pyramid of Swedish pig farming. Animals are expected to mainly be moved downwards in the pyramid Farrow-to-finish herds are not included in the pyramid as the whole chain of production is integrated within the herd. Often, however, holdings with this type are concurrently operating as Piglet producers (Anonymous 2009). The Sow pool system is also not included in the pyramid. A Sow pool consists of a center, where sows are covered or inseminated, and a number of satellites. The sow is moved to the satellite for farrowing and subsequently moved back to the center. See Paper II for more details. 27 In general, the results presented in Paper II are in agreement with the expectations from the structure of the pig farming industry. The predicted contact pattern generally follows the expectations of the breeding pyramid and the Sow pool system, and also the analysis showed that Farrow-to-finish herds are estimated to have few contacts. However, there are unexpectedly many movements predicted between Sow pool centers, and it is concluded that this result may be a result of erroneous reports in database. Paper III combines and builds on the analysis of Papers I and II, and the analysis is further expanded to include herd sizes. The production type structure is modeled as in Paper II and the probability of contacts depending on between-herd distances and herd sizes is modeled as being dependent on the production types. The MAM component of the distance dependence is however not included, and this is removed for two reasons, the first being for simplicity. Given the large amount of parameters used in the model presented in Paper III, it is indeed valuable to reduce these. Secondly, describing the distance dependence strictly by the spatial kernel promotes a more transparent interpretation of the results. One of the aims of Paper III is to compare the parameters estimated for different production types. The variance and kurtosis describes the scale and shape, respectively, of the distance dependence of the contact probabilities. Adding a third quantity (the amount of MAM) makes comparison of these features more difficult. The importance of having quantitative measures for the scale and shape of distance dependent contact probabilities is well illustrated in this paper. These quantities allows for comparison of the spatial component of the disease transmission via the analyzed contacts. Further, the included simulation study demonstrates that the scale (measured as 2D-variance) estimated for different production type combinations also is expected to result in differences in disease transmission at different distances. The data is inherently weak for spatial kernels describing the distance-related probability of movements between holdings of production types that are not commonly in contact. To improve the estimates of these contacts, hierarchical priors are implemented. This involves a prior, with a set of hyper parameters that are estimated by the parameters, rather than specified beforehand. The parameters are concurrently influenced by the hierarchical prior, and thereby estimates of parameters are indirectly influenced by each other, a concept known as borrowing strength (Gelman et al. 2004). Figure 9 illustrates this concept. This approach is particularly appropriate when one expects that the parameters are not entirely independent, which makes sense in the case with the spatial kernels used for 28 modeling of between-herd contacts. Parameters based on little (or even no) data will be highly influenced by the hierarchical prior, while parameters estimated from strong data is only moderately influenced by this. Without the implementation of hierarchical priors, rare contacts can be estimated to have a resemblance to MAM, and therefore an overestimated influence on long distance disease transmission. Figure 9. Schematic description of hierarchical Bayesian modeling. Posterior distribution of parameters θ1, θ2,… θn are estimated from the data and hierarchical prior ψ. The latter is estimated by the parameters θ1, θ2,… θn (and a hyperprior, not shown in picture). Thereby, the estimation of any parameter θk is influenced by the estimates of other parameters. This concept is referred to as borrowing strength. In Paper IV, the importance of the parameters included in the model presented in Paper III are investigated and related to general theory about disease transmission dynamics. Simulations are performed where the influence of different parameters are removed. It is found that the production type structure is the most important factor, and this reduces the mean epidemic size compared to a model where disease transmission is simulated by MAM assumptions. Further, the production type structure is shown to induce a largely stochastic behavior of an outbreak. While the number of early infections is estimated at lower numbers in the full model compared to MAM assumptions, it should be stressed that this is under the assumption that the index herd is of arbitrary production type. If the index herd is of a 29 type with many contacts, for instance high up in the breeding pyramid, there will be a rapid increase in the number of infected herds. This is illustrated by figure 10. Figure 10. Examples of individual simulations from Paper IV with Full model (black line) and MAM model (grey line) the number of infected herds plotted vs time. In this example, the index herd had many contacts, and there is a rapid increase of the number of infected herd compared to a MAM model. 4.3 Use of the model in risk assessment The long-term aim of the model developed in Papers I – III is to develop a method for simulation of disease transmission. Therefore, the general contact pattern is estimated based on the observed contacts between herds with different characteristics and locations are in contact via animal movements. Papers III and IV demonstrate how this model can be used to make general predictions of disease transmission via animal movements. It is shown that differences depending on the index herd are expected, both in the number of contacts as well as in the distance at which these contacts occur. Another possible implementation of the model is to provide more detailed information on risk assessment of individual herds. Suppose an infectious disease is discovered at a herd, hereafter referred to as the index herd. In order to control the outbreak, it is of great importance to map out the contacts of the index herd. The database kept by SJV can be used to trace any reported movements from a herd, but delayed reports are common (Nöremark et al. 2009a). The model presented in Paper III is parameterized 30 such that it describes the general contact pattern of a herd, given its location and characteristics. By simulating animal movements from the index herd, it is possible to obtain a picture of which herds are most likely to have had contacts with the index herd, including secondary and higher degree infections. Figure 11 shows the result of such simulations. Note the different patterns arising depending on the production type of the index herd. It should be stressed that this is not intended to be a substitute for classic disease tracing. It may however be used as a complement, for example as prior information on the risk of herds becoming infected via animal movement. It should also be pointed out that further developments of the model may improve the ability to predict contacts between holdings. 31 Figure 11. Results of simulation of disease transmission with the model, parameterization and simulation method presented in Paper IV. Each dot shows the location of a herd and colors indicate the probability of infection. Each map show the result of 9000 simulations for a time period of 28 days, starting at the encircled, black herd, which is reported with production type (from left to right) Nucleus herd, Multiplying herd and Fattening herd, respectively. Arrows point out the location of herds with the three highest probability of becoming infected and annotate the estimated probability. 32 5 Bayesian analysis and Markov Chain Monte Carlo Papers I, II and III focus largely on parameter estimation. Bayesian inference is utilized and implemented with Markov Chain Monte Carlo (MCMC). MCMC is a flexible analytic tool that allows for probabilistic parameter estimation of complex models in the presence of parameter uncertainty. A Markov Chain is a discrete random process where the current state of the chain only depends on the previous and Monte Carlo is a term used for algorithms that repeatedly draw random numbers from a probability distribution. It may be used for numerical integration based on random numbers, and this integration is required in order to perform Bayesian inference. The idea of the MCMC is to simulate (correlated) draws from the posterior distribution of the parameters. The chain is initialized with initial parameter values (in some parameter space where the posterior density is larger than 0) and updated repeatedly within the Markov Chain, with the parameter values of the next state depending only on the current parameter values. For multi-parameter models it is generally convenient to perform updates of one parameter at the time within each step of the chain, yet the mixing may be improved by joint updates if there are strong correlations between parameters. Updates are typically performed by one of the following two methods. If the distribution of some parameter conditional on the other parameters, , is of a standard distributional form, call it ( |) , then Gibbs sampling may be implemented. This means that the parameter is updated by a random draw from the distribution ( |) . If the conditional distribution of is not of a known form, updates are commonly performed with Metropolis-Hastings steps. This involves proposing new values ̇ given the current value from some distribution, (̇ | ), and subsequently accepting or rejecting it with probability (1, (̇ | … )( |̇ ) ), ( | … ) (̇ | ) (14) where (| … ) is the posterior probability of . Commonly, random walk proposals are utilized, and (̇ | ) = ( |̇ ) in the case of symmetrical random walk. If the proposed value is accepted, then +1 = ̇ . If ̇ however is rejected, then +1 = . As the MCMC is typically initialized with some arbitrary value, the initial updates, known as the burn-in, needs to be removed. Only after converging does the MCMC properly sample from the posterior density. While there are analytic tools to aid the 33 assessment of convergence (e.g. Venna et al. 2003, Gamerman and Lopes 2006), visual inspection is commonly used. It is however important to try different initial parameter values to ensure that the MCMC converges to the same region of high posterior probability, independent of the initial values. Figure 12. The first 5000 updates of the MCMC estimation of the marginal posterior densities of ̇ in Paper III. SPC – Sow Pool Centers, SPS - Sow Pool Satellites, FH – Fattening herds, FF – Farrow to finish, NH – Nucleus herds, MH – Multiplying herds, PP – Piglet producers, MI – Missing information. Figure 12 shows the initial updates of ̇2 in one Markov Chain used in Paper III. In this example, the simulation was initialized by setting all parameters ̇2, = 0. The 34 chain appears to have stabilized after about 300 steps. To avoid any confusion, it needs to be pointed out that a much longer chain than shown in figure 12 is used in Paper III. For the full analysis we simulated 7 parallel chains, each with 500000 iterations, and discarded the first 100000 updates as burn-in. Other parameters showed poorer mixing and required a longer burn-in. In both Gibbs sampling and Metropolis-Hastings updates, the current parameters are given from the previous step, making the random draws from the posterior distribution inherently autocorrelated. Hence, it is important to simulate a chain long enough to ensure that the chain explores the whole posterior density of parameters. High autocorrelation means that there is poor mixing and a longer chain has to be simulated. Hence, implementation of MCMC techniques may require substantial effort to improve the mixing. While symmetric random walk proposals, as used in the original algorithm (Metropolis et al. 1953), are often sufficient, mixing may be improved by having a fixed proposal distribution, called the independence MCMC sampler. This should ideally be similar to the posterior distribution of the parameters, but it needs to span the whole posterior distribution (or it will not sample the whole posterior). Naturally, this may be difficult to know beforehand. In Paper I, such a distribution is created by simulating an initial Markov Chain with random walk Metropolis-Hastings proposals and subsequently fitting a multivariate Gaussian distribution to the sampled values of and (see Paper I for details on the parameters). To ensure that the distribution is wide enough, the scale is doubled. 5.1 A small bibliographic study This section provides a small bibliographic study in order to show the trend of Bayesian methodology and MCMC techniques in the research fields that are the focus of this thesis. While the estimations in Papers I – III concerns description of animal movements in the context of disease spread modeling, this thesis also contains ecological modeling with partly similar features. An analysis for this field is therefore also included. The ISI Web of Knowledge (http://apps.isiknowledge.com) allows for tracking changes in research topics over time by analyzing relevant publications. The aim in this section is to show the trend of MCMC and Bayesian implementation in the fields of ecology and disease spread research. For ecology, publications published between 1975 and 2009 were selected by searching for Topic=(bayes*) and Topic=(MCMC) OR Topic=(Markov Chain Monte Carlo), respectively. Topic terms performs search of Title, Abstract and Keywords. To obtain ecological 35 publications only, the results were filtered by Subject Areas=(ECOLOGY) and selected relevant journals. The following journals were selected: TRENDS IN ECOLOLOGY AND EVOLUTION, ECOLOGICAL MONOGRAPHS, ECOLOGY LETTERS, ECOLOGY, AMERICAN NATURALIST, ECOLOGICAL APPLICATIONS, JOURNAL OF ECOLOLOGY, JOURNAL APPLIED ECOLOGY, OECOLOGIA, OIKOS, ECOGRAPHY, JOURNAL OF ANIMAL ECOLOGY, ECOLOGICAL MODELING, THEORETICAL POPULATION BIOLOGY, and PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON SERIES B. For disease spread studies it is less apparent which relevant journals to select as these may be found in journals focusing on quite different topics. The publications were therefore selected based only on topic search as Topic=(bayes* disease spread) OR Topic=(bayes* infect* disease) OR Topic=(bayes* epidemi*) and Topic=(MCMC disease spread) OR Topic=(MCMC infect* disease) OR Topic=(MCMC epidemi*), respectively. While using different search criteria may change the specific results, it is unlikely to change the general trend (which is the focus of this bibliographical study). As shown in figure 13, few ecological or disease spread studies using Bayesian or MCMC methods were published in either ecological or disease spread literature prior to the mid 90’s. Much of this may be explained by the development of high performance computers. Access to fast computers has certainly made the use of Bayesian inference and in particular MCMC methods possible, leading some authors (Spiegelhalter 2004) to describe the recent trend in epidemiology as an MCMC revolution. It is theoretically possible to estimate the parameters of any mathematical model as long as a likelihood function and prior can be formulated, yet computation time and convergence may be a problem (O’Neill and Roberts 1999). While Bayesian inference is used in only a minority of the published papers within both studies of disease spread and ecology, the trend shows a clear increase. Since Bayesian statistics has some useful properties (see below) it is likely that this trend will continue. Implementation with MCMC relies on computational speed, and with faster computers, a continued increase in the use of MCMC should be expected. 36 Figure 13. Publications per year with (top panel) MCMC or Markov Chain Monte Carlo and (bottom panel) Bayes* included in title, abstract or keywords. Publications were selected through ISI Web of Knowledge. 5.2 Bayesian and frequentistic statistics It is not within the scope of this thesis to penetrate the deep philosophical differences between Bayesian and frequentistic methods of inference. A profound discussion of these two frameworks and their relation to the concepts of probability as well as theory and models may be found in for instance Dawid (2004). Computational 37 advances are however not the only reasons why researchers are increasingly turning to Bayesian methods. A growing skepticism against the use and relevance of for example p-values has lead to a still ongoing debate within both ecology and epidemiology over which method is preferable (Dixon and Ellison 1996, Dennis 1996, Burton et al. 1998, Dunson 2001, Clark 2005, Cripps 2008). In this debate, Bayesian inference has received much criticism regarding the subjectiveness of the prior. It is indeed true that a prior needs to be included in order to obtain a posterior distribution (of parameters or models) as given by Bayes theorem. However, as pointed out by Dixon and Ellison (1996), not all Bayesians use subjective probabilities. Rather, by using uninformative priors, the researcher may let the data inform the belief about the parameter(s) or model(s). While the prior may not be uniform between −∞ and ∞, it may be uniform on a limited parameter space, or as stated in Paper III, the prior is “proportional to one on the support of the parameters”. One clear advantage of the Bayesian methods, and indeed the main reason why this methodology is used in Papers I – III, is of a pragmatic nature. It is a flexible framework that allows for parameterization of more complex mathematical models than is possible with classic frequentistic methods (Dunson 2001, Clark 2005). I am not aware of any frequentistic method that would allow for parameterization of for example the model presented in Paper III. Moreover, Bayesian statistics may be favorable also for simpler models as it, if required, allows for extension of the model to include further complexity. Another pragmatic issue is however the possibility to communicate results to other researchers. While Lee (2004) argued that the interpretation of Bayesian statistics is more intuitive, most scientists are more familiar with the frequentistic concepts such as confidence intervals. Clark (2005) pointed out that their Bayesian counterpart, the credibility intervals, are most often very similar given vague priors and similar assumptions and for simple analyses little insight might be gained by switching to Bayesian inference. In addition, while Bayesian methods allow for fitting of more complex models, implementing these methods may require extensive computational work (Clark 2005, Lawson and Zhou 2005). It is therefore, I believe, justified to use a frequentistic method if it answers the question at hand. Dennis (1996, p.1101) argued that “Bayesian and frequentistic statistics cannot logically coexist”. I would however argue that such radical statement misses the question of why and how a study is performed. The choice of epistemological framework should be based on what type of data is available, and what the purpose of the study is. If predictions and testing different scenarios are the central issue, as is often the purpose of applied studies of 38 both epidemiology and ecology, then I believe that Bayesian inference is the method of choice. This is agreed upon by for instance Berger (2003), who concluded that for epidemiological studies, frequentistic statistics may be preferred in controlled experiments, while Bayesian methods are superior for creating public policies. Also, Ellison (2004) argued that the Bayesian iterative process of gaining more information through several experiments and including the result of previous studies as prior information has proven successful for adaptive management in ecology. A final issue supporting the use of Bayesian statistics for the models in Papers I – III is of an epistemological kind. Frequentistic statistics is based on the probability of observing an outcome given the model (and model parameters) and if the outcome is too improbable, the model is falsified and discarded. Bayesian statistics is instead concerned with the probability of the model (and model parameters) given the observed outcome. Models may be compared by for instance the ratio of the posterior probabilities of the models, as is done in Paper I. The frequentistic approach may be well suited if the purpose is to find the mechanism of the underlying process causing the observed outcome. However, for such complex processes as the system of animal trading in the livestock sector, it is not realistic to expect to capture something even close to a “true” model. Where and when an animal shipment is sent from one holding to another depends on a wide range of factors. These include social and economic aspects, legal issues (such as animal welfare legislations), population growth, logistics and indeed highly unpredictable human behavior. Hence, we are mainly confined to phenomenological modeling and should aspire to capture the important features of the observed pattern. Parameter uncertainty is expected in such modeling and this uncertainty would be disregarded if parameters are estimated with a maximum likelihood approach. If the full posterior is estimated, the uncertainty can be incorporated in simulations or when inference is made directly from the parameters. 39 6 Conclusions and recommendations for further research This thesis has contributed to both ecological and epidemiological studies by improving theoretical understanding of spatially explicit spread of organisms. Further, models and analytical tools were developed that can be used specifically as a basis for preventative disease spread models in the livestock sector. The following sections outline specific conclusions and recommendations for further research. 6.1 The spatial kernel The spatial spread of organisms is often described by a spatial kernel. By efficiently separating the shape and the scale of the kernel, it is possible to compare the influence of these features. The scale, in this thesis measured as 2D-variance or expected net squared displacement, is always an important factor when modeling the spread of organisms. The shape, measured as the kernel kurtosis, can also be interpreted as a measure of heterogeneity in the spatial process, which in ecological studies may be caused by individual differences. This may be important, but depends on what process is investigated, as well as the spatial context. Using quantitative measures for characterization of the spatial kernel by its shape and scale also allows for comparison of different processes. These characteristics provide insight to the spatial aspect of probabilities of between herd contacts via animal movements. Herds of production types with kernels generally estimated to have a large variance are expected to have many long distance contacts and thereby the possibility to transmit diseases to distant areas. Herds of production types with estimated kernels of generally high kurtosis are concluded to be heterogeneous in their contact structure. The spatial kernels estimated for the animal movements are generally leptokurtic, and in Paper III it is demonstrated that the kernel kurtosis differs between different production types. For both ecological and epidemiological studies it is often hard to estimate the shape of the kernel beforehand. The kurtosis may not always be directly important, but erroneous assumptions about kurtosis may also influence estimation of the kernel variance. Therefore, one should be careful about making such assumptions, and rather aim for estimation of the joint probability of scale and shape. Bayesian inference is a promising approach for this. 41 Hawkes (2009) argues that the unquestioned use of negative exponential distribution should be avoided, and instead scientists should use distributions based on mechanistic reasons. While I do agree with the first part, I believe that mechanistic deductions of distance dependence may be too complex to obtain, in both ecological and epidemiological studies. Instead we may often be confined to phenomenological estimations. In order to implement this, one requires flexible distributions. Further it is important to compare the predictions of the estimated distribution with observed data, and thereby make sure that the predictions are sufficient for both short and long distances. The resampling method presented in Paper VII can also be used to estimate dispersal parameters of moving animals. It does however not account for individual differences. Therefore, it should not be used when this is expected to be an important aspect in the modeling scenario. If so, a further development of the method using a hierarchical modeling approach is recommended, as introduced by Morales et al. (2004), where individual differences can be incorporated. 6.2 Neutral point pattern landscapes The method presented in Paper VI is a promising approach for modeling point pattern landscapes. The landscapes are described by two parameters: the contrast, which describes how different dense and sparse areas are, and the continuity, which describes if areas of similar density are located nearby each other or scattered in the landscape. These measures can also be estimated from observed data, and thereby modeling results can be related to real situations. Further development of the NPPL methodology should focus on introducing classes of the patches. In an ecological setting these may represent habitat qualities or age classes (which is relevant for modeling of organisms associated with trees). In epidemiological modeling of transmission of livestock disease, such classes may represent production types or herd sizes. 6.3 Disease spread modeling The model developed can be used to aid risk assessment as a complement to classical methods. Contact tracing based on the database of reported movements may in initial 42 stages of an outbreak be difficult due to delayed reports. Modeling contact scenarios can therefore aid the early phase of disease control. The model can also be used to provide insight to dynamics of disease transmission via the analyzed contacts, as well as sort out which factors are most important to consider in modeling the system. Thereby, the importance of reliable data can be pinpointed. It is concluded that for modeling of disease spread via pig movements, the production type structure is the most important factor to consider. Improving data quality of this can improve the possibility to use the database in risk assessment. 6.3.1 Further development of the contact model To develop a model, it may sometimes be necessary to “squint” a bit and focus on the big picture. It is however important to retrospectively open the eyes and see what details may have been missed. Animal movements between herds are influenced by numerous factors, and the model presented should not be expected to have captured all relevant features. The main excluded aspect, perhaps, is the reoccurrence of contacts between the same herds. In particular, this is expected to be an important aspect of the contacts between the actors of the Sow pool system. If recurrent contacts are included in the model, this is expected to reduce the number of infections and this effect may particularly be important at later stages of an outbreak. It may also be used to provide better estimates when used for risk assessment. If recurrent movements of animals between holdings are common, prior reported movements may be used to improve the estimation of contacts of individual holdings. It should be expected that recurrent contacts are more common between holdings of some production types than others (such as Sow pool satellites and centers), and therefore this should be included as a parameter for each production type combination of the sending and receiving holding. While cattle are only considered in Paper I, the cattle farming also involves different production types. Beef and dairy herds are expected to have different contact structures, and it should be expected that dairy herds send male calves to beef producers for fattening. The report forms of herd registrations in Sweden does not include differentiation of dairy and beef herds, but information of dairy herds may be obtained from the dairy farmers organization “Svensk mjölk”. From the results of Nöremark et al. (2009a), it should be expected that seasonal patterns are important to include in modeling of contacts between cattle herds. 43 6.3.2 Application to specific diseases The long term aim of the developed model is to use it in modeling of scenarios of specific diseases. For that purpose, disease specific features will have to be incorporate. These include incubation time and recovery rates, and it may also be necessary to consider the intra herd structure. Holdings of different production types and sizes are further expected to differ in efforts to increase biosecurity (Nöremark 2009b, Anonymous 2009, Nöremark et al. submitted), which is an important factor to include in modeling scenarios. Further, other contacts are expected to be important pathways for between-herd transmissions for most diseases. Farming related contacts (for example milk trucks, shared equipment, veterinarians etc.) and other visitors can contribute to the spread of disease. The modeling framework developed here can also be applied to such contact. Obtaining reliable data for these contacts will however be a major challenge, and parameters will probably have to be based largely on expert opinions. This may also be combined with observed data, and Bayesian modeling with expert opinions as informative prior is a suitable approach. Some diseases may also be transmitted via wildlife and in these instances the disease dynamic may not be understood (nor modeled) by the discourses of one single discipline. For example bluetongue disease, a viral disease infecting ruminants (mainly sheep), is transmitted via biting midges belonging to the genus Culicoides (Erasmus 1975). The spread of blue tongue disease needs to be described as both a local process, resulting from the movement pattern of the Culicoides, and a largescale process with long distance transmission (Gerbier et al. 2008). For example long distance animal movements may quickly carry the disease further than is expected from the flight of the Culicoides. A flexible model that can be fitted to both ecological and human processes can be very useful for modeling of such disease and I argue that the general approach with a spatial kernel is very useful. This shows that it is not only possible to use similar models for ecological and epidemiological studies, but also advantageous. 44 Acknowledgements First of all I would like to thank my supervisor, Uno Wennergren. I have in particular appreciated three things about working with Uno. First, he has always been very positive and encouraging. Secondly, he has allowed me to be very free in my decisions and allowed me to take any path I have found suitable (both geographically and scientifically). Thirdly, he has always shown a great understanding in the importance of a balance between the work and non-work related dimensions of life. Also, (as a fourth “runner-up”) Uno has never refrained from investing in gadgets, which has made work both easier and more motivating. I want to thank Scott Sisson for letting me visit School of Mathematics and Statistics at UNSW in Sydney (twice!). Special thanks for the patience he has shown in teaching a biologist about Bayesian inference and Markov Chain Monte Carlo methods. I am also grateful for Scotts insightful and detailed comments on my work, as well as for his ability to always find time for this, even with tight deadlines. I am also grateful to my colleges at the Theoretical Biology group, in particular Lars Westerberg for all stimulating conversations (both scientific and not). I also want to thank Lars for his valuable comments on my thesis. I further wish to recognize my fellow PhD students Frida Lögdberg, Sara Gudmundsson and Nina Håkansson for creating a positive working atmosphere. A special thank you to Nina, whose mathematical and computing skills has improved the quality of much of the research in this thesis. Further thanks to all the people at the Biology department. I have enjoyed being around people with deep knowledge in all the different fields of biology. I further want to thank my colleagues at Skövde University; Annie Jonsson (thank you for all the comments on my thesis) and Jenny Lennartsson. Thank you Maria Nöremark and Susanna Stenberg Lewerin at the National Veterinary Institute (SVA) for all the fruitful discussions over the years. I have learnt much. Also, thanks to all the nice people in the Physics building, I have had many good laughs at lunch time. I am grateful to the Swedish Emergency Management Agency (KBM) for funding and the Swedish Board of Agriculture for supplying the data of animal movements to my 45 cooperative partners at SVA. Also, I want to thank the Administration Board of Östergötland for supplying the tree distribution data used in Paper VI. ………………………………………………………………………………………… I want to thank my partner Dana Cordell for the support during my PhD, for cheering me up when frustration has struck, and also for raising the standard of what it means to be a PhD student (“why?” can be a very important question). Also, a big thank you for helping me make the cover of the thesis. Tack till mamma och pappa för att ni alltid låtit mig göra mina egna val, och för allt stöd ni gett mig för att klara av det. Och tack för den bästa systern jag haft, som alltid har tid att lyssna. Having an outlet in playing music has been very valuable during my time as a PhD student and therefore a special thanks to Kristian Rödin and Björn Lorenzoni of the band(s) Mudfly/Bubbla and Niclas Peterson of Kokong. Finally, thank you to all my friends (no one mentioned, no one forgotten) who have made life more enjoyable for the past years. 46 References Achter, J.D. and Webb, C.T. 2006. Mixed dispersal strategies and response to disturbance. Evolutionary Ecology Research. 8, 1377–1392. Anonymous. 2010. Anmälan – djurhållning. SJV JSB3:1 Anonymous. 2009. Hur mycket får PRRS-bekämpning kosta? – en veterinärmedicinsk och samhällsekonomisk analys. Jordbruksverket, Rapport 2009:4. Berger, Z.D. 2003. Bayesian and frequentist models: Legitimate choices for different purposes. AEP. 13, 559–596. Bjørnstad, O.N., Finkenstädt, B.F. and Grenfell, B.T. 2002. Dynamics of measles epidemics: estimating scaling of transmission rates using a time series sir model. Ecological Monographs. 72, 169–184. Boender, G.J., Meester, R., Gies, E. and De Jong, M.C.M., 2007. The local threshold for geographical spread of infectious diseases between farms. Preventive Veterinary Medicine. 82, 90–101. Boender, G.J. Nodelijk, G., Hagenaars, T.J, Elbers, A.R.W. and de Jong, M.C.M. 2008. Local spread of classical swine fever upon virus introduction into The Netherlands: Mapping of areas at high risk. BMC Veterinary Research. 4:9. Burton, P.R., Gurrin, L.C. and Campbell, M.J. 1998. Clinical significance not statistical significance: a simple Bayesian alternative to p values. Journal of Epidemiology and Community Health. 52, 318– 323. Cain, M.L. 1990. Models of Clonal Growth in Solidago Altissima. Journal of Ecology. 78, 27–46. Cain, M.L. 1991. When Do Treatment Differences in Movement Behaviors Produce Observable Differences in Long-Term Displacements? Ecology. 72, 2137–2142. Clark, J.S., Macklin, E. and Wood, L. 1998. Stages and Spatial Scales of Recruitment Limitation in Southern Appalachian Forests. Ecological Monographs. 68, 213–235. Clark, J. S. 1998. Why trees migrate so fast: confronting theory with dispersal biology and the paleorecord. American Naturalist. 152, 204–224. Clark, J. S., Silman, M., Kern, R., Macklin, E. and HilleRisLambers, E. 1999. Seed dispersal near and far: generalized patterns across temperate and tropical forests. Ecology. 80, 1475–1494. Clark, J.S., Lewis, M. and Horvath, L. 2001. Invasion by Extremes: Population Spread with Variation in Dispersal and Reproduction. American Naturalist. 157, 537–554. Clark, J.S. 2005. Ideas and perspectives: why environmental scientists are becoming Bayesians. Ecology Letters. 8, 2–14. 47 Cohen J. E., Newman C. M., Cohen A. E., Petchey O. L. and Gonzalez A. 1999. Spectral mimicry: A method of synthesizing matching time series with different Fourier spectra. Circuits, Systems, and Signal Processing. 18, 431–442. Coombs, M. F. and Rodríguez M. A. 2007. A field test of simple dispersal models as predictors of movement in a cohort of lake-dwelling brook charr. Journal of Animal Ecology. 76, 45–57. Crauwels, A.P.P., Nielen, M., Elbers, A.R.W., Stegeman, J.A. and Tielen, M.J.M. 2003. Neighbourhood infections of classical swine fever during the 1997-1998 epidemic in the Netherlands. Preventive Veterinary Medicine. 61, 263–277. Cripps, P.J. 2008. Statistical and epidemiological methodology for sheep research: The needs, the problems, the solutions. Small Ruminant Research. 76, 26–30. Dawid, A. P. 2004. Probability, causality and the empirical world: a Bayes-de Finetti-Popper-Borel synthesis. Statistical Science. 19, 44–57. Débarre, F. Lenormand, T. and Gandon, S. 2009. Evolutionary epidemiology of drug-resistance in space. PLoS Computational Biology. 5, 1–8. Dennis, B. 1996. Discussion: should ecologists become Bayesians? Ecological Applications. 6, 1095– 1103. Dixon, P. and Ellison, A.M. 1996. Introduction: ecological applications of Bayesian inference. Ecological Applications. 6, 1034–1035. Dobzhansky, T. and Wright, S. 1943. Genetics of natural populations. X. Dispersion rates in drosophila pseudoobscura. Genetics. 28, 304–340. Dohoo, I.R., Morris, R.S., Martin, S.W., Perry, B.D., Bernardo, T., Erb, H., Thrusfield, M., Smith, R. and Welte, V.R. 1994. Epidemiology. Nature. 368, 284. Dubé, C., Ribble, C., Kelton, D. and McNab, B. 2009. A review of network analysis terminology and its application to foot-and-mouth disease modelling and policy development. Transboundary and Emerging Diseases. 56, 73–85. Dunning, J.B., Stewart, D.J., Danielson, B.J., Noon, B.R., Root, T.L., Lamberson, R.H. and Stevens, E.E. 1995. Explicit population models: Current forms and future uses. Ecological Applications. 5, 3– 11. Dunson, D.B. 2001. Commentary: practical advantages of Bayesian analysis of epidemiologic data. American Journal of Epidemiology. 153, 1222–1226. Ehrlich, P.R. and Roughgarden, J. 1987. The science of ecology. Macmillan publishing company. New York, USA. Ellison, A.M. 2004. Bayesian inference in ecology. Ecology Letters. 7, 509–520. Erasmus, B.J. 1975. Bluetongue in sheep and goats. Australian Veterinary Journal. 51, 165–170. 48 Ersbøll, A.K. and Nielsen, L.R. 2008. The range of influence between cattle herds is of importance for the local spread of Salmonella Dublin in Denmark. Preventive Veterinary Medicine. 84, 277–290. Fahrig, L. 1997. Relative effects of habitat loss and fragmentation on population extinction. The Journal of Wildlife Management. 61, 603–610. Ferguson, N.M., Donnelly, C.A. and Anderson, R.M. 2001. The foot-and-mouth epidemic in Great Britain: pattern of spread and impact of interventions. Science. 292, 1155–1160. Févre, E.M., Bronsvoort, B.M.de C., Hamilton, K.A. and Cleaveland, S., 2006. Animal movements and the spread of infectious diseases. Trends in Microbiology. 14, 125–131. Fischer, J.R. 1980. The Economic Effects of Cattle Disease in Britain and Its Containment, 1850– 1900. Agricultural History. 54, 278–294. Fraser, D.F., Gilliam, J.F., Daley, M.J., Le, A.N. and Skalski, G.T. 2001 Explaining leptokurtic movement distributions: Intrapopulation variation in boldness and exploration. The American Naturalist. 158, 124–135. Gamerman, D. and Lopes, H.F. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, second edition. CRC Press, Chapman and Hall. Boca Raton. Garner, M.G., Hess, G.D. and Yang, X. 2006. An integrated modelling approach to assess the risk of wind-borne spread of foot-and-mouth disease virus from infected premises. Environmental Modeling and Assessment. 11, 195–207. Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B., 2004. Bayesian Data Analysis, second edition. Chapman and Hall, CRC. Gerbier G., Baldet T., Tran A., Hendrickx G., Guis H., Mintiens K., Elbers A.R.W. and Staubach C. 2008. Modelling local dispersal of bluetongue virus serotype 8 using random walk. Preventive Veterinary Medicine. 87, 119–130. Grenfell, G. and Keeling M., 2007. Dynamics of infectious disease. In May, R. and McLean, A. (eds) Theoretical ecology, third edition. Oxford university press, Oxford, UK. Halley, J. M., Hartley, S., Kallimanis, A. S., Kunin, W. E., Lennon, J. J. and Sgardelis, S. P. 2004. Uses and abuses of fractal methodology in ecology. Ecology Letters. 7, 254–271. Hanski, I. 1994. A practical model of metapopulation dynamics. Journal of Animal Ecology. 63, 151– 162. Hawkes, C. 2009. Linking movement behaviour, dispersal and population processes: is individual variation a key? Journal of Animal Ecology. 78, 894–906. Heesterbeek, J. 2002. A brief history of R0 and a recipe for its calculation. Acta Biotheoretica. 50, 189–204. 49 Heinz, S.K., Wissel, C. and Frank, K. 2006. The viability of metapopulations: individual dispersal behaviour matters. Landscape Ecology. 21, 77–89. Hiebeler, D. 2000. Populations on fragmented landscapes with spatially structured heterogeneities: landscape generation and local dispersal. Ecology. 81, 1629–1641. Holmes, E.E. 1993. Are diffusion models too simple? A comparison with telegraph models of invasion. The American Naturalist. 142, 779–795. Holt, D.H. 2002. Food webs in space: On the interplay of dynamic instability and spatial processes. Ecological Research. 17, 261–273. Hovestadt, T., Messner, S. and Poethke, H.J. 2001. Evolution of reduced dispersal mortality and ‘fattailed’ dispersal kernels in autocorrelated landscapes. Proceedings of the Royal Society of London, B. 268, 385–391. Inoue, T. 1978. A new regression method for analyzing animal movement patterns. Population Ecology. 20, 141–163. Johnson, W.C. and Webb, T. 1989. The Role of Blue Jays (Cyanocitta cristata L.) in the Postglacial Dispersal of Fagaceous Trees in Eastern North America. Journal of Biogeography. 16, 561–571. Kareiva, P. M. and Shigesada, N. 1983. Analyzing insect movement as a correlated random walk. Oecologia. 56, 234-238. Keeling, M.J., Woodhouse, M.E., Shaw, D.J. and Matthews, L. 2001. Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a dynamic landscape. Science. 294, 813–817. Keitt, T. H. 2000. Spectral representation of neutral landscapes. Landscape Ecology. 15, 479–493. Kiss, I.Z., Green, D.M. and Kao, R.R. 2006. The effect of contact heterogeneity and multiple routes of transmission on final epidemic size. Mathematical Bioscience. 203, 124–136. Klein, E.K., Lavigne, C. and Gouyon, P.-H. 2006. Mixing of propagules from discrete sources at long distance: comparing a dispersal tail to an exponential. BMC Ecology. 6:3. Kleinbaum, D.G., Sullivan, K.M and Barker, N.D. 2007. A pocket guide to epidemiology, Springer, New York. Kot, M., Lewis, M. A. and van den Driessche, P. 1996. Dispersal data and the spread of invading organisms. Ecology. 77, 2027–2042. Lawson, A.B. and Zhou, H. 2005. Spatial statistical modeling of disease outbreaks with particular reference to the UK foot and mouth disease (FMD) epidemic of 2001. Preventive Veterinary Medicine. 71, 141–156. Lee, P. M. 2004. Bayesian statistics: An introduction, third edition. Hodder/Oxford University Press. New York. 50 Lockwood, D. R., Hastings, A. and Botsford, L.W. 2002. The effects of dispersal patterns on marine reserves: Does the tail wag the dog? Theoretical Population Biology. 61, 297–309. Marshall, W.A. 1996. Aerial dispersal of lichen soredia in the maritime Antarctic. New Phytologist. 134, 523–530. McPeek, M.A. and Holt, R.D. 1992. The evolution of dispersal in spatially and temporally varying environments. The American Naturalist.140, 1010–1027. Merrill, R.M. and Timmreck, T.C. 2006. Introduction to epidemiology, fourth edition. Sudburry, MA. Metropolis, N., Rosenblutn, A.W., Rosenbluth, M.N., Teller, A.H. and Teller, E. 1953. Equation of state calculations by fast computing machines. The Journal of Chemical Physics. 21, 1087–1092. Mollison, D. 1977. Spatial contact models for ecological and epidemic spread. Journal of the Royal Statistics Society, B. 39, 283–326. Mollison, D. 1991. Dependence of epidemic and population velocities on basic parameters. Mathematical Bioscience. 107, 255–287. Mollison, D., Isham, V. and Grenfell, B. 1994. Epidemics: models and data. Journal of the Royal Statistical Society, A. 157, 115–149. Morales, J.M., Haydon, D.T., Frair, J., Holsinger, K.E. and Fryxell, J.M. 2004. Extracting more out of relocation data: building movement models as mixtures of random walks. Ecology. 85, 2436–2445. Mugglestone, M. A. and Renshaw, E. 1996. A practical guide to the spectral analysis of spatial point processes. Computational Statistics and Data Analysis. 21, 43–65. Murray, M.D. 1995. Influences of vector biology on transmission of arboviruses and outbreaks of disease: the Culicoides brevitarsis model. Veterinary Microbiology. 46, 91–99. Nadarajah, S. 2005. A generalized normal distribution. Journal of Applied Statistics. 32, 685–694. Nathan, R., Perry, G., Cronin J.T., Strand, A.E. and Cain, M.L. 2003. Methods for estimating longdistance dispersal. Oikos. 103, 261–273. Nathan, R. 2006. Long distance dispersal of Plants. Science. 313, 786 –788. Nöremark, M., Håkansson, N., Lindström, T. Wennergren, U., and Sternberg Lewerin, S., 2009a. Spatial and temporal investigations of reported movements, births and deaths of cattle and pigs in Sweden. Acta Veterinaria Scandinavica. 51:37. Nöremark M, Lindberg A, Vågsholm I. and Sternberg Lewerin S. 2009b. Disease awareness, information retrieval and change in biosecurity routines among pig farmers in association with the first PRRS outbreak in Sweden. Preventive Veterinary Medicine. 90, 1-9. Nöremark, M., Frössling, J. and Stenberg Lewerin, S. Application of routines that contribute to onfarm biosecurity as reported by Swedish livestock farmers. Submitted 51 Okubo, A. 1980. Diffusion and ecological problems: mathematical models. Springer-Verlag. New York. O'Neill, P.D. and Roberts, G.O. 1999. Bayesian inference for partially observed stochastic epidemics. Journal of The Royal Statistical Society, A. 162, 121–129. Ortiz-Pelaez A, Pfeiffer D. U., Soares-Magalhães R.J. and Guitian F.J., 2006. Use of social network analysis to characterize the pattern of animal movements in the initial phases of the 2001 foot and mouth disease (FMD) epidemic in the UK. Preventive Veterinary Medicine. 75, 40–55. Phillips, B.L., Brown, G.P., Travis, J.M.J. and Shine, R. 2008. Reid’s paradox revisited: the evolution of dispersal kernels during range expansion. The American Naturalist. 172, S34–S48. Preece, T. and Mao, Y. 2009. Sustainability of dioecious and hermaphrodite populations on a lattice. Journal of Theoretical Biology. 261, 336–340. Pulliam, R.H. and Danielson, B.J. 1991. Sources, Sinks, and Habitat Selection: A Landscape Perspective on Population Dynamics. The American Naturalist. 137, 50–66. Ripley, B.D. 2004. Spatial Statistics. John Wiley and Sons. New Jersay. Robinson, S.E. and Christley, R.M. 2007. Exploring the role of auction markets in cattle movements within Great Britain. Preventive Veterinary Medicine. 81, 21–37. Ruokolainen, L., Lindén, A., Kaitala, V. and Fowler, M.S. 2009. Ecological and evolutionary dynamics under coloured environmental variation. Trends in Ecology and Evolution. 24, 555–563. Rweyemamu, M., Roeder, P., Mackay, D., Sumption, K., Brownlie, J., Leforban, Y., Valarcher, J.F., Knowles, N.J. and Saraiva, V., 2008. Epidemiological patterns of foot-and-mouth disease worldwide. Transboundary and Emerging Diseases. 55, 57–72. Santamaría, L., Rodríguez-Pérez, J., Larrinaga, A.R. and Pias, B. 2007. Predicting Spatial Patterns of Plant Recruitment Using Animal-Displacement Kernels. PLoS One. 2:10 Schick, R.S., Loarie, S.R., Colchero, F., Best, B.D., Boustany, A., Conde, D.A., Halpin, P.N., Joppa, L.N., McClella, C.M. and Clark, J. 2008. Understanding movement data and movement processes: current and emerging directions. Ecology Letters. 11, 1338–1350. Schneider, J.C. 1999. Dispersal of a highly vagile insect in a heterogeneous environment. Ecology. 80, 2740–2749. Schweiger, O., Frenzel, M. and Durka, W. 2004. Spatial genetic structure in a metapopulation of the land snail Cepaea nemoralis (Gastropoda: Helicidae). Molecular Ecology. 13, 3645–3655. Shigesada, N., Kawasaki, K. and Takeda, Y. 1995. Modeling Stratified Diffusion in Biological Invasions. The American Naturalist. 146, 229–251. Skarpaas, O. and Shea, K. 2007. Dispersal patterns, dispersal mechanisms, and invasion wave speeds for invasive thistles. The American Naturalist. 170, 421–430. 52 Smith, D.L., Dushoff, J. Perencevich, E.N., Harris, A.D. and Levin, S.A. 2004. Persistent colonization and the spread of antibiotic resistance in nosocomial pathogens: Resistance is a regional problem. Proceedings of the National Academy of Sciences. 101, 3709–3714. Smith, R.D. 2006. Veterinary clinical epidemiology, Taylor and Francis Group, Boca Raton. Spiegelhalter, D.J. 2004. Incorporating Bayesian ideas into health-care evaluation. Statistical Science. 19, 156–174. Merrill, R.M. and Timmreck, T.C. 2006. Introduction to epidemiology, fourth edition. Jones and Bartlett publishers Inc, Sudburry, MA. Tildesley, M.J. and Keeling, M.J. 2009. Is R0 a good predictor of final epidemic size: Foot-andmouth disease in the UK. Theoretical Population Biology. 258, 623–629. Turchin, P. 1998. Quantitative Analysis of Movement. Sinauer Associates, Sunderland, MA. Urban, M.C., Phillips, B.L., Skelly, D.K. and Shine, R. 2008. A toad more traveled: the heterogeneous invasion dynamics of cane toads in Australia. The American Naturalist. 171, E134– E148. van den Bosch, F., Metz, J.A.J. and Diekmann, O. 1990. The velocity of spatial population expansion. Journal of Mathematical Biology. 28, 529–565. Velthuis, A.G. and Mourits, M.C. 2007. Effectiveness of movement-prevention regulations to reduce the spread of foot-and-mouth disease in The Netherlands. Preventive Veterinary Medicine. 82, 262– 281. Venna, J., Kaski, S. and Peltonen, P. 2003. Visualizations for assessing convergence and mixing of MCMC. In Lavrac et al. (eds). ECML. Springer-Verlag, Berlin. Viswanathan, G. M., Afanasyev, V., Buldyrev, S. V., Murphy, E. J., Prince, P. A. and Stanley, H. E. 1996. Lévy flight search patterns of wandering albatrosses. Nature. 381, 413–415. Vogl, G. 2005. Diffusion and Brownian motion analogies in the migration of atoms, animals, men and ideas. Diffusion Fundamentals. 2, 1–15. Walser, J.-C. 2004. Evidence for Limited Dispersal of Vegetative Propagules in the Epiphytic Lichen Lobaria pulmonaria. American Journal of Botany. 91, 1273–1276. Walters, R. J., Hassall, M., Telfer, M. G., Hewitt, G. M. and Palutikof, J. P. 2006. Modelling dispersal of a temperate insect in a changing climate. Proceedings of the Royal Society of London, B. 273, 2017–2023. Westerberg, L. and Wennergren, U. 2003. Predicting the spatial distribution of a population in a heterogeneous landscape. Ecological Modelling. 166, 53–65. Worster, D. 1994. Natures economy – A history of ecological ideas. Second edition. Cambrige university press. Cambridge, UK. 53 Yamamura K. 2004. Dispersal distance of corn pollen under fluctuating diffusion coefficient. Population Ecology. 46, 87–101. Yamamura, K., Moriya, S., Tanaka, K. and Shimizu, T. 2007. Estimation of the potential speed of range expansion of an introduced species: characteristics and applicability of the gamma model. Population Ecology. 49, 51–62. 54

Download
# The spatial kernels estimated for the animal movements are