Modelling disease spread on partially observed networks Peter Dawson,1 Marleen Werkman,1 Ellen Brooks-Pollock,2 and Mike Tildesley1 1 Centre 2 Cambridge for Complexity Science, University of Warwick Infectious Disease Consortium, University of Cambridge Network models are being used increasingly to simulate the spread of diseases which are transmitted by a contact process. These models are most effective when detailed data are available. However in most cases only partial information is available. We represent the full-set of UK inter-farm cattle movements as a directed weighted-static network with farms as nodes and movements forming edges. Foot-and -mouth style outbreaks are simulated on the network. Movements are then deleted at random from the data set. The network is rebuilt using various methods from different percentages of the original data. When epidemics are simulated on the rebuilt networks we find that, in certain regions, a 30% knowledge of the true network is sufficient to accurately predict both the structure of the network and the size and spatial spread of an epidemic. However this result is dependent on the network structure of the regions investigated. I. INTRODUCTION Mathematical models of infectious diseases are being increasingly used to inform policy decisions. For the 2001 outbreak of foot-and-mouth disease (FMD) in the UK, scientists were engaged at an early stage during the outbreak and provided advice to help contain the outbreak [5, 6]. In the UK, all cattle movements are recorded via the Cattle Tracing System (CTS). These data allow for the development of detailed mathematical models to simulate the spread of disease through the livestock system. However, in other countries, such data are not available, with only partial knowledge of the network being known at any point in time - whilst we have 100% of movement data for the UK, the figure for other countries is significantly lower than this. Early epidemiological models assumed population wide mixing [1, 9] but in practice any given member of a population only has finite number of contacts. As a result network models are being used increasingly to simulate the spread of diseases which are transmitted by contact processes [7, 11]. The best approach to building these model remains an open question [8, 12]. However one thing is certain, the task of modelling disease spread on a network is further complicated as a full data set is not generally available to construct the network. We examine epidemic behaviour on a partially observed network. Partially observed in our case means that we know how many nodes are in the network but we have only a limited knowledge of the edges. In this project we will use data from the 2010 CTS database (provided by the Department for the Environment, Food and Rural Affairs (DEFRA)) in the UK as a basis for our model. The methodology is in three parts. We begin by analysing the network properties of the cattle farm network. Simulations of FMD susceptible-infected-removed (SIR) outbreaks will be modelled for different counties in the UK. Simulations of FMD susceptible-infected-removed (SIR) style outbreaks will be modelled for different counties in the UK. Next we look at the effect of deletion of movements on the cattle network. The network properties will be analysed and and SIR model will be simulated as more of the movement data is deleted. Finally we will attempt to rebuild the networks in a way that re-establishes the initial network properties and epidemic behaviour. 2 II. A. METHODS The Cattle Network The first step in any network analysis is to construct an appropriate adjacency matrix, A. We follow the procedure of Vernon and Keeling in creating a directed weighted-static network [12]. Nodes will represent farms and edges cattle movements. We note here that movements to slaughter are ignored and that markets are not treated as nodes – when a movement goes through a market, the source and destination farms are listed as the two nodes with an edge between them (so there is no increased risk of transmission when a movement goes through a market). The adjacency matrix is constructed from the cattle movement data such that an edge ai j is non-zero if cattle were moved from farm i to farm j in the considered time frame. The weight of the edge is the frequency of movements from farm i to farm j. For instance if there were four movements from farm 3 to farm 5 then a35 = 4/365. Neither the number of cattle moved or the size of the farms involved were considered. B. The SIR model SIR is a simple but effective way of modelling an epidemic. The three states are; susceptible - a farm can become infected based on the states of its neighbours, infectious - a farm passes on the infection to others or can be removed with a given rate, and removed - a farm is considered to become completely recovered and imunne to FMD or all its livestock have died or been culled, in either case the farm cannot become reinfected or pass on the infection to susceptible farms. The original model is governed by the set of differential equations dS = bN−λ S − d, dt dI = λ S−γ − dI dt dR = γI−dR dt (1) with S, I and R being the number of individuals in the susceptible, infectious and removed classes respectively in a population of size N. The dynamics are governed by the birth-rate b, the natural death rate d and the rate of recovery from infection γ. The force of infection λ is the rate at with susceptibles become infected, the choice of which should contain information about the interactions that lead to infection. Our model is run over a sufficiently short time period that we can ignore birth and death processes, we also set recovery period to be fixed rather than at a rate. We shall use the approach outlined above to simulate a stochastic process on a network. Recasting equation (1) in matrix form we give the probability of farm j being infected as ! λ j = 1 − exp −β ∑ ai j Ii ) . (2) i where Ii = 1 if farm i is infected and is zero otherwise. The infection strength is characterised by the parameter β . An infected farm becomes removed after T days of being infected. The values of parameters β = 1 (the probability of transmission) and T = 21 (the recovery period) were set to emulate a foot-and-mouth disease epidemic in the absence of control. The probability 3 of transmission being set to one means that if a farm is infected it is assumed that all livestock on the farm are infected. 1. Epidemic Statistics We will characterise epidemic using a few natural statistics. In this project we are interested in analysing the early stages of an outbreak so our models will run for ninety time-steps (days). Whilst it is likely that FMD would be detected earlier than this (and movement bans would be introduced) in the event of an outbreak, it was decided to run the model for ninety days to allow for a more detailed comparison of epidemics simulated on the recorded and the reconstructed data. We say an outbreak has ‘taken-off’ if there are ten or more farms infected or removed at the end of the simulation. The final size of an outbreak will be the total number of farms in either the removed or infected class after ninety days. Statistics are averaged over only simulations which have reached the take-off threshold, with 200 simulations being run for every realisation of the network. The locations of all the farms in the network are known allowing us to keep track of where the infection has reached. The number of counties infected after thirty, sixty and ninety days was recorded with the number of counties infected being described as the ‘spread’ of the epidemic. The initial growth rate of the epidemic is also of interest. This was taken as the gradient of the total number of infected farms as a function of time over the first twenty-one days of the outbreak, i.e. until recovery takes effect. We consider the effect of seeding the infection in different counties. In each case the infection begins in the most highly connected farm based on weighted-out-degree (i.e. the farm with the most movements away from it). The counties investigated were: Cumbria, Devon, Aberdeenshire, Surrey and Clwyd. Over 60% of movements originating at a farm in Cumbria are to other counties and therefore we expect that infections seeded in Cumbria would spread to several other counties. FIG 1 shows a histogram plot of the counties which receive movements from Cumbria – the majority are to its neighbouring counties in the North of England. Similar histogram plots for the other counties can be seen in the appendix. FIG 1 also compares movements from the full data set with what remains after the network has been depleted to 30% of its movements, showing some of the effects of deletion on the network. Aberdeenshire has a large farming community but appears to be quite isolated – 70% of movements are within-county and therefore we expect that infections starting in Aberdeenshire should stay in Scotland. Devon has similarities with Aberdeenshire in that 60% of movements are withincounty; however it does have strong connections with the surrounding counties of Cornwall and Somerset. The movement data also suggested that outbreaks in Surrey and Clwyd would be relatively short-lived and would not infect many other counties. In the main body of the paper we focus on epidemics seeded in Cumbria; however the results for the remaining counties are presented in the appendices and discussed in section IV. C. Network Properties After constructing the adjacency matrix we can calculate some network properties [10]. We expect these properties to reveal information that would influence epidemic spread. If the network is highly connected epidemics will spread quicker and further from the source farm. 4 4 number of movements to county 3 2.5 x 10 full network depleted to 30% Cumbria 2 1.5 North Yorkshire 1 0.5 Durham Lancashire Northumberland Aberdeen 0 counties recieving more than 20 movements FIG. 1: Histogram showing the number of movements beginning in Cumbria and their distribution amongst UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than 20 movements are shown. 60% of Cumbria’s cattle movements are external and these are mainly distributed amongst other counties in the North of England as well as Aberdeenshire. 1. Degree Distribution The degree of a node is defined as the number of edges connected to that node. However, given that our networks are directed, A is not necessarily symmetric so we define an in-degree kin (i) as the number of edges pointing to node i whilst the out-degree kout (i) is the number of edges pointing away from node i. As the cattle network is not only directed but also weighted we can define weighted in and out degrees, known as degree strengths. These are quickly calculated using the adjacency matrix. The sum of the elements of row i is the out strength of node i, sout (i) = ∑ j Ai j and the sum of the elements of column i is the weighted in-degree of node i, sin (i) = ∑ j A ji . We can also analyse the distribution of the weights. These distributions are plotted in FIG 2 and FIG 3 . All the distributions shown here appear to exhibit power-law behaviour and therefore we suggest that the networks are scale-free and could show similarities with preferential attachment networks [2]. The study of degree distributions can give good indications as to how the network forms and therefore how it may be reconstructed. Various studies [3] [4] have also been conducted showing that graphs formed under preferential attachment are more robust then those formed randomly, that is a preferential attachment graph will not fragment as quickly as a random graph under random deletion of edges or nodes. Clustering The global clustering coefficient measures the average probability that two neighbours of a vertex are themselves neighbours. It can be viewed as the ratio of triangles to the number of connected triples in network. We first set all non-zero elements in A to one, ignoring weights. aipj is an element of A p , where the elements of A p represent paths of length p: 5 number of nodes 10 0 10 0 10 2 4 10 in degree 10 10 0 10 0 10 5 10 0 10 0 10 1 2 10 10 3 10 out degree 5 number of nodes number of nodes number of nodes 5 2 4 10 in−strength 10 5 10 0 10 0 10 1 10 2 10 out−strength 3 10 1000 4000 800 3000 2000 1000 0 6 10 number of edges (log−scale) 5000 node out−strength node in−strength FIG. 2: Vertex degree and strength distributions for incoming and out going movements for the full network. These have all been plotted on log-log scales. The form of distributions suggest a possible scale free network. 600 400 200 0 1000 2000 in degree 3000 0 4 10 2 10 0 0 100 200 out degree 300 10 0 1 2 10 10 10 edge weight (log−scale) FIG. 3: The first two plots show vertex strength against degree. These plots are linear suggesting that farms which are connected to a lot of farms also use these connections multiple times. The third plot show the edge weight distribution, once again looking very power-law like. aipj ( n, i can reach j in n ways by paths of length p = 0, there are no paths of length p between i and j. For global clustering we want the number of triangles, closed paths of length three divided by the 6 4 4 x 10 number of connected componets 3.5 0.06 3 0.05 giant component size local clustering coefficient x 10 0.04 0.03 0.02 2.5 2 1.5 1 0.01 0.5 0 0 20 40 60 80 100 movement percentage 6.5 6 5.5 5 4.5 4 3.5 20 40 60 80 100 movement percentage 20 40 60 80 100 movement percentage FIG. 4: Network statistics for the cattle network shown as function of the movement percentage. Local clustering (left) and giant componet size (middle) decay as movements are removes while the the number of connected components (right) increases. number of connected triples CG = Tr(A3 ) ||A2 || − Tr(A2 ) where || · || = ∑i j (·). We can also define the local clustering coefficient for vertex i , Ci , which measures the ratio of the number of pairs of neighbours of i that are connected to the number of pairs of neighbours of i. A clustering coefficient for the entire network can then be calculated as the mean of the clustering coefficients for each vertex CL = 1 n ∑ Ci. n i=1 These alternate definitions of clustering yield different results – in this project we will generally use the second definition (for networks of our size it is easier to compute). We should also note here that when we compute clustering we ignore direction of links and therefore symmetrise the adjacency matrix and set all weights to one. 2. Connected Components A connected component of a graph is a sub-graph where all nodes in the sub-graph are connected to each other by some continuous path along the edges of the sub-graph. In the context of directed networks there exist strongly and weakly connected components. A weakly connected components do not take in to account edge direction, where as a strongly connected component does. For two nodes to be in the same strongly connected component a path must exist from node i to node j and a path must exist from node j to node i by following the directed edges. We focus on strongly connected components only. 7 The giant connected component is the largest connected component in a graph and gives an upper bound on the number of farms that could become infected. While the number of groups is fairly irrelevant for the initial networks, the giant component of the cattle network containing 95% of farms, as edges are deleted the speed at which the network fragments gives an indication of how well connected the network is and whether much movements would need to be cancelled in order to prevent a disease outbreak. How badly the network fragments will serve as an indicator as to whether or not the network can be rebuilt. D. Effect of Deletion counties infected 300 Cumbria Devon 200 100 0 20 40 60 80 2 1 0 20 40 60 80 movement percentage 100 30 20 10 0 100 take off percentage early growth rate final epidemic size Movements were deleted at randomly from the adjacent matrix. A movement being deleted means reducing a non-zero element of the adjacency matrix by 1/365. The new adjacency matrices formed by deletion were saved incrementally after they had lost one percent of the original number of movements until we reached 10% of original movements. This process was carried out ten times with the results below averaged over these different realisations of the network FIG 4. The results for epdiemics seeded in Cumbria and Devon are seen in FIG 5. The size and spread of epidemics falls quickly as movements are deleted. Though these results are averaged, there was practically no variance between the individual deletions of the network and the mean result. Epidemic statistics for the other counties followed similar trends and so are not shown in this paper. We refer the reader once again to the histogram plot 8 to see the effect of deletion on cattle movements in Cumbria. We also note here that unless otherwise stated we have run SIR simulations on the networks at 5% increments of the data from 10% up to to the full network. 20 40 60 80 100 40 60 80 movement percentage 100 100 50 0 20 FIG. 5: The effect of deleting movements for epidemics seeded in Cumbria and Devon. There is a sharp drop off in epidemic size and spread as soon as any movements are deleted. By the 10% mark almost no simulations are reaching the take-off threshold. 8 III. A. RESULTS Rebuilding the Network When rebuilding only one realisation of the depleted data was chosen at random to be rebuilt. This is because in reality one is only given one instance of the data, also as there is very little variance between the network and epidemic statistics for individual deletions this choice is justified. Various methods of differing complexity can be used to rebuild the network. The effect of rebuilding from different percentages was examined. Adding a movement from farm i to farm j means increasing the corresponding element of the adjacency matrix by 1/365. The first method was to rescale the weights of all the edges in the network we are rebuilding original from, Adepleted , to create a rebuilt network, Arebuilt such that ∑ arebuilt = ∑ ai j . For example, ij rebuilt depleted if we are rebuilding from 10% then A = 10 × A . It was expected that when epidemics were run through this model that for the higher rebuilding percentages we would overestimate the size of the epidemic as though less farms are subject to infection (there are less edges) the ones that have an infected contact are more likely to get infected as the strength of that edge is greater than it would have been in the original network. Next we considered adding one new movement every time step.The new movement would add to an existing edge based on the current weight of the edges. The probability of a movement being added from farm i to farm j at time t was p(i → j,t) = arebuilt (t) ij (t) ∑ arebuilt ij . (3) The second method does not allow for new edges to form in the network. The third method employed was to give a probability of forming a new edge. A new movement would be added to the data set randomly with a probability π, and would be added to an existing edge with probability 1 − π, in the same manner as the second method. This method has some obvious flaws as the new edges are created purely randomly, no spatial data or network structure is used to suggest what might be an appropriate new edge, however with the right choice for π the number of edges (as opposed to number of movements) present in the original network could be reached which should capture more long range movements. As methods 1 and 2 do not add any new connections the network statistics mentioned previously will be the same as those calculated when deleting movements from the network. The weighted degree distributions do change somewhat, but the results are not interesting. B. Comparisons Between Rebuilding Methods The first and second results yielded similar results. On further reflection this is not surprising, indeed a relation is derived in appendix A showing the two methods to be equivalent. Indeed the discrepancy between the results is expected as networks were only rebuilt using the second method ten times. However, despite the simplicity of the method some interesting results have been obtained. With only 30% of the network the values for final epidemic size and number of counties infected approach the results for epidemics carried out on the full network, for epidemics seeded in Cumbria, FIG 6. Indeed, a histogram plot of the number of farms infected per county shows that the results at 30% and for the full network are close at a local level. This is further illustrated in FIG 9. 9 20 counties infected final epidemic size 300 200 100 method 1 method 2 0 20 40 60 80 5 method 1, after 30 days method 2, after 30 days 20 40 60 80 100 40 counties infected early growth rate 10 0 100 2 1.5 1 0.5 0 15 method 1 method 2 20 40 60 80 movement percentage 100 30 20 method 1, after 60 days method 2, after 60 days method 1, after 90 days method 2, after 90 days 10 0 20 40 60 80 movement percentage 100 FIG. 6: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run in Cumbria. A reference line is included for comparison of final epidemic sizes with the original network. Non-parametric percentile bootstrapping was used to give 95% confidence intervals. These are only shown for the first method. The results for Devon are less satisfying as can be seen in the appended figures good estimates for the final epidemic size cannot be obtained with less than 60% of the full data. The number of counties being infected does become comparable with the original data for rebuilds from 30%. Aberdeenshire recieves very favouarble results and can be rebuilt from as low as 20% of the original data. Take-off rate are not shown here for the rebuilds methods as all almost all simulations reached the designated threshold for consideration. The prediction ability of the third method for rebuilding was significantly worse that the other methods, FIG 7. Generally adding random links did not help to capture the long-range movements and prevented the original edges from gaining strength. Results shown are for rebuilding Cumbria and Devon from 10%, 30%, 50% and 80%, with values of π being 0.1, 0.2, 0.4, 0.5, 0.7 and 0.9. The value π = 0 represents the results from the second rebuilding method. final epidemic size (10%) 100 Cumbria Devon 50 0 0 0.2 0.4 0.6 (50%) 300 200 100 0 0 0.2 0.4 0.6 probability of random movement 0.8 (30%) 300 200 100 0 0.8 final epidemic size final epidemic size final epidemic size 10 0 0.2 0.4 0.6 0.8 (80%) 300 200 100 0 0.2 0.4 0.6 probability of random movement 0.8 FIG. 7: The effect on rebuilding the network from different movement percentages using the third method described in the text, shown for epidemics seeded in Cumbria and Devon. The final epidemic size generally decreases as the chance of creating random movements π is increased, the only exception to this appears when rebuilding Devon from 50% with a π = 0.1 however even then the epidemic size is well below 300 which is the size for the full network in Devon. North Yorkshire 35 rebuilt at 10% rebuilt at 30% full network number of farms infected 30 Cumbria 25 Aberdeen 20 15 10 Durham Dumfriesshire Northumberland Lancashire 5 0 counties FIG. 8: Histogram plot showing the number of farms infected for a given county, for epidemics beginning in Cumbria. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full network (black). Rebuilding from 30% captures quite well the overall size and spread of the disease but for the most part overestimates the size of the epidemic in counties close to and including the source county while underestimating the size for counties further away. IV. DISCUSSION The most striking result of this project is that for some counties only partial knowledge of the contact network is necessary to make epidemiological predictions on the size and spread of an outbreak. The obvious question is why is this is not the case for all counties? The cause must surely lie with the local network structure in these counties. Looking first at the percentage of total movements from a county that were not within-county did not draw good comparisons as 11 FIG. 9: Heat maps showing the number of farms infected in different counties in the UK for epidemics seeded in Cumbria for a) the full network and b) the network rebuilt from 30%. The size and spread of the disease is almost perfectly captured when the network is rebuilt from 30%. Aberdeenshire and Devon, two very insular counties had very different results upon rebuilding. One hypothesis would be that not only is Aberdeenshire very insular it is also highly connected and as such its edge structure is robust against deletion. In contrast Devon may not be as highly connected. To check this we can calculate the global and local clustering coefficients for the sub-networks made up only of Aberdeenshire farms and Devon farms. Plotting both types of clustering as well as component statistics as we delete edges we see there are quantitative differences between the two counties, FIG 10. If we then look at the components of the two sub-networks, FIG 10 we see that the giant component size in Devon is initially much larger than in Aberdeenshire, giving rise to larger epidemics for the full data set, but the Devon sub-network fragments into many more connected components than the Aberdeenshire sub-network with the giant-component size decaying at a much higher rate. This may imply that clusters in Devon are connected by several low weighted links. Once these links are broken, the network fragments and the methods of rewiring that we have described here would be unable to recreate that network structure. Other avenues that were explored here included degree distribution and weight distribution within the counties. However no obvious differences were observed within these. No doubt a more thorough analysis of network properties would yield further differences between the two counties, giving a greater indication as to the circumstances under which one could rebuild networks in such a way as to make accurate epidemic predictions. More complicated rebuilding strategies are easily envisaged where one would form new edges based on the locations of existing connections. The counties of Surrey and Clwyd were also investigated. Surrey had quite small epidemic sizes, the figure in the appendix do suggest it can be rebuilt from partial knowledge but the figures are too low to draw definite conclusions. Clwyd has a larger farming community than Surrey and 12 0.05 0 global clustering Aberdeen Devon 20 40 60 80 100 0.03 0.02 0.01 0 connected componets 0.1 giant component local clustering it would seem reasonable to suggest based upon the appended figures than rebuilding from 35% effectively reproduced the size and spread of epidemics seeded in Clwyd. 20 40 60 80 movemenet percentage 100 4000 3000 2000 1000 20 40 60 80 100 40 60 80 movement percentage 100 1500 1000 500 0 20 FIG. 10: Network statistics for Aberdeenshire (blue line) and Devon (green dashed line). The plots on the left show local (upper-left) and global (lower) clustering coefficients at a given percentage of movement data. The values for both measures are consistently twice as high for Aberdeenshire as they are for Devon. The plots on the right show the size of the giant component (upper-right) and the number of connected components (lower-right) at a given percentage of movement data. 13 V. CONCLUSIONS AND FUTURE WORK This work shows that partial knowledge of a contact network can be sufficient to make good epidemiological predictions, indicating that models such as these could be used in regions where precise demographic data are not available. Indeed our best results came for Cumbria which can be rebuilt from 30% this is significant as Cumbria was one of the counties most affected by the 2001 FMD outbreak in the UK. We have also shown how to identify regions that need more complex rebuilding methods, by looking to see if the region fragments rapidly under edge deletion. FIG 7 shows that while adding random links did not work in general exploring this idea more closely may help to rebuild regions such as Devon. Taking the project we would like to further test our hypothesis as to why some counties are not so easily rebuilt with more analysis of the network structure. We would also like to try and invent some more complex methods for rebuilding the network by edge creation taking into account existing movement trends and spatial data. VI. ACKNOWLEDGEMENTS The authors are grateful to EPSRC for providing funding, DEFRA for providing the CTS data and UKBORDERS for providing the shape files used in FIG 9. [1] N. T. J. Bailey. The mathematical theory of epidemics. Griffin, 1957. [2] A. Barabasí and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999. [3] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Network robustness and fragility: percolation on random graphs. Phys. Rev. Lett., 85:5468–5471, 2000. [4] R. Cohen, K. Erez, D. Ben-Avraham, and S. Havlin. Resilience of the Internet to random breakdowns. Phys.Rev. Lett., 85:4626–4628, 2000. [5] M. J. Keeling et al. Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a heterogeneous landscape. Science, 294:813–817, 2001. [6] N. M. Ferguson, C. A. Donnelly, and R. M. Anderson. Transmission intensity and impact of control policies on the foot and mouth epidemic in Great Britain. Nature, 413:542–548, 2001. [7] M. J. Keeling and K. T. D. Eames. Monogamous networks and the spread of sexually transmitted diseases. Math Biosci, 189:115–130, 2004. [8] M. J. Keeling and K. T. D. Eames. Networks and epidemic models. J. R. Soc. Interface, 2:295–307, 2005. [9] W. O. Kermack and A. G. McKendrick. A contribution to the mathematical thoery of epidemics. Proc. R. Soc. A, 115:700–721, 1927. [10] M. E. J. Newman. Networks. Oxford, 2010. [11] J. M. Read and R. M. Christley. Disease evolution on networks: the role of contact structure. Proc. R. Soc. B, 270:699–708, 2003. [12] M. C Vernon and M.J. Keeling. Representing the u.k.’s cattle network herd as static and dynamic networks. Proc. R. Soc. B, 276:469–476, 2009. 14 APPENDIX A: DERIVATION OF THE RELATIONSHIP BETWEEN REBUILD METHODS ONE AND TWO The probability of a movement being added to an existing edge at time t was defined as p(i → j,t) = arebuilt (t) ij (t) ∑ arebuilt ij . (A1) A difference equation can then be derived for the number of movements from node i to j at time t + 1 where the network begins with α Ñ movements and a single movement is added at each time-step such that N(i → j,t + 1) = N(i → j,t) + p(i → j,t). (A2) We start with some fraction α of the original movements Ñ and we add n movements such that Ñ = α Ñ + n so N(i → j,t + 1) = N(i → j,t) + N(i → j,t) αn + t (A3) and we use the initial condition N(i → j, 0) = N0 (i → j) this can be solved this explicitly n−1 1 N(i → j, n) = N0 (i → j) ∏ 1 + α Ñ + t t=1 n + α Ñ = N0 (i → j) 1 + α Ñ Ñ = N0 (i → j) 1 + α Ñ (A4) (A5) (A6) and assuming α Ñ >> 1 which it is, we obtain the result Nn (i → j) = 1 N0 (i → j) α which is exactly equivalent to the first method of rebuilding. (A7) 15 APPENDIX B: DEVON 4 number of movements to county 3 x 10 full network depleted to 30% Devon 2.5 2 1.5 1 0.5 Somerset Cornwall 0 counties recieving more than 20 movements FIG. 11: Histogram showing the number of movements beginning in Devon and their distribution amongst UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than 20 movements are shown. 60% of Devon’s cattle movements are internal, whilst the others are mainly distributed amongst the neighbouring counties of Somerset and Cornwall. 45 Devon number of farms infected 40 35 30 rebuilt at 10% rebuilt at 30% full network Somerset North Yorkshire 25 20 15 10 Cornwall Shropshire Cumbria 5 0 counties with more than 2 infected farms FIG. 12: Histogram plot showing the number of farms infected for a given county, for epidemics beginning in Devon. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full network (black). Rebuilds from 30% do capture the counties that get infected underestimate the epidemic size in all counties, even in Devon. 16 300 counties infected final epidemic size 20 200 100 0 method 1 method 2 20 40 60 80 15 10 5 0 100 method 1, after 30 days method 2, after 30 days 20 40 60 80 100 50 counties infected early growth rate 2 1.5 1 method 1 method 2 0.5 0 20 40 60 80 movement percentage 100 40 30 method 1, after 60 days method 2, after 60 days method 1, after 90 days method 2, after 90 days 20 10 0 20 40 60 80 movement percentage 100 FIG. 13: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run in Devon. A reference line is included for comparison of final epidemic sizes with the original network. Rebuilds do not begin to approach the original values for epidemic size until around 60%, they do begin to capture the spread of disease earlier at around 30% 17 APPENDIX C: ABERDEENSHIRE 4 number of movements to county 3 2.5 x 10 full network depleted to 30% Aberdeen 2 1.5 1 0.5 0 counties recieving more than 20 movements FIG. 14: Histogram showing the number of movements beginning in Aberdeenshire and their distribution amongst UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than 20 movements are shown.70% of Aberdeenshire’s cattle movements are internal, the others are distributed around Scotland and the North of England. 70 Aberdeen rebuilt at 10% rebuilt at 30% full network number of farms infected 60 50 40 30 20 Dumfriesshire 10 North Yorkshire 0 counties with more than 2 infected farms FIG. 15: Histogram plot showing the number of farms infected for a given county, for epidemics beginning in Aberdeenshire. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full network (black). Rebuilding from 30% recaptures the epidemic size and spread well, giving slight overestimations is some case and a slight underestimation in Aberdeenshire itself. 18 20 counties infected final epidemic size 200 150 100 50 0 method 1 method 2 20 40 60 80 5 method 1, after 30 days method 2, after 30 days 20 40 60 80 100 30 counties infected early growth rate 10 0 100 2 1.5 1 0.5 0 15 method 1 method 2 20 40 60 80 movement percentage 100 20 method 1, after 60 days method 2, after 60 days method 1, after 90 days method 2, after 90 days 10 0 20 40 60 80 movement percentage 100 FIG. 16: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run in Aberdeenshire. A reference line is included for comparison of final epidemic sizes with the original network. From rebuilds from 20% they epidemic size and spread give results that compare well with the full network. 19 APPENDIX D: CLWYD number of movements to county 10000 full network depleted to 30% Clwyd 8000 6000 4000 Chesire Shropshire Staffordshire 2000 Gwynedd 0 counties recieving more than 20 movements FIG. 17: Histogram showing the number of movements beginning in Clwyd and their distribution amongst UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than 20 movements are shown. number of farms infected 20 rebuilt at 10% rebuilt at 30% full network 15 10 Chesire Shropshire North Yorkshire Aberdeen Staffordshire 5 0 counties with more than 2 infected farms FIG. 18: Histogram plot showing the number of farms infected for a given county, for epidemics beginning in Clwyd. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full network (black). Rebuilding from 30% gives an overall underestimation for the size and spread of the epidemic but it should be noted that even at 100% the epidemics sizes are quite small when compared to the previous counties so it is hard to quantify these results. 20 20 counties infected final epidemic size 150 100 50 method 1 method 2 0 20 40 60 80 5 method 1, after 30 days method 2, after 30 days 20 40 60 80 100 30 counties infected early growth rate 10 0 100 1 0.8 0.6 0.4 method 1 method 2 0.2 0 15 20 40 60 80 movement percentage 100 20 method 1, after 60 days method 2, after 60 days method 1, after 90 days method 2, after 90 days 10 0 20 40 60 80 movement percentage 100 FIG. 19: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run in Clwyd. A reference line is included for comparison of final epidemic sizes with the original network. 21 APPENDIX E: SURREY number of movements to county 800 full network depleted to 30% 700 600 Surrey 500 400 300 200 Devon Kent West Sussex East Sussex Norfolk 100 0 counties recieving more than 20 movements FIG. 20: Histogram showing the number of movements beginning in Surrey and their distribution amongst UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than 20 movements are shown. 16 rebuilt at 10% rebuilt at 30% full network number of farms infected 14 12 10 Norfolk 8 6 North Yorkshire 4 Kent Nothamptonshire 2 0 counties with more than 2 infected farms FIG. 21: Histogram plot showing the number of farms infected for a given county, for epidemics beginning in Surrey. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full network (black). Rebuilding from 30% does seem to capture the epidemic size and spread but as with Clwyd the figures are too low to draw concrete conclusions. 22 10 counties infected final epidemic size 80 60 40 20 0 method 1 method 2 20 40 60 80 8 6 4 0 100 method 1, after 30 days method 2, after 30 days 2 20 40 60 80 100 counties infected early growth rate 20 0.4 0.2 method 1 method 2 0 20 40 60 80 movement percentage 100 15 10 method 1, after 60 days method 2, after 60 days method 1, after 90 days method 2, after 90 days 5 0 20 40 60 80 movement percentage 100 FIG. 22: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run in Surrey. A reference line is included for comparison of final epidemic sizes with the original network.