Modelling disease spread on partially observed networks Peter Dawson, Marleen Werkman, Ellen Brooks-Pollock,

advertisement
Modelling disease spread on partially observed networks
Peter Dawson,1 Marleen Werkman,1 Ellen Brooks-Pollock,2 and Mike Tildesley1
1 Centre
2 Cambridge
for Complexity Science, University of Warwick
Infectious Disease Consortium, University of Cambridge
Network models are being used increasingly to simulate the spread of diseases which are
transmitted by a contact process. These models are most effective when detailed data are
available. However in most cases only partial information is available. We represent the
full-set of UK inter-farm cattle movements as a directed weighted-static network with farms
as nodes and movements forming edges. Foot-and -mouth style outbreaks are simulated on
the network. Movements are then deleted at random from the data set. The network is rebuilt
using various methods from different percentages of the original data. When epidemics are
simulated on the rebuilt networks we find that, in certain regions, a 30% knowledge of the
true network is sufficient to accurately predict both the structure of the network and the size
and spatial spread of an epidemic. However this result is dependent on the network structure
of the regions investigated.
I.
INTRODUCTION
Mathematical models of infectious diseases are being increasingly used to inform policy decisions. For the 2001 outbreak of foot-and-mouth disease (FMD) in the UK, scientists were engaged
at an early stage during the outbreak and provided advice to help contain the outbreak [5, 6]. In
the UK, all cattle movements are recorded via the Cattle Tracing System (CTS). These data allow
for the development of detailed mathematical models to simulate the spread of disease through
the livestock system. However, in other countries, such data are not available, with only partial
knowledge of the network being known at any point in time - whilst we have 100% of movement
data for the UK, the figure for other countries is significantly lower than this.
Early epidemiological models assumed population wide mixing [1, 9] but in practice any given
member of a population only has finite number of contacts. As a result network models are being
used increasingly to simulate the spread of diseases which are transmitted by contact processes
[7, 11]. The best approach to building these model remains an open question [8, 12]. However
one thing is certain, the task of modelling disease spread on a network is further complicated as a
full data set is not generally available to construct the network. We examine epidemic behaviour
on a partially observed network. Partially observed in our case means that we know how many
nodes are in the network but we have only a limited knowledge of the edges. In this project we
will use data from the 2010 CTS database (provided by the Department for the Environment, Food
and Rural Affairs (DEFRA)) in the UK as a basis for our model. The methodology is in three
parts. We begin by analysing the network properties of the cattle farm network. Simulations of
FMD susceptible-infected-removed (SIR) outbreaks will be modelled for different counties in the
UK. Simulations of FMD susceptible-infected-removed (SIR) style outbreaks will be modelled
for different counties in the UK. Next we look at the effect of deletion of movements on the cattle
network. The network properties will be analysed and and SIR model will be simulated as more
of the movement data is deleted. Finally we will attempt to rebuild the networks in a way that
re-establishes the initial network properties and epidemic behaviour.
2
II.
A.
METHODS
The Cattle Network
The first step in any network analysis is to construct an appropriate adjacency matrix, A. We
follow the procedure of Vernon and Keeling in creating a directed weighted-static network [12].
Nodes will represent farms and edges cattle movements. We note here that movements to slaughter
are ignored and that markets are not treated as nodes – when a movement goes through a market,
the source and destination farms are listed as the two nodes with an edge between them (so there is
no increased risk of transmission when a movement goes through a market). The adjacency matrix
is constructed from the cattle movement data such that an edge ai j is non-zero if cattle were moved
from farm i to farm j in the considered time frame. The weight of the edge is the frequency of
movements from farm i to farm j. For instance if there were four movements from farm 3 to farm
5 then a35 = 4/365. Neither the number of cattle moved or the size of the farms involved were
considered.
B.
The SIR model
SIR is a simple but effective way of modelling an epidemic. The three states are; susceptible
- a farm can become infected based on the states of its neighbours, infectious - a farm passes on
the infection to others or can be removed with a given rate, and removed - a farm is considered to
become completely recovered and imunne to FMD or all its livestock have died or been culled, in
either case the farm cannot become reinfected or pass on the infection to susceptible farms. The
original model is governed by the set of differential equations
dS
= bN−λ S − d,
dt
dI
= λ S−γ − dI
dt
dR
= γI−dR
dt
(1)
with S, I and R being the number of individuals in the susceptible, infectious and removed classes
respectively in a population of size N. The dynamics are governed by the birth-rate b, the natural death rate d and the rate of recovery from infection γ. The force of infection λ is the rate
at with susceptibles become infected, the choice of which should contain information about the
interactions that lead to infection. Our model is run over a sufficiently short time period that we
can ignore birth and death processes, we also set recovery period to be fixed rather than at a rate.
We shall use the approach outlined above to simulate a stochastic process on a network. Recasting equation (1) in matrix form we give the probability of farm j being infected as
!
λ j = 1 − exp −β ∑ ai j Ii ) .
(2)
i
where Ii = 1 if farm i is infected and is zero otherwise. The infection strength is characterised
by the parameter β . An infected farm becomes removed after T days of being infected. The
values of parameters β = 1 (the probability of transmission) and T = 21 (the recovery period)
were set to emulate a foot-and-mouth disease epidemic in the absence of control. The probability
3
of transmission being set to one means that if a farm is infected it is assumed that all livestock on
the farm are infected.
1.
Epidemic Statistics
We will characterise epidemic using a few natural statistics. In this project we are interested
in analysing the early stages of an outbreak so our models will run for ninety time-steps (days).
Whilst it is likely that FMD would be detected earlier than this (and movement bans would be
introduced) in the event of an outbreak, it was decided to run the model for ninety days to allow
for a more detailed comparison of epidemics simulated on the recorded and the reconstructed data.
We say an outbreak has ‘taken-off’ if there are ten or more farms infected or removed at the
end of the simulation. The final size of an outbreak will be the total number of farms in either the
removed or infected class after ninety days. Statistics are averaged over only simulations which
have reached the take-off threshold, with 200 simulations being run for every realisation of the
network.
The locations of all the farms in the network are known allowing us to keep track of where
the infection has reached. The number of counties infected after thirty, sixty and ninety days was
recorded with the number of counties infected being described as the ‘spread’ of the epidemic.
The initial growth rate of the epidemic is also of interest. This was taken as the gradient of the
total number of infected farms as a function of time over the first twenty-one days of the outbreak,
i.e. until recovery takes effect. We consider the effect of seeding the infection in different counties.
In each case the infection begins in the most highly connected farm based on weighted-out-degree
(i.e. the farm with the most movements away from it).
The counties investigated were: Cumbria, Devon, Aberdeenshire, Surrey and Clwyd. Over
60% of movements originating at a farm in Cumbria are to other counties and therefore we expect
that infections seeded in Cumbria would spread to several other counties. FIG 1 shows a histogram
plot of the counties which receive movements from Cumbria – the majority are to its neighbouring
counties in the North of England. Similar histogram plots for the other counties can be seen in
the appendix. FIG 1 also compares movements from the full data set with what remains after the
network has been depleted to 30% of its movements, showing some of the effects of deletion on
the network.
Aberdeenshire has a large farming community but appears to be quite isolated – 70% of movements are within-county and therefore we expect that infections starting in Aberdeenshire should
stay in Scotland. Devon has similarities with Aberdeenshire in that 60% of movements are withincounty; however it does have strong connections with the surrounding counties of Cornwall and
Somerset. The movement data also suggested that outbreaks in Surrey and Clwyd would be relatively short-lived and would not infect many other counties.
In the main body of the paper we focus on epidemics seeded in Cumbria; however the results
for the remaining counties are presented in the appendices and discussed in section IV.
C.
Network Properties
After constructing the adjacency matrix we can calculate some network properties [10]. We
expect these properties to reveal information that would influence epidemic spread. If the network
is highly connected epidemics will spread quicker and further from the source farm.
4
4
number of movements to county
3
2.5
x 10
full network
depleted to 30%
Cumbria
2
1.5
North Yorkshire
1
0.5
Durham
Lancashire
Northumberland
Aberdeen
0
counties recieving more than 20 movements
FIG. 1: Histogram showing the number of movements beginning in Cumbria and their distribution amongst
UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than
20 movements are shown. 60% of Cumbria’s cattle movements are external and these are mainly distributed
amongst other counties in the North of England as well as Aberdeenshire.
1.
Degree Distribution
The degree of a node is defined as the number of edges connected to that node. However, given
that our networks are directed, A is not necessarily symmetric so we define an in-degree kin (i)
as the number of edges pointing to node i whilst the out-degree kout (i) is the number of edges
pointing away from node i. As the cattle network is not only directed but also weighted we can
define weighted in and out degrees, known as degree strengths. These are quickly calculated using
the adjacency matrix. The sum of the elements of row i is the out strength of node i, sout (i) = ∑ j Ai j
and the sum of the elements of column i is the weighted in-degree of node i, sin (i) = ∑ j A ji . We
can also analyse the distribution of the weights. These distributions are plotted in FIG 2 and FIG 3
. All the distributions shown here appear to exhibit power-law behaviour and therefore we suggest
that the networks are scale-free and could show similarities with preferential attachment networks
[2].
The study of degree distributions can give good indications as to how the network forms and
therefore how it may be reconstructed. Various studies [3] [4] have also been conducted showing
that graphs formed under preferential attachment are more robust then those formed randomly, that
is a preferential attachment graph will not fragment as quickly as a random graph under random
deletion of edges or nodes.
Clustering
The global clustering coefficient measures the average probability that two neighbours of a
vertex are themselves neighbours. It can be viewed as the ratio of triangles to the number of
connected triples in network. We first set all non-zero elements in A to one, ignoring weights. aipj
is an element of A p , where the elements of A p represent paths of length p:
5
number of nodes
10
0
10
0
10
2
4
10
in degree
10
10
0
10
0
10
5
10
0
10
0
10
1
2
10
10
3
10
out degree
5
number of nodes
number of nodes
number of nodes
5
2
4
10
in−strength
10
5
10
0
10
0
10
1
10
2
10
out−strength
3
10
1000
4000
800
3000
2000
1000
0
6
10
number of edges (log−scale)
5000
node out−strength
node in−strength
FIG. 2: Vertex degree and strength distributions for incoming and out going movements for the full network.
These have all been plotted on log-log scales. The form of distributions suggest a possible scale free
network.
600
400
200
0
1000 2000
in degree
3000
0
4
10
2
10
0
0
100
200
out degree
300
10
0
1
2
10
10
10
edge weight (log−scale)
FIG. 3: The first two plots show vertex strength against degree. These plots are linear suggesting that farms
which are connected to a lot of farms also use these connections multiple times. The third plot show the
edge weight distribution, once again looking very power-law like.
aipj
(
n, i can reach j in n ways by paths of length p
=
0, there are no paths of length p between i and j.
For global clustering we want the number of triangles, closed paths of length three divided by the
6
4
4
x 10
number of connected componets
3.5
0.06
3
0.05
giant component size
local clustering coefficient
x 10
0.04
0.03
0.02
2.5
2
1.5
1
0.01
0.5
0
0
20 40 60 80 100
movement percentage
6.5
6
5.5
5
4.5
4
3.5
20 40 60 80 100
movement percentage
20 40 60 80 100
movement percentage
FIG. 4: Network statistics for the cattle network shown as function of the movement percentage. Local
clustering (left) and giant componet size (middle) decay as movements are removes while the the number
of connected components (right) increases.
number of connected triples
CG =
Tr(A3 )
||A2 || − Tr(A2 )
where || · || = ∑i j (·).
We can also define the local clustering coefficient for vertex i , Ci , which measures the ratio of
the number of pairs of neighbours of i that are connected to the number of pairs of neighbours of i.
A clustering coefficient for the entire network can then be calculated as the mean of the clustering
coefficients for each vertex
CL =
1 n
∑ Ci.
n i=1
These alternate definitions of clustering yield different results – in this project we will generally
use the second definition (for networks of our size it is easier to compute). We should also note
here that when we compute clustering we ignore direction of links and therefore symmetrise the
adjacency matrix and set all weights to one.
2.
Connected Components
A connected component of a graph is a sub-graph where all nodes in the sub-graph are connected to each other by some continuous path along the edges of the sub-graph. In the context of
directed networks there exist strongly and weakly connected components. A weakly connected
components do not take in to account edge direction, where as a strongly connected component
does. For two nodes to be in the same strongly connected component a path must exist from node
i to node j and a path must exist from node j to node i by following the directed edges. We focus
on strongly connected components only.
7
The giant connected component is the largest connected component in a graph and gives an
upper bound on the number of farms that could become infected.
While the number of groups is fairly irrelevant for the initial networks, the giant component of
the cattle network containing 95% of farms, as edges are deleted the speed at which the network
fragments gives an indication of how well connected the network is and whether much movements would need to be cancelled in order to prevent a disease outbreak. How badly the network
fragments will serve as an indicator as to whether or not the network can be rebuilt.
D.
Effect of Deletion
counties infected
300
Cumbria
Devon
200
100
0
20
40
60
80
2
1
0
20
40
60
80
movement percentage
100
30
20
10
0
100
take off percentage
early growth rate
final epidemic size
Movements were deleted at randomly from the adjacent matrix. A movement being deleted
means reducing a non-zero element of the adjacency matrix by 1/365. The new adjacency matrices
formed by deletion were saved incrementally after they had lost one percent of the original number
of movements until we reached 10% of original movements. This process was carried out ten times
with the results below averaged over these different realisations of the network FIG 4. The results
for epdiemics seeded in Cumbria and Devon are seen in FIG 5. The size and spread of epidemics
falls quickly as movements are deleted. Though these results are averaged, there was practically no
variance between the individual deletions of the network and the mean result. Epidemic statistics
for the other counties followed similar trends and so are not shown in this paper. We refer the
reader once again to the histogram plot 8 to see the effect of deletion on cattle movements in
Cumbria. We also note here that unless otherwise stated we have run SIR simulations on the
networks at 5% increments of the data from 10% up to to the full network.
20
40
60
80
100
40
60
80
movement percentage
100
100
50
0
20
FIG. 5: The effect of deleting movements for epidemics seeded in Cumbria and Devon. There is a sharp
drop off in epidemic size and spread as soon as any movements are deleted. By the 10% mark almost no
simulations are reaching the take-off threshold.
8
III.
A.
RESULTS
Rebuilding the Network
When rebuilding only one realisation of the depleted data was chosen at random to be rebuilt.
This is because in reality one is only given one instance of the data, also as there is very little
variance between the network and epidemic statistics for individual deletions this choice is justified. Various methods of differing complexity can be used to rebuild the network. The effect of
rebuilding from different percentages was examined. Adding a movement from farm i to farm j
means increasing the corresponding element of the adjacency matrix by 1/365.
The first method was to rescale the weights of all the edges in the network we are rebuilding
original
from, Adepleted , to create a rebuilt network, Arebuilt such that ∑ arebuilt
= ∑ ai j
. For example,
ij
rebuilt
depleted
if we are rebuilding from 10% then A
= 10 × A
. It was expected that when epidemics
were run through this model that for the higher rebuilding percentages we would overestimate the
size of the epidemic as though less farms are subject to infection (there are less edges) the ones
that have an infected contact are more likely to get infected as the strength of that edge is greater
than it would have been in the original network.
Next we considered adding one new movement every time step.The new movement would add
to an existing edge based on the current weight of the edges. The probability of a movement being
added from farm i to farm j at time t was
p(i → j,t) =
arebuilt
(t)
ij
(t)
∑ arebuilt
ij
.
(3)
The second method does not allow for new edges to form in the network. The third method
employed was to give a probability of forming a new edge. A new movement would be added to
the data set randomly with a probability π, and would be added to an existing edge with probability
1 − π, in the same manner as the second method. This method has some obvious flaws as the new
edges are created purely randomly, no spatial data or network structure is used to suggest what
might be an appropriate new edge, however with the right choice for π the number of edges (as
opposed to number of movements) present in the original network could be reached which should
capture more long range movements.
As methods 1 and 2 do not add any new connections the network statistics mentioned previously
will be the same as those calculated when deleting movements from the network. The weighted
degree distributions do change somewhat, but the results are not interesting.
B.
Comparisons Between Rebuilding Methods
The first and second results yielded similar results. On further reflection this is not surprising,
indeed a relation is derived in appendix A showing the two methods to be equivalent. Indeed
the discrepancy between the results is expected as networks were only rebuilt using the second
method ten times. However, despite the simplicity of the method some interesting results have
been obtained. With only 30% of the network the values for final epidemic size and number of
counties infected approach the results for epidemics carried out on the full network, for epidemics
seeded in Cumbria, FIG 6.
Indeed, a histogram plot of the number of farms infected per county shows that the results at
30% and for the full network are close at a local level. This is further illustrated in FIG 9.
9
20
counties infected
final epidemic size
300
200
100
method 1
method 2
0
20
40
60
80
5
method 1, after 30 days
method 2, after 30 days
20
40
60
80
100
40
counties infected
early growth rate
10
0
100
2
1.5
1
0.5
0
15
method 1
method 2
20
40
60
80
movement percentage
100
30
20
method 1, after 60 days
method 2, after 60 days
method 1, after 90 days
method 2, after 90 days
10
0
20
40
60
80
movement percentage
100
FIG. 6: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run
in Cumbria. A reference line is included for comparison of final epidemic sizes with the original network.
Non-parametric percentile bootstrapping was used to give 95% confidence intervals. These are only shown
for the first method.
The results for Devon are less satisfying as can be seen in the appended figures good estimates
for the final epidemic size cannot be obtained with less than 60% of the full data. The number
of counties being infected does become comparable with the original data for rebuilds from 30%.
Aberdeenshire recieves very favouarble results and can be rebuilt from as low as 20% of the
original data.
Take-off rate are not shown here for the rebuilds methods as all almost all simulations reached
the designated threshold for consideration.
The prediction ability of the third method for rebuilding was significantly worse that the other
methods, FIG 7. Generally adding random links did not help to capture the long-range movements
and prevented the original edges from gaining strength. Results shown are for rebuilding Cumbria
and Devon from 10%, 30%, 50% and 80%, with values of π being 0.1, 0.2, 0.4, 0.5, 0.7 and 0.9.
The value π = 0 represents the results from the second rebuilding method.
final epidemic size
(10%)
100
Cumbria
Devon
50
0
0
0.2
0.4
0.6
(50%)
300
200
100
0
0
0.2
0.4
0.6
probability of random movement
0.8
(30%)
300
200
100
0
0.8
final epidemic size
final epidemic size
final epidemic size
10
0
0.2
0.4
0.6
0.8
(80%)
300
200
100
0
0.2
0.4
0.6
probability of random movement
0.8
FIG. 7: The effect on rebuilding the network from different movement percentages using the third method
described in the text, shown for epidemics seeded in Cumbria and Devon. The final epidemic size generally
decreases as the chance of creating random movements π is increased, the only exception to this appears
when rebuilding Devon from 50% with a π = 0.1 however even then the epidemic size is well below 300
which is the size for the full network in Devon.
North Yorkshire
35
rebuilt at 10%
rebuilt at 30%
full network
number of farms infected
30
Cumbria
25
Aberdeen
20
15
10
Durham
Dumfriesshire
Northumberland
Lancashire
5
0
counties
FIG. 8: Histogram plot showing the number of farms infected for a given county, for epidemics beginning
in Cumbria. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full
network (black). Rebuilding from 30% captures quite well the overall size and spread of the disease but for
the most part overestimates the size of the epidemic in counties close to and including the source county
while underestimating the size for counties further away.
IV.
DISCUSSION
The most striking result of this project is that for some counties only partial knowledge of the
contact network is necessary to make epidemiological predictions on the size and spread of an
outbreak. The obvious question is why is this is not the case for all counties? The cause must
surely lie with the local network structure in these counties. Looking first at the percentage of
total movements from a county that were not within-county did not draw good comparisons as
11
FIG. 9: Heat maps showing the number of farms infected in different counties in the UK for epidemics
seeded in Cumbria for a) the full network and b) the network rebuilt from 30%. The size and spread of the
disease is almost perfectly captured when the network is rebuilt from 30%.
Aberdeenshire and Devon, two very insular counties had very different results upon rebuilding.
One hypothesis would be that not only is Aberdeenshire very insular it is also highly connected
and as such its edge structure is robust against deletion. In contrast Devon may not be as highly
connected.
To check this we can calculate the global and local clustering coefficients for the sub-networks
made up only of Aberdeenshire farms and Devon farms. Plotting both types of clustering as well
as component statistics as we delete edges we see there are quantitative differences between the
two counties, FIG 10. If we then look at the components of the two sub-networks, FIG 10 we
see that the giant component size in Devon is initially much larger than in Aberdeenshire, giving
rise to larger epidemics for the full data set, but the Devon sub-network fragments into many
more connected components than the Aberdeenshire sub-network with the giant-component size
decaying at a much higher rate. This may imply that clusters in Devon are connected by several low
weighted links. Once these links are broken, the network fragments and the methods of rewiring
that we have described here would be unable to recreate that network structure.
Other avenues that were explored here included degree distribution and weight distribution
within the counties. However no obvious differences were observed within these. No doubt a more
thorough analysis of network properties would yield further differences between the two counties,
giving a greater indication as to the circumstances under which one could rebuild networks in such
a way as to make accurate epidemic predictions. More complicated rebuilding strategies are easily
envisaged where one would form new edges based on the locations of existing connections.
The counties of Surrey and Clwyd were also investigated. Surrey had quite small epidemic
sizes, the figure in the appendix do suggest it can be rebuilt from partial knowledge but the figures
are too low to draw definite conclusions. Clwyd has a larger farming community than Surrey and
12
0.05
0
global clustering
Aberdeen
Devon
20
40
60
80
100
0.03
0.02
0.01
0
connected componets
0.1
giant component
local clustering
it would seem reasonable to suggest based upon the appended figures than rebuilding from 35%
effectively reproduced the size and spread of epidemics seeded in Clwyd.
20
40
60
80
movemenet percentage
100
4000
3000
2000
1000
20
40
60
80
100
40
60
80
movement percentage
100
1500
1000
500
0
20
FIG. 10: Network statistics for Aberdeenshire (blue line) and Devon (green dashed line). The plots on the
left show local (upper-left) and global (lower) clustering coefficients at a given percentage of movement
data. The values for both measures are consistently twice as high for Aberdeenshire as they are for Devon.
The plots on the right show the size of the giant component (upper-right) and the number of connected
components (lower-right) at a given percentage of movement data.
13
V.
CONCLUSIONS AND FUTURE WORK
This work shows that partial knowledge of a contact network can be sufficient to make good
epidemiological predictions, indicating that models such as these could be used in regions where
precise demographic data are not available. Indeed our best results came for Cumbria which can
be rebuilt from 30% this is significant as Cumbria was one of the counties most affected by the
2001 FMD outbreak in the UK.
We have also shown how to identify regions that need more complex rebuilding methods, by
looking to see if the region fragments rapidly under edge deletion. FIG 7 shows that while adding
random links did not work in general exploring this idea more closely may help to rebuild regions
such as Devon.
Taking the project we would like to further test our hypothesis as to why some counties are not
so easily rebuilt with more analysis of the network structure. We would also like to try and invent
some more complex methods for rebuilding the network by edge creation taking into account
existing movement trends and spatial data.
VI.
ACKNOWLEDGEMENTS
The authors are grateful to EPSRC for providing funding, DEFRA for providing the CTS data
and UKBORDERS for providing the shape files used in FIG 9.
[1] N. T. J. Bailey. The mathematical theory of epidemics. Griffin, 1957.
[2] A. Barabasí and R. Albert. Emergence of scaling in random networks. Science, 286:509–512, 1999.
[3] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Network robustness and fragility:
percolation on random graphs. Phys. Rev. Lett., 85:5468–5471, 2000.
[4] R. Cohen, K. Erez, D. Ben-Avraham, and S. Havlin. Resilience of the Internet to random breakdowns.
Phys.Rev. Lett., 85:4626–4628, 2000.
[5] M. J. Keeling et al. Dynamics of the 2001 UK foot and mouth epidemic: stochastic dispersal in a
heterogeneous landscape. Science, 294:813–817, 2001.
[6] N. M. Ferguson, C. A. Donnelly, and R. M. Anderson. Transmission intensity and impact of control
policies on the foot and mouth epidemic in Great Britain. Nature, 413:542–548, 2001.
[7] M. J. Keeling and K. T. D. Eames. Monogamous networks and the spread of sexually transmitted
diseases. Math Biosci, 189:115–130, 2004.
[8] M. J. Keeling and K. T. D. Eames. Networks and epidemic models. J. R. Soc. Interface, 2:295–307,
2005.
[9] W. O. Kermack and A. G. McKendrick. A contribution to the mathematical thoery of epidemics. Proc.
R. Soc. A, 115:700–721, 1927.
[10] M. E. J. Newman. Networks. Oxford, 2010.
[11] J. M. Read and R. M. Christley. Disease evolution on networks: the role of contact structure. Proc. R.
Soc. B, 270:699–708, 2003.
[12] M. C Vernon and M.J. Keeling. Representing the u.k.’s cattle network herd as static and dynamic
networks. Proc. R. Soc. B, 276:469–476, 2009.
14
APPENDIX A: DERIVATION OF THE RELATIONSHIP BETWEEN REBUILD METHODS ONE
AND TWO
The probability of a movement being added to an existing edge at time t was defined as
p(i → j,t) =
arebuilt
(t)
ij
(t)
∑ arebuilt
ij
.
(A1)
A difference equation can then be derived for the number of movements from node i to j at
time t + 1 where the network begins with α Ñ movements and a single movement is added at each
time-step such that
N(i → j,t + 1) = N(i → j,t) + p(i → j,t).
(A2)
We start with some fraction α of the original movements Ñ and we add n movements such that
Ñ = α Ñ + n so
N(i → j,t + 1) = N(i → j,t) +
N(i → j,t)
αn + t
(A3)
and we use the initial condition N(i → j, 0) = N0 (i → j) this can be solved this explicitly
n−1 1
N(i → j, n) = N0 (i → j) ∏ 1 +
α Ñ + t
t=1
n + α Ñ
= N0 (i → j)
1 + α Ñ
Ñ
= N0 (i → j)
1 + α Ñ
(A4)
(A5)
(A6)
and assuming α Ñ >> 1 which it is, we obtain the result
Nn (i → j) =
1
N0 (i → j)
α
which is exactly equivalent to the first method of rebuilding.
(A7)
15
APPENDIX B: DEVON
4
number of movements to county
3
x 10
full network
depleted to 30%
Devon
2.5
2
1.5
1
0.5
Somerset
Cornwall
0
counties recieving more than 20 movements
FIG. 11: Histogram showing the number of movements beginning in Devon and their distribution amongst
UK counties using the full movement data and the data depleted to 30%. Only counties receiving more
than 20 movements are shown. 60% of Devon’s cattle movements are internal, whilst the others are mainly
distributed amongst the neighbouring counties of Somerset and Cornwall.
45
Devon
number of farms infected
40
35
30
rebuilt at 10%
rebuilt at 30%
full network
Somerset North Yorkshire
25
20
15
10
Cornwall
Shropshire
Cumbria
5
0
counties with more than 2 infected farms
FIG. 12: Histogram plot showing the number of farms infected for a given county, for epidemics beginning
in Devon. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full
network (black). Rebuilds from 30% do capture the counties that get infected underestimate the epidemic
size in all counties, even in Devon.
16
300
counties infected
final epidemic size
20
200
100
0
method 1
method 2
20
40
60
80
15
10
5
0
100
method 1, after 30 days
method 2, after 30 days
20
40
60
80
100
50
counties infected
early growth rate
2
1.5
1
method 1
method 2
0.5
0
20
40
60
80
movement percentage
100
40
30
method 1, after 60 days
method 2, after 60 days
method 1, after 90 days
method 2, after 90 days
20
10
0
20
40
60
80
movement percentage
100
FIG. 13: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run
in Devon. A reference line is included for comparison of final epidemic sizes with the original network.
Rebuilds do not begin to approach the original values for epidemic size until around 60%, they do begin to
capture the spread of disease earlier at around 30%
17
APPENDIX C: ABERDEENSHIRE
4
number of movements to county
3
2.5
x 10
full network
depleted to 30%
Aberdeen
2
1.5
1
0.5
0
counties recieving more than 20 movements
FIG. 14: Histogram showing the number of movements beginning in Aberdeenshire and their distribution
amongst UK counties using the full movement data and the data depleted to 30%. Only counties receiving
more than 20 movements are shown.70% of Aberdeenshire’s cattle movements are internal, the others are
distributed around Scotland and the North of England.
70
Aberdeen
rebuilt at 10%
rebuilt at 30%
full network
number of farms infected
60
50
40
30
20
Dumfriesshire
10
North Yorkshire
0
counties with more than 2 infected farms
FIG. 15: Histogram plot showing the number of farms infected for a given county, for epidemics beginning
in Aberdeenshire. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the
full network (black). Rebuilding from 30% recaptures the epidemic size and spread well, giving slight
overestimations is some case and a slight underestimation in Aberdeenshire itself.
18
20
counties infected
final epidemic size
200
150
100
50
0
method 1
method 2
20
40
60
80
5
method 1, after 30 days
method 2, after 30 days
20
40
60
80
100
30
counties infected
early growth rate
10
0
100
2
1.5
1
0.5
0
15
method 1
method 2
20
40
60
80
movement percentage
100
20
method 1, after 60 days
method 2, after 60 days
method 1, after 90 days
method 2, after 90 days
10
0
20
40
60
80
movement percentage
100
FIG. 16: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics
run in Aberdeenshire. A reference line is included for comparison of final epidemic sizes with the original
network. From rebuilds from 20% they epidemic size and spread give results that compare well with the
full network.
19
APPENDIX D: CLWYD
number of movements to county
10000
full network
depleted to 30%
Clwyd
8000
6000
4000
Chesire
Shropshire
Staffordshire
2000
Gwynedd
0
counties recieving more than 20 movements
FIG. 17: Histogram showing the number of movements beginning in Clwyd and their distribution amongst
UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than
20 movements are shown.
number of farms infected
20
rebuilt at 10%
rebuilt at 30%
full network
15
10
Chesire
Shropshire
North Yorkshire
Aberdeen
Staffordshire
5
0
counties with more than 2 infected farms
FIG. 18: Histogram plot showing the number of farms infected for a given county, for epidemics beginning
in Clwyd. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full
network (black). Rebuilding from 30% gives an overall underestimation for the size and spread of the
epidemic but it should be noted that even at 100% the epidemics sizes are quite small when compared to the
previous counties so it is hard to quantify these results.
20
20
counties infected
final epidemic size
150
100
50
method 1
method 2
0
20
40
60
80
5
method 1, after 30 days
method 2, after 30 days
20
40
60
80
100
30
counties infected
early growth rate
10
0
100
1
0.8
0.6
0.4
method 1
method 2
0.2
0
15
20
40
60
80
movement percentage
100
20
method 1, after 60 days
method 2, after 60 days
method 1, after 90 days
method 2, after 90 days
10
0
20
40
60
80
movement percentage
100
FIG. 19: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run
in Clwyd. A reference line is included for comparison of final epidemic sizes with the original network.
21
APPENDIX E: SURREY
number of movements to county
800
full network
depleted to 30%
700
600
Surrey
500
400
300
200
Devon
Kent
West Sussex
East Sussex
Norfolk
100
0
counties recieving more than 20 movements
FIG. 20: Histogram showing the number of movements beginning in Surrey and their distribution amongst
UK counties using the full movement data and the data depleted to 30%. Only counties receiving more than
20 movements are shown.
16
rebuilt at 10%
rebuilt at 30%
full network
number of farms infected
14
12
10
Norfolk
8
6
North Yorkshire
4
Kent
Nothamptonshire
2
0
counties with more than 2 infected farms
FIG. 21: Histogram plot showing the number of farms infected for a given county, for epidemics beginning
in Surrey. Shown are results when rebuilding the network from 10% (blue), 30% (yellow) and the full
network (black). Rebuilding from 30% does seem to capture the epidemic size and spread but as with
Clwyd the figures are too low to draw concrete conclusions.
22
10
counties infected
final epidemic size
80
60
40
20
0
method 1
method 2
20
40
60
80
8
6
4
0
100
method 1, after 30 days
method 2, after 30 days
2
20
40
60
80
100
counties infected
early growth rate
20
0.4
0.2
method 1
method 2
0
20
40
60
80
movement percentage
100
15
10
method 1, after 60 days
method 2, after 60 days
method 1, after 90 days
method 2, after 90 days
5
0
20
40
60
80
movement percentage
100
FIG. 22: The effect of rebuilding the network using methods 1 and 2 as described in text for epidemics run
in Surrey. A reference line is included for comparison of final epidemic sizes with the original network.
Download