Detecting Disease Outbreaks using Markov Random Fields

advertisement
Detecting Disease Outbreaks using Markov Random Fields
CS 15-780 Final Project
Jon Derrberry (jonderry)
Matt Streeter (matts)
Summary: We address the problem of detecting disease outbreaks in real-time based on
available data. Our approach involves a type of intelligent smoothing of noisy
observations based on Markov random fields. The advantage of Markov random fields
over simpler methods such as kernel smoothing is confirmed experimentally.
1. Introduction
Our project addresses the problem of real-time detection of disease outbreaks
based on available data, for example the records of emergency departments. At
any time it is almost certain that we will see symptoms of a given disease
somewhere, and if we look at enough data we will probably find a geographical
region in which those symptoms appear, by chance, at an above-normal rate.
What we would like to do is identify, based on a time series of data running up
to present moment, any geographical region where the data is such that the
symptoms could not reasonably have occurred by chance. Our approach to this
problem is based on Markov random fields.
Our problem, goals, and motivations are broadly the same as those of Neill
and Moore (2003). One difference is that we focus exclusively on classification
accuracy, without rigorously assessing the statistical significance of each
prediction. This is a significant drawback, because statistical significance is very
important to an epidemiologist. On the other hand, our problem formulation is a
bit more general than that of Neill and Moore, and the answers produced by our
algorithms are less constrained.
The following section provides a formal statement of the problem we address.
Section 3 describes Markov random fields, which are central to our approach.
Section 4 presents the algorithms we use to generate random instances of the
problem for testing. Section 5 describes a Java application we created to perform
experiments and visualize the results. Section 6 describes our MRF algorithm, as
well as five simpler algorithms we compare against. Section 7 is the
experimental comparison, and section 8 is the conclusion.
2. Problem statement
The problem we consider involves labeling the vertices of a graph. Each vertex
in the graph represents a locale (e.g., a town), and an edge between vertices u and
v represents the ability of an infectious disease to spread (rapidly) from u to v.
The number of people living at each locale (the population) is known, as is the
number of people showing symptoms of a certain disease (the disease count) for
time t=0 up until the present. Our task is to determine, based on the time series
of disease counts and the populations, whether something out of the ordinary is
happening. Specifically, we will be looking for a contiguous set of vertices
whose disease counts are unlikely to have occurred by chance. The output of our
algorithm will be a labeling of each vertex as either “infected” or “not infected”.
This problem formulation is a generalization of the one studied by Neill and
Moore (2003) in that we consider arbitrary graphs (not just grids), in that any
contiguous subset of the vertices may be infected (rather than only rectangular
regions), and in that we present our algorithms with a time series of disease
counts, rather than just the disease counts at the current time. In regard to this
last point, however, it should be noted that the algorithms we will study make
use of disease counts prior to the current time in only a very limited way.
Note that in practice the “disease counts” do not need to be actual counts of
people who have a certain disease. Instead they can be indirect indicators of
disease, such as sales of cough syrup, and the techniques developed here will
still apply. Nonetheless we will refer to “disease counts” as if they were actual
counts of people having some disease.
Table 1 shows the various elements that comprise an instance of this problem.
Table 1. Elements that comprise an instance of the outbreak-detection problem.
Element
Meaning
G=<V, E>
Graph of locales. V is a set of locales and E is a set of edges
connecting pairs of locales between which disease can be
transmitted rapidly
population(v)
The population of some locale v  V
disease-count(v, t)
The disease count of some locale v  V at time t  {0, 1, ...,
tcurrent}
3. Markov random fields
In a (first-order) Markov chain, the future of the system depends on the past
only by way of the most recent state. Letting Q = {Q0, Q1, ..., QN} be a set of
random variables that take on values in a state space S, and letting q={q0, q1, ...,
qN} be their realizations, this is expressed by the equation:
Pr[qn+1 | q0 , q1, ..., qn] = Pr[qn+1 | qn].
Markov random fields are a generalization of Markov chains which allow qn+1
to be conditionally dependent on an arbitrary (but fixed) subset of q, rather than
just qn. When talking about Markov random fields, it is useful to think of the
elements of q as arranged in a (directed) graph in space, rather than as a sequence
in time. An edge in this graph represents conditional dependence. Letting
neighbors(qi) denote the elements of q that have edges pointing into qi, this is
expressed by the equation:
Pr[qi | q-{qi} ] = Pr[qi | neighbors(qi)].
Given a graph and an observation for each of its vertices, what we would like
to do is determine the maximum-likelihood set q of underlying states for each of
the vertices. If the graph is a Markov chain, this can be done efficiently using the
Viterbi algorithm, however in the general case the problem in NP-complete.
Fortunately, however, the special case of the problem that we need to solve is not
NP-complete, and can be reduced to a minimum cut problem solvable by
standard algorithms. The algorithm we use is due to Boykov et al. (1998), and is
described further in section 5.6. For approximation algorithms that has been
found to work well in the general case, see Boykov et al. (1998) and also
Hochbaum (2001).
The mapping from the problem described in the previous section to the
problem of Markov random field inference is straightforward. The relevant
observations are of course the disease counts. The graph that defines the Markov
random field has two directed edges (one in each direction) for every undirected
edge in the original graph G that defines the problem. Further details are given
in section 5.6.
A good online introduction to Markov random fields is provided by Li (1995).
4. Instance generators
An algorithm for generating a random instance of the problem described in
section 2 consists of two components: an algorithm for creating the graph of
locales with their associated populations, and an algorithm for filling in the time
series of disease counts. Our graph generation algorithms are described in the
following section, and the algorithm for generating the time series of disease
counts is described in section 4.2.
4.1. Graph-generation algorithms
We use three different graph-generation algorithms: Grid, Random, and Planar.
Each of these algorithms takes as a parameter a value Nlocales, which specifies the
number of vertices in the graph.
Grid. The Grid algorithm arranges the vertices in a square grid with sides of
length
 N . Each vertex is connected to its north, south, east, and west
locales
neighbors.
Random. The Random algorithm assigns random (x, y) coordinates from the

interval [0,1] to each of the Nlocales vertices. Each vertex is connected to its six
nearest neighbors. The way this is done is that the vertices are visited in a
round-robin fashion for six rounds, and each time a vertex is visited it is
connected to its nearest neighbor among vertices that are not already its
neighbors. Each vertex will be of degree at least six, but the maximum vertex
degree will generally be higher.
Planar. The Planar algorithm is just like the Random algorithm except than when
adding edges we first check to see if the edge would cross any existing edge. If
so the edge is not added.
In all cases, the populations of locales are drawn from a Gaussian distribution
with mean 10,000 and standard deviation 1,000. Negative populations are
rounded up to 0.
4.2. Filling in time-series data
At each time step, disease counts for each uninfected locale u are drawn from a
Poisson distribution with parameter rbaseline*population(u). Disease counts for each
infected locale v are drawn from a Poisson distribution with parameter
rinfected*population(v).
It remains to describe how we determine which locales are infected at each
time step. Initially (at time 0) all locales are uninfected. At time tonset, a random
locale becomes infected. Once a locale becomes infected, it stays infected for all
remaining time steps. Additionally, for each time step after tonset, each infected
locale has a probability ptransmission of infecting each of its neighbors. Neighbors
become infected independently.
Table 2 summarizes the various parameters of our instance-generation
algorithm.
Table 2. Parameters that are used in generating an instance of the outbreak
detection problem.
Parameter Meaning
Graph-type Grid, Random, or Planar
Nlocales
number of vertices in graph
T
total number of time steps
tonset
time at which infection begins
rbaseline
disease rate for non-infected locales
rinfected
disease rate for infected locales
ptransmission
probability that an infected locale will infect a particular one of its
neighbors on the next time step
5. Algorithms
This section describes the algorithms we have developed to solve the problem of
detecting disease outbreaks.
5.1. Naive algorithm
At each time t, the naive algorithm examines, for each vertex v, the likelihood
that disease-count(v, t) was generated from an infected locale and compares it to
the likelihood that disease-count(v, t) was generated from an uninfected locale.
The former is equal to the probability density at disease-count(v) of a Poisson
distribution with parameter rinfected*population(v), while the latter is equal to the
probability at disease-count(v) of a Poisson distribution with parameter
rbaseline*population(v). If the likelihood that disease-count(v, t) was generated from
an infected locale is greater than the likelihood that disease-count(v, t) was
generated from an uninfected locale, v is predicted to be infected at time t;
otherwise v is predicted to be uninfected at time t.
5.2. Nearest neighbor algorithm
The nearest neighbor algorithm is identical to the naive algorithm, except that
before applying the naive algorithm it performs smoothing based on the nearest
neighbors of each vertex. Specifically, the population and disease count of each
vertex are replaced by the summed populations and disease counts, respectively,
of the vertex and all its immediate neighbors. As will be shown, this algorithm
has a significantly lower false positive rate than the naive algorithm.
5.3. Kernel smoothing
Like the nearest neighbor algorithm, this algorithm employs the naive algorithm
after using smoothing as a preprocessing step. In this case we use kernel
smoothing: the population and disease count of each vertex v are replaced by a
distance-weighted average of the population and disease counts, respectively, of
all vertices in the graph. Each distance is measured as the shortest-path distance
to v, and the weights are given by a Gaussian kernel function applied to the
distance. We use a Gaussian kernel with mean 0 and variance 1.
5.4. Greedy growth:
In this approach, we conduct a greedy search to find a contiguous subset of
vertices that maximizes the ratio of the likelihood that the region is infected to
the likelihood that it is not infected. Specifically, for each vertex v  V we do the
following:
1. Let Rv = {v}.
2. Let R’ = Rv.
3. For each vertex w  Rv that is a neighbor of a vertex in Rv:
3.1.
If likelihood-ratio(R’  {w}) exceeds likelihood-ratio(R’), let R’ = R’ 
{w}.
4. If R’ = Rv, return Rv. Otherwise let Rv = R’ and goto 2.
The function likelihood-ratio(R) is defined as:



Poisson
diseasecount
v,t
,
r
population
v




infected


v R

v V
likelihood-ratio(R) =



Poisson
diseasecount
v,t
,
r
population
v

baseline 




v V
v R
 

 

where t is the time of the current prediction and where Poisson(x, q) denotes the
probability density of a Poisson distribution with parameter q at x.

The approach just described is similar to Kulldorff’s spatial scan statistic
(Kulldorff 1997). The main difference is that here we use a greedy algorithm to
find a region which maximizes the likelihood ratio, whereas Kulldorff uses an
exhaustive search among all regions from a predefined set.
5.5. Temporal smoothing
The temporal smoothing algorithm can act as a wrapper for any of the previous
four algorithms. Specifically, this algorithm takes the predictions made by an
underlying algorithm and modifies them according to the following rule: if a
vertex v is predicted as infected at time t but not at time t-1, predict v as
uninfected instead; otherwise go with the original prediction. This simple form
of temporal smoothing reduces a lot of spurious false positives and considerably
increases the accuracy of each of the previous four algorithms.
5.6. Markov random field approach
The main focus of our efforts has been detecting disease outbreaks using
Markov random fields. The mapping between the problem of detecting infected
locales and the problem of Markov random field inference was briefly discussed
at the end of section 3; here we give further details. The graph G of directly
defines the neighborhood function of the corresponding MRF (i.e., u is
conditionally dependent on v in the MRF iff. u is connected to v in G). The
analog of a “sensor model” (i.e., the probability of each possible observation
given the underlying state) is simply given by the Poisson distribution. It
remains only to describe how we solve the problem of MRF inference.
Boykov et al. (1998) show how to transform the problem of Markov random
field inference (finding a maximum-likelihood set of underlying states for the
vertices) into a multiway minimum-cut problem on a graph. The basic idea turn
the original graph G (i.e., the graph that defines the Markov random field) into a
weighted graph G’ with special “label” vertices for each state in the state space S.
Each of the edges in the original graph G is assigned a uniform weight in G’ (we
use a weight of 0.5). Each of the label vertices in G’ (one per state) is connected to
each of the non-label vertices with edge of large weight (to be described). We
then find a multiway minimum-cut that divides G’ into |S| components, with
each label vertex residing in a separate component. The state assigned to each
vertex is then determined by the label vertex that lies in the same connected
component. The “large weights” just mentioned are chosen so that (a) we are
guaranteed that no two label vertices will end up in the same connected
component and (b) the multiway minimum-cut will respect the sensor model
(i.e., a vertex is more likely to end up in the same connected component as a label
vertex that has high probability given the observation at that vertex). See Boykov
et al. for further details. Boykov et al. provide a local search algorithm for
solving this multiway cut problem that is guaranteed to find a cut with locally
minimal cost. However, it turns out that our problem is a simpler special case
that does not require the use of local search.
In our problem, the state space contains just two states, “infected” and
“uninfected”. The multiway cut problem with two terminals is thus just the
standard s-t minimum cut problem, which can be solved efficiently using the
Ford-Fulkerson algorithm. The time complexity of the Ford-Fulkerson algorithm
is O(|E|*wcut), where E is the set of edges and wcut is the sum of the weights of
the edges that are cut. We know wcut is O(|E|), so our algorithm runs in worstcase time O(|E|2). In practice, the actual running time of the Ford-Fulkerson
algorithm for this application is closer to linear (Boykan et al. 1998)
Nevertheless, if better worst-case time complexity is desired, a randomized
algorithm is available to solve the s-t cut problem for arbitrary graphs in time
O(|E|*log3(|V|)) (Karger 2000). The only reason we do not use this is because
Karger’s algorithm is quite complex and difficult to code, whereas coding the
Ford-Fulkerson algorithm is straightforward. Thus, although our algorithm as
coded is quadratic time in the worst case, it could be made linear time with
additional programming effort.
In short, the approach we are advocating is near-linear time, even if our
implementation of it is not.
5.7. Intertemporal MRF
We also consider an “intertemporal” variant of our MRF algorithm. This
algorithm is just like the algorithm described in section 5.6, except that we alter
the MRF so that locales have neighbors in time as well as in space. Specifically,
when making a prediction at time t the graph that defines the MRF will be the
union of the graphs created by the algorithm of section 5.6 for times t, t-1, ..., tWS+1 (by the union of a set of graphs, we mean a graph <VU, EU> where VU is the
union of the vertex sets and EU is the union of the edge sets). In addition to the
edges in EU, the MRF graph will have edges of weight 0.1 between copies of the
same vertex at adjacent times. For example, there will be an edge of weight 0.1
from the copy of vertex v from time frame t-1 to the copy of vertex v from time
frame t-2. Here WS is a “window size” parameter that indicates how much of the
past is considered when making a prediction about the present. The high-level
idea is that these extra temporal edges encourage the algorithm to make the same
predictions about a given vertex across time, where possible. This is intended to
be a more principled alternative to the temporal smoothing procedure described
in section 5.5, applicable only to the MRF algorithm.
Note that because we have only increased the number of edges and vertices
in the graph by a constant factor, the time complexity results described in the
previous section still hold.
6. Visualization
We have created a Java application for interactive instance generation,
application of inference algorithms, and visualization of results.
Figure 1 shows a screen shot of our tool. The left panel shows the world at
the time step indicated by the position of the slider. Each circle indicates a locale,
and the area of each circle is proportional to the population of the corresponding
locale. The color of the circle indicates the disease count of the locale: the red
component of the color (on a scale from 0 to 1) is proportional to the fraction of
diseased people in the locale, and the green component is equal to the fraction of
non-diseased people. If a locale is actually infected (independent of its current
disease count), there is a white rim around its circle. In the right panel, the
predictions made by the currently-selected inference algorithm are shown. Here,
the color green indicates a prediction of “uninfected” while red indicates a
prediction of “infected”. In this screen shot, all but two locales are predicted
correctly.
The user interface is intuitive. In the left panel the user selects various
parameters that determine how the world is generated, then clicks “Generate
World” to generate an instance of the problem. In the right panel, the user can
select an inference algorithm and click the “Estimate” button to display its
predictions. VCR-like controls are available in the bottom panel to control
animation of the time series of disease counts and predictions.
Figure. 1. Visualization tool.
7. Experimental evaluation
To evaluate the performance of the algorithms described in section 5, we ran
each algorithm for two sets 100 trials on problem instances generated with
parameter settings (see table 2) Graph-type=Random, T=100, Nlocales = 100,
tonset=10, ptransmission=0.01, and rbaseline=0.001. For the first set of 100 trials, we set
rinfected =0.002. These values of rbaseline and rinfected are the same as those used by
Neill and Moore (2003). For the second set of 100 trials, we make the problem
considerably more difficult by setting rinfected =0.0012. Each of the first four
algorithms discussed in section 5 (Naive, Nearest neighbor, Kernel smoothing,
and Greedy) were run both with and without temporal smoothing, and we ran
the MRF algorithm as well as the “intertemporal” variant of the MRF algorithm,
for a total of 10 algorithms. Because execution of the “intertemporal” version of
the MRF algorithm was quite computationally intensive, we perform only 5 trials
on this algorithm; all 9 other algorithms go through the full 100 trials. The
results of these experiments presented in tables 3 and 4. Further discussion of
the results follows.
Table 3. Performance of various algorithms with rinfected =0.002.
Algorithm
Accuracy
Avg. false
positives
Naive
91.1%
8.27
Nearest neighbor
97.7%
1.79
Kernel smoothing
97.1%
1.41
Greedy
97.9%
1.09
MRF
98.4%
0.79
Naive (temp. sm.)
98.0%
0.75
Nearest neighbor (temp. sm.)
98.6%
0.21
Kernel smoothing (temp. sm.)
97.3%
0.69
Greedy (temp. sm.)
98.2%
0.08
MRF (intertemporal)
98.3%
0.67
Avg. false
negatives
0.59
0.55
1.45
1.01
0.85
1.26
1.14
1.96
1.75
1.00
Table 4. Performance of various algorithms with rinfected =0.0012.
Algorithm
Accuracy
Avg. false
positives
Naive
63.5%
34.3
Nearest neighbor
76.2%
21.3
Kernel smoothing
80.1%
17.3
Greedy
90.2%
5.18
MRF
93.6%
0.98
Naive (temp. sm.)
83.6%
12.3
Nearest neighbor (temp. sm.)
91.2%
5.05
Kernel smoothing (temp. sm.)
93.0%
3.35
Greedy (temp. sm.)
93.7%
0.42
MRF (intertemporal)
94.8%
0.04
Avg. false
negatives
2.16
2.53
2.58
4.58
5.40
4.10
3.71
3.68
5.88
5.20
Considering only the first four algorithms in table 3, the greedy algorithm
outperforms the remaining three algorithms, followed by nearest neighbor,
kernel smoothing, and the naive algorithm. In table 4, the rankings of kernel
smoothing and nearest neighbor are reversed. In both tables we see that the
large number of false positives produced by the naive algorithm is reduced by
using spatial smoothing – either nearest-neighbor smoothing or kernel
smoothing – and are reduced even further by temporal smoothing, though at the
cost of an increase in false negatives. The effect of temporal smoothing was
qualitatively the same on all four algorithms: it decreased the false positive rate
and increased the false negative rate. Both of these results were expected. The
effect of temporal smoothing on overall accuracy was positive in all four cases.
In both sets of experiments the algorithms based on Markov random fields
did very well. In the first set of experiments, the two MRF algorithms were
outperformed only by the nearest neighbor algorithm with temporal smoothing,
which outperformed the MRF algorithm only by a small margin (0.02%). On the
second set of experiments, the two MRF algorithms were the clear winners. We
believe that if the problems were made still more difficult the gap between the
MRF algorithms and the others would increase even more.
Figure 2 illustrates one frame of a time series generated on a grid world,
along with the predictions made by five of the algorithms given in table 3. The
high false positive rate of the naive algorithm, and the elimination of these false
positives by various forms of smoothing, are apparent in the figure.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(a) the world
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(c) predictions of nearest-neighbor
algorithm
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(b) predictions of naive algorithm.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(d) predictions of kernel smoothing
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(e) predictions of greedy algorithm
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(f) predictions of greedy algorithm with
temporal smoothing
Figure 2. (a) shows an example world and (b-f) show predictions made by 5
algorithms.
8. Conclusions
We have developed a total of 10 algorithms for detection of disease outbreaks
and have compared their prediction accuracy under two parameter settings. The
algorithms based on Markov random fields consistently outperform the other
algorithms, including algorithms based on simpler forms of smoothing.
Furthermore, we are able to obtain high classification accuracy even when the
disease rate of infected locales is only 1.2 times the disease rate of uninfected
locales, a level of performance that appears to be quite beyond that of the human
visual system. On this harder of the two parameter settings, the advantage of
Markov random fields is especially apparent. Although we have not derived
rigorous measures of statistical significance for the predictions made by our
algorithms, and though statistical significance is clearly important to
epidemiologists, we are pleased with the high classification accuracy we have
obtained. Further work might consider how our approach could be combined
with the techniques of Neill and Moore (2003), perhaps as a pre- or postprocessing step.
References
1. Boykov, Y., Veksler, O., and Zabih, R. 1998. Markov random fields with
efficient approximations. IEEE Computer Vision and Pattern Recognition
Conference, Santa Barbara, CA.
2. Karger, D. R. 2000. Minimum cuts in near-linear time. Journal of the ACM 47(1):
46-76.
3. Kulldorff, M. 1997. A spatial scan statistic. Communications in Statistics: Theory
and Methods 26(6): 1481-1496.
4. Hochbaum, D. S. 2001. An efficient algorithm for image segmentation, Markov
random fields and related problems. Journal of the ACM 48(4): 686-701.
5. Li, S. Z. 1995. Markov Random Field Modeling in Computer Vision.
http://research.microsoft.com/~szli/MRF_Book/Chapter_1/node11.html
6. Neill, D. B. and Moore, A. W. 2003. A fast multi-resolution method for
detection of significant spatial disease clusters. Carnegie Mellon University
Dept. of Computer Science, Technical Report CMU-CS-03-154.
Download