Disease Signatures

advertisement
Disease signatures – a simple
combinatorial-type exploitation of
them for our own evil purposes
Prof. Nina H. Fefferman
Visiting DIMACS from :
Tufts Univ. School of Medicine,
Dept. Public Health and Family
Medicine
Plan for today:
1) Looking very quickly at traditional SIR models
2) Communication problems
3) Tweaking parameter definitions
4) Using these definitions to clear up
communication
5) Building disease signatures
6) Decomposing reported disease into component
signature curves
7) Checking this method against reality
8) Where this method can take us from here…
A quick look at SIR models
I(t) = number of infected
S(t) = number of susceptibles
R(t) = number of recovered
in the population at time t
And if we want spatial spread :
Keep R, but I(t, x, y) and S(t, x, y) become functions of
position (x, y), and a is replaced by an expression
involving two other constants related to the rate at
which the infection diffuses through space
Go ask HHS
or NIH or
CDC for a
and b for
the next flu
season so
our models
can predict
it.
Good luck.
Pictures of equations stolen from : http://maven.smith.edu/~callahan/ili/pde.html
Leads us to : Communication Problems
Parameters/Variables used by epidemiologists are warm
and fuzzy and not rigorously defined
So modelers made up their own (you just saw them) – these aren’t
things doctors/public health people can really
measure  we can’t get accurate parameter values
Example: MANY people are worried about outbreaks
There is no good definition of what constitutes an
outbreak
BIG problem (mostly just ignored)
Modelers use the concept of R0 – the reproductive
number of disease (in the differential equation model, it’s the ratio of S to a/b)
It’s when the average number of new infections
caused by contact with a current infection is
greater than 1
Communication Problems cont.
R0 gives us a rigorous definition of something
good, but not of what we really need ‘outbreak’
to mean
Really, if we think about it, public health
people want ‘outbreak’ to refer to
“times when we need to pay attention
to disease spread for some reason”
How can we say this mathematically?
Communication Problems cont.
What can public health people/
doctors measure (at least sometimes)?

Infectivity : Probability of becoming infectious
after becoming exposed

Attack rate : Probability of developing disease
after becoming exposed

Pathogenicity : Probability of developing disease
after becoming infected

Virulence : Probability of dying after becoming ill

Immunogenicity : Attack rate for re-exposure
Tweaking Parameter Definitions
So :
• E(X,T)= Probability of exposure in population X at time T
• I = Probability of infection from exposure
Really,
these are
all
functions
of time,
but my
journal
referees
got upset
with
functions,
so most
are now
subscripts
• ST = Probability that infection at time 0 leads to
manifestation of symptoms at time T (a distribution
function which does not need to sum to one if not all of the infected
develop symptoms)
• CT = Probability that infection takes T days to become
contagion
• MT = Probability that the time from the onset of symptoms
to death from the disease is T days
• NT = Size of the population possibly exposed to infection
on day T (this will be our disease signature curve)
• IT = Probability of infection from current exposure, given
previous infection T days ago
Clearing up communication
With those we can build :
Pathogenicity : The probability of developing
disease after becoming infected
n
=  ST , for n the maximum recovery time
T=0
Virulence : The probability of dying after becoming ill
n
=  MT , for n the maximum recovery time
T=0
Infectivity : The probability of becoming infectious
after becoming exposed
n
= I*  CT , for n the end of the window for
T=0
the disease
Clearing up communication cont.
And :
Attack rate : The probability of developing disease
after becoming exposed
n
= I *  ST , for n the end of the window for
T=0 disease expression
But now we notice that, from our original list, Immunogenicity is not a
truly meaningful idea, so we define instead:
PsuedoImmunogenicity : Probability of infection
from current exposure, given previous infection T
days ago = IT
We won’t be using all of these today, but they’re still useful
to have if you ever need to talk to health people
Clearing up communication cont.
Now both the
math and
health people
have the same
picture!
But this is only one
town
The SIR models could
handle spatial spread
with PDEs…
Uses a slightly different
notation
Clearing up communication cont.
With multiple locations and central reporting :
?
?
Clearing up communication cont.
Notice : different occurrences don’t have to be
separated only spatially or temporally
Can be different demographic populations, or
anything that allows narrower, more accurate
estimations of exposure or susceptibility
Let’s call these narrower things
subpopulations
Building Disease Signatures
So, using our definitions and our flow chart:
For a given subpopulation, we can compute a
‘disease signature curve’ representing the
number of cases predicted over time from
a single instance of exposure
Notice : these signature curves depend on
subpopulation-specific etiology, including
the shape of the distribution for some
parameters – not just averages
Decomposing curves into
signatures
So, if we have a total reported disease curve,
we can iteratively define
(Notice populations exposed on different days are disjoint sets due to
the definitions)
Now we can think of a single reported curve
CT as the composition of these curves
Decomposing curves into
signatures cont.
Since we are interested in exploiting the
heterogeneity of etiological response
within a diverse population, we can specify
these curves by subpopulation Y:
Yielding the total disease incidence curve:
Decomposing curves into
signatures cont.
And we can even exploit immune memory by
further dividing subpopulations into classes
of those with similar immune protection
from previous infection
With IT = Probability of infection given previous infection T
days ago
And T* = the last day of most recent prior infection
Giving us
Decomposing curves into
signatures cont.
Important because public
health people may trust it
Now we can use high school math to find
combinations of signature curves that make up the
total reported cases curve!
How many different combinations of coins can make $1.50…
Coins
¢
10
Sub-Populations
¢
5
¢
25
Similarly, we can
ask how many
combinations of
‘signature curves’
can go into a ‘Total
Reported Cases’
curve:
Decomposing curves into
signatures cont.
Now let’s come back to the idea of an outbreak:
Remember, we wanted ‘outbreak’ to mean “times
when we need to pay attention to disease spread
for some reason”
Suppose that the only combination of disease signature curves was to
have EVERY subpopulation just beginning to show symptoms from a
disease – that means that soon many many more people will be sick –
we should probably pay attention to that
OR
Maybe the only combination of signature curves indicates that only one
location has been exposed – we might want to use that to find out what
the source of exposure was, or quarantine the area
No matter how we choose to define it (will be arbitrary), this
method can tell us WHY we should care now
Decomposing curves into
signatures cont.
Let’s take a look at an example of how this
can work
To begin with, let’s look at something very simple :
Giardiasis – a waterborne infection
causing diarrheal disease in humans
with extremely low levels of secondary
transmission (makes life simpler)
There was an actual “outbreak” in MA in
1995
Decomposing curves into
signatures cont.
Reported incidence for MA
(all of it)
HIPPA requires aggregation of data released to public and to most researchers without
special access
Decomposing curves into
signatures cont.
Decomposing curves into
signatures cont.
To use this
method, we need
some measured
parameter values
I’m cheating a little because
I’m assuming values for I, but
we could in theory measure
this
Decomposing curves into
signatures cont.
We know that most of the reporting came from 3 urban centers:
Decomposing curves into
signatures cont.
Then we can decompose by demographic subgroup for each town:
Decomposing curves into
signatures cont.
That was a really simple disease without any
secondary transmission
So what happens if there is secondary spread?
It gets MUCH more complicated…
First of all, the probability of exposure in each
subpopulation can start to depend on the levels of
infection in each other subpopulation
Now we start getting into the social network stuff
An aside
Social Networks : Oy vey
Since this is a talk and not a course, I can’t leave
this as an exercise to the reader, but I can use the
‘we only have a little over an hour’ excuse to
hand-wave some of the modeling details on this –
I’m going to talk about the concepts
If you are interested in the details, well, that’s why
I’m going to be around for the year
Again, rather than using mass averages, let’s still
keep the idea of a disease signature
So exposure isn’t a simple underlying rate - it’s
based on contacting an infected individual
We can think of individuals in each subgroup as
having certain probabilities of interacting with
others, possibly in other subgroups
(People in the room who think of social interactions as edges in a graph, this is
almost the same - it’s like weighted edges in a complete graph)
Also, membership in particular subgroups can
changes over time (e.g. children becoming adults)
(In this case, both vertex states and edge weights can be thought of as vertexstate dependent progressions)
Checking Reality
This all gets complicated enough that it’s nerve
wracking not to check model outcomes against
some form of reality
Need to :
1. measure all model parameters
2. create disease outbreaks
3. check predicted spread against what
actually happens
(I tried to get Thus Spake Zarathustra to play now, but I couldn’t make it work)
My beautiful
termites
Checking Reality cont.
On Thursday, at the DIMACS Mixer, I’ll be
talking to you about ‘Why Termites’
For now, just go with it
The particular details:
Temporary
Immunity
Zootermopsis angusticollis
Spores
land on
termite
Metarhizium
anisopliae
Allogroomed
off
Burrow
through
cuticle
Death
Not a termite
Checking Reality cont.
So we built some CA simulation models
Including age-based differences in :
1. direction of wandering through nest
2. interaction rates
3. exposure rates
4. susceptibility to infection from exposure
5. mortality from infection
6. efficacy/duration of induced immunity
(via social vaccination)
As the model ran, individuals aged and behaved
accordingly
Checking Reality cont.
And…
Thank god, all the work so far has
shown that the models predict
spread accurately
Whew!
We’re even getting some interesting
new directions
Regardless of why specific outputs happen
Now that we know the model can work,
we can work backwards
Fit model outcome to observed data and look at
which sets of parameter values and behavioral
mixing rates produce them
This might provide an odd way of
understanding human social networks –
especially since they can so dramatically
affect model output
Maybe this last part is a pipe-dream.
Who knows, but it’s so crazy it just might work…
Thanks for asking me to speak to you
I hope you’ve had fun
Some of what I’ve talked about has been accomplished in
collaboration with Elena Naumova, James Traniello and
Rebeca Rosengaus
My thanks to the NIH for funding support for this research
Download