Disease signatures – a simple combinatorial-type exploitation of them for our own evil purposes Prof. Nina H. Fefferman Visiting DIMACS from : Tufts Univ. School of Medicine, Dept. Public Health and Family Medicine Plan for today: 1) Looking very quickly at traditional SIR models 2) Communication problems 3) Tweaking parameter definitions 4) Using these definitions to clear up communication 5) Building disease signatures 6) Decomposing reported disease into component signature curves 7) Checking this method against reality 8) Where this method can take us from here… A quick look at SIR models I(t) = number of infected S(t) = number of susceptibles R(t) = number of recovered in the population at time t And if we want spatial spread : Keep R, but I(t, x, y) and S(t, x, y) become functions of position (x, y), and a is replaced by an expression involving two other constants related to the rate at which the infection diffuses through space Go ask HHS or NIH or CDC for a and b for the next flu season so our models can predict it. Good luck. Pictures of equations stolen from : http://maven.smith.edu/~callahan/ili/pde.html Leads us to : Communication Problems Parameters/Variables used by epidemiologists are warm and fuzzy and not rigorously defined So modelers made up their own (you just saw them) – these aren’t things doctors/public health people can really measure we can’t get accurate parameter values Example: MANY people are worried about outbreaks There is no good definition of what constitutes an outbreak BIG problem (mostly just ignored) Modelers use the concept of R0 – the reproductive number of disease (in the differential equation model, it’s the ratio of S to a/b) It’s when the average number of new infections caused by contact with a current infection is greater than 1 Communication Problems cont. R0 gives us a rigorous definition of something good, but not of what we really need ‘outbreak’ to mean Really, if we think about it, public health people want ‘outbreak’ to refer to “times when we need to pay attention to disease spread for some reason” How can we say this mathematically? Communication Problems cont. What can public health people/ doctors measure (at least sometimes)? Infectivity : Probability of becoming infectious after becoming exposed Attack rate : Probability of developing disease after becoming exposed Pathogenicity : Probability of developing disease after becoming infected Virulence : Probability of dying after becoming ill Immunogenicity : Attack rate for re-exposure Tweaking Parameter Definitions So : • E(X,T)= Probability of exposure in population X at time T • I = Probability of infection from exposure Really, these are all functions of time, but my journal referees got upset with functions, so most are now subscripts • ST = Probability that infection at time 0 leads to manifestation of symptoms at time T (a distribution function which does not need to sum to one if not all of the infected develop symptoms) • CT = Probability that infection takes T days to become contagion • MT = Probability that the time from the onset of symptoms to death from the disease is T days • NT = Size of the population possibly exposed to infection on day T (this will be our disease signature curve) • IT = Probability of infection from current exposure, given previous infection T days ago Clearing up communication With those we can build : Pathogenicity : The probability of developing disease after becoming infected n = ST , for n the maximum recovery time T=0 Virulence : The probability of dying after becoming ill n = MT , for n the maximum recovery time T=0 Infectivity : The probability of becoming infectious after becoming exposed n = I* CT , for n the end of the window for T=0 the disease Clearing up communication cont. And : Attack rate : The probability of developing disease after becoming exposed n = I * ST , for n the end of the window for T=0 disease expression But now we notice that, from our original list, Immunogenicity is not a truly meaningful idea, so we define instead: PsuedoImmunogenicity : Probability of infection from current exposure, given previous infection T days ago = IT We won’t be using all of these today, but they’re still useful to have if you ever need to talk to health people Clearing up communication cont. Now both the math and health people have the same picture! But this is only one town The SIR models could handle spatial spread with PDEs… Uses a slightly different notation Clearing up communication cont. With multiple locations and central reporting : ? ? Clearing up communication cont. Notice : different occurrences don’t have to be separated only spatially or temporally Can be different demographic populations, or anything that allows narrower, more accurate estimations of exposure or susceptibility Let’s call these narrower things subpopulations Building Disease Signatures So, using our definitions and our flow chart: For a given subpopulation, we can compute a ‘disease signature curve’ representing the number of cases predicted over time from a single instance of exposure Notice : these signature curves depend on subpopulation-specific etiology, including the shape of the distribution for some parameters – not just averages Decomposing curves into signatures So, if we have a total reported disease curve, we can iteratively define (Notice populations exposed on different days are disjoint sets due to the definitions) Now we can think of a single reported curve CT as the composition of these curves Decomposing curves into signatures cont. Since we are interested in exploiting the heterogeneity of etiological response within a diverse population, we can specify these curves by subpopulation Y: Yielding the total disease incidence curve: Decomposing curves into signatures cont. And we can even exploit immune memory by further dividing subpopulations into classes of those with similar immune protection from previous infection With IT = Probability of infection given previous infection T days ago And T* = the last day of most recent prior infection Giving us Decomposing curves into signatures cont. Important because public health people may trust it Now we can use high school math to find combinations of signature curves that make up the total reported cases curve! How many different combinations of coins can make $1.50… Coins ¢ 10 Sub-Populations ¢ 5 ¢ 25 Similarly, we can ask how many combinations of ‘signature curves’ can go into a ‘Total Reported Cases’ curve: Decomposing curves into signatures cont. Now let’s come back to the idea of an outbreak: Remember, we wanted ‘outbreak’ to mean “times when we need to pay attention to disease spread for some reason” Suppose that the only combination of disease signature curves was to have EVERY subpopulation just beginning to show symptoms from a disease – that means that soon many many more people will be sick – we should probably pay attention to that OR Maybe the only combination of signature curves indicates that only one location has been exposed – we might want to use that to find out what the source of exposure was, or quarantine the area No matter how we choose to define it (will be arbitrary), this method can tell us WHY we should care now Decomposing curves into signatures cont. Let’s take a look at an example of how this can work To begin with, let’s look at something very simple : Giardiasis – a waterborne infection causing diarrheal disease in humans with extremely low levels of secondary transmission (makes life simpler) There was an actual “outbreak” in MA in 1995 Decomposing curves into signatures cont. Reported incidence for MA (all of it) HIPPA requires aggregation of data released to public and to most researchers without special access Decomposing curves into signatures cont. Decomposing curves into signatures cont. To use this method, we need some measured parameter values I’m cheating a little because I’m assuming values for I, but we could in theory measure this Decomposing curves into signatures cont. We know that most of the reporting came from 3 urban centers: Decomposing curves into signatures cont. Then we can decompose by demographic subgroup for each town: Decomposing curves into signatures cont. That was a really simple disease without any secondary transmission So what happens if there is secondary spread? It gets MUCH more complicated… First of all, the probability of exposure in each subpopulation can start to depend on the levels of infection in each other subpopulation Now we start getting into the social network stuff An aside Social Networks : Oy vey Since this is a talk and not a course, I can’t leave this as an exercise to the reader, but I can use the ‘we only have a little over an hour’ excuse to hand-wave some of the modeling details on this – I’m going to talk about the concepts If you are interested in the details, well, that’s why I’m going to be around for the year Again, rather than using mass averages, let’s still keep the idea of a disease signature So exposure isn’t a simple underlying rate - it’s based on contacting an infected individual We can think of individuals in each subgroup as having certain probabilities of interacting with others, possibly in other subgroups (People in the room who think of social interactions as edges in a graph, this is almost the same - it’s like weighted edges in a complete graph) Also, membership in particular subgroups can changes over time (e.g. children becoming adults) (In this case, both vertex states and edge weights can be thought of as vertexstate dependent progressions) Checking Reality This all gets complicated enough that it’s nerve wracking not to check model outcomes against some form of reality Need to : 1. measure all model parameters 2. create disease outbreaks 3. check predicted spread against what actually happens (I tried to get Thus Spake Zarathustra to play now, but I couldn’t make it work) My beautiful termites Checking Reality cont. On Thursday, at the DIMACS Mixer, I’ll be talking to you about ‘Why Termites’ For now, just go with it The particular details: Temporary Immunity Zootermopsis angusticollis Spores land on termite Metarhizium anisopliae Allogroomed off Burrow through cuticle Death Not a termite Checking Reality cont. So we built some CA simulation models Including age-based differences in : 1. direction of wandering through nest 2. interaction rates 3. exposure rates 4. susceptibility to infection from exposure 5. mortality from infection 6. efficacy/duration of induced immunity (via social vaccination) As the model ran, individuals aged and behaved accordingly Checking Reality cont. And… Thank god, all the work so far has shown that the models predict spread accurately Whew! We’re even getting some interesting new directions Regardless of why specific outputs happen Now that we know the model can work, we can work backwards Fit model outcome to observed data and look at which sets of parameter values and behavioral mixing rates produce them This might provide an odd way of understanding human social networks – especially since they can so dramatically affect model output Maybe this last part is a pipe-dream. Who knows, but it’s so crazy it just might work… Thanks for asking me to speak to you I hope you’ve had fun Some of what I’ve talked about has been accomplished in collaboration with Elena Naumova, James Traniello and Rebeca Rosengaus My thanks to the NIH for funding support for this research