Strategies for Prospective Biosurveillance Using

advertisement
Strategies for Prospective
Biosurveillance Using Multivariate
Time Series
Howard Burkom1, Yevgeniy Elbert2, Sean Murphy1
1Johns
Hopkins Applied Physics Laboratory
National Security Technology Department
2 Walter Reed Army Institute for Research
Tenth Biennial CDC and ATSDR Symposium on Statistical Methods
Panelist: Statistical Issues in Public Health Surveillance for Bioterrorism
Using Multiple Data Streams
Bethesda, MD
March 2, 2005
Defining the Multivariate Temporal
Surveillance Problem
Varying Nature of the Data:
•
Multivariate Nature of Problem:
•
Trend, day-of-week, seasonal behavior
•
depending on data type & grouping:
•
Many locations
Multiple syndromes
Stratification by age, gender, other
covariates
Surveillance Challenges:
• Defining anomalous behavior(s)
– Hypothesis tests--both appropriate
and timely
• Avoiding excessive alerting due to
multiple testing
– Correlation among data streams
– Varying noise backgrounds
• Communication with/among users at
different levels
• Data reduction and visualization
Problem: to combine multiple evidence sources
for increased sensitivity at manageable alert rates
6
height of outbreak
5
Office Visits
MILITARY
ED-UI
ED ILI
OTC
Recent Respiratory
Syndrome Data
4
3
2
1
early cases
8/1/2004
7/1/2004
6/1/2004
5/1/2004
4/1/2004
3/1/2004
2/1/2004
1/1/2004
12/1/2003
11/1/2003
10/1/2003
9/1/2003
8/1/2003
7/1/2003
6/1/2003
5/1/2003
4/1/2003
3/1/2003
2/1/2003
1/1/2003
0
Multivariate Hypothesis Testing
• Parallel monitoring:
– Null hypothesis: “no outbreak of unspecified infection in any
of hospitals 1…N” (or counties, zipcodes, …)
– FDR-based methods (modified Bonferroni)
• Consensus monitoring:
– Null hypothesis: “no respiratory outbreak infection based on
hosp. syndrome counts, clinic visits, OTC sales, absentees”
– Multiple univariate methods: “combining p-values”
– Fully multivariate: MSPC charts
• General solution: system-engineered blend of these
– Scan statistics paradigm useful when data permit
Univariate Alerting Methods
Data modeling: regression controls for weekly, holiday, seasonal
effects
• Outlier removal procedure avoids training on exceptional counts
• Baseline chosen to capture recent seasonal behavior
• Standardized residuals used as detection statistics
Process control method adapted for daily surveillance
• Combines EWMA, Shewhart methods for sensitivity to gradual
or sudden signals
• Parameters modified adaptively for changing data behavior
• Adaptively scaled to compute 1-sided probabilities for detection
statistics
• Small-count corrections for scale-independent alert rates
Outputs expressed as p-values for comparison, visualization
Parallel Hypotheses & Multiple Testing
Adapting Standard Methods
• P-values p1,…,pn with multiple null hypotheses
desired type I error rate a :
“no outbreak at any hospital j” j=1,…,N
• Bonferroni bound: error rate is achieved with test pj
< a /N, all j (conservative)
• Simes’ 1986 enhancement (after Seeger, Elkund):
– Put p-values in ascending order: P(1),…,P(n)
– Reject intersection of null hypotheses if any P(j*) < j* a / N
– Reject null for j <= j* (or use more complex criteria)
Parallel Hypotheses:
Criteria to Control False Alert Rate
Simes-Seeger-Elkund criterion:
• Gives expected alert rate near
desired a for independent signals
• Applied to control the false
discovery rate (FDR) for many
common multivariate distributions
(Benjamini & Hochberg, 1995)
– FDR = Exp( # false alerts / all
alerts )
– Increased power over methods
controlling Pr( single false alert )
• Numerous FDR applications, incl.
UK health surveillance in
(Marshall et al, 2003)
Criterion: reject combined
null hypothesis if any p-value
falls below line
Stratification and Multiple Testing
Counts
unstratified
by age
EWMAShewhart
Counts
ages 0-4
Counts
ages 5-11
EWMAShewhart
EWMAShewhart
p-value,
ages 0-4
p-value,
ages 5-11
aggregate
p-value
…
EWMAShewhart
…
Modified Bonferroni
(FDR)
MIN
resultant
p-value
Counts
ages 71+
composite
p-value
p-value,
ages 71+
Consensus Monitoring:
Multiple Univariate Methods
• Fisher’s combination rule (multiplicative)
– Given p-values p1, p2,…,pn:
F2
ln p j 

j
– F is c2 with 2n degrees of freedom, for pj independent
– Recommended as “stand-alone” method
• Edgington’s rule (additive)
– Let S = sum of p-values p1, p2,…,pn
– Resultant p-value: S n  n  S  1n  n  S  2n  n  S  3n
  
  
  

n!  1  n!
 2  n!
 3  n!
( stop when (S-j) <= 0 )
– Normal curve approximation formula for large n
– “Consensus” method: sensitive to multiple near-critical values
Multiple Univariate Criteria:
2D Visualization
Nominal univariate criteria
Edgington
Fisher
934 days of EMS Data
12 time series: separate syndrome groups of ambulance calls
• Poisson-like counts: negligible day-of-week, seasonal effects
• EWMA-Shewhart algorithm applied to derive p-values
• Each row is mean over ALL combinations
Stand-Alone Method
Multiple Testing Problem!Add’l Consensus Alerts
Multivariate Control Charts
•
T2 statistic: (X- m S-1(X- m
– X = multivariate time series: syndromic claims, OTC sales,
etc.
– S = estimate of covariance matrix from baseline interval
– Alert based on empirical distribution to alert rate
– MCUSUM, MEWMA methods “filter” X seeking shorter
average run length
• Hawkins (1993): “T2 particularly bad at distinguishing
location shifts from scale shifts”
– T2 nondirectional
– Directional statistic: (mA - m S-1(X- m, where mA – m is direction
of change
MSPC Example: 2 Data Streams
Evaluation: Injection in Authentic and
Simulated Backgrounds
• Background:
– Authentic: 2-8 correlated streams of daily resp syndrome data (23 mo.)
– Simulated: negative binomial data with authentic m,
modeled overdispersion with s2 = km
Observed vs Modeled Incubation Period
• Injections (additional attributable cases):
Distribution: Sverdlovsk 1979 Outbreak
Number of Cases
12
observed
modeled
10
8
– Each case stochastic draw from point-source
epicurve dist. (Sartwell lognormal model)
– 100 Monte Carlo trials; single outbreak effect per trial
– With and without time delays between effects across streams
6
4
2
0
0
10
20
30
Days after Exposure
Pr( False Alarm ) 
( 1-specificity )
Pr(det ection) 
( sensitivity )
# alerts in noise (no attributab le cases)
# days examined
# signals alerted
# signals injected
ROC: Both as a function
of threshold
40
50
Multivariate Comparison
Example: faint, 1-s peak signal with in 4 independent
data streams, with differential effect delays
Data correlation tends
to degrade alert rate of
multiple, univariate
methods
Cross correlation can greatly
improve multivariate method
performance (if consistent), or
can degrade it!
PD=PFA (random)
ROC Effects of Data Correlation
Example: faint, 2-s peak signal with 2 of 6 highly
correlated data streams, with differential effect delays
Detection Probability
Degradation of multiple,
univariate methods
Effect of strong, consistent
correlation on multivariate
methods
Daily False Alarm Probability
Conclusions
• Comprehensive biosurveillance requires an interweaving
of parallel and consensus monitoring
• Adapted hypothesis tests can help maintain sensitivity at
practical false alarm rates
– But background data and cross-correlation must be understood
• Parallel monitoring: FDR-like methods required
according to scope, jurisdiction of surveillance
• Multiple univariate
– Fisher rule useful as stand-alone combination method
– Edgington rule gives sensitivity to consensus of tests
• Multivariate
– MSPC T2-based charts offer promise when correlation is
consistent & significant, but their niche in routine, robust,
prospective monitoring must be clarified
Backups
References 1
Testing Multiple Null Hypotheses
•
•
•
•
•
Simes, R. J., (1986) "An improved Bonferroni procedure for multiple tests of significance", Biometrika 73
751-754.
Benjamini, Y., Hochberg, Y. (1995). " Controlling the False Discovery Rate: a Practical and Powerful
Approach to Multiple Testing ", Journal of the Royal Statistical Society B, 57 289-300.
Hommel, G. (1988). "A stagewise rejective multiple test procedure based on a modified Bonferroni test “,
Biometrika 75,383-386.
Miller C.J., Genovese C., Nichol R.C., Wasserman L., Connolly A., Reichart D., Hopkins A., Schneider J.,
and Moore A. , “Controlling the False Discovery Rate in Astrophysical Data Analysis”, 2001, Astronomical
Journal , 122, 3492
Marshall C, Best N, Bottle A, and Aylin P, “Statistical Issues in Prospective Monitoring of Health Outcomes
Across Multiple Units”, J. Royal Statist. Soc. A (2004), 167 Pt. 3, pp. 541-559.
Testing Single Null Hypotheses with multiple evidence
•
•
•
Edgington, E.S. (1972). "An Additive Method for Combining Probability Values from Independent
Experiments. “, Journal of Psychology , Vol. 80, pp. 351-363.
Edgington, E.S. (1972). "A normal curve method for combining probability values from independent
experiments. “, Journal of Psychology , Vol. 82, pp. 85-89.
Bauer P. and Kohne K. (1994), “Evaluation of Experiments with Adaptive Interim Analyses”, Biometrics 50,
1029-1041
References 2
Statistical Process Control
•
•
•
•
Hawkins, D. (1991). “Mulitivariate Quality Control Based on Regression-Adjusted Variables “,
Technometrics 33, 1:61-75.
Mandel, B.J, “The Regression Control Chart”, J. Quality Technology (1) (1969) 1:1-9.
Wiliamson G.D. and VanBrackle, G. (1999). "A study of the average run length characteristics of the
National Notifiable Diseases Surveillance System”, Stat Med. 1999 Dec 15;18(23):3309-19.
Lowry, C.A., Woodall, W.H., A Multivariate Exponentially Weighted Moving Average Control Chart,
Technometrics, February 1992, Vol. 34, No. 1, 46-53
Point-Source Epidemic Curves & Simulation
•
•
•
Sartwell, P.E., The Distribution of Incubation Periods of Infectious Disease, Am. J. Hyg. 1950, Vol. 51, pp.
310-318; reprinted in Am. J. Epidemiol., Vol. 141, No. 5, 1995
Philippe, P., Sartwell’s Incubation Period Model Revisited in the Light of Dynamic Modeling, J. Clin,
Epidemiol., Vol. 47, No. 4, 419-433.
Burkom H and Rodriguez R, “Using Point-Source Epidemic Curves to Evaluate Alerting Algorithms for
Biosurveillance”, 2004 Proceedings of the American Statistical Association, Statistics in Government
Section [CD-ROM], Toronto: American Statistical Association (to appear)
MSPC 2-Stream Example:
Detail of Aug. Peak
Effect of Combining Evidence
0.10
Edgington: ED, OV, OTC
0.09
Office Visits Only
Algorithm P-values
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
8/17/03
10/6/03
early cases
11/25/03
1/14/04
height of outbreak
3/4/04
4/23/04
secondary event
Bayes Belief Net (BBN) Umbrella
• To include evidence from disparate evidence types
– Continuous/discrete data
– Derived algorithm output or probabilities
– Expert/heuristic knowledge
• Graphical representation of conditional dependencies
• Can weight statistical hypothesis test evidence using
heuristics – not restricted to fixed p-value thresholds
• Can exploit advances in data modeling, multivariate
anomaly detection
• Can model
– Heuristic weighting of evidence
– Lags in data availability or reporting
– Missing data
Bayes Network Elements
Flu Season
Flu
Anthrax
GI Anomaly
Resp Anomaly
Sensor Alarm
Posterior probabilities
P(Flu |
Evidence)
Evidence
P(Anthrax |
Evidence)
Flu Season
GI Anomaly
Resp Anomaly
Sensor Alarm
0.70
>>
0.0023
Flu Season
GI Anomaly
Resp Anomaly
Sensor Alarm
0.67
>>
0.09
Flu Season
GI Anomaly
Resp Anomaly
Sensor Alarm
0.08
>
0.005
Flu Season
GI Anomaly
Resp Anomaly
Sensor Alarm
0.07
<
0.17
Structure of BBN Model for
Asthma Flare-ups
Syndromic
Asthma
Asthma Military RX
Cold/Flu Season and Irritant
Resp Anomaly
Resp Military OV
SubFreezing Temp
Cold/Flu Season
Cold/Flu Season Start
Resp Military RX
Resp Civilian OV
Ozone
Resp Civilian OTC
PM 2.5
AQI
Mold Spores
Level
Season
Grass Pollen
Level
Season
Season
Tree Pollen
Level
Season
Weed Pollen
Level
Season
BBN Application to
Asthma Flare-ups
• Availability of practical, verifiable data:
– For “truth data”: daily clinical diagnosis counts
– For “evidence”: daily environmental, syndromic data
• Known asthma triggers with complex interaction
– Air quality (EPA data)
• Concentration of particulate matter, allergens
• Ozone levels
– Temperature (NOAA data)
– Viral infections (Syndromic data)
• Evidence from combination of expert knowledge,
historical data
Download