Experimental Analysis of Sequential Decision Making Algorithms for Port of RUTGERS UNIVERSITY

advertisement
Experimental Analysis of Sequential
Decision Making Algorithms for Port of
Entry Inspection Procedures
Saket Anand
David Madigan
Richard Mammone
Saumitr Pathak
Fred Roberts
RUTGERS UNIVERSITY
1
Port of Entry Inspection Procedures
• Goal: Find ways to intercept illicit nuclear materials and
weapons destined for the U.S. via the maritime
transportation system subject to minimization of
delays, manpower and equipment utilization
• Find inspection schemes
that minimize total “cost”
including “cost” of false
positives and false negatives
2
Port of Entry Inspection Procedures
• Formulation of the problem as a complex sequential decision
making problem
• Containers have attributes
• Sample attributes:
– Does ship’s manifest set off an “alarm”?
– What is the neutron or Gamma emission count? Is it above
threshold?
• The Decision Maker’s Problem:
• Which attributes to inspect?
• Which inspections next based
on previous results?
• Approach:
Mobile VACIS:
• Builds on ideas of Stroud
truck-mounted
and Saeger at Los Alamos
gamma ray
National Laboratory
imaging system
Sequential Decision Making Problem
• Simplest Case: Attributes are in state 0 or 1
– Sensors measure presence/absence of attributes
• Then: Container is a binary string like 0110
• So: Classification is a Boolean decision function F
that assigns each binary string to a category (0 or 1).
0110
F(0110) = 0/1
Category 0 = “ok” and 1 = “suspicious”
• Example: 00001111 ; F(000) = F(001) = F(010) = F(011) = 0,
F(100) = F(101) = F(110) =F(111) = 1,
4
Binary Decision Tree Approach
• Binary Decision Tree:
– Nodes are sensors (A,B,C, etc.) or categories (0 or 1)
• Stroud and Saeger enumerate all “complete” and monotone
boolean functions and calculate the least expensive
corresponding binary decision trees.
Number of
Complete and
distinct BDTs for
monotone
complete and
boolean
monotone
functions
boolean functions
No. of attributes
Number of
distinct BDTs
for all boolean
functions
2
74
2
4
3
16,430
9
60
4
1,079,779,602
114
11,808
5
5x1018
6894
263,515,920
Sensitivity Analysis of Stroud Saeger
Experiments
•
Aim – Experimental Analysis of the robustness of the
optimal binary decision tree (BDT) implementing the
inspection scheme found by the Stroud-Saeger1
approach
•
How sensitive are the optimal
Binary Decision Trees to
variations in the cost and
sensor parameters?
1 Stroud, P. D. and Saeger K. J., “Enumeration of Increasing Boolean Expressions and Alternative
Digraph Implementations for Diagnostic Applications”, Proceedings Volume IV, Computer,
Communication and Control Technologies, (2003), 328-333
Cost Function used for Evaluating
the Decision Trees.
CTot = CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative + Cutil
CFalsePositive is the cost of false positive (Type I error)
CFalseNegative is the cost of false negative (Type II error)
PFalsePositive is the probability of a false positive occurring
PFalseNegative is the probability of a false negative occurring
Cutil is the expected cost of utilization of the tree.
Note: In this cost model, other costs such as fixed costs
and costs due to delay are not considered
Probability of Error for Individual
Sensors
• For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2
(P(Yi=0|X=1)) errors are modeled using Gaussian
distributions.
– State of nature X=0 represents absence of a bomb.
– State of nature X=1 represents presence of a bomb.
– Yi represents the outcome (count) of sensor i.
– Σi is variance of the distributions
– PD = prob. of detection, PF = prob. of false positive
Ki
P(Yi|X=0)
Ti
P(Yi|X=1)
i
Characteristics of a typical sensor
8
Probability of Error for The Entire Tree
State of nature is zero (X = 0),
absence of a bomb
State of nature is one (X = 1),
presence of a bomb
A
A
C
0
B
0
B
0
C
1
1
Probability of false positive
(P(Y=1|X=0))
for this tree is given by
0
1
1
Probability of false negative
(P(Y=0|X=1))
for this tree is given by
P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0)
+ P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0)
P(Y=0|X=1) = P(YA=0|X=1) +
P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)
Pfalsepositive
Pfalsenegative
9
Stroud Saeger Experiments
• Stroud-Saeger ranked all trees formed
from 3 or 4 sensors A, B, C and D
according to increasing tree costs.
• Used cost function defined above.
• Values used in their experiments:
– CA = .25; P(YA=1|X=1) = .9856; P(YA=1|X=0) = .0144; KA = 4.37; ΣA = 1;
– CB = 1; P(YC=1|X=1) = .7779; P(YB=1|X=0) = .2221; KB = 1.53; ΣB = 1;
– CC = 10; P(YD=1|X=1) = .9265; P(YC=1|X=0) = .0735; KC = 2.9; ΣC = 1;
– CD = 30; P(YD=1|X=1) = .9893; P(YD=1|X=0) = .0107; KD = 4.6; ΣD = 1;
– Here, Ci = unit cost of utilization of sensor i, Ki is the sensor
discrimination power and Σi is the relative spread factor for
sensor i.
• Also fixed were: CFalseNegative, CFalsePositive, P(X=1)
10
Stroud Saeger Experiments: Our
Sensitivity Analysis
– CFalseNegative was varied between 25 million and 500
billion dollars
• Low and high estimates of direct and indirect costs
incurred due to a false negative.
– CFalsePositive was varied between $180 and $720
• Cost incurred due to false positive
(4 men * (3 -6 hrs) * (15 – 30 $/hr)
– P(X=1) was varied between 3x10-9 and 1x10-5
11
Stroud Saeger Experiments: Our
Sensitivity Analysis
• First set of Computer experiments: n = 3;
(use sensors, A, C and D)
• Experiment 1: Fix values of two of CFalseNegative,
CFalsePositive, P(X=1) and vary the third through
their interval of possible values.
• Experiment 2: Fix a value of one of CFalseNegative,
CFalsePositive, P(X=1) and vary the other two.
• Do 10,000 experiments each time.
• Look for the variation in the highest ranked tree.
12
Variation of CTot vs. CFalseNegative
493
P(X=1) and CFalsePositive were kept constant at the specified value and CTot was
computed for 10,000 randomly selected values of CFalseNegative in the specified range.
Randomly selected fixed parameter values
Variation of CTot vs. CFalsePositive
426,807,420,776
P(X=1) and CFalseNegative were kept constant at the specified value and CTot was
computed for 10,000 randomly selected values of CFalsePositive in the specified range.
Randomly selected fixed parameter values
Variation of CTot vs. P(X=1)
447,470,143,842
352
CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was
computed for 10,000 randomly selected values of P(X=1) in the specified range.
Randomly selected fixed parameter values
Variation of CTot wrt CFalseNegative and CFalsePositive
Randomly selected fixed parameter values
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
Variation of CTot wrt CFalseNegative and P(X=1)
669
Randomly selected fixed parameter values
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
Variation of CTot wrt CFalsePositive and P(X=1)
82,737,009,757
Randomly selected fixed parameter values
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
Structure of trees which came first Rank
with 3 sensors (A, C and D)
c
d
0
a
a
a
d
1
1
0
1
d
0
0
1
Tree number 49
Boolean Fn: 01010111
1
1
1
d
c
c
Tree number 55
Boolean Fn: 01111111
0
1
Tree number 37
Boolean Fn: 00011111
In the 10,000 experiments, only 3 out of the 60 Binary Decision Trees
ever attained first rank
Stroud Saeger Experiments: Our
Sensitivity Analysis
• Second set of computer experiments: n = 4;
(use sensors, A, B, C, D).
• Experiment 1: Fix values of two of CFalseNegative,
CFalsePositive, P(X=1) and vary the third through
their interval of possible values.
• Experiment 2: Fix a value of one of CFalseNegative,
CFalsePositive, P(X=1) and vary the other two.
• Do 10,000 experiments each time.
• Look for the variation in the highest ranked tree.
20
Variation of CTot vs. CFalseNegative
189
P(X=1) and CFalsePositive were kept constant at the specified value and CTot was
computed for 10,000 randomly selected values of CFalseNegative in the specified range.
Randomly selected fixed parameter values
Variation of CTot vs. CFalsePositive
240,407,400,315
P(X=1) and CFalseNegative were kept constant at the specified value and CTot was
computed for 10,000 randomly selected values of CFalsePositive in the specified range.
Randomly selected fixed parameter values
Variation of CTot vs. P(X=1)
406,238,290,733
298
CFalsePositive and CFalseNegative were kept constant at the specified value and CTot was
computed for 10,000 randomly selected values of P(X=1) in the specified range.
Randomly selected fixed parameter values
Variation of CTot wrt CFalseNegative and CFalsePositive
Randomly selected fixed parameter values
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
Variation of CTot wrt CFalseNegative and P(X=1)
454
Randomly selected fixed parameter values
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
Variation of CTot wrt CFalsePositive and P(X=1)
47,484,728,943
Randomly selected fixed parameter values
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
Structure of trees and corresponding
Boolean Expressions for n = 4
a
a
b
b
1
c
1
d
c
1
d
d
1
0
1
Tree number 11785
Boolean Fn: 0111111111111111
0
1
1
Tree number 11605
Boolean Fn: 0101011111111111
In the above experiments, ≤ 10 out of the 11,808 Binary Decision
Trees ever attained first rank
Structure of trees and corresponding
Boolean Expressions for n = 4
a
a
b
b
1
c
d
b
c
d
c
1
0
1
0
1
0
Tree number 9133
Boolean Fn: 0001010111111111
d
1
1
Tree number 8965
Boolean Fn: 0001010101111111
In the above experiments, ≤ 10 out of the 11,808 Binary Decision
Trees ever attained first rank
Receiver Operating Characteristic
(ROC) Curve
• The ROC curve is the plot of the
probability of correct detection
(PD) vs. the probability of false
positive (PF)
• Sensor threshold is varied to select
an operating point; trade off
between the PD and PF
• Each sensor has an ROC curve and
the combination of sensors into a
decision tree has a composite ROC
curve
• Equal Error Rate (EER) is the
operating point on the ROC curve
where,
PF = 1 - PD
1
PD
Operating Point
EER
0
P(Yi|X=0)
PF
Ki
1
Ti
P(Yi|X=1)
Sensitivity to Sensor Performance
Following experiments have been done using sensors A, B, C and D
as described below by varying the individual sensor thresholds Ti
from -4.0 to +4.0 in steps of 0.4. These values were chosen since they
gave us a ROC curve for the individual sensors over a complete range
P(Yi=1|X=0) and P(Yi=1|X=1)
PF for the ith sensor is computed as:
P(Yi=1|X=0) = 0.5 erfc[Ti/√2]
PD for the ith sensor is computed as:
P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σi√2)]
CA = .25; KA = 4.37; ΣA = 1; CB= 1; KB = 1.53; ΣB = 1
CC = 10; KC = 2.9; ΣC = 1; CD = 30; KD = 4.6; ΣD = 1
where Ci is the individual cost of utilization of sensor i, Ki is the
discrimination power of the sensor and Σi is the spread factor for the sensor
Performance (ROC) of Binary Decision Tree
number 37 (3 sensors)
• The lines represent performance characteristics (ROC curve) of sensors A, C and
D.
• The green dots represent the performance characteristics (P(Y=1|X=0),
P(Y=1|X=1)) of the tree over all combinations of sensor thresholds (Ti).
• 15 out of 60 trees attained first rank
Performance (ROC) of Binary Decision Tree
number 37 (3 sensors)
•
•
Assuming performance probabilities (P(Y=1|X=1) and P(Y=1|X=0)) to be monotonically
related (in the sense that P(Y=1|X=1) can be called a monotonic function of P(Y=1|X=0)),
we can find an ROC curve for the tree consisting of the set containing maximum P(Y=1|X=1)
value corresponding to given P(Y=1|X=0) value.
The blue dots represent such an ROC curve, the “best” ROC curve for tree 37.
Performance (ROC) of Binary Decision Tree
number 445 (4 sensors)
• The lines represent performance characteristics (ROC curve) of sensors A, B, C
and D.
• The green dots represent the performance characteristics (P(Y=1|X=0),
P(Y=1|X=1)) of the tree over all combinations of sensor thresholds (Ti).
• Only 244 of 11,808 trees attained first rank
Performance (ROC) of Binary Decision Tree
number 445 (4 sensors)
•
•
Assuming performance probabilities (P(Y=1|X=1) and P(Y=1|X=0)) to be monotonically
related (in the sense that P(Y=1|X=1) can be called a monotonic function of P(Y=1|X=0)),
we can find an ROC curve for the tree consisting of the set containing maximum P(Y=1|X=1)
value corresponding to given P(Y=1|X=0) value.
The blue dots represent such an ROC curve, the “best” ROC curve for tree 445.
Conclusions from Sensitivity
Analysis
• Considerable lack of sensitivity to
modification in parameters for trees using 3 or
4 sensors.
• Very few optimal trees.
• Very few boolean functions arise among
optimal trees.
• Binary Decision Trees perform better than
individual sensors
35
Some Research Challenges
•Explain why conclusions are so insensitive to
variation in parameter values.
•Explore the structure of the optimal trees and
compare the different optimal trees.
•Develop less brute force methods for finding
optimal trees that might work if there are more
than 4 attributes.
•Develop methods for
approximating the optimal tree.
Pallet VACIS
36
Acknowledgement
• Supported by Naval Research and National
Science Foundation
• The authors thank
Phil Stroud
Kevin Saeger
Rick Picard
for providing data, code and ideas
37
Thank you
http://dimacs.rutgers.edu/Workshops/PortofEntry/
Saket Anand – anands@caip.rutgers.edu
Fred S. Roberts – froberts@dimacs.rutgers.edu
38
Monotone and Complete Boolean
Functions
Monotone Boolean Functions:
•Given two strings x1x2…xn, y1y2…yn
•Suppose that xi ≤ yi for all i implies that
F(x1x2…xn) ≤ F(y1,y2…yn).
•Then we say that F is monotone.
•Then 11…1 has highest probability of being in category 1.
Complete Boolean Functions:
•Boolean function F is complete if and only if F can be calculated
by finding all n attributes and knowing the value of the input string
on those attributes
•Example: F(111) = F(110) = F(101) = F(100) = 1, F(000) =
F(001) = F(010) = F(011) = 0.
•F(abc) is determined without knowing b (or c).
39
•F is incomplete.
Download