Algorithms for Port of Entry Inspection for WMDs

advertisement
Algorithms for Port of Entry
Inspection for WMDs
Fred S. Roberts
DIMACS Center,
Rutgers University
1
Port of Entry Inspection Algorithms
•Goal: Find ways to intercept illicit
nuclear materials and weapons
destined for the U.S. via the
maritime transportation system
•Currently inspecting only small
% of containers arriving at ports
•Even inspecting 8% of containers in Port of
NY/NJ might bring international trade to a halt
(Larrabbee 2002)
2
Port of Entry Inspection Algorithms
•Aim: Develop decision support algorithms that
will help us to “optimally” intercept illicit
materials and weapons subject to limits on delays,
manpower, and equipment
•Find inspection schemes that minimize total
“cost” including “cost” of false positives and
false negatives
Mobile Vacis: truckmounted gamma ray
imaging system
3
Sequential Decision Making Problem
•Stream of containers arrives at a port
•The Decision Maker’s Problem:
•Which to inspect?
•Which inspections next based on previous results?
•Approach:
–“decision logics”
–combinatorial optimization methods
–Builds on ideas of Stroud
and Saeger at Los Alamos
National Laboratory
–Need for new models
– and methods
4
Sequential Diagnosis Problem
•Such sequential diagnosis problems arise in many
areas:
–Communication networks (testing connectivity, paging
cellular customers, sequencing tasks, …)
–Manufacturing (testing machines, fault diagnosis,
routing customer service calls, …)
–Artificial intelligence/CS (optimal derivation strategies
in knowledge bases, best-value satisficing search, coding
decision trees, …)
–Medicine (diagnosing patients, sequencing treatments,
…)
5
Sequential Decision Making Problem
•Containers arriving to be classified into categories.
•Simple case: 0 = “ok”, 1 = “suspicious”
•Inspection scheme: specifies which inspections are
to be made based on previous observations
6
Sequential Decision Making Problem
•Containers have attributes, each
in a number of states
•Sample attributes:
–Levels of certain kinds of chemicals or
biological materials
–Whether or not there are items of a certain
kind in the cargo list
–Whether cargo was picked up in a certain port
7
Sequential Decision Making Problem
•Currently used attributes:
–Does ship’s manifest set off an “alarm”?
–What is the neutron or Gamma emission
count? Is it above threshold?
–Does a radiograph image come up positive?
–Does an induced fission test come up positive?
Gamma
ray
detector
8
Sequential Decision Making Problem
•We can imagine many other attributes
•This project is concerned with general algorithmic
approaches.
•We seek a methodology not tied to today’s
technology.
•Detectors are evolving quickly.
9
Sequential Decision Making Problem
•Simplest Case: Attributes are in state 0 or 1
•Then: Container is a binary string like 011001
•So: Classification is a decision function F that
assigns each binary string to a category.
011001
F(011001)
If attributes 2, 3, and 6 are present, assign container to
category F(011001).
10
Sequential Decision Making Problem
•If there are two categories, 0 and 1, decision
function F is a boolean function.
Example:
F(000) = F(111) = 1, F(abc) = 0 otherwise
This classifies a container as positive iff it has
none of the attributes or all of them.
1=
11
Sequential Decision Making Problem
•Given a container, test its attributes until know
enough to calculate the value of F.
•An inspection scheme tells us in which order to
test the attributes to minimize cost.
•Even this simplified problem is hard
computationally.
12
Sequential Decision Making Problem
•This assumes F is known.
•Simplifying assumption: Attributes are
independent.
•At any point we stop inspecting and output the
value of F based on outcomes of inspections so
far.
•Complications: May be precedence relations in
the components (e.g., can’t test attribute a4 before
testing a6.
•Or: cost may depend on attributes tested before.
•F may depend on variables that cannot be
directly tested or for which tests are too costly. 13
Sequential Decision Making Problem
•Such problems are hard computationally.
•There are many possible boolean functions F.
•Even if F is fixed, problem of finding a good
classification scheme (to be defined precisely
below) is NP-complete.
•Several classes of functions F allow for efficient
inspection schemes:
–k-out-of-n systems
–Certain series-parallel systems
–Read-once systems
–“regular” systems
14
–Horn systems
Sensors and Inspection Lanes
•n types of sensors measure presence or absence of the n
attributes.
•Many copies of each sensor.
•Complication: different characteristics of sensors.
•Entities come for inspection.
•Which sensor of a given type to
use?
•Think of inspection lanes and
queues.
•Besides efficient inspection
schemes, could decrease costs by:
–Buying more sensors
–Change allocation of containers to sensor lanes.
15
Binary Decision Tree Approach
•Sensors measure presence/absence of attributes.
•Binary Decision Tree:
–Nodes are sensors or categories (0 or 1)
–Two arcs exit from each sensor node, labeled
left and right.
–Take the right arc when sensor says the
attribute is present, left arc otherwise
16
Binary Decision Tree Approach
•Reach category 1 from the
root only through the path
a0 to a1 to 1.
•Container is classified in
category 1 iff it has both
attributes a0 and a1 .
•Corresponding boolean
function F(11) = 1, F(10) =
F(01) = F(00) = 0.
Figure 1
17
Binary Decision Tree Approach
•Reach category 1 from
the root by:
a0 L to a1 R a2 R 1 or
a0 R a2 R1
•Container classified in
category 1 iff it has
a1 and a2 and not a0 or
a0 and a2 and possibly a1.
•Corresponding boolean
function F(111) = F(101)
= F(011) = 1, F(abc) = 0
otherwise.
Figure 2
18
Binary Decision Tree Approach
•This binary decision
tree corresponds to the
same boolean function
F(111) = F(101) =
F(011) = 1, F(abc) = 0
otherwise.
However, it has one less
observation node ai. So,
it is more efficient if all
observations are equally
costly and equally likely.
Figure 3
19
Binary Decision Tree Approach
•Even if the boolean function F is fixed, the
problem of finding the “optimal” binary decision
tree for it is very hard (NP-complete).
•For small n = number of attributes, can try to
solve it by brute force enumeration.
Port of Long Beach
•Even for n = 5, not practical. (n = 4 at Port of
Long Beach-Los Angeles)
20
Binary Decision Tree Approach
Promising Approaches:
•Heuristic algorithms, approximations to optimal.
•Special assumptions about the boolean function F.
•Example: For “monotone” boolean functions,
integer programming formulations give promising
heuristics.
•Stroud and Saeger enumerate
all “complete,” monotone
boolean functions and calculate
the least expensive corresponding
binary decision trees.
21
Binary Decision Tree Approach
Monotone Boolean Functions:
•Given two strings x1x2…xn, y1y2…yn
•Suppose that xi  yi for all i implies that
F(x1x2…xn)  F(y1,y2…yn).
•Then we say that F is monotone.
•Then 11…1 has highest probability of being in
category 1.
22
Binary Decision Tree Approach
Incomplete Boolean Functions:
•Boolean function F is incomplete if F can be
calculated by finding at most n-1 attributes and
knowing the value of the input string on those
attributes
•Example: F(111) = F(110) = F(101) = F(100) =
1, F(000) = F(001) = F(010) = F(011) = 0.
•F(abc) is determined without knowing b (or c).
•F is incomplete.
23
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•Stroud and Saeger: algorithm for enumerating
binary decision trees implementing complete,
monotone boolean functions.
•Feasible to implement up to n = 4.
•n = 2:
–There are 6 monotone boolean functions.
–Only 2 of them are complete, monotone
–There are 4 binary decision trees for
calculating these 2 complete, monotone boolean
24
functions.
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•n = 3:
–9 complete, monotone boolean functions.
–60 distinct binary trees for calculating them
•n = 4:
–114 complete, monotone boolean functions.
–11,808 distinct binary decision trees for
calculating them.
25
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•n = 5:
–6894 complete, monotone boolean functions
–263,515,920 corresponding binary decision
trees.
•Combinatorial explosion!
•Need alternative approaches; enumeration not
feasible!
26
Cost Functions
•Above analysis: Only uses number of sensors
•Using a sensor has a cost:
–Unit cost of inspecting one item with it
–Fixed cost of purchasing and deploying it
–Delay cost from queuing up at the sensor
station
•Preliminary problem: disregard fixed and delay
costs. Minimize unit costs.
27
Cost Functions
•Simplification so far: Disregard characteristics
of population of entities being inspected.
•Only count number of observation (attribute)
nodes in the tree.
•Unit Cost Complication: How many nodes of
the decision tree are actually visited during
average container’s inspection? Depends on
“distribution” of containers. In our early models,
will depend on probability of sensor errors and
28
probability of bomb in a container.
Cost Functions: Delay Costs
•Tradeoff between fixed costs and delay costs:
Add more sensors cuts down on delays.
•Stochastic process of containers arriving
•Distribution of delay times for inspections
•Use queuing theory to find average delay
times under different models
29
Cost Functions
•Cost of false positive: Cost of additional
tests.
–If it means opening the container, it’s
very expensive.
•Cost of false negative:
–Complex issue.
–What is cost of a bomb going off in
Manhattan?
30
The Brute Force Approach
•The cost of each binary
decision tree corresponding to a
complete, monotone boolean
function is calculated.
•The optimum tree is selected.
•Optimum depends on
assumptions about sensor
errors, costs of false positive
and false negative outcomes,
and unit, fixed, and delay costs
for each sensor.
31
Cost Functions: Sensor Errors
•One Approach to False Positives/Negatives:
Assume there can be Sensor Errors
•Simplest model: assume that all sensors checking
for attribute ai have same fixed probability of
saying ai is 0 if in fact it is 1, and similarly
saying it is 1 if in fact it is 0.
•More sophisticated analysis later describes a
model for determining probabilities of sensor
errors.
•Notation: X = state of nature (bomb or no bomb)
Y = outcome (of sensor or entire inspection
32
process).
Probability of Error for The Entire Tree
State of nature is zero (X =
0), absence of a bomb
State of nature is one (X =
1), presence of a bomb
A
A
C
0
B
0
B
0
C
1
1
Probability of false positive
(P(Y=1|X=0))
for this tree is given by
P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0)
+ P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0)
0
1
1
Probability of false negative
(P(Y=0|X=1))
for this tree is given by
P(Y=0|X=1) = P(YA=0|X=1) +
P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)
33
Cost Function used for Evaluating
the Decision Trees.
CTot = CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative +
Cutil
CFalsePositive is the cost of false positive (Type I error)
CFalseNegative is the cost of false negative (Type II error)
PFalsePositive is the probability of a false positive occurring
PFalseNegative is the probability of a false negative occurring
Cutil is the cost of utilization of the tree.
The error probability of the entire tree is computed from
the error probabilities of the individual sensors.
34
Cost Function used for Evaluating
the Decision Trees.
Cutil is the cost of utilization of the tree.
Simplest assumption: Cutil is the expected sum of unit
costs associated with the tree. Count unit cost of each
sensor each time it is used. Use P(X = 1) and probability
of errors at each type of sensor to calculate expected value.
Later: models for distribution of attributes of containers
and more sophisticated analysis of expected cost of
utilizing the tree, bringing in delay costs.
35
Stroud Saeger Experiments
• Stroud-Saeger ranked all trees formed
from 3 or 4 sensors A, B, C and D
according to increasing tree costs.
• Used cost function defined above.
• Values used in their experiments:
– CA = .25; P(YA=1|X=1) = .90; P(YA=1|X=0) = .10;
– CB = 10; P(YC=1|X=1) = .99; P(YB=1|X=0) = .01;
– CC = 30; P(YD=1|X=1) = .999; P(YC=1|X=0) = .001;
– CD = 1; P(YD=1|X=1) = .95; P(YD=1|X=0) = .05;
– Here, Ci = cost of utilization of sensor i.
• Also fixed were: CFalseNegative, CFalsePositive, P(X=1)
36
Stroud Saeger Experiments: Our
Sensitivity Analysis
• We have explored sensitivity of the Stroud-Saeger
conclusions to variations in values of these three
parameters.
• We estimated high and low values for these parameters.
• We chose one of the values from the interval of values
and then explored the highest ranked tree as the other
two were chosen at random in the interval of values.
10,000 experiments for each pair of fixed values.
• We looked for the variation in the top-ranked tree and
how the top-rank related to choice of parameter values.
• Very surprising results.
37
Stroud Saeger Experiments: Our
Sensitivity Analysis
– CFalseNegative was varied between 25 million and 10
billion dollars
• Low and high estimates of direct and indirect costs
incurred due to a false negative.
– CFalsePositive was varied between $180 and $720
• Cost incurred due to false positive
(4 men * (3 -6 hrs) * (15 – 30 $/hr)
– P(X=1) was varied between 1/10,000,000 and
1/100,000
38
Stroud Saeger Experiments:
Sensitivity Analysis
• First set of
experiments: 3
attributes or types of
sensors, A, B, C.
• Extensive computer
experimentation.
39
Frequency of Top-ranked Trees when
CFalseNegative and CFalsePositive are Varied
7000
1st
2nd
3rd
4th
5th
6000
Frequency
5000
4000
3000
2000
1000
0
0
10
20
30
40
50
60
Tree no.
•
•
•
10,000 randomized experiments (randomly selected values of CFalseNegative and
CFalsePositive from the specified range of values) for the median value of P(X=1).
The above graph has frequency counts of the number of experiments when a
particular tree was ranked first or second, or third and so on.
Only three trees (7, 55 and 1) ever came first. 6 trees came second, 10 came third,40
13 came fourth.
Frequency of Top-ranked Trees when
CFalseNegative and P(X=1) are Varied
8000
1st
2nd
3rd
4th
5th
7000
6000
Frequency
5000
4000
3000
2000
1000
0
0
10
20
30
40
50
60
Tree no.
• 10,000 randomized experiments for the median value of CFalsePositive.
• Only 2 trees (7 and 55) ever came first. 4 trees came second. 7 trees came
third. 10 and 13 trees came 4th and 5th respectively.
41
Frequency of Top-ranked Trees when
P(X=1) and CFalsePositive are Varied
7000
1st
2nd
3rd
4th
5th
6000
Frequency
5000
4000
3000
2000
1000
0
0
10
20
30
40
50
60
Tree no.
• 10,000 randomized experiments for the median value of CFalseNegative.
• Only 3 trees (7, 55 and 1) ever came first. 6 trees came second. 10 trees
came third. 13 and 16 trees came 4th and 5th respectively.
42
Most Frequent Tree Groups Attaining
the Top Three Ranks.
• Trees 7, 9 and 10
A
B
B
0
C
0
1
A
A
1
C 0
0
0
B
1
A
C
1
A 0
0
0
1
1
All the three decision trees have been generated from the same
boolean expression 00000111 representing F(000)F(001)…F(111)
Both Tree 9 and Tree 10 are ranked second and third more than
43
99% of the times when Tree 7 is ranked first.
Most Frequent Tree Groups Attaining
the Top Three Ranks
• Trees 55, 57 and 58
A
1
B
1
1
1
C
1
C
0
B
1
A
1
C
0
B
1
A
0
1
The boolean expression for these three decision trees is 01111111
Tree ranked 57 is second 96% of the times and tree 58 is third
79 % of the times when tree 55 is ranked first.
44
Most Frequent Tree Groups Attaining
the Top Three Ranks
• Trees 1, 3, and 2
A
B
B
0
0
A
0
0
C
0
A
1
0
C
0
C
0
1
B
0
1
The boolean expression for these three decision trees is 00000001
Tree 3 is ranked second 98% of times and tree 2 is ranked third
80 % of the times when tree 1 is ranked first.
45
Values of CFalseNegative and CFalsePositive when
Tree 7 was Ranked First
• This is a graph of CFalsePositive against CFalseNegative values
obtained from the randomized experiments. The black dots
represent points at which tree 7 scored first rank.
46
Values of CFalseNegative and CFalsePositive when
Tree 55 was Ranked First
• Tree 55 fills up the lower area in the range of CFalseNegative
47
and CFalsePositive values.
Values of CFalseNegative and CFalsePositive when
Tree 1 was Ranked First
900
800
700
CFalsePositive
600
500
400
300
200
100
0
0
1
2
3
4
C
5
6
7
8
9
10
9
FalseNegative
x 10
• Tree 1 fills up the major area in the range of CFalseNegative and
48
CFalsePositive.
Values of CFalseNegative and CFalsePositive for
the Three First Ranked Trees
• Trees 7, 55 and 1 fill up the entire area in the range of
CFalseNegative and CFalsePositive among themselves.
49
Values of CTot, CFalseNegative and CFalsePositive
for First Ranked Trees
• This graph shows total costs for trees 7, 55 and 1 in the
respective regions in which they were ranked first.
• Each tree’s total cost is a hyperplane which cuts other
hyperplanes as it gains and then loses first rank.
50
Values of CTot, CFalseNegative and CFalsePositive
for Trees 1, 7 and 55 (Even When They
Were not Ranked First).
This graph shows the extended CTot hyperplanes for trees
7, 55 and 1 for all regions.
51
Values of CFalseNegative and P(X=1) when
Tree 7 was Ranked First
• Tree 7 again fills up the major area in the range of CFalseNegative
and P(X=1).
52
Values of CFalseNegative and P(X=1) when
Tree 55 was Ranked First
• Tree 55 fills up the rest of the area in the range of CFalseNegative
and P(X=1).
53
Values of CFalseNegative and P(X=1) for First
Ranked Trees
• Together trees 7 and 55 fill up the entire region of CFalseNegative
and P(X=1).
54
Variations of CTot, CFalseNegative and P(X=1)
for First Ranked Trees
• This graph has CTot on the 3rd axis for trees 7 and 55 in the
respective regions in which they were most optimal.
• Each tree’s total cost is a conic surface.
55
Values of CFalsePositive and P(X=1) When
Tree 7 was Ranked First
• Tree 7 fills up the major area in the range of CFalsePositive and
P(X=1).
56
Values of CFalsePositive and P(X=1) when Tree 55
was Ranked First
• Tree 55 fills up the lower area in the range of CFalsePositive and
P(X=1).
57
Values of CFalsePositive and P(X=1) when
Tree 1 was Ranked First
-5
x 10
1
0.9
0.8
0.7
P(X=1)
0.6
0.5
0.4
0.3
0.2
0.1
0
0
100
200
300
400
C
500
600
700
800
900
FalsePositive
• Tree 1 fills up the major area in the range of CFalsePositive and
P(X=1).
58
Values of CFalsePositive and P(X=1) for First
Ranked Trees
• Trees 7, 55 and 1 fill up the entire area in the range of
CFalsePositive and P(X=1) among themselves.
59
Values of CTot, CFalsePositive and P(X=1) for
First Ranked Trees
• This graph shows total costs for trees 7, 55 and 1 in the
respective regions in which they were most optimal.
• Each tree’s total cost is a hyperplane which cuts other
hyperplanes as it gains and then loses first rank.
60
Modeling Sensor Errors
•One Approach to Sensor Errors: Modeling
Sensor Operation
•Threshold Model:
–Sensors have different discriminating power
–Many use counts (e.g., Gamma radiation
counts)
–See if count exceeds
threshold
–If so, say attribute is present.
61
Modeling Sensor Errors
Threshold Model:
•Sensor i has discriminating power Ki,
threshold Ti
•Attribute present if counts exceed Ti
•Calculate fraction of objects in each category
whose readings exceed T
•Seek threshold values that minimize all costs:
inspection, false positive/negative
•Assume readings of category 0 containers
follow a Gaussian distribution and similarly
category 1 containers
62
•Simulation approach
Probability of Error for Individual
Sensors
• For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2
(P(Yi=0|X=1)) errors are modeled using Gaussian
distributions.
– State of nature X=0 represents absence of a bomb.
– State of nature X=1 represents presence of a bomb.
– i represents the outcome (count) of sensor i.
– Σi is variance of the distributions
Ki
P(i|X=0)
Ti
P(i|X=1)
Characteristics of a typical sensor i
63
Modeling Sensor Errors
The probability of false positive for the ith sensor is computed as:
P(Yi=1|X=0) = 0.5 erfc[Ti/√2]
The probability of detection for the ith sensor is computed as:
P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σ√2)]
erfc = complementary error function erfc(x) = (1/2,x2)/sqrt()
The following experiments have been done using sensors A, B,
C and using:
KA = 4.37; ΣA = 1
KB = 2.9; ΣB = 1
KC = 4.6; ΣC = 1
We then varied the individual sensor thresholds TA, TB and TC
from -4.0 to +4.0 in steps of 0.4. These values were chosen since
they gave us an “ROC curve” (see later for the individual sensors
over a complete range P(Yi=1|X=0) and P(Yi=1|X=1)
64
Frequency of First Ranked Trees for
Variations in Sensor Thresholds
18000
16000
14000
Frequency
12000
10000
8000
6000
4000
2000
0
0
10
20
30
40
50
60
Tree no.
• 68,921 experiments were conducted, as each Ti was varied through its
entire range.
• The above graph has frequency counts of the number of experiments when
a particular tree was ranked first. There are 15 such trees. Tree 37 had the
highest frequency of attaining rank one.
65
Stroud Saeger Experiments: Our
Sensitivity Analysis: 4 Sensors
• Second set of computer experiments: 4
attributes or types of sensors, A, B, C, D.
• Same values as before.
• Experiment 1: Fix values of two of CFalseNegative,
CFalsePositive, P(X=1) and vary the third.
• Experiment 2: Fix a value of one of CFalseNegative,
CFalsePositive, P(X=1) and vary the other two
through their interval of possible values. Do
10,000 experiments each time.
• Look for the variation in the highest ranked tree.
66
Stroud Saeger Experiments: Our
Sensitivity Analysis: 4 Sensors
• Experiment 1: Fix values of two of
CFalseNegative, CFalsePositive, P(X=1) and vary the
third.
67
CTot vs CFalseNegative for Ranked 1 Trees
(Trees 11485(9651) and 10129(349))
Only two trees ever were ranked first, and one, tree 11485, was
ranked first in 9651 out of 10,000 runs.
68
CTot vs CFalsePositive for Ranked 1 Trees (Tree
no. 11485 (10000))
One tree, number 11485, was ranked first every time.
69
CTot vs P(X=1) for Ranked 1 Trees (Tree
no. 11485(8372), 10129(488), 11521(1056))
Three trees dominated first place. Trees 10201(60), 10225(17) and70
10153(7) also achieved first rank but with relatively low frequency.
Tree Structure and corresponding
Boolean Expressions
a
a
b
b
b
b
c
d
0
c
1
0
d
c
1
1 0
1
1
Tree number 11485
Boolean Expr: 0101011101111111
1
c
1
d
c
d
0
d
0
0
1
1 0
1
d
1
1
Tree number 10129
Boolean Expr: 0001011101111111
71
Stroud Saeger Experiments: Our
Sensitivity Analysis: 4 Sensors
• Experiment 2: Fix the values of one of
CFalseNegative, CFalsePositive, P(X=1) and vary the
others.
72
Frequency of First Ranked Trees when Two
Parameters (CFalseNegative and CFalsePositive) were Varied
Keeping P(X=1) Constant at Randomly Selected
Values.
5
2
Trees coming first -9541 10129 10153 10201 11485 11521
x 10
1.8
1.6
1.4
Frequency
1.2
1
0.8
0.6
0.4
0.2
0
0
2000
4000
6000
Tree number
8000
10000
12000
10,000 randomized experiments with randomly selected values of P(X=1)
 The experiments were repeated for 20 different randomly selected values of
P(X=1)
73
Frequency of First Ranked Trees when Two
Parameters (CFalseNegative and P(X=1)) were Varied
Keeping CFalsePositive Constant at Randomly Selected
Values.
14
4
xTrees
10 coming first -505
4695
5105
5129
7353
9541 10129 10153 10201 10225 11485 11521
12
Frequency
10
8
6
4
2
0
0
2000
4000
6000
Tree number
8000
10000
12000
10,000 randomized experiments with randomly selected values of CFalsePositive
The experiments were repeated for 20 different randomly selected values of
CFalsePositive
74
Frequency of First Ranked Trees when Two
Parameters (P(X=1) and CFalsePositive) were Varied
Keeping CFalseNegative Constant at Randomly
Selected Values.
4
15
x 10
Trees coming first -9541 10129 10153 10201 10225 11485 11521
Frequency
10
5
0
0.95
1
1.05
1.1
Tree number
1.15
1.2
4
x 10
10,000 randomized experiments with randomly selected values of CFalseNegative
The experiments were repeated for 20 different randomly selected values of
CFalseNegative
75
Variation of CTot wrt CFalseNegative and CFalsePositive,
for Tree Ranked First (Tree nos. 11485 and
10129)
11485
10129
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
76
Variation of CTot wrt CFalseNegative and P(X=1),
for Tree Ranked First(Tree no.
11485(8121),10129(728) and 11521(984))
11485
10129
Trees 505, 5105, 5129, 9541, 10153, 10201 and 10225 also attained rank 1, but
with very low frequency (<100).
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
77
Variation of CTot wrt CFalsePositive and P(X=1),
for Tree Ranked First(Tree no.
11485(7162),10129(1690) and 11521(851))
11485
10129
Trees 10153, 10201 and 10225 also attained first rank, 80, 195 and 22 times
respectively.
CTot = CFalsePositive *P(X=0)*P(Y=1|X=0) + CFalseNegative *P(X=1)*P(Y=0|X=1) + Cutil
78
Receiver Operating Characteristic
(ROC) Curve
• The ROC curve is the plot of the probability
of correct detection (PD) vs. the probability
1
of false positive (PF).
• The ROC curve is used to select an operating
PD
point, which provides the tradeoff between
the PD and PF
• Each sensor has a ROC curve and the
combination of the sensors into a decision
tree has a composite ROC curve.
0
• The parameter which is varied to get
different operating points on the ROC curve
is the sensor threshold and a combination of
thresholds for the decision tree.
P(i|X=0)
• Equal Error Rate (EER) is the operating
point on the ROC curve where PF = 1 – PD
• We can use ROC curves to identify optimal
thresholds for sensors.
ROC Curve
Operating Point
EER
PF
Ki
1
Ti
P(i|X=1)
79
Receiver Operating Characteristic
(ROC) Curve
• We seek operating
characteristics of sensors that
place us in the upper left hand
corner of the ROC curve.
• Here, PF is small and PD is
large.
ROC Curve
1
PD
Operating Point
EER
0
P(i|X=0)
PF
Ki
1
Ti
P(i|X=1)
80
Performance of Sensors Against that of Tree 37
(Most Frequent Tree Attaining Rank 1)
• The black, blue and red dotted lines represent performance
characteristics (ROC curve) of sensors A, B and C.
• The green dots represent the performance characteristics
(P(Y=1|X=0), P(Y=1|X=1)) of the tree over all combinations
of sensor thresholds (Ti).
81
Performance of Sensors Against that of
Tree 37
• This zoomed-in figure of the ROC curve displays the region of high detection
probabilities and low false positive probabilities.
• Points lying on the diagonal line are the Equal Error Rates for this tree and
the sensors. The tree achieves equal error rates of 0.0027 while sensors A, B 82
and C have EERs of 0.0145, 0.0738, 0.0107.
Best Possible ROC Curve for Tree 37
• Assuming performance probabilities (P(Y=1|X=1) and P(Y=1|X=0)) to be
monotonically related (in the sense that P(Y=1|X=1) can be called a monotonic
function of P(Y=1|X=0)), we can find an ROC curve for the tree consisting of
the set containing maximum P(Y=1|X=1) value corresponding to given
P(Y=1|X=0) value.
83
• The blue dots represent such an ROC curve, the “best” ROC curve for tree 37.
Conclusions from Sensitivity
Analysis
• Considerable lack of sensitivity to
modification in parameters for trees using 3 or
4 sensors.
• Very few optimal trees.
• Very few boolean functions arise among
optimal trees.
84
Some Complications
•More complicated cost models; bringing in
costs of delays
•More than two values of an attribute
(present, absent, present with
probability > 75%, absent with probability
at least 75%)
(ok, not ok, ok with probability > 99%,
ok with probability between 95% and
99%)
•Inferring the boolean function from
observations (partially defined boolean
functions)
85
Some Research Challenges
•Explain why conclusions are so insensitive to
variation in parameter values.
•Explore the structure of the optimal trees and
compare the different optimal trees.
•Develop less brute force methods for finding
optimal trees that might work if there are more
than 4 attributes.
•Develop methods for
approximating the optimal tree.
Pallet vacis
86
Closing Remark
•Recall that the “cost” of inspection includes the
cost of failure, including failure to foil a terrorist
plot.
•There are many ways to lower the total “cost”
of inspection:
Use more efficient
orders of inspection.
Find ways to inspect
more containers.
Find ways to cut down
on delays at inspection lanes.
87
Research Team
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Saket Anand, Rutgers, ECE graduate student
Endre Boros, Rutgers, Operations Research
Elsayed Elsayed, Rutgers, Ind. & Systems Engineering
Liliya Fedzhora, Rutgers, Operations Res. grad. student
Paul Kantor, Rutgers, Schl. of Infor. & Library Studies
Abdullah Karaman, Rutgers Ind. & Syst. Eng. grad. student
Alex Kogan, Rutgers, Business School
Paul Lioy, Rutgers/UMDNJ, Environmental and Occupational Health and
Sciences Institute
David Madigan, Rutgers, Statistics
Richard Mammone, Rutgers, Center for Advanced Information Processing
S. Muthukrishnan, Rutgers, Computer Science
Saumitr Pathek, Rutgers ECE graduate student
Richard Picard, Los Alamos, Statistical Sciences Group
Fred Roberts, Rutgers, DIMACS Center
Kevin Saeger, Los Alamos, Homeland Security
Phillip Stroud, Los Alamos, Systems Engineering and Integration Group
Hao Zhang, Rutgers Ind. & Systems Eng., graduate student
88
Collaborators on Sensitivity Analysis:
• Saket Anand
• David Madigan
• Richard Mammone
• Saumitr Pathak
Research Support:
• Office of Naval Research
• National Science Foundation
Los Alamos National Laboratory:
• Rick Picard
• Kevin Saeger
• Phil Stroud
89
Download