Algorithms for Port of Entry Inspection: Finding Optimal Binary Decision Trees

advertisement
Algorithms for
Port of Entry Inspection: Finding
Optimal Binary Decision Trees
Fred S. Roberts
Rutgers University
1
Port of Entry Inspection Algorithms
•Goal: Find ways to intercept illicit
nuclear materials and weapons
destined for the U.S. via the
maritime transportation system
•Goal: “inspect all containers arriving at ports”
•Even carefully inspecting 8% of containers in
Port of NY/NJ might bring international trade to
a halt (Larrabbee 2002)
2
Port of Entry Inspection Algorithms
•Aim: Develop decision support algorithms that
will help us to “optimally” intercept illicit
materials and weapons subject to limits on delays,
manpower, and equipment
•Find inspection schemes that minimize total
“cost” including “cost” of false alarms (“false
positives”) and failed alarms (“false negatives”)
Mobile Vacis: truckmounted gamma ray
imaging system
3
Port of Entry Inspection Algorithms
•My work on port of entry inspection has gotten
me and my students to some remarkable places.
Me on a Coast Guard
boat in a tour of the
harbor in Philadelphia
Thanks to Capt. David
Scott, Captain of Port,
for taking us on the
tour
4
The work on port inspection + other
work has led to a new DHS center
based at Rutgers.
Founded 2009 as a DHS University Center of
Excellence
5
CCICADA has a wide variety of workshops,
tutorials, and programs for students and
faculty that emphasize the mathematical
sciences and homeland security.
For more information: http://ccicada.org
6
Sequential Decision Making Problem
• Stream of containers arrives at a port
• The Decision Maker’s Problem:
• Which to inspect?
• Which inspections next based on previous
results?
• Approach:
– “decision logics” – Boolean methods
– combinatorial optimization methods
– Builds on ideas of Stroud
and Saeger at Los Alamos
National Laboratory
– Need for new models
and methods
7
Sequential Diagnosis Problem
•Such sequential diagnosis problems arise in many
areas:
–Communication networks (testing connectivity, paging
cellular customers, sequencing tasks, …)
–Manufacturing (testing machines, fault diagnosis,
routing customer service calls, …)
–Medicine (diagnosing patients, sequencing treatments,
…)
8
Sequential Decision Making Problem
•Containers arriving to be classified into categories.
•Simple case: 0 = “ok”, 1 = “suspicious”
•Inspection scheme: specifies which inspections are
to be made based on previous observations
9
Sequential Decision Making Problem
For Container Inspection
•Containers have attributes, each
in a number of states
•Sample attributes:
–Levels of certain kinds of chemicals or
biological materials
–Whether or not there are items of a certain
kind in the cargo list
–Whether cargo was picked up in a certain port
10
Sequential Decision Making Problem
•Currently used attributes:
–Does ship’s manifest set off an “alarm”?
–What is the neutron or Gamma emission
count? Is it above threshold?
–Does a radiograph image come up positive?
–Does an induced fission test come up positive?
Gamma
ray
detector
11
Sequential Decision Making Problem
•We can imagine many other attributes
• The project I have worked on is concerned with
general algorithmic approaches.
•We seek a methodology not tied to today’s
technology.
•Detectors are evolving quickly.
12
Sequential Decision Making Problem
•Simplest Case: Attributes are in state 0 or 1
(absent or present)
•Then: Container is a bit string like 011001
•So: Classification is a decision function F that
assigns each bit string to a category.
011001
F(011001)
If attributes 2, 3, and 6 are present, assign container to
category F(011001).
13
Sequential Decision Making Problem
•If there are two categories, 0 and 1 (“safe” or
“suspicious”), the decision function F is a
Boolean function.
Example:
F(000) = F(111) = 1, F(abc) = 0 otherwise
This classifies a container as positive iff it has
none of the attributes or all of them.
1=
14
Sequential Decision Making Problem
•What if there are three categories, 0, ½, and 1?.
Example:
F(000) = 0, F(111) = 1, F(abc) = 1/2 otherwise
This classifies a container as positive if it has all of
the attributes, negative if it has none of the
attributes, and uncertain if it has some but not all
of the attributes.
•I won’t discuss this case.
15
Sequential Decision Making Problem
•Given a container, test its attributes until know
enough to calculate the value of F.
•An inspection scheme tells us in which order to
test the attributes to minimize cost.
•Even this simplified problem is hard
computationally.
16
Sequential Decision Making Problem
•This assumes F is known.
•Simplifying assumption: Attributes are
independent.
•At any point we stop inspecting and output the
value of F based on outcomes of inspections so
far.
•Complications: May be precedence relations in
the components (e.g., can’t test attribute a4 before
testing a6.
•Or: cost may depend on attributes tested before.
•F may depend on variables that cannot be
directly tested or for which tests are too costly. 17
Sequential Decision Making Problem
•Such problems are hard computationally.
•There are many possible Boolean functions F.
•Even if F is fixed, problem of finding a good
classification scheme (to be defined precisely below)
is NP-complete.
•Several classes of Boolean functions F allow for
efficient inspection schemes:
- k-out-of-n systems
- Certain series-parallel systems
- Read-once systems
- “regular” systems
- Horn systems
18
Sensors and Inspection Lanes
•n types of sensors measure presence or absence of the n
attributes.
•Many copies of each sensor.
•Complication: different characteristics of sensors.
•Entities come for inspection.
•Which sensor of a given type to
use?
•Think of inspection lanes and
waiting on line for inspection
•Besides efficient inspection
schemes, could decrease costs by:
–Buying more sensors
–Change allocation of containers to sensor lanes.
19
Binary Decision Tree Approach
•Sensors measure presence/absence of attributes:
so 0 or 1
•Use two categories: 0, 1 (safe or suspicious)
•Binary Decision Tree:
–Nodes are sensors or categories
–Two arcs exit from each sensor node, labeled
left and right.
–Take the right arc when sensor says the
attribute is present, left arc otherwise
20
Binary Decision Tree Approach
•Reach category 1 from the
root only through the path
a0 to a1 to 1.
•Container is classified in
category 1 iff it has both
attributes a0 and a1 .
•Corresponding Boolean
function:
• F(11) = 1, F(10) = F(01)
= F(00) = 0.
Figure 1
21
Binary Decision Tree Approach
•Reach category 1 from the
root only through the path a1
to a0 to 1.
•Container is classified in
category 1 iff it has both
attributes a0 and a1 .
•Corresponding Boolean function:
• F(11) = 1, F(10) = F(01) = F(00)
= 0.
•Note: Different tree, same
function
Figure 1
22
Binary Decision Tree Approach
•Reach category 1 from the
root only through the path a0
to 1 or a0 to a1 to 1.
•Container is classified in
category 1 iff it has attribute
a0 or attribute a1 .
•Corresponding Boolean function:
• F(11) = 1, F(10) = F(01) = 1,
F(00) = 0.
Figure 1
23
Binary Decision Tree Approach
•Reach category 1 from
the root by:
a0 L to a1 R a2 R 1 or
a0 R a 2 R 1
•Container classified in
category 1 iff it has
a1 and a2 and not a0 or
a0 and a2 and possibly a1 .
•Corresponding Boolean
function:
• F(111) = F(101) = F(011) = 1,
F(abc) = 0 otherwise.
Figure 2
24
Binary Decision Tree Approach
•This binary decision
tree corresponds to the
same Boolean function
F(111) = F(101) =
F(011) = 1, F(abc) = 0
otherwise.
However, it has one less
observation node ai. So,
it is more efficient if all
observations are equally
costly and equally likely.
Figure 3
25
Binary Decision Tree Approach
•So we have seen that a given Boolean function
may correspond to different binary decision trees.
•How do we find a low-cost or least-cost binary
decision tree corresponding to a Boolean function?
26
Binary Decision Tree Approach
•Even if the Boolean function F is fixed, the
problem of finding the “least cost” binary
decision tree for it is very hard (NP-complete).
•For small n = number of attributes, can try to
solve it by trying all possible binary decision trees
corresponding to the Boolean function F.
Port of Long Beach
•Even for n = 4, not practical. (n = 4 at Port of
Long Beach-Los Angeles)
27
Binary Decision Tree Approach
Promising Approaches:
•Heuristic algorithms, approximations to optimal.
•Special assumptions about the Boolean function F.
•For “monotone” Boolean functions, integer
programming formulations give promising
heuristics.
•Stroud and Saeger (Los Alamos
National Lab) enumerate all
“complete, monotone” Boolean functions
and calculate the least expensive
corresponding binary decision trees.
•Their method practical for n up to 4, not n = 5.28
Binary Decision Tree Approach
Monotone Boolean Functions:
•Given two bit strings x1x2…xn, y1y2…yn
•Suppose that xi  yi for all i implies that
F(x1x2…xn)  F(y1y2…yn).
•Then we say that F is monotone.
•Then 11…1 has highest probability of being in
category 1.
29
Binary Decision Tree Approach
Monotone Boolean Functions:
•Given two bit strings x1x2…xn, y1y2…yn
•Suppose that xi  yi for all i implies that
F(x1x2…xn)  F(y1y2…yn).
•Then we say that F is monotone.
•Example:
•n = 4, F(x) = 1 iff x has at least two 1’s.
•F(1100) = F(0101) = F(1011) = 1, F(1000) = 0,
etc.
30
Binary Decision Tree Approach
Incomplete Boolean Functions:
•Boolean function F is incomplete if F can be
calculated by finding at most n-1 attributes and
knowing the value of the input string on those
attributes
•Example: F(111) = F(110) = F(101) = F(100) =
1, F(000) = F(001) = F(010) = F(011) = 0.
•F(abc) is determined without knowing b (or c).
•F is incomplete.
31
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•Stroud and Saeger: algorithm for enumerating
binary decision trees implementing complete,
monotone Boolean functions.
•Feasible to implement up to n = 4.
•Then you can find least cost tree by enumerating
all binary decision trees corresponding to a given
complete, monotone Boolean function and
repeating this for all complete, monotone Boolean
functions.
32
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•Stroud and Saeger: algorithm for enumerating
binary decision trees implementing complete,
monotone Boolean functions.
•n = 2:
–There are 6 monotone Boolean functions.
–Only 2 of them are complete, monotone
–There are 4 binary decision trees for
calculating these 2 complete, monotone
Boolean functions.
33
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•n = 3:
–9 complete, monotone Boolean functions.
–60 distinct binary trees for calculating them
34
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•n = 4:
–114 complete, monotone Boolean functions.
–11,808 distinct binary decision trees for
calculating them.
–(Compare 1,079,779,602 BDTs for all Boolean
functions)
35
Binary Decision Tree Approach
Complete, Monotone Boolean Functions:
•n = 5:
–6894 complete, monotone Boolean functions
–263,515,920 corresponding binary decision
trees.
•Combinatorial explosion!
•Need alternative approaches; enumeration not
feasible!
•(Even worse: compare 5 x 1018 BDTs
corresponding to all Boolean functions)
36
Cost Functions
•So far, we have figured one binary decision tree
is cheaper than another if it has fewer nodes.
•This is oversimplified.
•There are more complex costs involved than
number of sensors in a tree.
37
Cost Functions
•Stroud-Saeger method applies to more
sophisticated cost models, not just cost =
number of sensors in the BDT.
•Using a sensor has a cost:
–Unit cost of inspecting one item with it
–Fixed cost of purchasing and deploying it
–Delay cost from queuing up at the sensor
station
•Preliminary problem: disregard fixed and delay
38
costs. Minimize unit costs.
Cost Functions: Delay Costs
•Tradeoff between fixed costs and delay costs:
Add more sensors cuts down on delays.
•More sophisticated models describe the process
of containers arriving
•There are differing delay times for inspections
•Use “queuing theory” to find average delay
times under different models
39
Cost Functions
•Unit Cost Complication: How many nodes of
the decision tree are actually visited during
average container’s inspection? Depends on
“distribution” of containers.
•Answer can also depend on probability of
sensor errors and probability of “bomb” in a
container.
40
Cost Functions:
Unit Costs
Tree Utilization
•In our early models, we assume we are given
probability of sensor errors and probability of
bomb in a container.
•This allows us to calculate “expected” cost of
utilization of the tree Cutil.
41
Cost Functions
OTHER COSTS:
•Cost of false positive: Cost of additional
tests.
–If it means opening the container, it’s
expensive.
•Cost of false negative:
–Complex issue.
–What is cost of a bomb going off in
Manhattan?
42
Cost Functions: Sensor Errors
•One Approach to False Positives/Negatives:
Assume there can be Sensor Errors
•Simplest model: assume that all sensors checking
for attribute ai have same fixed probability of
saying ai is 0 if in fact it is 1, and similarly
saying it is 1 if in fact it is 0.
•More sophisticated analysis later describes a
model for determining probabilities of sensor
errors.
•Notation: X = state of nature (bomb or no bomb)
Y = outcome (of sensor or entire inspection
43
process).
Probability of Error for The Entire Tree
State of nature is zero (X =
0), absence of a bomb
State of nature is one (X =
1), presence of a bomb
A
A
C
0
B
0
B
0
C
1
1
Probability of false positive
(P(Y=1|X=0))
for this tree is given by
0
1
1
Probability of false negative
(P(Y=0|X=1))
for this tree is given by
P(Y=1|X=0) = P(YA=1|X=0) * P(YB=1|X=0)
+ P(YA=1|X=0) *P(YB=0|X=0)* P(YC=1|X=0)
P(Y=0|X=1) = P(YA=0|X=1) +
P(YA=1|X=1) *P(YB=0|X=1)*P(YC=0|X=1)
Pfalsepositive
Pfalsenegative
44
Cost Function used for Evaluating
the Decision Trees.
CTot = CFalsePositive *PFalsePositive + CFalseNegative *PFalseNegative +
Cutil
CFalsePositive is the cost of false positive (Type I error)
CFalseNegative is the cost of false negative (Type II error)
PFalsePositive is the probability of a false positive occurring
PFalseNegative is the probability of a false negative occurring
Cutil is the expected cost of utilization of the tree.
45
Cost Function used for Evaluating
the Decision Trees.
CFalsePositive is the cost of false positive (Type I error)
CFalseNegative is the cost of false negative (Type II error)
PFalsePositive is the probability of a false positive occurring
PFalseNegative is the probability of a false negative occurring
Cutil is the expected cost of utilization of the tree.
PFalsePositive and PFalseNegative are calculated from the tree.
Cutil is calculated from tree and probabilities of bomb in
container and probability of sensor errors.
CFalsePositive, CFalseNegative are input – given information.
46
Stroud Saeger Experiments
• Stroud-Saeger ranked all trees formed
from 3 or 4 sensors A, B, C and D
according to increasing tree costs.
• Used cost function defined above.
• Values used in their experiments:
– CA = .25; P(YA=1|X=1) = .90; P(YA=1|X=0) = .10;
– CB = 10; P(YC=1|X=1) = .99; P(YB=1|X=0) = .01;
– CC = 30; P(YD=1|X=1) = .999; P(YC=1|X=0) = .001;
– CD = 1; P(YD=1|X=1) = .95; P(YD=1|X=0) = .05;
– Here, Ci = unit cost of utilization of sensor i.
• Also fixed were: CFalseNegative, CFalsePositive, P(X=1)
47
Sensitivity Analysis
• When parameters in a model are not known
exactly, the results of a mathematical analysis
can change depending on the values of the
parameters.
• It is important to do a sensitivity analysis: let the
parameter values vary and see if the results
change.
• So, do the least cost trees change if we change
values like probability of a bomb, cost of a false
positive, etc?
48
Stroud Saeger Experiments: Our
Sensitivity Analysis
• We have explored sensitivity of the StroudSaeger conclusions to variations in values of the
three parameters:
CFalseNegative, CFalsePositive, P(X=1)
• Extensive computer experimentation.
• Fascinating results.
• To start, we estimated
high and low values
for the parameters.
49
Stroud Saeger Experiments: Our
Sensitivity Analysis
– CFalseNegative was varied between 25 million and 10
billion dollars
• Low and high estimates of direct and indirect costs
incurred due to a false negative.
– CFalsePositive was varied between $180 and $720
• Cost incurred due to false positive
(4 men * (3 -6 hrs) * (15 – 30 $/hr)
– P(X=1) was varied between 1/10,000,000 and
1/100,000
50
Stroud Saeger Experiments: Our
Sensitivity Analysis
•
•
•
•
•
•
n = 3 (use sensors A, B, C)
Varied the parameters
CFalseNegative, CFalsePositive, P(X=1)
We chose the value of one of these parameters from the
interval of values
Then explored the highest ranked tree as the other two
parameters were chosen at random in the interval of
values.
10,000 experiments for each fixed value.
We looked for the variation in the top-ranked tree and
how the top-rank related to choice of parameter values.
Very surprising results.
51
Frequency of Top-ranked Trees when
CFalseNegative and CFalsePositive are Varied
7000
1st
2nd
3rd
4th
5th
6000
Frequency
5000
4000
3000
2000
1000
0
0
10
20
30
40
50
60
Tree no.
•
•
10,000 randomized experiments (randomly selected values of CFalseNegative and
CFalsePositive from the specified range of values) for the median value of P(X=1).
The above graph has frequency counts of the number of experiments when a
particular tree was ranked first or second or third and so on.
• Only three trees (7, 55 and 1) ever came first. 6 trees came second,
10 came third, 13 came fourth.
52
Frequency of Top-ranked Trees when
CFalseNegative and P(X=1) are Varied
8000
1st
2nd
3rd
4th
5th
7000
6000
Frequency
5000
4000
3000
2000
1000
0
0
10
20
30
40
50
60
Tree no.
• 10,000 randomized experiments for the median value of CFalsePositive.
• Only 2 trees (7 and 55) ever came first. 4 trees came second. 7
trees came third. 10 and 13 trees came 4th and 5th respectively.
53
Frequency of Top-ranked Trees when
P(X=1) and CFalsePositive are Varied
7000
1st
2nd
3rd
4th
5th
6000
Frequency
5000
4000
3000
2000
1000
0
0
10
20
30
40
50
60
Tree no.
• 10,000 randomized experiments for the median value of CFalseNegative.
• Only 3 trees (7, 55 and 1) ever came first. 6 trees came second. 10
54
trees came third. 13 and 16 trees came 4th and 5th respectively.
Most Frequent Tree Groups Attaining
the Top Three Ranks.
• Trees 7, 9 and 10
A
B
B
0
C
0
1
A
A
1
C 0
0
0
B
1
A
C
1
A 0
0
0
1
1
All the three decision trees have been generated from the same
Boolean function: 00000111 representing F(000)F(001)…F(111)
Both Tree 9 and Tree 10 are ranked second and third more than
55
99% of the times when Tree 7 is ranked first.
Most Frequent Tree Groups Attaining
the Top Three Ranks
• Trees 55, 57 and 58
A
1
B
1
1
1
C
1
C
0
B
1
A
1
C
0
B
1
A
0
1
All three trees correspond to the same Boolean function: 01111111
Tree ranked 57 is second 96% of the times and tree 58 is third
79 % of the times when tree 55 is ranked first.
56
Most Frequent Tree Groups Attaining
the Top Three Ranks
• Trees 1, 3, and 2
A
B
B
0
0
A
0
0
C
0
A
1
0
C
0
C
0
1
B
0
1
All three trees correspond to the same Boolean function: 00000001
Tree 3 is ranked second 98% of times and tree 2 is ranked third
80 % of the times when tree 1 is ranked first.
57
Most Frequent Tree Groups Attaining
the Top Three Ranks
•
•
•
•
Challenge: Why so few trees?
Why these trees?
Why so few Boolean functions?
Why these Boolean functions?
58
Stroud Saeger Experiments:
Sensitivity Analysis: 4 Sensors
• Second set of computer experiments: n = 4
(use sensors, A, B, C, D).
• Same values as before.
• Experiment 1: Fix values of two of CFalseNegative,
CFalsePositive, P(X=1) and vary the third through
their interval of possible values.
• Experiment 2: Fix a value of one of CFalseNegative,
CFalsePositive, P(X=1) and vary the other two.
• Do 10,000 experiments each time.
59
• Look for the variation in the highest ranked tree.
Stroud Saeger Experiments: Our
Sensitivity Analysis: 4 Sensors
• Experiment 1: Fix values of two of
CFalseNegative, CFalsePositive, P(X=1) and vary the
third.
60
CTot vs CFalseNegative for Ranked 1 Trees
(Trees 11485(9651) and 10129(349))
Only two trees ever were ranked first, and one, tree 11485, was
ranked first in 9651 out of 10,000 runs.
61
CTot vs CFalsePositive for Ranked 1 Trees (Tree
no. 11485 (10000))
One tree, number 11485, was ranked first every time.
62
CTot vs P(X=1) for Ranked 1 Trees (Tree
no. 11485(8372), 10129(488), 11521(1056))
Three trees dominated first place. Trees 10201(60), 10225(17) and63
10153(7) also achieved first rank but with relatively low frequency.
Tree Structure For Top Trees
a
a
b
b
b
b
c
d
0
c
1
0
d
c
1
1 0
1
1
Tree number 11485
Boolean Expr: 0101011101111111
1
c
1
d
c
d
0
d
0
0
1
1 0
1
d
1
1
Tree number 10129
Boolean Expr: 0001011101111111
Note how close the Boolean expressions are
64
Most Frequent Tree Groups Attaining
the Top Three Ranks
•
•
•
•
Same challenge as before: Why so few trees?
Why these trees?
Why so few Boolean functions?
Why these Boolean functions?
65
Stroud Saeger Experiments: Our
Sensitivity Analysis: 4 Sensors
• Experiment 2: Fix the values of one of
CFalseNegative, CFalsePositive, P(X=1) and vary the
others.
66
Stroud Saeger Experiments: Our
Sensitivity Analysis: 4 Sensors
• Experiment 2: Fix the values of one of
CFalseNegative, CFalsePositive, P(X=1) and vary
the others.
• Similar
results
67
Conclusions from Sensitivity
Analysis
• Considerable lack of sensitivity to
modification in parameters for trees using 3 or
4 sensors.
• Very few optimal trees.
• Very few Boolean functions arise among
optimal and near-optimal trees.
• Surprising results.
68
New Idea: Searching through a
Generalized Tree Space
• Sometimes adding more possibilities results in being
able to do more efficient searches.
• We expand the space of trees from those
corresponding to Stroud and Saeger’s “Complete
and Monotonic” Boolean Functions to “Complete
and Monotonic” BDTs.
• Advantages:
– Unlike Boolean functions, BDTs may not have to consider
all sensor inputs to give a final decision.
– Allow more potentially useful trees to participate in the
analysis
– Help define an irreducible tree space for search operations69
Revisiting Monotonicity
•
Monotonic Decision Trees
– A binary decision tree will be called monotonic
if all the left leaves are class “0” and all the
right leaves are class “1”.
•
a
Example:
b
a b c F(abc)
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
1
0
1
0
b
c
1 0
a
1
0
0
c
c a
c
1
0
0 11 0
c
b
c
b
c
a a 1
0 11 0
b
0
a
b
0
1
1
c
a
a 1
1 0 0 1
a
0
b
a
0
1
c
b
0 a
1
c
0 1
b
1
0
a
a
b 1
1 0 0 1
All these trees correspond to same monotonic Boolean function70
Only one is a monotonic BDT.
Revisiting Completeness
• Complete Decision Trees
– A binary decision tree will be called complete if every
sensor occurs at least once in the tree and, at any nonleaf node in the tree, its left and right sub-trees are not
identical.
• Example:
a
a b c F(abc)
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
1
0
1
1
1
b
c
0
a
c
1 b
1
c
1
0 1
b
0
b
1
1
0 1
a
c
b
0
a
c
1 b
1
c 1
0 1
b
1
c
0
b
1
1
c 1
0 1
71
The CM Tree Space
complete, monotonic BDTs
Distinct BDTs
Trees From CM
Boolean
Functions
Complete,
Monotonic
BDTs
2
74
4
4
3
16,430
60
114
4
1,079,779,602
11,808
66,936
No. of
attributes
72
Tree Neighborhood and Tree Space
• Define tree neighborhood by giving operations for
moving from one tree in CM Tree Space to another.
• We have developed an algorithm for finding lowcost BDTs by searching through CM Tree Space
from a tree to one of its neighbors.
73
Search Operations in Tree Space
• Split
Pick a leaf node and replace it with a sensor that is
not already present in that branch, and then insert
arcs from that sensor to 0 and to 1.
a
b
0
a
c
c
d 1
b
SPLIT
0
d 10 1
0
1
c
c
d 1
d 1 b 1
0
1 0
1
74
Search Operations
• Swap
Pick a non-leaf node in the tree and swap it with its
parent node such that the new tree is still
monotonic and complete and no sensor occurs
more than once in any branch.
a
b
0
a
c
c
d 1
b
SWAP
0
d 10 1
0
1
c
d
d 1
c 10 1
0
1
75
Search Operations
• Merge
Pick a parent node of two leaf nodes and make it a
leaf node by collapsing the two leaf nodes below it,
or pick a parent node with one leaf node, collapse
both the parent node and its one leaf node, and
shift the sub-tree up in the tree by one level.
a
b
0
c
c
a
a
d 1
d 1 0 1
b
MERGE
0
c
d
d 1 0
0 1 0 1
b
c
c
d 1
0 1 0 1
0 1
76
Search Operations
• Replace
Pick a node with a sensor occurring more than
once in the tree and replace it with any other
sensor such that no sensor occurs more than once
in any branch.
a
b
0
a
c
c
d 1
b
REPLACE
0
d 1 0 1
0
1
c
c
b 1
d 1 0 1
0
1
77
78
Tree Neighborhood and Tree Space
• Define tree neighborhood by using these four
operations for moving from one tree in CM
Tree Space to another.
• Irreducibility
– Theorem: Any tree in the CM tree space can be
reached from any other tree by using these
neighborhood operations repetitively
– An irreducible CM tree space helps “search” for
the cheapest trees using neighborhood operations
79
Tree Neighborhood and Tree Space
Sketch of Proof of the
Theorem:
• Simple Tree:
– A simple tree is defined
as a CM tree in which
every sensor occurs
exactly once in such a
way that there is exactly
one path in the tree with
all sensors in it.
80
Tree Neighborhood and Tree Space
Sketch of Proof of the Theorem:
• To Prove: Given any two trees τ1, τ2 in CM tree
space, τ2 can be reached from τ1 by a sequence of
neighborhood operations
• We prove this in three different steps:
– 1. Any tree τ1 can be converted to a simple tree
τs1
– 2. Any simple tree τs1 can be converted to any
other simple tree τs2
– 3. Any simple tree τs2 can be converted to any
tree τ2
81
Tree Space Traversal
•
Naïve Idea: Greedy Search
1. Randomly start at any tree in the CM tree
space
2. Find its neighboring trees using the above
operations
3. Move to the neighbor with the lowest cost
4. Iterate until we find a minimum
–
Problem: The CM Tree space is highly multimodal (more than one local minimum)!
–
Therefore, we implement a stochastic search
algorithm with simulated annealing to find the
82
best tree
Tree Space Traversal
• Stochastic Search
– Randomly start at any tree in CM space
– Find its neighboring trees, and evaluate each one for its
total cost
– Select next move according to a probability distribution
over the neighboring trees
• To deal with the multimodality of the tree space, we
introduce Simulated Annealing:
– Make more random jumps initially, gradually decrease the
randomness and finally converge at the overall minimum
83
Results: Searching CM Tree Space
• We were able to perform experiments for 3, 4 and 5
sensors successfully by searching CM Tree Space.
• Results show improvement compared to the extensive
search method over BDTs corresponding to complete,
monotone Boolean functions. E.g., n = 4 (66,936 trees)
– 100 different experiments were performed
– Each experiment was started 10 times randomly at some tree in
CM Tree Space and chains were formed by making stochastic
moves in the neighborhood, until we find a local minimum
– Only 4890 trees were examined on average for every
experiment
– Global minimum was found 82 out of 100 times while the
second best tree was found 10 times
– The method found trees that were less costly than those found
by earlier searches of BDTs corresponding to complete, 84
monotonic Boolean functions.
Genetic Algorithms-based Approach
• Structure-based neighborhood moves allow
very short moves only. Therefore,…
• Techniques like Genetic Algorithms and
Evolutionary Techniques may suggest ways
for getting more efficiently to better trees,
given a population of good trees
85
Genetic Algorithms-based Approach
• Started implementing genetic algorithms-based
techniques for tree space traversal
• Basically, we try to get “better” trees from the
current population of “good” trees using the
basic genetic operations on them:
– Selection
– Crossover
– Mutation
• Here, “better” decision trees correspond to
lower cost decision trees than the ones in the
current population (“good”).
86
Genetic Algorithms-based Approach
• Selection:
– Select a random, initial population of N trees from
CM tree space
• Crossover:
– Performed k times between every pair of trees in
the current best population, bestPop
87
Genetic Algorithms-based Approach
• For each crossover operation between two
trees, we randomly select a node in each tree
and exchange their subtrees
• – However, we impose certain restrictions on
the selection of nodes, so that the resultant
trees still lie in CM tree space
88
Genetic Algorithms-based Approach
• Mutation:
– Performed after every m generations of the
algorithm
– We do two types of mutations:
• 1. Generate all neighbors of the current best
population and put them into the gene pool
• 2. Replace a fraction of the trees of bestPop
with random trees from the CM tree space
89
Genetic Algorithms-based Approach
• Only ~1600 trees had to be examined to
obtain the 10 best trees for 4 sensors!
90
Modeling Sensor Errors
•One Approach to Sensor Errors: Modeling
Sensor Operation
•Threshold Model:
–Sensors have different discriminating power
–Many use counts (e.g., Gamma radiation
counts)
–See if count exceeds
threshold
–If so, say attribute is present.
91
Modeling Sensor Errors
Threshold Model:
•Sensor i has discriminating power Ki,
threshold Ti
•Attribute present if counts exceed Ti
•Seek threshold values that minimize the
overall cost function, including costs of
inspection, false positive/negative
•Assume readings of category 0 containers
follow a Gaussian distribution and similarly
category 1 containers
•Simulation approach
92
Probability of Error for Individual
Sensors
• For ith sensor, the type 1 (P(Yi=1|X=0)) and type 2
(P(Yi=0|X=1)) errors are modeled using Gaussian
distributions.
– State of nature X=0 represents absence of a bomb.
– State of nature X=1 represents presence of a bomb.
– i represents the outcome (count) of sensor i.
– Σi is variance of the distributions
– PD = prob. of detection, PF = prob. of false pos.
Ki
P(i|X=0)
Ti
P(i|X=1)
i
Characteristics of a typical sensor
93
Modeling Sensor Errors
The probability of false positive for the ith sensor is computed as:
P(Yi=1|X=0) = 0.5 erfc[Ti/√2]
The probability of detection for the ith sensor is computed as:
P(Yi=1|X=1) = 0.5 erfc[(Ti-Ki)/(Σ√2)]
erfc = complementary error function erfc(x) = (1/2,x2)/sqrt()
The following experiments have been done using sensors A, B,
C and using:
KA = 4.37; ΣA = 1
KB = 2.9; ΣB = 1
KC = 4.6; ΣC = 1
We then varied the individual sensor thresholds TA, TB and TC
from -4.0 to +4.0 in steps of 0.4. These values were chosen since
they gave us an “ROC curve” for the individual sensors over a
complete range P(Yi=1|X=0) and P(Yi=1|X=1)
94
Frequency of First Ranked Trees for
Variations in Sensor Thresholds
18000
16000
14000
Frequency
12000
10000
8000
6000
4000
2000
0
0
10
20
30
40
50
60
Tree no.
• Extensive Search: 68,921 experiments were conducted, as each Ti was
varied through its entire range. (n = 3)
• The above graph has frequency counts of the number of experiments when
a particular tree was ranked first. There are 15 such trees. Tree 37 had the
highest frequency of attaining rank one.
95
Modeling Sensor Errors
•A number of trees ranking first in other
experiments also ranked first here.
•Similar results in case of n = 4.
•4,194,481 experiments.
•244 different trees were ranked first in at least one
experiment.
•Trees ranked first in other experiments also
frequently appeared first here.
•Conclusion: considerable insensitivity to change
96
of threshold.
New Approaches to Optimum
Threshold Computation
• Extensive search over a range of thresholds
(e.g., -4.0 to +4.0 in steps of 0.4) has some
practical drawbacks:
– Large number of threshold values for every sensor
– Large step size
– Grows exponentially with the number of sensors
(computationally infeasible for n > 4)
• A non-linear optimization approach proves
more satisfactory:
– A combination of Gradient Descent and modified
97
Newton’s methods
Problems with Standard Approaches
• Gradient Descent Method:
– Too small step size results in large number of
iterations to reach the minimum
– Too big step size results in skipping the minimum
• Newton’s Method:
– The convergence depends largely on the starting
point. This method occasionally drifts in the wrong
direction and hence fails to converge.
• Solution: combination of gradient descent and
Newton’s methods
• This works well.
98
Results: Threshold Optimization
• Costs of false positive CFalsePositive and false
negative CFalseNegative and prior probability of
occurrence of a bad container, P(X=1), were
fixed as medians of the min and max values
given by Stroud and Saeger (same as we used in
earlier experiments)
• We were able to converge to a (hopefully-closeto-minimum) cost every time with a modest
number of iterations changing thresholds.
99
Results: Threshold Optimization
• We were able to converge to a (hopefully-close-tominimum) cost every time with a modest number of
iterations changing thresholds. For example:
– For 3 sensors, it took an average of 0.081 seconds (as
opposed to 0.387 seconds using extensive search) to
converge to a cost for all 114 trees studied
– For 4 sensors, it took an average of 0.196 seconds (as
opposed to more than 2 seconds using extensive search)
to converge to a cost for all 66,936 trees studied
• In each case, min cost attained with new algorithm
was lower, and often much lower, than that
100
attained with extensive search.
Results: Threshold Optimization
Tree costs at optimum thresholds
500
Combined Optimization
Extensive search
450
400
Total Cost
350
300
250
200
150
100
0
20
40
60
80
100
Tree Number
Many times the minimum obtained using the
optimization method was considerably less
than the one from the extensive search
technique.
101
Closing Comments
• Very few optimal trees; optimality insensitive to
changes in parameters.
• Extensive search techniques become practically
infeasible beyond a very small number of sensors
• Studying an irreducible tree space helps us to
“search” for the best trees rather than evaluating all
the trees for their cost
• A new stochastic search algorithm allows us to
search for optimum inspection schemes beyond 4
sensors successfully
• Our new threshold optimization algorithms provide
faster ways to arrive at a low tree cost; cost is lower
and often much lower than in extensive search
102
Discussion and Future Work
•Future Work: Explain why conclusions are so
insensitive to variation in parameter values.
•Future Work: Explore the structure of the
optimal trees and compare the different optimal
trees.
•Future Work: Develop methods for
approximating the optimal tree.
Pallet vacis
103
Discussion and Future Work
•Future work: More than two values of an
attribute
(present, absent, present with
probability > 75%, absent with probability
at least 75%)
(ok, not ok, ok with probability > 99%,
ok with probability between 95% and
99%)
•Future work: In the Boolean function model:
inferring the Boolean function from
observations (partially defined Boolean
104
functions)
Discussion and Future Work
•Future work: Need for more complicated
cost models; bringing in costs of delays
105
Discussion and Future Work
• Future work: Because of the rapid growth in
number of trees in CM Tree Space when the
number of sensors grows, it is necessary to try
to reduce the number of trees we need to
search through.
• A notion of tree equivalence could be
incorporated when the number of sensors go
beyond 5 or 6
• We hope that incorporating this into our model
will enable us to extend our model to a large
106
number of sensors
Collaborators on this Work:
• Saket Anand
• David Madigan
• Richard Mammone
• Sushil Mittal
• Saumitr Pathak
Research Support:
• Dept. of Homeland Security University Programs
• Domestic Nuclear Detection Office
• Office of Naval Research
• National Science Foundation
Los Alamos National Laboratory:
• Rick Picard
• Kevin Saeger
• Phil Stroud
107
This work has gotten me places I never
thought I’d go.
More information:
http://ccicada.org
http://dimacs.rutgers.edu
froberts@dimacs.rutgers.edu
108
Download