Applying Diffusion Distance for Multi-Scale Analysis of an Experience Space WeiLi Ge

advertisement
Manifold Learning and Its Applications: Papers from the AAAI Fall Symposium (FS-10-06)
Applying Diffusion Distance for Multi-Scale
Analysis of an Experience Space
Meng Su, Xiaocong Fan
WeiLi Ge
The Behrend College
The Pennsylvania State University
Erie, PA 16563, USA
Email: {mengsu, xfan}@psu.edu
School of Information Engineering
Zhengzhou University
Zhengzhou, 450001, China
Email: iewlge@zzu.edu.cn
The diffusion geometry framework (Coifman and Lafon
2006) introduces the diffusion distance (and diffusion maps,
diffusion wavelets) and offers a general foundation for multiscale analysis on massive data sets. For a data set X
with n observations, suppose a pairwise similarity matrix
W = {wi,j } can be built. The n × n matrix W is called a
kernel function, representing some notion of affinity or similarity between pairs of points of X. A typical choice is the
2
Gaussian kernel: wi,j = e−(||xi −xj ||/ε) , where ε is a scale
(precision) parameter of the Gaussian distribution.
W is then normalized to obtain a matrix P = D−1 W ,
where D is the diagonal matrix with entries Dii =
n
j=1 wi,j . The matrix P is the diffusion operator on X,
where each entry p(i, j) = wi,j /Dii can be viewed as the
transition probability of going from node i to node j in one
time step of the Markov chain on X. Since P encodes the
geometric information about the data set X, the transitions
directly reflect the local geometry defined by the immediate
neighbors of each node in the graph of the data. Carrying
this “random walk” view further, P t gives the probability of
transitions from one node to another in t steps.
A family of diffusion distances Dt at step t is defined as
Abstract
Diffusion distance has been shown to be significantly
more effective than Euclidean distance in multi-scale
recognition of similar experiences in RecognitionPrimed Decision making (Fan and Su 2010). In this
paper, we first examine the experience data set used in
the previous study. The visualization of the data set
(using the first three dominant eigenvectors of the diffusion space) suggests the applicability of the diffusion
approach. Second, we investigate two approaches to the
computation of diffusion distance: Spectrum based and
Probability-Matching based. Specifically, by ‘Spectrum
based’ approach we refer to the one derived in terms
of the eigenvalues/eigenvectors of the normalized diffusion matrix (Coifman and Lafon 2006). We use the term
‘Probability-Matching’ to refer to the use of various
probability distances, where the original L2 diffusion
distance is treated as a special case. Our preliminary result indicates that the performance of using L2 diffusion
distance at least is tied with the use of Spectrum based
distance. Furthermore, when spectrum based approach
is applied, we have to use the embedding and extending
techniques for labeling new experience data (Lieu and
Saito 2009), while such re-computation is not necessary
when the L2 diffusion distance is used. We do not need
to re-compute the diffusion matrix, hence the diffusion
map each time when adding a new data. It is more natural and robust especially for labeling new single experience data. The numerical examples also show the improvement on the performance. We are currently working on several other Probability-Matching approaches
(e.g. the Earth-Mover’s Distance).
1
2 Δ
Dt (x, y) =
n
(p(z, t|x) − p(z, t|y))2 /w(z)
z=1
t
t
=Px.
− Py.
L2 (X)
(1)
where p(z, t|x) is the probability of therandom walk from
t
is
node x to node z after t steps, w(z) = x wz,x , and Px.
the row vector in matrix P t for point x.
Diffusion distances can be computed using eigenvectors
{ψk }(0 ≤ k ≤ L) and eigenvalues {γk } of P (Coifman and
Lafon 2006) where 1 = γ0 > |γ1 | ≥ |γ2 | ≥ · · · |γL |:
L
γk2t (ψk (x) − ψk (y))2
(2)
Dt (x, y)2 =
Introduction
Multi-agent research, including large-scale agent organization and experience-based decision making often needs to
identify similar events or patterns of events buried in a massive and high dimensional vector space. They may cross
policy boundaries, where multi- scale information analysis
is the key to effective reasoning at multiple policy/priority
levels. Diffusion geometry provides a natural framework to
study the clustering and labeling of these high-dimensional
data. The results can lead to the accurate decision making.
k=1
A family of diffusion maps {Ψt }t∈N
⎛ t
γ1 ψ1 (x)
t
γ
⎜
2 ψ2 (x)
Δ
Ψt (x) = ⎜
..
⎝
.
c 2010, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
γLt ψL (x)
59
is defined by
⎞
⎟
⎟.
⎠
(3)
That is, the ordinary Euclidean distance in the diffusion
space measures the intrinsic diffusion distance on the data.
Euclidean distance often fails in capturing the global spatialrelation among points of a massive data set, while the diffusion distance has a global meaning for data sets with a nonlinear geometric structure (manifold), and is very robust to
noise data.
Since in many practical applications the spectrum of the
matrix P has a spectral gap with only a few eigenvalues
close to 1 and all the others much smaller than 1, the diffusion distance at a large enough scale t can be well approximated by only the first few δ eigenvectors. Such an observation provides a theoretical justification for dimensional
reduction.
However, calculating diffusion distance by formula (2)
need to evaluating eigenvalues and eigenvectors. It is especially awkward if you need to do that every time when
you add a new data because the matrix has been changed.
Lieu and Saito proposed to compute the diffusion distance
directly from the definition (1) for t=1, which they named
it NCM (Node Connectivities Matching)(Lieu and Saito
2009). They applied this new approach to image pixel type
recognition and showed it outperform the methods based on
the diffusion distance defined by (2).
The rest of the paper is organized as follows. We frame
our problem in section 2. In section 3, we introduce our
method by generalizing NCM to the mutli-scale levels for
any t. Our experiment results are presented and our further
work are discussed. The last section summarizes the results.
2
Table 2: Course of action and related parameters
Seq
1
2
3
4
5
Action
AgentRecall(Squad, X1)
AgentAssignment(Squad, X2)
AgentRecall(EOD, X3)
AgentAssignment(EOD, X4)
ReadinessRecover(X5, X6)
6
7
MoveTo/RushTo/CreepOver
CaptureInsurgent/Monitor
/DisperseF/DisperseW
/DisperseP/RemoveIED
Description
Recall X1 number of squads
Assign X2 number of squads
Recall X3 number of EODs
Assign X4 number of EODs
Recover X6 number of agents’
readiness to X5 percentage
X7: get to the target’s location
X8: approach to handle threats
We can frame our problem as follows. Suppose an agent
has a large set E of experiences (knowledge collected from
domain experts) about decision making on a certain task
type, and the set F = {Fi |1 ≤ i ≤ m} of features (type
of information) relevant to this task type is fixed. Each experience ei = Si , Ai ∈ E has two parts: feature-based situation description Si = (f1 , f2 , · · · , fm ) where fi is a value
for feature Fi , and a course of action Ai = (a1 , a2 , · · · , ak )
(i.e. a solution successfully implemented in a situation as
described by Si ).
Given a new decision situation D = (d1 , d2 , · · · , dm )
where di is a value for feature Fi , the feature-matching problem is to find a set E ∗ ⊂ E such that those experiences in
E ∗ are similar to D in terms of all the features being considered. The solution construction problem is to synthesize a
new course of action such that the solution part of each experience ei ∈ E ∗ has an appropriate influence on the new synthesized solution. Our objective here is to examine whether
the use of diffusion distance, as compared with the traditional Euclidean distance, can perform better in computing
the set E ∗ so that solutions of higher quality can be produced
afterward.
Experiences in real-world problems are high-dimensional
data. For this study, we choose to use a data set E including 16, 383 decision making experiences about how C2
teams react to potential threats that emerge unexpectedly in
a metropolis. Members of a C2 team, consisting of an S2
agent (intelligence cell) and an S3 agent (operations cell),
need to work collaboratively to handle three types of targets: crowds, insurgents, and IEDs (Improvised Explosive
Device). Two types of friendly units are under S3’s charge:
Squad units and EOD (Explosive Ordnance Disposal) teams.
The S3 agent, when making decisions on how to handle
a specific threat, needs to consider information about 34
features (which are classified as target-specific, situationspecific, or weather-related) and decide resource allocation
actions appropriate for that threat.
Table 1 gives a portion of the situation description of three
example experiences, and Table 2 gives the fixed set of action types and the corresponding parameters. All the experiences in E are complete (both the situation description
part and the course of action part are available). Since the
course of action of all the experiences in E has the same sequence as shown in Table 2, all that matters is the values for
parameters X1 through X8. For each experience ei ∈ E,
Problem Definition
Naturalistic decision making (NDM) focuses on how people actually make decisions in realistic settings that typically involve ill-structured problems, uncertain dynamic
environments, shifting/competing goals, and time stress
(Zsambok and Klein 1997). One particular model is
Klein’s Recognition-Primed Decision framework (RPD)
(Klein 1989). The RPD model is based on the supposition
that in complex situations human experts usually make decisions based on the recognition of similarities between the
current decision situation and previous decision experiences.
Cognitive studies have shown that over 95% of human decisions conform to the RPD model in various time-stressed
situations (Klein 1998).
The RPD model (Klein 1989) has a recognition phase
and an evaluation phase. In the recognition phase, a decision maker synthesizes the observed information about the
current decision situation into appropriate cues or pattern
of cues, then employs a strategy called “feature-matching”
to recall experiences worked before in a similar situation.
These similar experiences are then used to construct candidate solutions, each of which is a course of action that
might be applicable to the current situation to achieve the
goal under concern. In the evaluation phase, the RPD model
stresses on Simon’s satisficing criterion (Simon 1955) rather
than objective optimization: a decision maker considers the
candidate solutions one by one, terminating the evaluation
as soon as a workable solution is obtained.
60
Insurgent
Yes
No
No
IED
No
No
Yes
Table 1: Feature-based situation description of example experiences
Target-specific
Situation-specific
Weather
Crowd
Level Speed CloseToRoute UnitReadiness ... PrecipitationType Rate
No
XHigh Slow
Yes
85
...
Rain
High
Yes
Low
Fast
Yes
80
...
Hail
Light
No
High
None
No
60
...
Snow
High
the values of X1 through X8 can be concatenated into one
string, which will be refered as the label of ei below.
3
Visibility
Haze
Fog
Fog
6
5.5
Methodology and Experiment
Fan and Su(2010) described a approach to compare diffusion distance with the traditional Euclidean distance in identifying similar experiences. This is a scale-up approach at a
diffusion level i, where the diffusion distance, evaluated by
the formula (2) for t = i, is applied to the “transformed”
experience space for similar experience identification. It
is noted that diffusion maps can filter out high-frequency
noises, which suggests that noises can be reduced as diffusion level increases. They utilized this property implied by
the diffusion process and designed an anytime algorithm for
solution construction in recognition-primed decision making. However, calculating diffusion distance by formula (2)
need to evaluate eigenvalues and eigenvectors. It is especially awkward if you need to do that every time when you
add a new data because the matrix has been changed. Lieu
and Saito (Lieu and Saito 2009) proposed to compute the
diffusion distance directly from the definition (1) for t=1,
which was named NCM (Node Connectivities Matching).
As explained in Section 2, the data set E used in this
experiment includes 16, 383 decision making experiences,
which are represented in a 34-dimension feature space.
Since the value ranges of the 34 features are different (some
are indicator variables, some are percentages, some are integers with fixed ranges), the data set is first standardized such
that all the features have the same range [0, 1]. We denote
this standardized set by Xn×m , where n = 16383, m = 34.
Due to the huge number of different possible labels ( >> 28
actions, see Table 2) for each experience, we did not try to
cluster them by labels. However, we test the small number
(e.g. 6 ) of clusters, the distinguishable cluster pattern (Figure 1) in the diffusion space shows that the diffusion distance
can reflect the intrinsic feature of the experience data more
effectively.
As stated in Section 1, from the labeled set X, we first
build a symmetric matrix W where wi,j is the similarity between points xi and xj . W can be taken as a graph, where
points xi , xj ∈ X are connected by an edge of weight wi,j .
For any new unlabeled experience data y, we connect y
with all the points in X and define the similarity vector p =
2
(wi,y )ni=1 , where wi,y = e−(||xi −y||/ε) , q = p/||p||2 .
It is easy to obtain that the diffusion distance at level t
from y to xi is
(4)
Dt (xi , y)2 =Pxti . − q T P t L2 (X)
0.03
5
0.02
4.5
0.01
4
3.5
0
3
−0.01
2.5
−0.02
0.04
2
0.02
0.02
0.01
0
1.5
0
−0.02
−0.01
−0.04
1
−0.02
Figure 1: The clusters of experience data set in the diffusion
space
into account all incidences relating the unlabeled new data
to the labeled training data instead of performing spectral
embedding. This makes it robust to noise, saves time and
improves accuracy.
We apply this algorithm to the 16, 383 decision making
experiences data set mentioned above on t = 1, 2, 4, 8, and
16, and compare their performance with the result using the
traditional Euclidean measurement.
The performance was evaluated in terms of the recoverability of labels. In particular, suppose we are given a decision situation D together with its label ζD . After a set
E ∗ = {e1 , e2 , · · · , ek } of k similar experiences are identi∗
for D by mafied for D, we can generate another label ζD
∗
is determined
jority vote: each part (C1 through C8) of ζD
by majority vote of the corresponding part of experiences in
∗
is the weighted sum of the correctness
E ∗ . The score of ζD
(0 or 1) of each part as compared with the corresponding
part of the known label ζD .
We choose = 1.0, and varied the parameter k ( kNN
from 3, 10, 30) for the nearest neighbor search.
Figure 2(a) plots the result. It shows that the performance
of using direct calculated diffusion distances (levels 1-16)
can be significantly better than using Euclidean distance in
the original space (level O). For instance, the level-1 performance was lower than level-O performance, but the performance increased considerably to its peak as the diffusion level increased from 1 to 4 (regardless of the value of
kNN). Figure 2(a) plots the result. It shows that the performance of using direct calculated diffusion distances (levels
1-16) can be significantly better than using Euclidean distance in the original space (level O). However, it is about
the same as using the ’Spectrum-based’ diffusion distance
Our approach evaluates the diffusion distance directly on
any level t of random walk from the definition of (4). It takes
61
2
1.32
1.3
1.3
1.28
1.28
1.26
1.26
Labeling performance
Labeling performance
x 10
1.24
1.22
1.2
1.18
x 10
1.24
1.22
1.2
1.18
1.16
1.16
1.14
kNN=3
kNN=10
kNN=30
1.14
1.12
1.1
Spectrum−Based Diffusion Distance
5
L Diffusion Distance (Probability−Matching)
5
1.32
1.1
O
kNN=3
kNN=10
kNN=30
1.12
1
2
4
8
16
O
1
2
4
8
16
Diffusion levels
Diffusion levels
(a)
(b)
Figure 2: (a) Performance on L2 diffusion distance definition; (b)Performance on eigenvalue/eigenvector diffusion distance
eralized to any Probability-Matching approaches (e.g. the
Earth-Mover’s Distance).
in Figure 2(b)(Fan and Su 2010). We can extend the definition of diffusion distance by (1) in L2 to be defined by various other probability or statistic distances (Lieu and Saito
2009). We are currently experimenting with these various Probability-Matching approachs (e.g. the Earth-Mover’s
Distance). More results will be available, and hopefully, will
be presented at the Symposium.
4
References
Coifman, R. R., and Lafon, S. 2006. Diffusion maps. Appl.
Comput. Harmon. Anal. 21(1):5–30.
Fan, X., and Su, M. 2010. Using geometric diffusions for
recognition-primed multi-agent decision making. In van der
Hoek; Kaminka; Lesperance; Luck; and Sen., eds., Proc. of
9th Int. Conf. on Autonomous Agents and Multiagent Systems (AAMAS 2010).
Klein, G. A. 1989. Recognition-primed decisions. In Rouse,
W. B., ed., Advances in man-machine systems research, volume 5. Greenwich, CT: JAI Press. 47–92.
Klein, G. A. 1998. Recognition-primed decision making. In
Sources of power: How people make decisions. MIT Press.
15–30.
Lieu, L., and Saito, N. 2009. Signal classification by matching node connectivities. In Statistical Signal Processing,
2009. SSP ’09. IEEE/SP 15th Workshop on, 81 –84.
Simon, H. 1955. A behavioral model of rational choice.
Quarterly Journal of Economics 69:99–118.
Zsambok, C. E., and Klein, G., eds. 1997. Naturalistic
Decision Making. Lawrence Erlbaum Associates.
Conclusion
In this study, we first investigated the agent’s large experience data set in their mapped diffusion space and visualized
their k-mean clustering(e.g. k = 6) in the 3-dimensional
space spanned by the first three dominant eigenvectors of
the diffusion matrix. The ”fireworks” pattern of the clusters suggests the applicability of the diffusion geometry approach to the experience data set. Second, we compared
two approaches to the computation of diffusion distance between two experience vectors: one is derived in terms of
the eigenvalues/eigenvectors of the diffusion matrix; another
one is referred to the original definition of diffusion distance. Our preliminary result indicates that the performance
of our second method at least is tied with the use of Spectrum based distance for our data sets. However, the second
method is simpler because the evaluation of the eigenvalues
and eigenvectors are not necessary. Furthermore, our second approach, which is basically the L2 histogram discriminant of the diffusion distribution of each point, can be gen-
62
Download