Two-stage Framework for Visualization of Clustered High Dimensional Data Jaegul Choo Shawn Bohn

advertisement
Two-stage Framework for Visualization
of Clustered High Dimensional Data
Jaegul Choo∗
Shawn Bohn†
Haesun Park∗
College of Computing
Georgia Institute of Technology
266 Ferst Drive, Atlanta, GA 30332, USA
National Visualization and Analytics Center
Pacific Northwest National Laboratory
902 Battelle Blvd, Richland, WA 99354, USA
College of Computing
Georgia Institute of Technology
266 Ferst Drive, Atlanta, GA 30332, USA
A BSTRACT
In this paper, we discuss dimension reduction methods for 2D visualization of high dimensional clustered data. We propose a twostage framework for visualizing such data based on dimension reduction methods. In the first stage, we obtain the reduced dimensional data by applying a supervised dimension reduction method
such as linear discriminant analysis which preserves the original
cluster structure in terms of its criteria. The resulting optimal reduced dimension depends on the optimization criteria and is often larger than 2. In the second stage, the dimension is further reduced to 2 for visualization purposes by another dimension reduction method such as principal component analysis. The role of the
second-stage is to minimize the loss of information due to reducing
the dimension all the way to 2. Using this framework, we propose
several two-stage methods, and present their theoretical characteristics as well as experimental comparisons on both artificial and
real-world text data sets.
Keywords: dimension reduction, linear discriminant analysis,
principal component analysis, orthogonal centroid method, 2D projection, clustered data, regularization, generalized singular value
decomposition
Index Terms: H.5.2 [INFORMATION INTERFACES AND PRESENTATION]: User Interfaces—Theory and methods
1
I NTRODUCTION
Within the visual analytics community, various types of information content are represented using high dimensional signatures. To
make these signatures useful they often need to be transformed into
a lower dimension (i.e., 2D or 3D) for a variety of visual representations such as scatter plots. Many researchers in this community
have used a wide assortment of dimension reduction techniques,
e.g., self-organizing map (SOM) [12], principal component analysis (PCA) [11], multidimensional scaling (MDS) [2], etc. However,
it is not always clear why a certain technique has been chosen over
another, especially to the end user. Typically, its goal can be viewed
in terms of two aspects: efficiency and accuracy. Efficiency as defined here is the time to compute the reduction, but accuracy may
not be as simple to quantify. Many would amiably agree to quantify accuracy as a measure of the relationship preservation in the
high dimensional space to the reduced dimensional space. Note that
most techniques either directly or indirectly work on this principle.
There are other properties that are important to those interpreting
the semantics of the reduced space. Specifically, we note that while
local neighbor preservation is important it depends upon the analysis task. No single reduction technique will provide the complete
view as various properties of the space are obscured or lost. We
have mentioned that typically the primary objective is relationship
preservation. However, there are at least two others: outlier and
macro structure visualization. Outliers are conceptually easy (i.e.,
a variance beyond some threshold), but more difficult to quantify,
∗ e-mail:
{joyfull, hpark}@cc.gatech.edu
† e-mail:shawn.bohn@pnl.gov
as we do not necessarily know which set of outliers are important
to accentuate to the user. Certain techniques (e.g., PCA) tend to
show outliers more readily, however tend to compress the reduced
space at the expense of showcasing the outliers. Other techniques
(e.g., SOM) maximize space usage well, but do so at the expense
of masking or even hiding those outliers. Likewise, macro structures of the high dimensional space may be masked or massively
distorted during the reduction. Macro structures are those larger order groupings (e.g., clusters) that exist in the original dimensional
space. We recognize they are important in dimension reduction research and to those in the visual analytics community. However,
few of them focus on data representation especially for visualization of the clustered data [20, 13, 3].
We propose theoretical measures for these properties and efficient algorithms which will aid not only the researchers but ultimately the users/analysts to better understand which balance of
properties are important and for which analytic tasks.
2
M OTIVATION
The focus of this paper is the fundamental characteristics of dimension reduction techniques for visualizing high dimensional data in
the form of a 2D scatter plot when the data has cluster structure.
The role of dimension reduction here is to give a 2-dimensional
representation of data while preserving cluster structure as much as
possible. To this end, supervised dimension reduction methods that
incorporate cluster information such as linear discriminant analysis (LDA) [4] or orthogonal centroid method (OCM) [10] can be
naturally considered.
However, one of the issues is that with many dimension reduction methods designed to preserve the cluster structure in the data,
the theoretically optimal reduced dimension, which is the smallest
dimension that is acceptable with respect to the optimization criteria of the dimension reduction method, is usually larger than 2. For
example, in LDA, the minimum reduced dimension that preserves
the cluster structure quality measure defined as a trace maximization problem is one less than the number of clusters in the data in
general [8, 7].
In this case, one may simply choose the two dimensions that
contribute most to such a measure. However, with only two dimensions, such a measure may become significantly smaller than the
original quantity after dimension reduction. This results in loss of
information that hinders visualization in properly reflecting the true
cluster relationship of the data. A similar situation may occur when
using PCA for visualizing the data not having a cluster structure.
Even though PCA finds the principal axes that maximally capture
the variance of the data, when the resulting 2-dimensional representation of the data maintains only a small fraction of the total
variance, the relationships of the data in 2 dimension are likely to
be highly inconsistent with those in the original dimension.
Such loss of information is inevitable in that the dimension has to
be reduced to 2. Our main motivation is to deal with such loss more
carefully by separating the loss-introducing stage from the original dimension reduction methods. Based on this idea, we propose
the two-stage framework of dimension reduction for visualization.
In this framework, a supervised dimension reduction method is applied in the first stage so that the original dimension is reduced to
the minimum dimension achievable while preserving the quality of
cluster measure as defined in a dimension reduction method. The
reduced dimension achieved in the first stage is often larger than
2. Thus in the second stage, we find another dimension reducing
transformation that minimizes the loss introduced in further reducing the dimension all the way to 2. This two-stage framework provides us with a means to flexibly apply different types of dimension
reduction techniques in each stage and to systematically analyze
their effects, which provides understanding the effects of the overall dimension reduction methods for visualization of clustered data.
The issues then are the design of the most appropriate dimension
reduction methods, the modeling of optimization criteria, and the
corresponding solution methods.
In this paper, we present both theoretical and empirical answers
to these issues. Specifically, we propose several two-stage methods
utilizing linear dimension reduction methods such as LDA, orthogonal centroid method (OCM), and principal component analysis
(PCA), and we present their theoretical justifications by modeling
the optimization criteria for which each method provides the optimal solution. Also, we illustrate and compare the effectiveness of
the proposed methods by showing empirical visualization on synthetic and real-world data sets.Although nonlinear dimension reduction methods such as MDS or other manifold learning methods
such as isometric feature mapping [16] and locally linear embedding [14] may also be utilized for the effective 2D visualization of
high dimensional data, our focus in this paper is on linear methods.
The linear methods are computationally more efficient in general,
and unlike most of the manifold learning methods, they also provide
dimension reducing transformations that can be applied to map and
visualize unseen data points in the same space where the existing
data are visualized.
Our approach to successively apply two dimension reduction
methods should be discerned from the previous works [18, 19,
21] in that they usually aim for improving computational efficiency, scalability, or applicability of a certain dimension reduction
method, e.g., LDA.
The rest of this paper is organized as follows. In Section 3, LDA,
OCM, and PCA are described based on a unified framework of the
scatter matrices and their trace optimization problems. In Section 4,
we formulate two-stage dimension reduction methods, and in Section 5, several two-stage methods for visualization are proposed and
compared along with their criteria. Experimental comparisons are
given using artificial and real-world data sets in Section 6, and conclusion and future work are addressed in Section 7.
3
D IMENSION
P ROBLEM
R EDUCTION
AS
T RACE
∑
a j and c =
j∈Ni
1
n
n
∑ a j.
j=1
(i)
The scatter matrix within the i-th cluster Sw , the within-cluster
scatter matrix Sw , the between-cluster scatter matrix Sb , and the
total (or mixture) scatter matrix St are defined [9, 15], respectively,
as
(i)
Sw
=
∑ (a j − c(i) )(a j − c(i) )T ,
j∈Ni
k
Sw
=
(i)
∑ Sw
k
=
i=1
∑ ∑ (a j − c(i) )(a j − c(i) )T ,
k
Sb
=
k
∑ ∑ (c(i) − c)(c(i) − c)T = ∑ ni (c(i) − c)(c(i) − c)T
i=1 j∈Ni
=
(2)
i=1 j∈Ni
1
n
k−1
i=1
k
∑ ∑
ni n j (c(i) − c( j) )(c(i) − c( j) )T , and
(3)
i=1 j=i+1
n
St
=
∑ (a j − c)(a j − c)T .
(4)
j=1
Note that the total scatter matrix St is related to Sw and Sb as [9]
St = Sw + Sb .
(5)
When GT in Eq. (1) is applied to the matrix A, the scatter matrices
Sw , Sb , and St in the original dimensional space are reduced to the
l × l matrices
GT Sw G, GT Sb G, and GT St G,
respectively. By computing the trace of the scatter matrices as
k
trace(Sw )
=
∑ ∑ (a j − c(i) )T (a j − c(i) )
i=1 j∈Ni
k
=
∑ ∑
ka j − c(i) k22 ,
(6)
i=1 j∈Ni
k
trace(Sb )
=
∑ ∑ (c(i) − c)T (c(i) − c)
i=1 j∈Ni
k
∑ ni kc(i) − ck22
(7)
=
1 k−1 k
∑ ∑ ni n j kc(i) − c( j) k22 , and
n i=1
j=i+1
(8)
=
∑ (a j − c)T (a j − c) = ∑ ka j − ck22 ,
=
i=1
GT : x ∈ Rm×1 → z = GT x ∈ Rl×1 .
(1)
Suppose also that a data matrix A = [a1 a2 · · · an ] ∈ Rm×n is given
where the columns a j , j = 1, . . . , n, of A represent n data items in
an m-dimensional space, and they are partitioned into k clusters.
Without loss of generality, for simplicity of notations, we further
assume that A is partitioned as
A2
1
ni
c(i) =
O PTIMIZATION
In this section, we introduce the notions of scatter matrices used
in defining cluster quality and optimization criteria for dimension
reduction.
Suppose a dimension reducing linear transformation GT ∈ Rl×m
maps an m-dimensional data vector x to a vector z in an ldimensional space (m > l):
A = [A1
Let Ni denote the set of column indices that belong to cluster i,
and ni the size of Ni . The i-th cluster centroid c(i) and the global
centroid c are defined, respectively, as
···
Ak ], where Ai ∈ Rm×ni and
k
∑ ni = n.
i=1
trace(St )
n
n
j=1
j=1
(9)
we obtain values that can be used to measure the cluster quality.
Note that from Eqs. (7) and (8), trace(Sb ) can be viewed as the
squared sum of the pairwise distances between cluster centroids as
well as that of the distances between each centroid and the global
centroid.
The cluster structure quality can be defined by analyzing how
well each cluster can be discriminated from each other. High quality clusters usually have small trace(Sw ) and large trace(Sb ), relating to the small variance within each cluster and the large distances between clusters. Subsequently, dimension reduction methods may be intended to maximize trace(GT Sb G) and minimize
trace(GT Sw G) in the reduced dimensional space. This simultaneous optimization can be approximated to a single criterion as
Jb/w (G) = max trace((GT Sw G)−1 (GT Sb G)),
(10)
which is the criterion of LDA. In addition, one may focus on maximizing the distances between clusters, which can be represented as
the criterion of OCM, i.e.,
T
Jb (G) = max trace(G Sb G).
GT G=I
(11)
On the other hand, regardless of cluster dependent terms, Sw and
Sb , the trace of the total scatter matrix St can be maximized as
Jt (G) = max trace(GT St G),
GT G=I
(12)
which turns out to be the criterion of PCA. In Eqs. (11) and (12),
without the constraint, GT G = I, Jb (G) and Jt (G) can become arbitrarily large.
In what follows, LDA, OCM, and PCA are discussed based on
such maximization criteria, and their properties relevant to visualization are identified.
3.1 Linear Discriminant Analysis (LDA)
Conceptually, in LDA, we are looking for a dimension reducing
transformation that keeps the between-cluster relationship as remote as possible by maximizing trace(GT Sb G) while keeping the
within cluster relationship as compact as possible by minimizing
trace(GT Sw G). As shown in Eq. (10), the criterion of LDA can be
written as
Jb/w (G) = max trace((GT Sw G)−1 (GT Sb G)).
(13)
It can be shown that for any G ∈ Rm×l where m > l ,
−1
trace((GT Sw G)−1 (GT Sb G)) ≤ trace(Sw
Sb ),
(14)
−1 S )
meaning that the cluster structure quality measured by trace(Sw
b
cannot be increased after dimension reduction [4]. By setting the
derivative of Eq. (13) with respect to G to zero, which gives the
first order optimality condition, it can be shown that the solution
of LDA, where we denote it as GLDA , has the columns which are
the leading generalized eigenvectors u of the generalized eigenvalue
problem,
Sb u = λ Sw u.
(15)
Since the rank of Sb is at most k − 1, LDA achieves the upper bound
of trace((GT Sw G)−1 (GT Sb G)) in Eq. (14) for any l such that l ≥
k − 1, i.e.,
trace((GTLDA Sw GLDA )−1 (GTLDA Sb GLDA ))
−1
= trace(Sw
Sb ) for l ≥ k − 1,
(16)
−1 S ) is preserved between the original
which indicates trace(Sw
b
space and the reduced dimensional space obtained by GLDA .
3.2 Orthogonal Centroid Method (OCM)
Orthogonal centroid method (OCM) [10] focuses only on maximizing trace(GT Sb G) under the constraint of GT G = I. The criterion
of OCM is shown as
Jb (G) = max trace(GT Sb G).
GT G=I
(17)
It is known that for any G ∈ Rm×l where m > l such that GT G =
I,
trace(GT Sb G) ≤ trace(Sb ),
(18)
which means the cluster structure quality measured by trace(Sb )
cannot be increased after dimension reduction. The solution of Eq.
(17) can be obtained by setting the columns of G as the leading
eigenvectors of Sb . Since Sb has at most k − 1 nonzero eigenvalues,
the upper bound of trace(GT Sb G) in Eq. (18) can be achieved for
any l such that l ≥ k − 1, i.e.,
trace(GT Sb G) = trace(Sb ) for l ≥ k − 1.
(19)
Eq. (19) indicates trace(Sb ) is preserved between the original and
the reduced dimensional spaces.
An advantage of OCM is that it achieves an upper bound of
trace(GT Sb G) more efficiently by using QR decomposition, avoiding the eigendecomposition. The algorithm of OCM is as follows. First the centroid matrix C is formed so that each column
of C is composed
of each cluster’s centroid vector, i.e., C =
c1 c2 · · · ck . Then the reduced QR decomposition [5] of
C is computed for C = Qk R where Qk ∈ Rm×k with QTk Qk = I and
R ∈ Rk×k is upper triangular. The solution of OCM, GOCM , is found
as
GOCM = Qk .
Note that the columns of GOCM are composed of the orthogonal
bases for the subspace spanned by the centroids, and l = k in this
case. Finally, OCM achieves
trace(GTOCM Sb GOCM ) = trace(Sb ), where l = k.
By using the equivalence between Eqs. (3) and (3), one can
prove that each pairwise distance between cluster centroids is also
preserved in the reduced dimensional space obtained by OCM.
Another important property of OCM is that by projecting data
into the subspace spanned by the centroids, the order of similarities
between any particular point and centroids are preserved in terms
of Euclidean norm and cosine similarity measure [10, 7]. In other
words, for any vector q ∈ Rm×1 and cluster centroids c(i) and c( j) ,
we have
kq − c(i) k2 < kq − c( j) k2 ⇒
kGTOCM q − GTOCM c(i) k2 < kGTOCM q − GTOCM c( j) k2 , and
qT c(i)
qT c( j)
<
⇒
(i)
kqk2 kc k2
kqk2 kc( j) k2
T
T
GTOCM q GTOCM c( j)
GTOCM q GTOCM c(i)
<
.
kGTOCM qk2 kGTOCM c(i) k2
kGTOCM qk2 kGTOCM c( j) k2
3.3 Principal Component Analysis (PCA)
PCA is a well-known dimension reduction method that captures the
maximal variance in the data. The criterion of PCA can be written
as
Jt (G) = max trace(GT St G).
(20)
GT G=I
For any G ∈ Rm×l where m > l such that GT G = I, we have
trace(GT St G) ≤ trace(St ),
(21)
which means trace(St ) cannot be increased after dimension reduction. The solution of Eq. (20), where we denote it as GPCA , can
be obtained by setting the columns of G as the leading eigenvectors of St . Since the rank of St is at most min(m, n), PCA achieves
the upper bound of trace(GT St G) in Eq. (21) for any l such that
l ≥ min(m, n), i.e.,
trace(GTPCA St GPCA ) = trace(St ) for l ≥ min(m, n).
Table 1: Comparison of dimension reduction methods. It is assumed Sb and St are full rank.
LDA
Optimization Criterion
GT
(x ∈ Rm×1 → y ∈ Rl×1 )
Algorithm
Smallest dimension achieving
the criterion upper bound
Jb/w (G) =
max trace((GT Sw G)−1 (GT Sb G))
generalized eigendecomposition
min
G, GT G=Il
PCA
Jb (G) = max trace(GT Sb G)
Jt (G) = max trace(GT St G)
QR decomposition
symmetric eigendecomposition
k
min(m, n)
GT G=I
k−1
In many applications of PCA, however, l is usually chosen as a
fixed value less than the ranke of St for the purpose of dimension
reduction or noise reduction. This noisy subspace corresponds to
the smallest eigenvectors of St , and they are removed by PCA for
better representation of the data.
Although St is related to Sb and Sw as in Eq. (5), St as it is
does not contain any information on cluster labels. That is, unlike
LDA and OCM, PCA ignores the cluster structure represented by
Sb and/or Sw , which is why PCA is considered as an unsupervised
dimension reduction method.
Usually, PCA assumes that the global centroid is zero by subtracting the empirical mean of the data from each data vector.
The centered data can be represented as A − ceT , where e is ndimensional vector whose components are all 1’s.
PCA has a unique property that, given a fixed l, it produces the
best reduced dimensional representation that minimizes the difference between the centered matrix A − ceT and its projection to the
reduced dimensional space GGT (A − ceT ) where G has orthonormal columns, i.e.,
GPCA = arg
OCM
kGGT (A − ceT ) − (A − ceT )k,
where the matrix norm k · k is either a Frobenius norm or a Euclidean norm.
The three discussed methods are summarized in Table 1.
GT G=I
In the next section, discussion will be focused on various ways
for choosing the first stage dimension reducing transformation G
and the second stage dimension transformation H with a purpose
to construct combined dimension reducing transformation V T =
H T GT for 2-dimensional visualization according to various optimization criteria.
5
T WO - STAGE M ETHODS
FOR
2D V ISUALIZATION
All the proposed two-stage methods start from one of the supervised dimension reduction methods such as LDA or OCM that are
designed for clustered data. In the first stage (by GT ∈ Rl×m in
Eq. (23)), the dimension is reduced by LDA or OCM to the smallest dimension that satisfies Eq. (16) or (19), respectively. Therefore in the first stage, the cluster structure quality measured either
−1 S ) or trace(S ) is preserved. Then we perform the
by trace(Sw
b
b
second-stage dimension reduction (by H T ∈ R2×l in Eq. (24)) that
minimizes the loss of information either by applying the same criterion used in the first stage or by using Jt in Eq. (20), i.e., that of
PCA. As seen in Section 3.3, Eq. (20) gives the best approximation of the first-stage results that minimize the difference in terms
of Frobenius/Euclidean norm.
In what follows, we describe each of the two-stage methods in
detail, and derive their equivalent single-stage methods (by V T ∈
R2×m in Eq. (22)) in case they exist.
5.1 Rank-2 LDA
4
F ORMULATION
OF
T WO - STAGE F RAMEWORK
FOR
V ISU -
ALIZATION
Suppose we want to find a dimension reducing linear transformation V T ∈ R2×m that maps an m-dimensional data vector x to a vector z in a 2-dimensional space (m ≫ 2):
V T : x ∈ Rm×1 → z = V T x ∈ R2×1 .
(22)
Further assume that it is composed of two stages of dimension reductions as follows. In the first stage, a dimension reducing linear
transformation GT ∈ Rl×m maps an m-dimensional data vector x to
a vector y in the l-dimensional space (l ≪ m):
GT : x ∈ Rm×1 → y = GT x ∈ Rl×1 ,
(23)
where l is fixed as its minimum optimal dimension by the first-stage
criterion. When l ≤ 2, we have no further dimension reduction to
do after the first step. However, an optimal l in many methods and
for many data sets is larger than 2, and so we assume that l > 2.
In the second stage, another dimension reducing linear transformation H T ∈ R2×l maps an l-dimensional data vector y to a vector
z in the 2-dimensional space(l > 2):
H T : y ∈ Rl×1 → z = H T y ∈ R2×1 .
(24)
Such consecutive dimension reductions performed by GT followed by H T can be combined, resulting in a single dimension reducing transformation V T as
V T = H T GT .
−1 S )
In this method, LDA is applied in the first stage, and trace(Sw
b
is preserved in the l-dimensional space where l = k − 1. In the
second stage, the same criterion Jb/w (H) is used to reduce the ldimensional first-stage results to 2-dimensional data.
The criterion of the second-stage dimension reducing matrix H
can be formulated as
(25)
Hb/w = max trace((H T (GTLDA Sw GLDA )H)−1
H∈Rl×2
(H T (GTLDA Sb GLDA )H)).
(26)
Assuming the columns of GLDA , which are generalized eigenvectors of Eq. (15), are in decreasing order
of their corresponding
generalized eigenvalues, i.e., GLDA = u1 u2 · · · uk−1 where
λ1 ≥ λ2 ≥ · · · ≥ λk−1 , the solution of Eq. (26) is
Hb/w = e1 e2 ,
where e1 and e2 are the first and the second standard unit
T
1 0 ··· 0
vectors, i.e., e1 =
∈ Rl×1 and e2 =
T
l×1
0 1 0 ··· 0
∈ R . This solution is equivalent to
choosing two dimensions with the most leading generalized eigenvalues from the first stage result, and the resulting two-stage method
can be represented as a single-stage dimension reduction method by
V ∈ Rm×2 which directly maximize Jb/w , i.e.,
Vb/w
=
arg max Jb/w (V )
=
arg max trace((V T SwV )−1 (V T SbV )).
V ∈Rm×2
V ∈Rm×2
(27)
Table 2: Summary of the optimization criteria of the two-stage dimension reduction methods.
Rank-2 LDA
Stage 1: Preservation
trace((H T (GT Sw G)H)−1
(H T (GT Sb G)H))
z ∈ R2×1 )
Vb/w = GLDA Hb/w =
u1
u2
trace (H T (GT St G)H)
,
trace(H T (GTLDA St GLDA )H),
trace (H T (GT Sb G)H)
H T H=I
trace(GTOCM Sb GOCM ) ≫ trace(GTOCM Sw GOCM ),
5.2 LDA followed by PCA
−1 S ) is
In this method, LDA is applied in the first stage, and trace(Sw
b
preserved in the l-dimensional space where l = k − 1. In the second
stage, PCA is applied in order to obtain the best approximation of
the l-dimensional first-stage results in terms of Frobenius/Euclidean
norm.
The second-stage dimension reducing matrix H is obtained by
solving
max
trace (H T (GT St G)H)
H T H=I
Unlike LDA, OCM does not minimize trace(GT Sw G) as shown in
Eq. (17). Therefore the following may not be the case:
where u1 and u2 are the leading generalized eigenvectors of Eq.
(15). This solution is also known as reduced-rank linear discriminant analysis [6].
H∈Rl×2 ,H T H=I
Rank-2 PCA on Sb
trace(GT Sb G) = trace(Sb )
H T H=I
The solution of Eq. (27) becomes
Ht = arg
OCM+PCA
−1 S )
trace((GT Sw G)−1 (GT Sb G)) = trace(Sw
b
GT
(x ∈ Rm×1 → y ∈ Rl×1 )
Stage 2: Maximization
HT
(y ∈ Rl×1 →
LDA + PCA
(28)
where the solution is the two leading eigenvectors of the total scatter
matrix of the first-stage result, GTLDA St GLDA .
From Eq. (5), we have
which means that GTOCM Sb GOCM does not necessarily dominate
GTOCM St GOCM . Then the two principal axes of GTOCM St GOCM obtained by PCA in the second stage tend to fail to reflect those of
GTOCM Sb GOCM , which may rather scatter the data points within
each cluster, eventually preventing the visualization results from
showing a clear cluster structure.
5.4 Rank-2 PCA on Sb
In this method, OCM is applied in the first stage, and trace(Sb ) is
preserved in the l-dimensional space where l = k. In the second
stage, the same criterion Jb (H) is used to reduce the l-dimensional
first-stage results to 2-dimensional data.
The second-stage dimension reducing matrix H is obtained by
solving
Hb = arg
max
H∈Rl×2 ,H T H=I
trace(H T (GTOCM Sb GOCM )H),
(32)
Since LDA conceptually maximizes trace(GT Sb G) and minimizes
trace(GT Sw G), the result is expected to satisfy
where the solution is the two leading eigenvectors of the betweenscatter matrix of the first-stage result, GTOCM Sb GOCM . The columns
of GOCM form the subspace spanned by centroids, and this subspace includes the range space of Sb . Accordingly, one can easily
show that the eigenvector uYi ∈ Rl×1 of GTOCM Sb GOCM is related to
eigenvectors ui ∈ Rm×1 of Sb as
trace(GTLDA Sb GLDA ) ≫ trace(GTLDA Sw GLDA )),
uYi = GTOCM ui
GTLDA St GLDA = GTLDA (Sb + Sw )GLDA .
(29)
which means that GTLDA St GLDA is dominated by GTLDA Sb GLDA , i.e.,
GTLDA (Sb + Sw )GLDA ≃ GTLDA Sb GLDA .
In this case, the principal axes that PCA gives in the second stage
better reflect those of the between-cluster matrix of the first-stage
result, GTLDA Sb GLDA , and they may in turn discriminate the clusters
clearly in the 2-dimensional space. In this sense, LDA followed by
PCA achieves a clear cluster structure as well as a good approximation of the first-stage result.
5.3 OCM followed by PCA
In this method, OCM is applied in the first stage, and trace(Sb ) is
preserved in the l-dimensional space where l = k. In the second
stage, PCA is applied in order to obtain the best approximation of
the l-dimensional first-stage results in terms of Frobenius/Euclidean
norm.
As in Section 5.2, the second-stage dimension reducing matrix
H is obtained by solving
Ht = arg
max
H∈Rl×2 ,H T H=I
trace(H T (GTOCM St GOCM )H),
(30)
where the solution is the two leading eigenvectors of the total scatter
matrix of the first-stage result, GTOCM St GOCM .
From Eq. (5), we have
GTOCM St GOCM = GTOCM (Sb + Sw )GOCM .
(31)
with their corresponding eigenvalues matched as well, i.e., λiY = λi .
Hence, the solution of Eq. (32) can be written as
Hb = uY1 uY2 = GTOCM u1 u2 .
(33)
Using Eq. (33) and the relationship shown in Eq. (25), the singlestage dimension reducing transformation Vb can be built as
T u1
T
T T
Vb = Hb GOCM =
GOCM GTOCM
uT2
T u1
=
(34)
uT2
=
arg max Jb (V )
=
arg max trace(V T SbV ).
V ∈Rm×2
V ∈Rm×2
(35)
Eq. (34) holds since the eigenvectors of Sb , u1 and u2 , are in the
range space of GOCM . The criterion of Eq. (35) has been used in
one of the successful visual analytic systems, IN-SPIRE, for 2D
representation of document data [17].
The discussed two-stage methods are summarized in Table 2.
6 E XPERIMENTS
In this section, we present visualization results using the proposed
methods for several data sets, especially focusing on undersampled text data visualization where the data item is represented in
m-dimensional space and the number of the data items n is less
than m (m > n).
Table 3: Description of data sets.
Original dimension, m
Number of data items, n
Number of clusters, k
GAUSSIAN
1100
1000
10
6.1 Regularization on LDA for undersampled data
In undersampled cases, the LDA criterion shown in Eq. (13) cannot be applied directly because Sw is singular. In order to overcome this singularity problem, Howland et al. proposed a universal algorithmic framework of LDA using the generalized singular
value decomposition (LDA/GSVD) [8, 7]. Specifically, for the case
when m ≫ n ≫ k, which is usual for most undersampled problems, LDA/GSVD gives the solution for G such that GT Sw G = 0
while maintaining the maximum value of trace(GT Sb G). This solution makes sense since LDA criterion is formulated to minimize
trace(GT Sw G). However, it means that all of the data points belonging to a specific cluster are represented as a single point in the
reduced dimensional space, which lessens the generalization ability for classification as well as for visualizing the individual data
relationship within each cluster.
On the contrary, the fact that LDA makes GT Sw G = 0 can be
viewed as an advantage for visualization purposes since LDA has
the capability to fully minimize trace(GT Sw G). By means of regularization on Sw one can control trace(GT Sw G), which determines
the scatter of the data points within each cluster. In regularized LDA
which was originally proposed to avoid the singularity of Sw in classification context, Sw is replaced by a nonsingular matrix Sw + γ I
where I is an identity matrix, and γ is a control parameter. In general, as γ is increased, the within-cluster distance, trace(GT Sw G),
also becomes larger with data points being more scattered around
their corresponding centroids. As γ is decreased, the within-cluster
distance becomes smaller, and the data points gather closer around
their centroids. Such manipulation of γ can be exploited in a visualization context because one can choose an appropriate value of
γ so that the second-stage method such as PCA, which maximizes
trace(GT St G) = trace(GT Sb G + GT Sw G), does not focus too much
on trace(GT Sw G). The results that follow are based on such choices
of γ .
6.2 Data Sets
The data sets tested are composed of one artificially-generated
Gaussian-mixture dataset (GAUSSIAN) and three real-world text
data sets (MEDLINE, NEWSGROUPS, and REUTERS) that are
clustered based on their topics. All the text documents are encoded
as term-document matrices where each dimension corresponds to a
particular word, and the value of a certain dimension represents the
frequency of the corresponding word shown in the document. Each
data set is set to have an equal number of data per cluster, and have
a mean of zero which is attained by subtracting the global mean.
(See Section 6.3.)
The descriptions of data sets, which are also summarized in Table 3, are as follows.
The GAUSSIAN data set is a randomly generated Gaussian mixture with 10 clusters. Each cluster is made up of 100 data vectors,
which add up to 1000 in total, and the dimension is set to 1100,
which is slightly more than the number of the data items. In its visualization shown in Fig. (1), the clusters are labeled using letters
as
• ’a’, ’b’, . . . , and ’j’.
The MEDLINE data set is a document corpus related to medical
science from the National Institutes of Health1 . The original dimension is 22095, and the number of clusters is 5, where each clus1 http://www.cc.gatech.edu/˜hpark/data.html
MEDLINE
22095
500
5
NEWSGROUPS
16702
770
11
REUTERS
3907
800
10
ter has 100 documents. The cluster labels that correspond to the
document topics are shown as
• heart attack (’h’), colon cancer (’c’), diabetes (’d’), oral cancer
(’o’), and tooth decay (’t’),
where the letters in parentheses are used in the visualization shown
in Fig. (2).
The NEWSGROUPS data set [1] is a collection of newsgroup
documents, and originally composed of 20 topics. However, we
have chosen 11 topics for visualization, and each cluster is set to
have 70 documents. The original dimension is 16702, and the cluster labels are shown as
• comp.sys.ibm.pc.hardware (’p’), comp.sys.mac.hardware (’a’),
misc.forsale (’f’), rec.sport.baseball (’b), sci crypt (’y’),
sci.electronics (’e’), sci.med (’d’), soc.religion.christian (’c’),
talk.politics.guns (’g’), talk.politics.misc (’p’), and talk.religion.misc
(’r’),
where the letters in parentheses are used in the visualization shown
in Fig. (3).
The REUTERS data set [1] is the document collection that appeared in the Reuters newswire in 1987, and originally composed
of hundreds of topics. Among them, 10 topics related to economic
subjects are chosen for visualization, and each cluster has 80 documents. The original dimension is 3907, and the cluster labels are
shown as
• earn (’e’), acquisitions (’a’), money-fx (’m’), grain (’g’), crude (’r’),
trade (’t’), interest (’i’), ship (’s’), wheat (’w’), and corn (’c’),
where the letters in parentheses are used in the visualization shown
in Fig. (4).
6.3 Effects of Data Centering
Fig. 5 is the example of applying OCM+PCA to the MEDLINE
data set with and without data centering. Once the MEDLINE data
set is encoded as a term-document matrix, every component has a
non-negative value, which results in the global centroid that is significantly far from the origin. Then performing PCA without data
centering might give the first principal axis as the one reflecting the
global centroid rather than that discriminating clusters. If we consider projecting the data onto each of the horizontal and the vertical
axes in Fig. 5, the former, which corresponds to the first principal axis, does not help in showing the cluster structure clearly, and
only the vertical axis, which corresponds to the second principal
axis from PCA, discriminates clusters. We have found that such
undesirable behavior is common in many cases without data centering, which is why we assume that data is centered throughout this
paper. Accordingly, all the results shown in Figs 1-4 are obtained
after data centering.
6.4 Comparison of Visualization Results
The results of four two-stage methods for the tested data sets are
shown in Figs.1-42 .
In all cases, LDA-based methods show cluster structures more
clearly than OCM-based methods. This proves the effectiveness
of LDA that considers both within- and between-cluster measures
while OCM only takes into account the latter. Due to this difference, OCM generally produces a widely-scattered data representation within each cluster. As a result, in the NEWSGROUPS dataset,
such a wide within-cluster variance significantly deteriorates the
2 Those
figures can be arbitrarily magnified without losing the resolution
in the electronic version of this paper.
cluster structure visualization even if OCM still attempts to maximize the between-cluster distance.
In the MEDLINE and the REUTERS data sets, all of the four
methods produce relatively similar results. However, we have
controlled the within-cluster variance in LDA-based methods using the regularization term γ I. In addition, the fact that rank-2
LDA and LDA+PCA produce almost identical results indicates that
GTLDA St GLDA is dominated by GTLDA Sb GLDA after LDA is applied in
the first stage as we expected.
Rank-2 LDA represents each cluster most compactly by minimizing the within-cluster radii both in the first and the second stage.
However, it may reduce the between-cluster distances as well because Jb/w maximizes the conceptual ratio of two scatter measures.
As can be seen in the two LDA-based methods applied to the NEWGROUPS data set, while rank-2 LDA minimizes the within-cluster
radii, it also places the centroids closer to each other as compared
to those in LDA+PCA. Due to this effect, which one is preferable
between rank-2 LDA and LDA+PCA depends on the data set to be
visualized.
Overall, OCM+PCA and Rank-2 PCA on Sb show similar results. It means GT Sb G ≃ GT St G in that the difference between two
methods lies in whether PCA is applied to GT Sb G or GT St G in the
second stage. Since performing PCA on GT Sb G is computationally more efficient than PCA on GT St G, Rank-2 PCA on Sb can be
a good alternative to OCM+PCA in case efficient computation is
important.
Finally, these visualization results reveal the interesting cluster relationships underlying in the data. In Fig. (2), the clusters
for colon cancer (’c’) and oral cancer (’o’) are shown close to
each other. In Fig. (3), the clusters of soc.religion.christian (’c’)
and talk.religion.misc (’r’), those of comp.sys.ibm.pc.hardware
(’p’) and comp.sys.mac.hardware (’a’), and those of sci.crypt (’y’)
and sci.med (’d’) are closely located respectively in LDA-based
methods. In addition, the two clusters, misc.forsale (’f’) and
rec.sport.baseball (’b’), are shown to be the most distinctive, which
makes sense because those topics are quite irrelevant to the others.
In Fig. (4), the clusters of grain (’g’), wheat (’w’), and corn (’c’)
as well as those of money-fx (’m’) and interest (’i’) are visualized
very close.
7
C ONCLUSION
AND
F UTURE W ORK
According to our results, LDA-based methods are shown to be
superior to OCM-based methods since both within- and betweencluster relationships are taken into account in LDA. Especially,
combined with PCA in the second stage, LDA+PCA achieves a
clear discrimination between clusters as well as the best approximation of the results of LDA when the distance between data is
measured in terms of Frobenius/Euclidean norm.
However, many classes except for few of them that are clearly
unrelated tend to be overlapped especially when dealing with large
numbers of data points and clusters. This is inherently due to the
nature of the second-stage dimension reduction in which only the
two axes are chosen so that the classes which contribute most to
the second stage criteria can be well-discriminated. Such behavior
can exaggerate the distances between particular clusters, and more
elaboration towards new criteria that fits in visualization is required.
In the MEDLINE and the REUTERS datasets, visualization results
seem to have a tail-shape along specific directions. We often found
this phenomenon to occur in many other data sets. It is still unclear
as to what causes this and how it affects the visualization, e.g. characteristics of information loss in the second stage. Finally, in order
to determine how much loss of information is introduced by each
method, more rigorous analysis based on various quantitative measures such as pairwise between-cluster distance and within-cluster
radii should be conducted.
ACKNOWLEDGEMENTS
The work of these authors was supported in part by the National
Science Foundation grants CCF-0732318 and CCF-0808863. Any
opinions, findings and conclusions or recommendations expressed
in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
R EFERENCES
[1] A. Asuncion and D. Newman. UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences, 2007.
[2] T. F. Cox and M. A. A. Cox. Multidimensional Scaling. Chapman &
Hall, London, 1994.
[3] I. S. Dhillon, D. S. Modha, and W. S. Spangler. Class visualization of
high-dimensional data with applications. Computational Statistics &
Data Analysis, 41(1):59 – 90, 2002.
[4] K. Fukunaga. Introduction to Statistical Pattern Recognition, second
edition. Academic Press, Boston, 1990.
[5] G. H. Golub and C. F. van Loan. Matrix Computations, third edition.
Johns Hopkins University Press, Baltimore, 1996.
[6] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[7] P. Howland, M. Jeon, and H. Park. Structure preserving dimension reduction for clustered text data based on the generalized singular value
decomposition. SIAM Journal on Matrix Analysis and Applications,
25(1):165–179, 2003.
[8] P. Howland and H. Park. Generalizing discriminant analysis using
the generalized singular value decomposition. Pattern Analysis and
Machine Intelligence, IEEE Transactions on, 26(8):995–1006, Aug.
2004.
[9] A. Jain and R. Dubes. Algorithms for clustering data. Prentice-Hall,
Inc. Upper Saddle River, NJ, USA, 1988.
[10] M. Jeon, H. Park, and J. B. Rosen. Dimensional reduction based on
centroids and least squares for efficient processing of text data. In
Proceedings of the First SIAM International Workshop on Text Mining.
Chiago, IL, 2001.
[11] I. Jolliffe. Principal component analysis. Springer, 2002.
[12] T. Kohonen. Self-organizing maps. Springer, 2001.
[13] Y. Koren and L. Carmel. Visualization of labeled data using linear
transformations. In Information Visualization, 2003. INFOVIS 2003.
IEEE Symposium on, pages 23–30, Oct. 2003.
[14] S. T. Roweis and L. K. Saul. Nonlinear Dimensionality Reduction by
Locally Linear Embedding. Science, 290(5500):2323–2326, 2000.
[15] D. L. Swets and J. J. Weng. Using discriminant eigenfeatures for
image retrieval. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 18(8):831–836, 1996.
[16] J. B. Tenenbaum, V. d. Silva, and J. C. Langford. A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science,
290(5500):2319–2323, 2000.
[17] J. A. Wise. The ecological approach to text visualization. Journal
of the American Society for Information Science, 50(13):1224–1233,
1999.
[18] J. Ye and Q. Li. A two-stage linear discriminant analysis via qrdecomposition. Pattern Analysis and Machine Intelligence, IEEE
Transactions on, 27(6):929–941, June 2005.
[19] H. Yu and J. Yang. A direct LDA algorithm for high-dimensional data
with application to face recognition. Pattern Recognition, 34:2067–
2070, 2001.
[20] X. Zhang, C. Myers, and S. Kung. Cross-weighted fisher discriminant
analysis for visualization of dna microarray data. In Acoustics, Speech,
and Signal Processing, 2004. Proceedings. (ICASSP’04). IEEE International Conference on, volume 5, pages V–589–92 vol.5, May 2004.
[21] W. Zhao, R. Chellappa, and A. Krishnaswamy. Discriminant analysis
of principal components for face recognition. Automatic Face and
Gesture Recognition, IEEE International Conference on, 0:336, 1998.
Figure 1: Comparison of the two-stage methods in the GAUSS data set.
(a) Rank-2 LDA
(b) LDA+PCA
ee
e
eeeeee e
ee
eee e eeeeeee ee
ii
eeeeeeeeee
ee ee e
i
c cc c
eeeeee
e eee e
i i
e
e
eeee
e ee ee
ee ei ii i i i ii
c cc c c c c
e e
e
e
i
e
e
e
e
e
e
c
c
cc cccc c c
e ei eei ei i i iiiii iii ii ii i iii i i i
c
e
ei i ii i i i ii ii i i ii ii
c c cccccc cccccccc cc
ccc cccc cc c
ee
i iie ii i i i iii i ii ii i i i
ee
cc cccc
c cccccccc
i ii iiiiii i i
cc c cccccc c c
e
a
c
f
i
i
i
b bb f
c cc c c
h
b f ff ffii i
c c
cc
bb b bbbbb f
h
h
a
a aa
fffff if f
b b bb b ff fb
h h hh hh c c c chc cc b b bb
a aa a aaa aaaa aa
h
b b bbbbbbbfffbfbbbb
b
h h hh hhh hhh
f bb fb
f f f f ffff
a aaaa a a aa
c b bbbbbbbbbbbb
a a aaaaaa
h
b bbb fbbbbffbfbfffffff fffffffffff ff fff
a aa aaa
hh
b
hhhhhhhhhhhhh
aa
a aaa aaa
hh hh h h
b bbbbb bffbb
h hh
hh hhhhhh
a a aaa aaaa
h hh
c
a aaaaa a a
fbf f f ff f f ffff ff f ff f
h
h
h
b
a
hh hhh
h
b
a
b
b
a
b
h h
h
a aaa aa a aaa
b bbbbbf bb f f f fffff
hhhh hhhhhhhh hhh h
f
a aa
a
hh h hh
aa
a
bb
g
h h
a
g
h
0.08
e
0.04
0.02
0
g
g
g g gggggg g
g ggggg gg gg g
g
g gggg gggg g gg
g gg ggg g gg g
g g ggg
g gg g
g g gggggg
gg
gg g gg gg g gg
g gg g
g ggg
gg
g
g
g
ggg
g
g
g
−0.02
−0.04
d
dd
d
d
d d d
dddddddd
d d
ddddd
ddj d
d dd
d ddddddd
d dddddd
dddddddddjd d j
dd
dd
ddd
d
ddd
d
dd
jdd ddjj j
d
j
d d d djd d dd
dj j j j dd
j
d jj j j j jj jjjjj jd j j
j j jjd jj j jj
j j
j jjj j j jjj jj j
j
j j j jj jjjj jjjj j jj
j j j
j
j jj jj j j j
j jj j
j
−0.06
−0.06
−0.04
−0.02
0
0.06
0.04
h
h hhhhh h
h hh h
h hhhhhhhh hhh hh
hh h
h h
hhhhhhhhhh
hhhhh h hh
h h
h
h
hhh hh hhhhhh h
h hhh h hhh h
hh hhhh
h h
h hh
h
h
g
h
−0.02
g
j
g
−0.06
0.02
0.04
0.06
gg
g ggg g
g g
g gg gg gg
gggggg g gg g
gggg g
ggg g
gg
ggggg gg g
g g ggggggggg
g
g
g
gg
g
g gg gggg ggggggg g g
g g g g
g
g
ggg g g
g g
g
−0.04
−0.06
−0.04
i
i
i
i i ii
i i
i i
i
ii i
i
i ii ii i i i ii i
i i
ii
a
i
i ai
e i i i ii iiiei i
i
i aiai iaaii i ai iaa
ei
ie
a
iie
a ibi j
b
i
a
e
e
j a
aabaibi iaa j aa
f eeeiaeii fifibii fb
e ei e
b ia
feeefi eibai ba
abaa
ee
eee
i fa fj a
ee iefe
aebbibb
ae babaabb
fie
eabiiea
e e eee
j affj aafd aia
aaba
haa
f ebbbaebi ba
bbhab
jiba
fa
fjbeje
j aiadh f
a
f
a
a
b
ee e eeifaeef aejeeebfb
j
j
a
a
j
e
e
b
j
f
a
b
afb jj a
ebfj biajbffab
e eeef efjbebebef aab
fib
ee
jh
faabfjajfa
fa ebba
j j aj f d f
bee
ja
ef b
fjdfa
jbj jajbf bb
bbjjb
e
fd
ebfed
a
b
e f fe f ea
jbcf djjajh
fae bib
dbfbfbe
ed
bfj afffjfbjaajbfjb
bfb
jhj a
fe
jgd
jfh
i jjjbjd
eaafbj b
e gb fbd
fjbfjdbdabd
e e
jjhh
d d
d
fbbj ffdhfjbjfjd
gd
ee e f g
f e ced fbfdaeeb
jddhfh jj d
ajfdd
dd
fafdjhfdbabjjj d
jjd
c
f f f fdj behhdjfhh
dbf f hj h
g
hhfjjfjj jhchh
hj b
d e c d bd b
de
djb
hd
dbdghdhhgjdjhhbhdjhdhhhdfdhhhd dddhh g g g
ddd
hdchjhd
e f hh
djjhghhh
dhdd
ec fcf f f f de hfedfehhhd
fhh
fh
d g
c
d
dhjh
h ddd
h
j
h
h
h
h
f
f
d
c
hdj d hd h dgd
c c cc
h hhhd f dhh
gg g g g
gh hddjhdghhh d ggdh ggg g
c c
hdd dd hd
g
g
gdgg gg g
c
c
c
f h hhhd h d
hggh g gg gggg g g g g g
c
cc c e
g
gg
c
c
ggg g g
c
c
h
gd d
g
c
g ggg g
cc
c h
j
gg
gg gg gggg
c cc ccc c c cccc
h h
g
gg g
cc c c c
g g gg
g g
g
c
h
c cccc c c c
g g
c
c
c
c
c c
c ccc c
g
g g
g
c cccc c c
cc
g
g
c
g
cc c
gg g
c c c c cc c
c
g
g
c
c
c
100
50
e
i ii
be
e eee ee
e e eee eeeeee e
ii i i
i
b b eeefeeee
e ei i
eeee ee
e
eee
i i i ii i i i ii
beeee
c b b bb bbbbbbbb fbebe
i
eff ff f fi
ee
ebefebeee
ee
b eeefe
bbbbeb bbbebb
f f i i i ii i iii ii iiii iiii iii i iii iiii
ebffb
ef efe
ii
ebbee eeeefe
bbbbbbb
ffbb
e
bb bbbb
i
b bbbbbbbe
e f f fi f i i i iiiii iiiii i iiiiiiii ii i
b
b
f efef effe
b
b
b
b
f
f
b
f
b
e
f
f
b
b
e
f i
i
f
b b bbbb f e
bbbb b
ii
a
f f f f ffffffff f fifff i i ii
bb
a
b bbe bbbbf fffffeff ffeff fffffff fff ff f f i i
ai aaaaa aaa
bbb b b f f f
f ff ff
aaaaa
a aaaaaaa
a a a a aaaaaaaa aa
a
f
a a
a aa
a
a aaaa aaaaaaaa
a
aaa aaaaaaaaa
a
a
a
a
a
a
aaa aa a
d
ddd
a
a
a a aa a aaaa
ddd
d ddd d ddddd
d
ddddddddddddd
d ddd
dd j j j
ddd
ddddd
ddjdddd ddddj
d ddd
j
ddd d
dd
dddddjd
d
dj djdjd
d
d
j
j
d d dddjdd ddj j jj jj j
j jj jj jdj jjdjd
j
j
j jjjj j jjjjjjjjj jj j j j
j j
j jj
j j jjj j jj jjjj j j
j
j jj jjjj j j j j j j j
j
j
j
j j
0.02
(d) Rank-2 PCA on Sb
i
i
c
c
0
(c) OCM+PCA
c c
c
c c
c c cc c
c
c cc c
c cccc cccc cc
c
c cccc c cc c c
ccc cccccccccc
c c cc
cc cc cccc cccc ccc
cc c cc c c
cc
cc ccc cc
c
cc
cc c
e
b
−0.02
0
0.02
0.04
0
−50
c
c
−100
c
−150
0.06
c
100
i
i i
i
ai
ii
i ii i
i
ii a
j jj
i i
ii ii i i i aa ia a aaj j
iii i
i
i
i a aaa a
i a ii aiai a
i
i f ieii aiaiiaiaafaaiaifaiaaiaaj f aaajjjjaj aadjjjj dj
i
i ii ii f aiaaa
a
a
i
a
j
a
a
i
a
i
iaafbiaifj fjij djj jij ji j fj fjjjj j jfj j
ii f ejaaeaabafa
e
i
ia
jababi fajaf f j j j j j djjj d
fj iibii e
f ffi aia iaa
j
a
i
i
e
a
a
b
i
i
a
e
jjjj a
h
b
i
f
a
jdd
jb
ja
aiba
bjaabjejbjbe
i e e affeei f ibfb
g
e
jb
adjjdddjj ddfj hj d
a
fje
fefebfbfa
ib fbfad
ba fafjbecde
d
bba
jf j d
b
f
e
a
d dd
g
eaffb
bbfbbfe
b
bjfbfbb
djfddfjddjjgf h
fajffbj fdfbd
b
g
e f ef eeee
aebaebfba
ifid
dffb
ea
eebbe
efdbbdfjfd
bdgfddjjdj
bf b
fhe
j dbjbd
g g
e eif de
j d
ebeefbefbd
bde
h dd d d d gg
jhg
fbjdbffb
bb
g
ee
ibfbdeb
e e f f eeeeee
dbdj dd
jde
bjbbjd
fbdd
d
gg gg
da
fe
eef fe
j
gbebbfabde
ef e f ee
dhbgbdd
hdd h
e e
b de jbb
ee eeege fbb
ee ffbaf bd
df g
cdhdfdh hhdd gd g ggggg gg ggg g
ddb
bebf fd fbchd
e
h
b e
h
h ggg
c
d
b
g
g g g gg
gg
e
cf eddf e hbd f d dhgbghhdhhhgdh
gg g ggg gg g g g gg
gh g
h ggg
eh f d
e ed ce efe fe f ffb ddb hh
h h hh jgg
hh
h
h
g
g
g
g
g
g
g
h
e
h
g
e c e
edh hhhhh
hh h
h gg gg gg
gg gg
c
c ehfhedhd
h
fc
h h
hgdhdh hhh
hhhh h h g g g gg
hhh
ce c cc c c c
h
h
c c
gg
ehh hh hhhhh h
cc
c
c
h
g
h h hh
c ccc
g
h
h
c
h
c
c c c c cc
hh
cc
c cc cc c c c
h h
cccc c c
c cc ccc cc c ccc
ccc c c cc
c
c
c
c
c c cc c
c c c cc c c
c
c
c
c
i i
50
0
−50
−100
c
c
−150
c
i i
g
g g
g
g
c c
c
cc
c
i
i
a
c
−150
−100
−50
0
50
100
150
−100
−50
0
50
100
150
Figure 2: Comparison of the two-stage methods in the MEDLINE data set.
(a) Rank-2 LDA
(b) LDA+PCA
cc
0.06
0.04
0.02
0
(c) OCM+PCA
cc
c
cc
c c cc
c cccc c
c
cc
ccccccccccccc ccc c
ccccccccc c
cc
ccccccc
c ccccccccccccco
c
c ccccccccccc
c coc c c
o
o ooooooo o o
o o
oo
oooo
oooo
ooooooo
o oo o
ooooo
ooo
oo
ooo
o
o
oo
ooooo
oooooo
o
o
o
oooooooo
o o
o o o
o
o
0.04
d
dd
d
d
d d d
d
d
d d d
dd ddd ddddd
dddd d d
ddddddddd
dddd
dddd d
ddddd ddd
d
d
d
ddddddddddddd
d d d
dd d
d
d
d
0.02
0
d
hh
hhh h h h
h
hh h hhh
hhhh
hhhhhhh h
hhhhhhh
hh
hh
hhhhh
hhh
hh hh
hhh
h hhhh
hhhhh
hh
h hhh
h h
h h
hh
d
to
to
t
t tt
t t
t ttt t t t
t t ttttttttt tt
tt t t
t ttttttt tt ttt
tt ttttttt t t
t t tttttttttttt t tt
tt tt
t tt t
t
t
−0.04
−0.06
−0.05
h
h
h
h
hh h
h h
hhhhhhhhhhh h h
hhh hhhhh h
hhhh
hhh
hhhhhhhhhh h
h
hhh
hhhhh
hhhhhh
hhh
hhhh h hhh hh
hh h
hh
h
t t
−0.04
t
t ttt tt
t tt tt t
tt
t t tt tttttttttttt
tttt t tt
t t tttttttttttt t
ttt tt tt
t t tt tt
t tt tt ttttttt t t
t ttt
h
−0.06
−0.08
d
d
0.05
0
d
h
−0.05
−0.1
t
t
−0.15
0.05
0.1
−0.06
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
(d) Rank-2 PCA on Sb
c
d
c c
d
c cc
d
d
c
d
cc ccc
d d d d
cc cc
cc
o
d
ddddd d
ccc cco c ccc
c
d
c
d
c
c
c
c
d
d
d
ccocc c o
ccccccccccc
dddddd ddddd d
c o ccccccc cc
d
d
d
d
d
ccc co c cc o
o
dd
d dddd ddd
o ccccococc c
d dd d d
ddd
d
cc
c
d
c
ddddd
d
oco o
oooccocococ o cc
dd d
d d ddddddd d
ddd d d dd
coo oc o o
d
o cocococo
d
dd
ooo
oo
o oo
d
o oo
o oo o o
oo
oo
dd
o oooo
h
o o
o
ooo
oo
oo
oooo
ooo
h
oo oooo
o
d
d
oo h
hh h h h
h
h
o ooo oo h h h hh
h
hh hhhhhh hhh
hhh hhhhhh h h
o hhhh
d
h
h
o ot hh hhhho
hhh
hhh
hhh h h h
hh
t h hh
hh hh h
tt t t hhhhhh h h
t ot tttt tt
h
h
h
h
h
t
o
t
t t t tt t
hh
h
t tt t t t t t
tt
t t
t t ttt ttt t
t tt t
t tt ttttt t t
ttttttt tt tt
t tt t t t
t
t
tt t ttt t
tt
ttt
t t
t t t
t
c
c
c ccc
ccccc
ccc c
d
cc co cc o
c
cc
c c c cc
d
d
c c cc
ccc o c
ccc o
d
cccocccccccc c
cc
d
c ccco
cc c ccco
o
ddd d d d d
o
ccooc ccc
d
occoc oc
d
o ccc
ccco
c ooco
dd d d dd
o
c cooo
d ddd ddddd ddddddd d
o co
occc o occc
d
d
o
d
d
c
d
o cococoooo o o o
dd dd
ddd ddddd
ooo oo
ddd
dddd
dddd dddddddd d d d d
o
d
o
o
oooo o
d
o
ooo oo
ddd d d dd
o ooo
d
oo
oo
dd
o
dd
oo ooo
h
ooooo
o
o o
h
oo
dd
h h h
o ooo oo hhh h
d
h
d
h h
hh hh hh h
h
h
o o o hhhhh
h hhhhh hh
hhhhh
o
h
h
t
hhhhhhh
hh h h
t h hhhh hhhhhh hh h
hh
d
hh
t o ttt t t t hhhhhh
hh
hhh
t t t tt t
o
h
ttt
h
h
t
t
t
t t t tt t t
h
t tt ttt tt tt
t t
t t t
t t tt ttttt tt t
tt tt
t t
t
t tttt tt ttt t
t
ttt ttt t
tt
t t tt
t t t
t
0.15
d
0.1
0.05
0
−0.05
−0.1
−0.15
d
d
−0.2
−0.2
t
0
c
0.1
d
d
d
dd
dd d
d dd
d dddd ddddddd
d
dd dd d dd
dd d
dd
ddddd
dddddd
d
ddddddddd
d
dd
dd
d
dddddddddd
d
d
d d
−0.02
−0.02
c
0.15
c
cc c
cc
c ccccc c
c c c
cccccccccccc cccc cc
cccc cc
cc
ccccccccccccccccccco
cc ccccccccc
c c cc
o o
o o oo
oooooooo
oooooooooo
o
o
o ooooo o
ooooo
ooooooooooo
o o
oo
oooo
ooo o
oooooo
ooo
oooo o
o
o oo
o
0.06
t
t
−0.2
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
−0.15
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
Figure 3: Comparison of the two-stage methods in the NEWSGROUPS data set.
(a) Rank-2 LDA
0.06
0.04
0.02
0
−0.02
−0.04
−0.06
−0.08
−0.1
yy
y
yy
y
d
y
y
yydyyd yyd yyyd d
dy y
d y yyydd d
p
dddydy yyy
d dydy
d y dy yd
y dyy ddyd y
y
y
yy ddyydddyyyddy
p
p
d yy d
ddd
p p pp
r y y d pd dddd
pp
a
y
y
y
p rccpc r dyc
d
a p p p ap a pp
p a p
c
c
d dd dy dyy
y d dd
p
papppap ppap pa
a
r pc pdypd d y ddyd d
pppa a
pa
e paap
p aaaaa p
pdd cp
p
dy ee e a
a
p
c pccp
e a pa pafaap
r r crcccp p cypcrr r p
r
d
p
ppp apappaa p p p
d
a
a
e eae
p ppae p ap pa appaap
a p
c r c pcc c pccrp r c c rdpr c dpy g p p e e e e e
p
e ea
aap aaa p a ep
p
e
r
p
p
f
p
p
r
r
p
e
r
r
c
r
r
c
p
r
p
d
g
e
r
c
p
p
r
e aae pf afe
e
a a
rc ccppcccc
rr rg cp r pgpgp
p c prc
a a
eee
a pa
e e eee e
gr c ccrppcg
c crc
rgcrrp rcp
cccgpcpdpr cr c p ee
e
a
a
a
p
a
e
e
e
p
e
e
r
e
p
c
e
a
a
e
rrp
f
e eea
pr ppp ccg
cgp gg recrr c rr r
r gr rg
ef ee
a e a f f f f f ff ff
e
c c rgprgp pgpp gce d r p rg e e
e
a
rp
a f
cr gggrgggggr g gg p
ee e
p f e
p p
g cg r g crgg r g
e
gggggg g
fff
c
f f f f e
e
e
g g r rg
f f ff f
e e
gp
g gc gg r gg gg
f f
ff
e
gr g
g g gg b
f
f f
g
f f
g
g g
b
f
g
g
f f
g
g
b
b
b
b
b b
b
bbb
b b b bb
b
b bb b
b
bb
bb
b b
b bb
bb
b
b
bb
b b b b bb b
bb bb
b bb
bb b b
b
b b b b
b
bb
b
b b
−0.04
−0.02
0
0.02
0.04
(b) LDA+PCA
0.06
0.04
p
p
a
f
f
0.02
f
f
f
f f
f ff f ff
f
f f ff f
f f f f ff
f f
f
0
−0.02
−0.04
−0.06
−0.08
0.06
(c) OCM+PCA
0.08
−0.04
−0.02
0
0.02
0.04
0.06
(d) Rank-2 PCA on Sb
b
d
d
dd d d
d
d
d dd d
dd d d ddd d
dd
dd
d
p
d d
d
d
d
d
yy dd dddd
pp
d y d
d y d
yy ddd d
pa ap pp
p
y y
d dddd d a
dd dddd
r
p ap a ppppaa pa
y
cc ry y y ycdyyyy
p
appa p a p
a
y yd y
yy
a
c
c
dy
p a p a appa
papap a
r cc yy yyyyyyyy y yyy
d
p
p
a a pp
a
a
a
d
yy dy ryy
papppp
appp aa a
c
r pr yd y y
papaaaa
y e ee ap ap aa
p cc c y p yyr rr y
ppa a pap a pe ppp p
e
y
p
d
c
c
p
y
p
p
y
af
y e ee
yp cr rc dyrp
c cr c r c
ppp ap a
eepea ep p
pp
cppr y yr
y a
yeyp ye
p ppcrccr crrrp
a a
e pae a
aa p
rcr rp cpyc pr dc redpp e e
cc c rrccrp pccc
e
r rcr rp
ee ae ae apee efafa a p
a
cp r
f
p per e e r e yee
rrrprcc p pr rpr y
cr c cr prpcr crccpcrpcprpcycprcp
a
p
f
a
a
e
c c r rcp py p
f
e ee
cpp ccp c r e re e
a
e
e e e e e eeea
p p c r pp cp rc
ef
rc r p pc p er cr e
e
a
a
pprcpp r e pg e eeee f e
e
e
ff f f f
p
p
p
r
f
p
c
g
r
e
p
f
p
f f
ppr g rp c p
e e
f f
f
f
ef
e
f
pg
f
ff
c c pg p r g grp
f
g
b
g
g
r
f
f
f
e
g gg g r
f
e
f f
gg
f
bg g p
f
f ff f f
f
f
f
f
g
g
f
f
f
g
f
f
g g g
ff f f
g bg g g g
ff
f
gg
g
f f
gggggg
f
f
ggg g
ff f
b
b
g g
f ff f
ggg g b
g
g
g
f
f
g
ggg g g g g
b
b b
b
g g gg
g
g g
bbb
b b b bb
b
b b
bb
g g
b
bb
b
bb b
bb
g
bbb
b
b
b gb
bb
gb
b
b b b bbbb
b b bb b b
bbb
b
b b b
b bb
b
b
b
b
g
dd d
b
0.2
0.1
0
−0.1
−0.2
−0.3
0.08
b
b
b
b
b g
g
b y b
b
c
b
ae
g bb b
ba e e
e b
b
b
a
g gbbg b
bb
b
g
a
b g
g bb b a
c
b
y
p
fe
r
y
b
g
f
yb yg e gc b grgr c y c g a c a g
ab
g
b
e ge gg p yr b
g ba g
e
ff f
b
cgy r
a
e
b r a
y pg gg g
e f f
ey dy
e
cy
b bby rbb
ycbg g p
ab
b e ep
e
e
yp
a be
y ba p p f bf
yy r
g
g pyr
f yf f e b af fc
bf be e
g
a b y p b b p p rb
yeygbr e rpg
r
cc pyy y b
f
y
b
e
c
p
e
a
p
g
e
p
y
e
a a
p p pe
a
ggc a abp g
facf
f ea f e fe b
e d
e
ge ep
c r g e gg
ppe f e p r rp b a fapfp f a
bag f
d grpc
raa g
perg
dp d
cf py
c
y ega y p
pe p cr ff dpe p f
yy r pcccagb
d a efr a f f a a
af
p g pp
y
cp
a
b
d e r
f y
r gp e c r gy p pc
p ppyaparg
f
a r yf e
ceep pype
p
rcr
f pf
bp ff e
pa p
cy
r c cg yceg ypy yebdg c ef rar ep
p
g
ppapcr p pr
d afp
cc
c rp yr
rc d
ay d d a d
pp e ed yp
e e f fe f
cf
r yr p
aa r p
pb
e
ycc
c
gp p prdd yy gya peae y
p
dppd
d
pr p y cy c g
a
r
p
p gr e a
d gr p rg pd f p
pr
ag d
p
d ped
a rpa r r
p ydgd ygp f dppp a gpr eppa d
rpy c y y d yc
p
p
a
f
p
p
c
p
r
c
py p
g p a
g d ecyp p y r p a p
f
r cppcc b pr r dg pcpd
p pd a a dy d dp f a
c cpd d
d d c
cpr c yd
r
r
y d a dpcdpr yedd
pp d
p
a
d
dd
p cd
e p e pp
p
p r p
r c
pd
y
c c d d pr
g
re d
c
r dc
d
f
f
p p
d
p
a
p
c d
c
f
r
d
d
dd
c
a
d
r dr p
dc
d
d
r
d
d
b b
b
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.2
g
b
g
b
b
b b
b
b b
g b
b
b g bg
bb
b
b
b
g
g eb
g eb bb
e
b bcgg
bg
ea
g
g
bb
g ge p y gb
pg b
g gggyr b b
f
be
f
p
c
ab
g
g y b bb r a
rb
fb
cgp r b g g
f
d gbb
a
b
y egp
g bbb
c c
y c
b
g
f fe
b
p
a
g
g
b
f f b
b
g
bp
ae a
cc
g b yacbrbf ry r
f
r yyc y be r
e f
p g
p p
a e
ey c gg p b rg
y ye ea g g ar pyab aaf f f a f b e bg f f b f
g
g
p
f ff f
e f
y
pr p cf
y
ar aar p e
epd
y grc g yyy p be pryyae gpcp p e
f
e
e
f
c e
e
e
b
f
p
e
p
y
c a
p ar b pe
a ff f f
r ccy
pegr gyg g
fy p fp f aa e
caeey y g p ppp rrp y g ep
pe f fa
b
b
f
p
r r c r r r r cc pc g f ep d rpeyp ggpa ap efe
d
y
r
p
p
p pa ff
a
cc p y ecy c gcyp c crc yy c eppg r ragf ped dpf e r bp f pb e
f cf ea p pp f a
f f
g
r
c
p
c
c
f
e
y
r
a
e
y
g
y
p
d
c
e
f
p
d
c
f
a
p
a
f
p f
e p g
rpp
p c p pcp yrpycg
cpde pyeddfappbydady acpa fpa appay dy pebppa
e y af e p
r cr y g
pf
e p prggc prg
p
ea a r ee ype rp pr g bar p
f
d pa y p ye
e
c r pry cp r
a
ppd
f
p
rec pere
p
pd p
r yr yc yp
p
g pdc
c
f
p
e
a
p
y
p
d
dg e
a
ppa ad d
pa
d
pf
a
r c cpb r d cp y pdy pcy cd dcy p p d p dyd
ara
f p
r
a
d dp
yy d rdy p d rf r
ap c
p
fp
r rp
a
y ea cy a d p
f
d pp ry
add cd ye
pd p ddd c
ygd
p
c
a
pd
c p aa e
c dc
c d
f
p d
c d
p ea
p
d r d fd
r
d
f
d
cr rp r e c
r
a
d
dd
c
a d f
d
p
d
d
d
d
dc
d
r
r
d
d
d
d
g
f
0.1
f
e
f f
f
a f f
f
f f
f
f
f
0
f
−0.1
p
f
−0.2
−0.3
d
0.4
0.5
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5
Figure 4: Comparison of the two-stage methods in the REUTERS data set.
(a) Rank-2 LDA
0.06
0.04
0.02
0
−0.02
−0.04
−0.06
−0.08
im
i im m
im
i i im
mm
iim
m
iim
m
i im
m
iiim
mim
im
m
im
ii
iim
m
im
ii im
iim
i m
imm
m
iiim
i im
iim
im
m
i im
m
m
m
m
im
m
iiim
iim
i m mtmm
imi m
iim
mi
i mm
mm
mi mi imm m tm
m
tm
t
m
i tm
m
t
m t t ttttt
t
t ttttt ttt tttt t tt t
t ttt t ttttt t
t t
ir t t t ttt ttttt ttttttt t t
t tt t t
r t
tr r
r
s
r
r
s
rr rrrr r r
r ss sssss sa
a
rrrrrrrr
rrr rara
s sssssssss s s
s
r rarrrrrrrrrrrrrr r rrs rsrs rsrsssssssssssssssssss
aa
s ssssssss
arrrrr arrra a
a
arar a
aa a
a ar a ss s ss s ss
raa
aaaaa
aaaa
a
e
aaaaaaa
aa
aaaaaaa
a
aaa
aaaaa a
a aa
aaea
a aa
e
e e e
e
a
e e e eee
e ee e
e
e ee e
ee eeeeeee
eeee
ee
e
ee
e
ee
eeee
eeee
eee
e
eee
e
e
eeeeeee e
ee e
e
eeee
(b) LDA+PCA
0.04
c
tw
s g
s
cg w
c
g
c gwc
gc
c
c w
g c
gc w
c
gw
ggcw
g gw
w
wc
w
c
w
gw
gww
w
g
w
gcc
gc
gw w
w
gc
g
c
cw
c
wc
cccc
cc
cww
w
gcw
g
g
cgww
gg
c
w
w
w w
c
w
g
cc w
g
w
g
ggw
w
cgg
gw
w
gcc
w
gcg
c
w
gg
gww
w
gw
w
gwgwcwgwgcwgcgc
ccgcc
cc
gg
c
w
w
gwc
cg
gc
c
g
w
g
gg ggcgw
gcw
w
g
w
c
ww
gc
w
g
ccccgc
g
w
sww
g g ww
g
w
s
c
c
im
m
i
i ii m
im
im
im
im
iim
ii
m
iiim
im
imm
iiim
m
iii m
m
iim
ii
i
m
im
im
iim
iiim
ii
m
im
m
m
im
ii m
ii m
im
m
i m
im
m
iim
mi
iimm
m
imimm
im
m
m
m mi m
m
tm
mm
mm
mm
m
m m
m
0.02
0
w
s
−0.02
−0.04
−0.06
−0.08
0.3
0.2
ttmm
c
m
tm
tm
t
tt t tttt
ir
ttt
t t tt ttttttttttttttt ttt t tt tt t t
t
t t tttt t t
ttt t ttttttt tt t
r
t
r rt
a
a r
aa
s
a aaaaaaarars r s
rrraarrr aarsrssssr s s s
g
aaa
ra
raa
rrara
aar rasssss s
ra
ra
a
rra
a
aa
a
arrra
sasaa
rrra
ra
s sa
a
a
a
e raaaaa
rsrs
s
srssssssss
rra
rssrss
ra
s
rrarrarassrs
s
a
aaarrea
rss
sssssssss
a ararrrrsrrrsrss
ss
e ee e
a
e
e eeee e
e eeeeee e
ee
ee eeee
eeeee
e
eee
ee
ee
ee
eee
ee
eee
eeee
e
e
e
eeeeee
e
ee
e
ee
c
w
wg
wcgg
gwg
w
w
g g
ww
ww
wg
gw
w
c
www
w
wc
g
cg
gc
c
cg
g
w
w
c
wc
cg c
g
w
wc
w
w
g
w
g
w
cw
ww
c
g
wcg
wccccg
g gcgcg
c
ccg
g
t g
cwgwcwgccgwgwgwcgccwccwggcwgwgcwgwgwgwwgwgwgwcw cwcwgwcgw wg
w
cg
w
gcccgcc
g
cg
ccccg
gcccg
ggg cgcg gg
g
w
cg
c cs
cccg
gg
c
cc
g gg c
s
g
w
c
c
c w
ww
w
wc
w
0.1
0
s
w
−0.1
−0.2
−0.3
e
e
−0.04
(c) OCM+PCA
i
0.06
−0.02
0
0.02
0.04
0.06
−0.04
−0.4
−0.02
0
0.02
0.04
0.06
0.08
(d) Rank-2 PCA on Sb
im
imim
im
im
mim
im
im
im
mim
im
m
iimim
im
im m
im
im
im
i im
i imim
imimi
ii imi im im
m
imi
iii iiim
i
w
c
w
c g
g
w
iim im
c
w g
w
wg
g
w
ww
c
i i iim
w c
c
s
ww
m imi
c
w
g
wg
wg
c
wg
wg
c g
g
wg
w g
c
c cgg
c
w
cg
iim
wc
imm
ww wg
c
g
cw
gg g
w
c
w
ccw
cg
im
g
wc w
gg
wg
im
cw
ttm m
wg
ww
wg
g
g
w
c cwcg
it m
wg
ww
c
wg
w
wg
cgw
cc
cg
g
mt tm
w ww
miiiimmm
gwcw
cc
m
w
g
wg
g
c
g
cccc
ww
m
w
cc
ag
ccgcg
cg
m t
ttm
m
tt m
wt
cg
c gg
c
gg
cgg
itm
m
m
g
c
w
w
t
g
c
im
c
t
m
m
w
w
g
c
c
ccc g
w
ttt
wcccg
g
gg
w
w
t ggg
w
w
m tti mt tt t tit tttt tt t
g
wwcc
c
c gtcgctw
g
cg
t t tt tm
im
w
c
g
c
ttrt t ttt t t t t g cg
w
g
t
t
m
c
s c
s
ww
gw
araaaair t tt t tt t ttt t g g g
erar st tt t tt c tr
c
tt
mr
a r rr e
arsa r srs
raaraara
w
rrsrsasas s r s s a s
rrra
raaaaa
a
s
rae
arsaasss ss
araa
a
rarsrra
a
g
a
e
rr
a
errr raa
rr r s
aasrsa
aarra
srssssss
raarrss
rr
rss
ss
aaaaa
s
arsa
rasarra
era
rrrasrsa
sssss sss
re
are
ssrra
e e aa
sss
ea
rss ss
ere
rre
rra
a
e
s
s
s
r
aee s
e
ee re ee
r e
e ee
eeee
e e
e
e
eeee
e
e eeee
e
eeeee
e
e ee
e
e
eeeee
e
e
eeee
ee
e
e
e
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.3
imim
im
im
im
m
im
im
im
im
mim
im
imim
im
im
im m
im im
im
im
i im
i im
w w
w
g
c
imim
i im im
g
w
w
gs
w
ii ii im
w
g
g w
c
g
im
w
gw
c
g w
w
cgw
cw w
gcw
g
c
i
i iiimm
g
c
w
cw
i iim i
g w
ww
c w
gww
w
c
w gw
w
g
g
w
gw
gg
im
g g w
w
c
gc c
c w
w
gg
i i im
w
c
gc
g
c
w
w
w
g cw
g cc
imimimi
gww
g cw
c
gg w
c
cww
g
w
w
w
g
c
i im
m
c
w
g
w
g
gccgwcwg cgwwgcc
imm
iimim
gc
w
gc
g
cw
g cccw w
i ii t mm
m
a
g
w
g
c
g
w
w
g
t
c
t
c
g gw
g
w
c
g gg w
w
g
w
m
mm
c
gw
gc
mm m
i im
gcc gc c
t c w
mm
gc
gccc
g gccctw
m
itm tmmic
gcw
c
t t gw
mi tim
tt m t
c
g
scccgw
g
c
gw
t ggggcgc s
atttttm
g
m
tt t ttt t t tg c c c
ttm
ataat ttt m
ttrt t t
t c
gt
ra
ri r t tttm
e
w
a rrreaa
cs
a ttrr tttttsts ttcttttt t
raaarsra m
s t tttt tr srta
s
asrt
ara
rraara
r rara
t
a
a
re
aaaaaa
a
a
s
s
arsaararrarrarrraasrasaat ssrrrs sg
err arse
rra
aa
srarrr
a
era
errr
sa
e er raaa
rssssr ss
rssssa
rsrsa
ars
ss
aas
rre
e
e
r ss s
rsa
rsrsss ss
ee raeee
erssa
arrre
ree
e
sr
e rerr reeesssssssssss s
s
e
e eeeeeee
eee
e
e eeee
e
eee
eeee
e eee
eeeee
e
eeee
ee
e
e
e
0.2
w
w
w c
w
c
0.1
0
−0.1
−0.2
−0.3
0.4
−0.3
Figure 5: Example of effects of data centering in the MEDLINE data set.
(a)OCM+PCA with data centering
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
c
d
c c
d
c cc
d
d
c
d
cc ccc
d d d d
cc cc
cc
c
d
ddddd d
ccc cc c cco
c
occccc cocc
d
c
c
d
d
d d d
cc c o
cc cccccc
dddddd dd dd d
c o cccccc cc
d
d
d
d
d
ccc co c cc o
o
o c cc c
d dddd dddddddd
d ddd d d
d
co
cc occc ooocococcc o
dddd
d d d d
o ooocc c
cc
dd d
dd d ddd ddd
o oc
ddd d d dd
d
o o
o cocococcooooo o
d
dd
o oo oo
d
o ooo o o
oo
oo
dd
oooooo
h
o
o
oo
ooooo
ooooo
h
oo oooo
ooo o h h h h
d d
hh
h
h
o ooo oo h h h hh
h
hh hhhhhh h
h
hh
h
h
h
h
h
h
o
h
h
h
d
h
h
hhh
h
hh
ho
h
o ot h hhhhh
hhh h h h h
hhhh
t h hh
hh hhh hh
tt t t hhhhhh
h h
t ot tttt tt
hhh
h
to t t t
hh
t tt t
h
t tt t t t t t
tt
t t
t t ttt ttt t
t tt t
t tt ttttt t t
tttttt tt tt
t tt tt t t
t
t
tt t ttt t
tt
t
t t t ttt
t
t
−0.2
−0.2
(b)OCM+PCA without data centering
c
c
0.2
d
c
c
c
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
−0.2
c
c
c
c
c c
c
c
c c c c cc
c c
c
c
c
c co c
c c c c cccc c cc
o
o
c c c o o cc
c coc c ccoc cc coc coocco
c
c coc
c
c o c ccco ccc o
c
c cc oc cco oc cc c oco o ooc
c
c
c
o
oo c o
o
o
o
c
oococo o
o
c c o
cc
o
o
o o
oc
oo o t
o
o
oo o ooc oo
o
o
o
o
o
o o
c
o
o o to oo
o
o
o
ooo oo oo oo
o o
t o oo to
to t
ot t t t o t t to
o o o
o
c
t t t tt ttt h t ot t tt o
tt
t
ctt t t t t
t h o tt
httt t t oth tttt th t tthtttth
ttt t t t
tt htttht tt thth
tt tt t t
to
t t h tt
t
h
h h h ht hh hh h h h h thh ht h
t tht h ht
h
t t t
h h hh h
h h hhhh hhhh
hh h hhthhhh hh hh h t h h
hhhh
h
hh h
hhhh h
h
h
d
h
t
h
d
h
hh h h
hh
d
h
h
d
h
d
d
dd d
h h
dd
h
d
ddd
d dd d d d
d
d dd d
d
dd
d
d
dd d d d
d
d d d
d
d dd
ddddd d dd ddddd d d dd
d
d d dd dd ddd
d
dd d d
d dd d
d d
d
d
d
dd d
d
d
d
d
d
d
d
d d
d
d
d
t
−0.15
−0.1
−0.05
h
d
0
0.05
0.1
0.15
0.2
0.25
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
−0.2
−0.1
0
0.1
0.2
0.3
w
w
w
w
c
c
0.4
Download