A Comparison of Clustering, Biclustering and Hierarchical

advertisement
1
Sejun Kim
Department of Electrical and Computer Engineering
Missouri University of Science and Technology, Rolla, MO 65409
A Comparison of Clustering, Biclustering and Hierarchical
Biclustering Algorithms
Abstract—Biclustering has proven to be a more powerful
method than conventional clustering algorithms for analyzing
high-dimensional data, such as gene microarray samples. It
involves finding a partition of the vectors and a subset of the
dimensions such that the correlations among the biclusters are
determined and automatically associated. Thus, it can be considered an unsupervised version of heteroassociative learning.
Biclustering ARTMAP (BARTMAP) is a recently introduced
algorithm that enables high-quality clustering by modifying
the ARTMAP structure, and it outperforms previous biclustering approaches. Hierarchical BARTMAP (HBARTMAP),
introduced here, offers a biclustering solution to problems in
which the degree of attribute-sample association varies. We
also have developed a hierarchical version of Iterative TwoWay Clustering for comparison purposes and have compared
these results with other methods, including various clustering
algorithms. Experimental results on multiple genetic datasets
reveal that HBARTMAP can offer in-depth interpretation of
microarrays, which other conventional biclustering or clustering
algorithms cannot achieve. Thus, this paper contributes two hierarchical extensions of biclustering or co-clustering algorithms
and comparatively analyzes their performance in the context
of microarray data.
Index Terms—Adaptive Resonance Theory (ART), ARTMAP,
Hierarchical clustering, Biclustering, BARTMAP
I. I NTRODUCTION
Clustering is a common data-mining technique used to
obtain information from raw data sets. However, major
challenges arise when large numbers of samples must be
analyzed, and these challenges escalate as techniques improve and the speed of data acquisition continues to increase,
especially regarding the ability to gather high-dimensional
data [1], such as gene expressions. The curse of dimensionality renders the conventional clustering of high-dimensional
data infeasible [2]–[5]. The two critical traits of bioinformatics data are noise and high dimensionality, both of which
diminish the robustness of clustering results [6]. Thus, biclustering was introduced to overcome computational obstacles
and provide higher quality analyses [7]–[13]. This approach
finds subsets of samples correlated to subsets of attributes.
Due to the simultaneous row and column decomposition of
the data matrix, biclustering, unlike clustering, can generate
various correlated segments within a matrix.
Sejun Kim is with the Applied Computational Intelligence Laboratory,
Department of Electrical & Computer Engineering, Missouri University of
Science & Technology, Rolla, MO 65409 (phone: 573-341-6811; fax:573341-4532; e-mail: skgcf@mst.edu).
D. C. Wunsch II is with the Department of Electrical & Computer
Engineering, Missouri University of Science & Technology, Rolla, MO
65409 (e-mail: dwunsch@mst.edu).
The amount of biological data being produced is increasing at a significant rate [14]–[16]. For instance, since the
publication of the H. influenzae genome [17], complete
sequences for over 40 organisms have been released, ranging
from 450 genes to over 100,000 genes. Given this data,
one can imagine the enormous quantity and variety of
information being generated in gene expression research. The
surge in data has resulted in the indispensability of computers
in biological research. Data sets, such as earth science data
and stock market measures, are collected at a rapid rate [18],
[19] as are microarray gene expression of bioinformatics. The
discovery of biclusters has allowed sets with coherent values
to be searched across a subset of transactions or examples.
An important example of the utility of biclustering is the
discovery of transcription modules from microarray data,
which denote groups of genes that show coherent activity
only across a subset of all conditions constituting the data set,
and may reveal important information about the regulatory
mechanisms operating in a cell [20].
Neural networks have played a major role in data mining and clustering [21]–[24]. Adaptive Resonance Theory
(ART) [25] is one of the most well-known neural networkbased clustering algorithms. The ARTMAP architecture is
a neural network for supervised learning composed of two
ART modules and an inter-ART module. Xu and Wunsch revised the ARTMAP architecture to develop Biclustering ARTMAP (BARTMAP) [26]. Biclustering through
BARTMAP is achieved by performing row-wise and columnwise Fuzzy ART clustering with the intervention of correlation calculations. The greatest advantage of ART is that
its structure, unlike other unsupervised clustering algorithms,
allows flexibility in the clustering process. This strength also
applies to BARTMAP, as the number of biclusters is adjusted
automatically.
This paper contains a discussion of Hierarchical
BARTMAP (HBARTMAP), which inherits the advantages
of BARTMAP. HBARTMAP also automatically generates a
BARTMAP tree with attention given to each cluster obtained
on every node, starting from the root BARTMAP node.
After generating the tree, this technique uses a correlation
comparison method to recursively calculate the measurement
of the row and column clusters from every terminal node and
eventually creating a full hierarchical biclustering classification. We will display these results as a heat map, illustrating
the relationship between data elements.
The remainder of the paper is organized as follows. Section
2 introduces ART and BARTMAP. Section 3 introduces
the HBARTMAP approach and Hierarchical Interrelated
2
Two-way Clustering (H-ITWC) for comparsion, followed
by Section 4, which includes the experimental setup, data
description and results. Finally, the conclusion is provided
in Section 5.
II. BACKGROUND
A. Fuzzy Adaptive Resonance Theory (ART) and ARTMAP
Fuzzy Adaptive Resonance Theory (ART) is a neural
network-based unsupervised learning method proposed by
Carpenter and Grossberg [25]. The framework is composed
of two layers of neurons, which include the feature representation field F1 , and the category representation field F2 .
The neurons in layer F1 are activated by the input pattern,
while the prototypes of the formed clusters are stored in
layer F2 . The neurons in layer F2 that already represent
input patterns are said to be committed. Correspondingly,
the uncommitted neuron encodes no input patterns. The two
layers are connected via adaptive weights wj , emanating
from node j in layer F2 , which are initially set as 1. Once an
input pattern A is registered, the neurons in layer F2 compete
by calculating the category function
Tj =
|A ∧ wj |
,
α + |wj |
(1)
where ∧ is the fuzzy AND operator defined by
(A ∧ w)i = min(Ai , wi ),
(2)
(3)
The winning neuron, J , then becomes activated, and an
expectation is reflected in layer F1 and compared with the
input pattern. The orienting subsystem with the pre-specified
vigilance parameter ρ(0 ≤ ρ ≤ 1) determines whether the
expectation and the input pattern are closely matched. If the
match meets the vigilance criterion,
|A ∧ wJ |
ρ≤
,
|A|
(4)
(5)
where β (0 ≤ β ≤ 1) is the learning rate parameter, and
β = 1 corresponds to fast learning. This procedure is called
resonance, which suggests the name of ART. On the other
hand, if the vigilance criterion is not met, a reset signal
is sent back to layer F2 to ignore the winning neuron. A
new competition will occur among the remaining neurons,
excluding the ignored neurons. This new expectation then
is projected into layer F1 , and this process repeats until
(6)
where y b is the binary output vector of field F2 in
t
ARTb and y b
b
i = 1 only if the i h category wins in ART .
Similar to the vigilance mechanism in ARTa , the map field
also performs a vigilance test such that a match tracking
procedure is activated if
| xab|
,
|yb |
(7)
where ρab (0 ≤ ρab ≤ 1) is the map field vigilance
parameter. In this case, the ARTa vigilance parameter ρa
is increased from its baseline vigilance to a value just above
the current match value. This procedure ensures the shut-off
of the current winning neuron in ARTa , whose prediction
does not comply with the label represented in ARTb . Another
ARTa neuron then will be selected, and the match tracking
mechanism again will verify its appropriateness. If no such
neuron exists, a new ARTa category is created. Once the
map field vigilance test criterion is satisfied, the weight wJab
of the neuron J in ARTa is updated using the following
learning rule:
ab
wJab(new) = γ(y b ∧ wab
J (old)) + (1 − γ)wJ (old),
weight adaptation occurs, where learning begins and the
weights are updated using the following learning rule,
wJ (new) = β(x ∧ wJ (old)) + (1 − β)wJ (old),
xab = yb ∧ wjab ,
ρab >
and α > 0 is the choice parameter that breaks the tie when
more than one prototype vector is a fuzzy subset of the input
pattern, based on the winner-take-all rule,
TJ = max{Tj |∀j}.
the vigilance criterion is met. If an uncommitted neuron is
selected for coding, a new uncommitted neuron is created
to represent a potential new cluster, thus maintaining a
consistent supply of uncommitted neurons.
By incorporating two ART modules, which receive input
patterns (ARTa ) and corresponding labels (ARTb ), respectively, with an inter-ART module, the resulting ARTMAP
system can be used for supervised classifications [27]. The
vigilance parameter of ARTb is set to 1, which causes each
label to be represented as a specific cluster. The information regarding the input-output associations is stored in the
weights w ab of the inter-ART module. The j th row of the
weights of the inter-ART module wjab denotes the weight
vector from the jth neuron in ARTa to the map field. When
the map field is activated, the output vector of the map field
is
(8)
where γ(0 ≤ γ ≤ 1) is the learning rate parameter. Note
that with fast learning (γ = 1), once neuron J learns to
predict the ARTb category I , the association is permanent,
i.e., wJabI = 1 for all input pattern presentations.
In a test phase in which only an input pattern is provided to
ARTa without the corresponding label to ARTb , no match
tracking occurs. The class prediction is obtained from the
map field weights of the winning ARTa neuron. However,
if the neuron is uncommitted, the input pattern cannot be
classified solely based on prior experience.
B. Biclustering ARTMAP (BARTMAP)
The BARTMAP architecture is derived from Fuzzy
ARTMAP, which also consists of two Fuzzy ART modules
3
where
rk,jl =
Ni
t=1 (esk git
Ni
t=1 (esk git
− es kG i)(esj l git − esj lG i )
− es k Gi )2
Ni
t=1 (esjl git
,
− esjl Gi )2
(10)
and
es k Gi =
esjl Gi =
Fig. 1. Structure of BARTMAP. Gene clusters first form in the ARTb
module, and sample clusters form in the ARTa module with the requirement
that members of the same cluster behave similarly across at least one of
the formed gene clusters. The match tracking mechanism will increase the
vigilance parameter of the ARTa module if this condition is not met.
communicating through the inter-ART module, as shown in
Fig. 1. However, the inputs to the ARTb module are attributes
(rows) instead of labels. Obviously, the inputs to the ARTa
module are samples (columns), although the inputs to the
modules can be exchanged, depending on the focus of the
biclustering procedure. The objective of BARTMAP is to
combine the clustering results of the attributes and samples
of the data matrix from each ARTa and ARTb module to
create biclusters that project the correlations of attributes and
samples. Thus, BARTMAP can be categorized as two-way
clustering.
The first step of BARTMAP is to create a set of Kg
gene clusters Gi , i = 1, · · · , Kg , for N genes by using
the ARTb module, which behaves like standard Fuzzy ART.
The goal of the following step is to create Ks sample
clusters Sj , j = 1, · · · , Ks , for M samples within the
ARTa module while calculating the correlations between the
attribute and sample clusters. When a new data sample is
registered to the ARTa module, the candidate sample cluster
that is eligible to represent this sample is determined based
on the winner-take-all rule using the standard Fuzzy ART
vigilance test. If this candidate cluster corresponds to an
uncommitted neuron, learning will occur to create a new
one-element sample cluster that represents this sample, as
in Fuzzy ART. Otherwise, before updating the weights of
the winning neuron, it will check whether the following
condition is satisfied: A sample is absorbed into an existing
sample cluster if and only if it displays behavior or patterns
similar to the other members in the cluster across at least
one gene cluster formed in the ARTb module.
The similarity between the new sample sk and the sample
cluster Sj = {sj1 , · · · , sjMj } with Mj samples across a gene
cluster Gj = {gi1 , · · · , giNi } with Ni genes is calculated
as the average Pearson correlation coefficient between the
sample and all the samples in the cluster,
Mj
rkj
1
=
Mj
rk,jl ,
l=1
(9)
1
Ni
1
Ni
Ni
eSk git ,
(11)
eSjl git .
(12)
t=1
Ni
t=1
The sample sk is enclosed in cluster Sj only when rkj is
above some threshold η; learning will occur following the
Fuzzy ART updating rule.
If the sample does not show any behaviors similar to those
of the sample cluster that the winning neuron represents for
any clusters of genes, the match tracking mechanism will
increase the ARTa vigilance parameter ρa from its baseline
vigilance to just above the current match value to disable the
current winning neuron in ARTa . This shut-off will force the
sample to be included into some other cluster or will create
a new cluster for the sample if no existing sample cluster
matches it well.
III. H IERARCHICAL BARTMAP
Fig. 2. Main idea of hierarchical biclustering. Within a subset, the
biclustering procedure is reiterated to obtain fewer results. In HBARTMAP,
increasing the vigilances of the ARTa and ARTb modules as well as the
correlation threshold by a preset interval enables diversification.
A. Method
The basic idea of Hierarchical BARTMAP (HBARTMAP)
is to reiterate BARTMAP within the obtained BARTMAP
results in order to obtain sub-biclusters, as shown in Fig. 2.
Such subdivision provides insight into reinterpreting the generated biclusters by conjugating or disbanding sub-biclusters
of the initial results. The overall procedure is as follows:
In this algorithm, vi a, vi b and corthi are the increased
intervals of the vigilance of ARTa and ARTb and the
correlation threshold of the inter-ART module, respectively.
BARTMAP does not have the ability to evaluate and pair the
attribute and sample biclusters. Thus, a scatter search [28] is
4
Algorithm 1 Pseudo Code of Overall HBARTMAP Algorithm
Initialize BARTMAP and Load data
Run BARTMAP (whole data set)
Bicluster Evaluation
for i = 1 to Number of Sample Biclusters do
Run ChildBARTMAP (SampleBicluster[i], vi a, vi b,
corthi)
end for
applied to calculate the regulations of each bicluster pairs.
The correlation coefficient between two variables X and Y
measures the grade of linear dependency between them and
is defined by,
ρ(X, Y ) =
cov(X, Y )
=
σX σY
n
i
(xi − x)(yi − y)
,
nσX σY
(13)
where cov(X, Y ) is the covariance of the variables X and
Y ; x and y are the mean of the values of the variables X
and Y ; and σX and σY are the standard deviations of X and
Y , respectively.
Given a bicluster B composed of N samples and M
attributes, B = [g1 , · · · , gN ], the average correlation of B,
ρ(B), is defined as
ρ(B) =
1
N −1
N
ρ(gi , gj )
N
(2)
(14)
i=1 j=i+1
where ρ(gi , gj ) is the correlation coefficient between samN
ples i and j. Because ρ(gi , gj ) = ρ(gj , gi ), only ( 2 ) elements
have been considered.
With the calculated average correlation of bicluster B, the
scatter search fitness function is applied, which is defined by
f (B) = (1 − ρ(B)) + σρ +
1
1
+
,
N
M
Algorithm 2 Pseudo Code of ChildBARTMAP Function
Initialize BARTMAP and Load data (SampleBicluster[i])
Adjust Variables(vi a, vi b, corthi)
Run BARTMAP
Bicluster Evaluation
if Number of Sample Biclusters == 1 then
Return
end if
for i = 1 to Number of Sample Biclusters do
if Number of Attributes(SampleBicluster[i])≥3 and
Number of Samples(SampleBicluster[i])≥3 then
Run ChildBARTMAP(SampleBicluster[i], vi a, vi b,
corthi)
end if
end for
Return
(15)
where σρ is the standrard deviation of the values ρ(gi , gj ).
The standard deviation is included in order to avoid the value
of the average correlation being high. The best biclusters are
those with the lowest fitness function values.
During the bicluster evaluation process, once the fitness
of every bicluster of a sample group is calculated, the
most highly correlated attributes begin to be sorted out in
accordance with a preset threshold. If the fitness of an
attribute is smaller than the fitness threshold, it is selected.
Once the attribute scan is complete, the process advances to
the next sample group and progresses through the selection
step again. However, to avoid previously-selected attributes
overlapping in different sample groups, they are excluded
from the search.
The ChildBARTMAP function shown in Algorithm III-A
is a recursive function used to generate a tree of BARTMAP
modules that solely compute the subset.
Biclustering with less than three samples or attributes is
revealed to be meaningless, so the recursion process will stop
under such conditions.
B. Hierarchical Interrelated Two-way Clustering (H-ITWC)
In order to compare and contrast HBARTMAP with other
biclustering algorithms, the hierarchical version of interrelated two-way clustering (ITWC) [29] also was programmed
using the same HBARTMAP method.The ITWC technique
is a clustering method used widely in cases in which information is spread out over a large body of experimental
data [30], [31]. It was developed to achieve clustering in
high-dimensional data spaces. ITWC takes an approach similar to that of BARTMAP. First, ITWC performs clustering
on the attribute side, which is the larger dimension, in order
to reduce it to a reasonable level. Then, scores such as
correlation coefficients [32]–[34] are applied in patterns in
order to sort out important samples.
ITWC involves five main steps:
• Step 1: Clustering in the gene dimension. In this step,
the data are clustered gene-wise into k groups using any
clustering method, such as K-means or self-organizing
maps (SOMs).
• Step 2: Clustering in the sample dimension. The samples
are clustered into two groups, - Si,a and Si,b ,- based on
each gene cluster i.
• Step 3: Combining the clustering results. The results
from steps 1 and 2 are combined into 2k groups. If k =
2, then the samples can be divided into the following
four groups:
- C1 (all samples clustered into S1,a ,a based on G1
and into S2,a ,a based on G2 );
- C2 (all samples clustered into S1,a ,a based on G1
and into S2,b ,a based on G2 );
- C3 (all samples clustered into S1,b ,a based on G1
and into S2,a ,a based on G2 );
- C4 (all samples clustered into S1,b ,a based on G1
and into S2,b ,a based on G2 );
5
•
•
Step 4: Finding heterogeneous groups. Among the sample groups Ci , two distinct groups are selected that
satisfy the following condition: ∀u ∈ Cs , ∀v ∈ Ct ,
where u and v are samples, and s and t are the two
selected groups, respectively. If u ∈ Si,r1 , v ∈ Si,r2 ,
then r1 j= r2 (r1 , r2 ∈ {a, b}) for all i(1 ≤ i ≤ k). The
group (Cs , Ct ) is called a heterogeneous group.
Step 5: Sorting and reducing. For each heterogeneous
group, two patterns are introduced. The vector-cosine
defined in Eq. 16 is calculated for each pattern, and
then all genes are sorted according to the similarity
values in descending order. Thefirst one-third of the
sorted gene sequence is kept, and the other two-thirds
of the sequence is removed.
Correspondingly, the number of pairs of samples for the
four cases are denoted as a, b, c, and d, respectively. The
total number of pairs of samples is M (M − 1)/2, denoted
as L; therefore, a + b + c + d = L. The Rand index then can
be defined as follows, with larger values indicating greater
similarity between C and P:
(a + d)
(17)
L
In order to correct the Rand index for randomness, it
should be normalized so that its value is 0 when two
partitions are randomly selected and 1 when two partitions
are perfectly matched,
R=
AdjR =
m
j=1
gl , E
cos(θ) =
gl · E
=
m
j=1
wi,j
w 2i,j
m
j=1
(16)
e2j
Steps 1 through 5 are reiterated until the terminal conditions are satisfied.
The H-ITWC algorithm is composed of the same structure
as HBARTMAP. However, the individual child nodes are
applied with the ITWC algorithm instead of BARTMAP.
IV. E XPERIMENTAL R ESULTS
A. Setup
The leukemia data set [35] consists of 72 samples, including bone marrow samples, peripheral blood samples
and childhood AML cases. Twenty-five of these samples
are acute myeloid leukemia (AML), and 47 are acute lymphoblastic leukemia (ALL), which is composed of two subcategories due to the influences of T-cells and B-cells. The
expression levels for 7,129 genes were measured across all
of the samples by high-density oligonucleotide microarrays.
The raw data were preprocessed through linear transform to
fit the HBARTMAP requirement of an [0, 1] interval. Similar
preprocessing was performed for H-ITWC.
The result of HBARTMAP is evaluated by comparing the
resulting clusters with the real structures in terms of external
criteria. Both the Rand index and the adjusted Rand index
[36] are applied. We assume that P is a pre-specified partition
of dataset X with N data objects, which also is independent
from a clustering structure C resulting from the use of the
BARTMAP algorithm. Therefore, a pair of data objects xi
and xj , will yield four different cases based on how xi and
xj are placed in C and P.
• Case 1 xi and xj belong to the same cluster of C and
the same category of P.
• Case 2 xi and xj belong to the same cluster of C and
different categories of P.
• Case 3 xi and xj belong to different clusters of C and
the same category of P.
• Case 4 xi and xj belong to different clusters of C and
different categories of P.
R − E(R)
,
max(R) − E(R)
(18)
where E(R) is the expected value of R under the baseline
distribution, and max(R) is the maximum value of R.
Specifically, the adjusted Rand index [37] assumes that the
model of randomness takes the form of the generalized
hypergeometric distribution, which is written as,
M
AdjR =
( 2 )(a + d) − ((a + b)(a + c) + (c + d)(b + d))
M
( 2 )2 − ((a + b)(a + c) + (c + d)(b + d))
.
(19)
The adjusted Rand index has demonstrated consistently
good performance in previous studies compared to other
indices.
For additional peformance comparison, a synthetic data
set developed by Handl and Knowles [38] also was used
for the simulation, which consists of 1286 samples, and 100
attributes.
B. Results
Fig. 3. HBARTMAP heatmap result on the leukemia data set. The child
biclusters are generated within the bicluster generated from the parent
BARTMAP node. Certain portions of the attributes (genes) are ignored based
on the scatter search evaluation.
Fig. 3 depicts the resulting HBARTMAP heat map on
the leukemia data set showing that bicluster subsets are
being generated within the biclusters. The main focus of
the leukemia data set simulation is to perform biclustering
with HBARTMAP and to judge how precisely the condition
that this approach computes matches external criteria. Fig. 4
6
depicts the Rand index and the adjusted Rand index result
of each layer, beginning with the biclustering result from the
root HBARTMAP node. Because the initial vigilance and
threshold parameters are set low, the result is rough. While
the leukemia data set has three conditions, the root node only
found two. As a result, deeper layers were evaluated, and the
growth of both the Rand index and the adjusted Rand index
is obvious. At layer 3, the Rand index was 0.9711, and the
adjusted Rand index was 0.8863, both of which values are
higher than the best BARTMAP result (0.7926).
R AND I NDEX
AND
TABLE I
A DJUSTED R AND I NDEX
Algorithm
HBARTMAP (2nd layer)
ds BARTMAP
H-ITWC
WITH
Rand Index
0.9576
0.9817
0.8779
S YNTHETIC D ATA
Adjusted Rand Index
0.9347
0.8944
0.6032
On the first iteration, HBARTMAP divides the data set
into two major clusters. Then, the algorithm performs
BARTMAP within each cluster to split it into two subclusters. HBARTMAP is terminated when the subclusters
cannot be not divided any further even with stricter variables.
A comparison of the evaluation results with BARTMAP
and H-ITWC is shown in Table I. HBARTMAP clearly
performs better than H-ITWC and performs slightly better
than BARTMAP based on the Adjusted Rand Index criterion.
Fig. 4. Rand index and Adjusted Rand Index calculation of each layer. The
biclusters generated on layer 3 produce the best result based on Rand index
and adjusted Rand index evaluation.
Fig. 5 compares the two indices using BARTMAP, Fuzzy
ART, K-means, hierarchical clustering with four different
linkages and various versions of ITWC. The performances
of the well-known ITWC and SOM also are presented. This
comparison indicates that the hierarchical version of ITWC
with K-means is also improved; however, the increased performance is not as significant as that offered by HBARTMAP.
Fig. 6. The result of HBARTMAP running the synthetic data. HBARTMAP
ends with 2 layers - 2 subclusters each under 2 root clusters.
V. C ONCLUSION
Fig. 5. Results from HBARTMAP and various clustering algorithms on
the leukemia data set in terms of Rand and adjusted Rand index. The
methods used for the comparison are BARTMAP (BAM), Fuzzy ART
(FA), K-means (KM), hierarchical clustering with single linkage (HC-S),
complete linkage (HC-C), average linkage (HC-A), Ward’s method (HCW) and interrelated two-way clustering with SOFM (ITWC-S), K-means
(ITWC-K) and hierarchical version with SOFM (H-ITWC-S).
Fig. 6 shows the clustering result of the synthetic data.
In this paper, we propose hierarchical BARTMAP, an
hierarchical approach for biclustering. The tasks of clustering
high dimensional data are achieved by incorporating scatter
search into biclustering algorithms.
The experimental results imply that HBARTMAP provides
better biclustering than BARTMAP. The sudden increase in
the adjusted Rand index while searching each layer indicates
that the advanced version of BARTMAP can be implemented
effectively in high-dimensional data analysis. It suggests
that utilizing the scatter search method on HBARTMAP
biclustering was the major factor of successful experiments.
ACKNOWLEDGEMENT
Partial support of this research from the National Science
Foundation (Grants 1102159 and 1238097), the Mary K. Fin-
7
ley Missouri Endowment, and the Missouri S&T Intelligent
Systems Center is gratefully acknowledged.
R EFERENCES
[1] T. Havens, J. Bezdek, C. Leckie, L. Hall, and M. Palaniswami,
“Fuzzy c-means algorithms for very large data,” Fuzzy Systems, IEEE
Transactions on, vol. 20, pp. 1130 –1146, dec. 2012.
[2] R. E. Bellman, Dynamic Programming. Courier Dover Publications,
1957.
[3] R. Xu and D. C. Wunsch II, Clustering. IEEE Press Series on
Computational Intelligence, John Wiley & Sons, 2008.
[4] J. A. Hartigan, “Direct clustering of a data matrix,” Journal of the
American Statistical Association, vol. 67, no. 337, pp. 123–129, 1972.
[5] R. Xu, D. Wunsch, et al., “Survey of clustering algorithms,” Neural
Networks, IEEE Transactions on, vol. 16, no. 3, pp. 645–678, 2005.
[6] S. Kesh and W. Raghupathi, “Critical issues in bioinformatics and
computing,” Perspect Health Inf Manag, vol. 1, p. 9, 2004.
[7] Y. Cheng and G. M. Church, “Biclustering of expression data,” in
Proceedings of the Eighth International Conference on Intelligent
Systems for Molecular Biology, pp. 93–103, AAAI Press, 2000.
[8] S. Busygin, O. Prokopyev, and P. M. Pardalos, “Biclustering in
data mining,” Computers and Operations Research, vol. 35, no. 9,
pp. 2964–2987, 2008.
[9] S. C. Madeira and A. L. Oliveira, “Biclustering algorithms for biological data analysis: A survey,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 1, no. 1, pp. 24–45, 2004.
[10] P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza,
J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, and V. Robles,
“Machine learning in bioinformatics,” Brief. Bioinformatics, vol. 7,
pp. 86–112, Mar 2006.
[11] M. Belkin and P. Niyogi, “Laplacian eigenmaps for dimensionality
reduction and data representation,” Neural Computation, vol. 15, no. 6,
pp. 1373–1396, 2003.
[12] J. Keller and M. Popescu, “Soft computing in bioinformatics,” in Fuzzy
Systems, 2005. FUZZ’05. The 14th IEEE International Conference on,
pp. 3–3, IEEE, 2005.
[13] J. Zhang, J. Wang, and H. Yan, “A neural-network approach for
biclustering of gene expression data based on the plaid model,” in
Machine Learning and Cybernetics, 2008 International Conference
on, vol. 2, pp. 1082–1087, IEEE, 2008.
[14] T. Reichhardt, “It’s sink or swim as a tidal wave of data approaches,”
Nature, vol. 399, pp. 517–520, June 1999.
[15] N. M. Luscombe, D. Greenbaum, and M. Gerstein, “What is bioinformatics? A proposed definition and overview of the field,” Methods Inf
Med, vol. 40, no. 4, pp. 346–358, 2001.
[16] R. Xu and D. C. Wunsch, “Clustering algorithms in biomedical
research: A review,” Biomedical Engineering, IEEE Reviews in, vol. 3,
pp. 120–154, 2010.
[17] R. D. Fleischmann, M. D. Adams, O. White, R. A. Clayton, E. F.
Kirkness, A. R. Kerlavage, C. J. Bult, J. F. Tomb, B. A. Dougherty,
and J. M. Merrick, “Whole-genome random sequencing and assembly
of Haemophilus influenzae Rd,” Science, vol. 269, pp. 496–512, Jul
1995.
[18] G. Pandey, G. Atluri, M. Steinbach, C. L. Myers, and V. Kumar,
“An association analysis approach to biclustering,” in Proceedings
of the 15th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, KDD ’09, (New York, NY, USA),
pp. 677–686, ACM, 2009.
[19] R. Xu, S. Damelin, B. Nadler, and D. C. Wunsch, “Clustering of highdimensional gene expression data with feature filtering methods and
diffusion maps,” in BioMedical Engineering and Informatics, 2008.
BMEI 2008. International Conference on, vol. 1, pp. 245–249, IEEE,
2008.
[20] J. Ihmels, G. Friedlander, S. Bergmann, O. Sarig, Y. Ziv, and N. Barkai,
“Revealing modular organization in the yeast transcriptional network,”
Nat. Genet., vol. 31, pp. 370–377, Aug 2002.
[21] G. Lim and J. Bezdek, “Small targets in ladar images using fuzzy clustering,” in Fuzzy Systems Proceedings, 1998. IEEE World Congress on
Computational Intelligence., The 1998 IEEE International Conference
on, vol. 1, pp. 61–66, May 1998.
[22] H. Kim and B. Kosko, “Neural fuzzy motion estimation and compensation,” Signal Processing, IEEE Transactions on, vol. 45, no. 10,
pp. 2515–2532, 1997.
[23] Z. Hou, M. Polycarpou, and H. He, “Editorial to special issue: Neural
networks for pattern recognition and data mining,” Soft ComputingA Fusion of Foundations, Methodologies and Applications, vol. 12,
no. 7, pp. 613–614, 2008.
[24] P. Werbos, “Neurocontrol and elastic fuzzy logic: Capabilities, concepts, and applications,” Industrial Electronics, IEEE Transactions on,
vol. 40, no. 2, pp. 170–180, 1993.
[25] G. A. Carpenter, S. Grossberg, and D. B. Rosen, “Fuzzy ART: Fast
stable learning and categorization of analog patterns by an adaptive
resonance system,” Neural Networks, vol. 4, no. 6, pp. 759 – 771,
1991.
[26] R. Xu and D. C. Wunsch II, “Bartmap: A viable structure for
biclustering,” Neural Networks, vol. 24, no. 7, pp. 709 – 716, 2011.
[27] G. A. Carpenter, S. Grossberg, N. Markuzon, J. H. Reynolds, and
D. B. Rosen, “Fuzzy ARTMAP: A neural network architecture for
incremental supervised learning of analog multidimensional maps,”
IEEE Transactions on Neural Networks, vol. 3, no. 5, pp. 698–713,
1992.
[28] J. A. Nepomuceno, A. Troncoso, and J. S. Aguilar-Ruiz, “Biclustering
of gene expression data by correlation-based scatter search,” BioData
Mining, vol. 4, no. 1, 2011.
[29] C. Tang and A. Zhang, “Interrelated two-way clustering and its application on gene expression data,” International Journal on Artificial
Intelligence Tools, vol. 14, no. 4, pp. 577–597, 2005.
[30] C. Tang, L. Zhang, A. Zhang, and M. Ramanathan, “Interrelated twoway clustering: An unsupervised approach for gene expression data
analysis,” in Bioinformatics and Bioengineering Conference, 2001.
Proceedings of the IEEE 2nd International Symposium on, pp. 41–
48, IEEE, 2001.
[31] B. Chandra, S. Shanker, and S. Mishra, “A new approach: Interrelated
two-way clustering of gene expression data,” Statistical Methodology,
vol. 3, no. 1, pp. 93 – 102, 2006.
[32] T. Golub, D. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. Mesirov,
H. Coller, M. Loh, J. Downing, M. Caligiuri, et al., “Molecular
classification of cancer: Class discovery and class prediction by gene
expression monitoring,” Science, vol. 286, no. 5439, pp. 531–537,
1999.
[33] A. Jorgensen, “Clustering excipient near infrared spectra using different chemometric methods,” , Technical report, Dept. of Pharmacy,
University of Helsinki, 2000.
[34] J. Devore, Probability and Statistics for Engineering and the Sciences.
Duxbury Press, 2011.
[35] M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, “Cluster
analysis and display of genome-wide expression patterns,” Proc. Natl.
Acad. Sci. U.S.A., vol. 95, pp. 14863–14868, Dec 1998.
[36] W. M. Rand, “Objective criteria for the evaluation of clustering
methods,” Journal of the American Statistical Association, vol. 66,
no. 336, pp. 846–850, 1971.
[37] D. Steinley, “Properties of the Hubert-Arabie adjusted Rand index,”
Psychol Methods, vol. 9, pp. 386–396, Sep 2004.
[38] J. Handl and J. Knowles, “Improvements to the scalability of multiobjective clustering,” in Evolutionary Computation, 2005. The 2005
IEEE Congress on, vol. 3, pp. 2372–2379, IEEE, 2005.
Download