The Quantile Method for Symbolic Data Analysis Manabu Ichino

advertisement
The Quantile Method for Symbolic Data Analysis
Manabu Ichino
School of Science and Engineering, Tokyo Denki University
ichino@mail.dendai.ac.jp
Keywords: Quantiles, Monotonicity, Visualization, PCA, Clustering
Abstract
The quantile method transforms the given (N objects)×(d variables) symbolic data table to a standard
{N×(m+1) sub-objects}×(d variables) numerical data table, where m is a preselected integer number
that controls the granularity to represent symbolic objects. Therefore, a set of (m+1) d-dimensional
numerical vectors, called the quantile vectors, represents each symbolic object. According to the
monotonicity of quantile vectors, we present the following three methods for symbolic data analysis.
1) Visualization: We visualize each symbolic object by m+1 parallel monotone line graphs [Ichino and
Brito 2014]. Each line graph is composed of d-1 line segments accumulating the d zero-one
normalized variable values.
2) PCA: When the given symbolic objects have a monotone structure in the representation space, the
structure confines the corresponding quantile vectors to a similar geometrical shape. We apply the
PCA to the quantile vectors based on the rank order correlation coefficients. We reproduce each
symbolic object as m series of arrow lines that connect from the minimum quantile vector to the
maximum quantile vector in the factor planes [Ichino 2011].
3) Clustering: We present a hierarchical conceptual clustering based on the quantile vectors. We define
the concept sizes of d-dimensional hyper-rectangles spanned by quantile vectors. The concept size
plays the role of the similarity measure between sub-objects, i.e., quantile vectors, and it plays also
the role of the measure for cluster quality [Ichino and Brito 2015].
References
H-H. Bock and E. Diday (2000). Analysis of Symbolic Data - Exploratory Methods for Extracting
Statistical Information from Complex Data. Heidelberg: Springer.
L. Billard and E. Diday (2007). Symbolic Data Analysis - Conceptual Statistics and Data Mining.
Chichester: Wiley.
E. Diday and M. Noirhomme-Fraiture (2008). Symbolic Data Analysis and the SODAS Software.
Chichester: Wiley.
M. Ichino and P. Brito (2014). The data accumulation graph (DAG) to visualize multi-dimensional
symbolic data. Workshop in Symbolic Data Analysis. Taipei, Taiwan.
M. Ichino (2011). The quantile method for symbolic principal component analysis. Statistical Analysis
and Data Mining, 4, 2, pp. 184-198.
M. Ichino and P. Brito (2015). A hierarchical conceptual clustering based on the quantile method for
mixed feature-type data. (Submitted to the IEEE Trans. SMC).
Download