[CLICK HERE AND TYPE TITLE]

advertisement
International Biometric Society
LASSO CLUSTERING METHOD FOR CLASSIFICATION OF CANCER SUBTYPES
USING MICROARRAY DATA
Masaru Ushijima 1, Shinto Eguchi 2, Osamu Komori 2, Yoshio Miki 1, Masaaki Matsuura 1
1: Genome Center, Japanese Foundation for Cancer Research, Tokyo, Japan
2: Institute of Statistical Mathematics, Tokyo, Japan
Introduction: Importance of molecular portraits of a cancer and possibility of classifying a
cancer into biologically and molecularly distinct groups have been recognized in the fields of
cancer research and clinical management. In order to find useful and high performance
biomarkers for detecting a subtype of each cancer patient, effective selection of gene set
from molecular profile data would be crucial. In this study, we developed a method for
selecting genes for classification of breast cancer subtypes using a novel lasso type
clustering.
Material and Methods: The method we developed is based on a stepwise gene set
decreasing algorithm in a series of analyses of ordinary k-means clustering. We introduce
L1 penalty for the centers obtained by the ordinary k-means clustering method.
In this algorithm, genes satisfying a condition derived from the lasso theory with a prespecified parameter are excluded from a candidate set of genes. We iteratively evaluate the
number of overlapped subjects in specific clusters in the previous results and the current
one using the ordinary k-means clustering analysis. When the rate of overlapped index
becomes large, then we change the parameter to keep high level of overlapped rate in each
step.
We applied our method to microarray data from 417 breast cancer patients treated at the
Cancer Hospital of JFCR to examine its performance. Additionally, we examined a
possibility whether each subject has more than one subtype, based on iterative random
selections of subjects for k-means clustering analysis.
Results: In advance of our main analyses, we examined the number of subtypes of our 417
patients using the Consensus clustering (Monti et al. Machine Learning, 2003), and we
found that four subtypes (luminal A/B, Her2-enriched, and basal-like) were stable for our
data.
To confirm the prediction performance for our gene selection method, we randomly
divided the data of 417 subjects into 300 subjects training data for gene selection and 117
subjects test data for prediction with 1000 repetition. The median of the number of selected
genes for 300 training subjects was 284 and the median of the concordance rate of
prediction with the result of ordinary k-means clustering was 88.9% for 117 test subjects.
In this analysis, we found that subjects with more than one subtype affect the prediction
performance. When we excluded 53 subjects with more than one subtype, the median of the
accuracy went up 92.2%. These subjects with more than one subtype might affect the
training and prediction performance.
Conclusion: We developed a novel gene selection method for multi-subtype classification
based on a Lasso clustering method. We confirmed good performance of our method in a
framework of prediction problem. Furthermore, we found that subjects with more than one
subtype affect the prediction performance.
International Biometric Conference, Florence, ITALY, 6 – 11 July 2014
Download