1 D DISCRETE WAVELET TRANSFORM FOR CLASSIFICATION OF Adarsh Jose

advertisement
1 D DISCRETE WAVELET TRANSFORM FOR CLASSIFICATION OF
CANCER SAMPLES IN DNA MICROARAY DATA
Adarsh
1
Jose ,
Dale
1
Mugler ,
PhD, Zhong-Hui
2,3
Duan ,
PhD.
1. Department of Biomedical Engineering, The University of Akron
2. Department of Computer Science, The
University of Akron
3. Integrated Biosciences Program, The University of Akron
Abstract
The most important problem in applying Supervised
Learning methods for classifying cancer samples
using the gene expression profiles, is the limited
availability of the samples. So selecting the relevant
features is imperative
for optimizing the
classification algorithms. A feature(gene) selection
method using 1D Discrete Wavelet Transforms is
proposed for addressing ‘two class’ problems in
DNA microarray data.
Gene Expression: The process by which encoded
information from
DNA is converted into actual
structures in cells. The subset of ‘expressed genes’ and
their ‘expression levels’ form a characteristic of the state
of the cell.
DNA microarrays: Allows measurement of expression
levels of thousands of genes simultaneously. Entire
genome can be probed at a single point of time. It is
based on base pair attraction between complementary
pairs in the DNA and RNA strands. The microarray
technology quantifies the notion of gene expression.
Classification problem of microarray
data
Training sets
with class labels
Feature Selection
Training
Classifier
Validation using
Testing Set
Problem : The number of features(genes) is very large
compared to the number of samples.
Solution: To reduce the feature size by ‘selecting’ or
‘extracting’ the ‘most relevant’ features.
What is the wavelet transform ?
Datasets
•  Leukemia dataset - 48 ALL & 25 AML Samples
•  B-Cell Lymphoma dataset – 58 DLBLC & 10 FCC
Results & Observations
•  The algorithm was tested for classification accuracy on
the oligonucleotide datasets by using KNN Classifier
and 3 different validation methods for different
variable sizes .
•  ‘Haar’ and ‘Bior1.5’ wavelets gave accuracy of up to 97%.
•  The average classification error is less than 11% in
both the oligonucleotide datasets studied.
•  ‘Shuffling’ the samples within each class ‘DOES NOT’
have any effect on the accuracy.
Procedure
1 D Discrete Wavelet Transform
•  Break down the signal into different frequency bands.
•  Implemented by sending the signal through a series of
high
pass and low pass half band filters.
Conclusions
Wavelet Decomposition
Examples
•  1-D Discrete Wavelets can capture patterns in Gene –
Expression data which makes it a potential tool for
feature selection.
•  A complete Error Estimation study has to be carried out
with microarray data obtained from different
platforms.
References
1. T.R Golub et al. Molecular Classification of cancer: Class
Discovery and Class Prediction by Gene Expression
Monitoring, www.sciencemag.org, SCIENCE, VOL 286 (1999)
•  The samples are grouped into the 2 classes.
•  1-D Discrete Wavelet Transform to Level 3 of gene was
taken.
•  Gene expression profile reconstructed using Level 3
approx. only.
•  Score = abs(mean(class1) – mean(class2))
•  Genes were ranked by their scores .
2. Shipp MA, Ross KN, Tamayo P, Weng AP, Kutok JL, Aguiar
RC, Gaasenbeek M, Angelo M, Reich M, Pinkus GS, Ray TS,
Koval MA, Last KW, Norton A, Lister TA, Mesirov J, Neuberg
DS, Lander ES, Aster JC, Golub TR.
Diffuse large B-cell lymphoma outcome prediction by geneexpression profiling and supervised machine learning
Nat Med 2002 Jan;8(1):68-74.
3.Matlab manual – Matlab Wavelet toolbox, Matlab
Bioinformatics toolbox. Mathworks
Download