Proteomic Classification of Liver Cancer using Artificial Neural

advertisement
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Proteomic Classification of Liver Cancer
using Artificial Neural Network
By
Lam, Yee Hong Brian
Supervisor: Mr. Willy Tse
May, 2005
Submitted as part of the requirements for the award of the Degree in
Computing and Information Systems of the University of London.
0
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Acknowledgments
I gratefully acknowledge the inspiration and guidance of my supervisor, Mr. Willy Tse (HKU
SPACE, The University of Hong Kong) throughout the course of this work.
I am greatly
indebted to Dr. John Luk (Department of Surgery, The University of Hong Kong) for
providing tremendous support, including proteomic data and computing facilities for this
study.
I may also thank my father, my brother, Ms. Marcella Ma, Mr. Stanley Lam for their useful
advice and suggestions, and to all the staffs in HKU SPACE and colleagues in the department
of Surgery for their help and co-operation
Last but not least, I would like to express my gratitude to my family for their patience and
support throughout these years of study.
1
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Abstract
Hepatocellular carcinoma (HCC) is one of the most deadly cancers worldwide. Current
advances in proteomic approaches facilitate several proteome wide studies in identifying
markers as well as insights into the mechanisms of HCC development. Facing a relatively
large amount of data in the proteome, modern high-order data mining techniques may provide
a systematic way in search for meaningful and biological significant pattern and trends
hidden in the proteomic dataset.
In this study, a proteomic dataset of 132 HCC related tumour and non-tumour samples, each
consisting of 1433 variables was used for construction and evaluation of classification models
based on artificial neural network (ANN) and classification and regression trees (CART)
algorithm. Both algorithms successfully segregate samples into corresponding phenotypes
with high sensitivities and specificities (ANN: 89.4%, 89.4%; CART: 80.3%, 80.3%),
enlightening the usefulness and possibilities of data mining techniques in genomic and
proteomic expression profiling studies.
2
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Table of Contents
Acknowledgments ....................................................................................................................1
Abstract.....................................................................................................................................2
Table of Contents..................................................................................................................3
List of Figures and Tables1 Introduction ..............................................................................5
1 Introduction...........................................................................................................................6
Pathogenesis of Cancer ..........................................................................................................8
Proteomics ...........................................................................................................................11
Data Mining .........................................................................................................................14
Rationale and objectives of study ........................................................................................17
2 Terms of Reference .............................................................................................................19
Problem domain and problem statement .............................................................................19
Project objectives.................................................................................................................19
Expected deliverables ..........................................................................................................19
The approach of project .......................................................................................................20
3 Liver Cancer Proteomics – a brief review ........................................................................21
Proteomic approaches & current findings ...........................................................................22
Data mining – an emerging discipline in HCC proteomics .................................................25
4 Artificial Neural Network and Uses in Proteomics..........................................................26
The artificial neuron model..................................................................................................27
ANN Architectures ..............................................................................................................28
Learning Rules.....................................................................................................................28
Backpropagation algorithm..................................................................................................29
Applications of ANN in HCC proteomics...........................................................................29
5 Implementation and training of ANN ...............................................................................30
3
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Proteomic data structure & characteristics ..........................................................................30
Data pre-processing .............................................................................................................33
Implementation of ANN ......................................................................................................36
Training & Optimization .....................................................................................................37
6 Results ..................................................................................................................................43
Training performance ..........................................................................................................43
Validation performance .......................................................................................................46
7 Comparison with Classification and Regression Trees Algorithm ................................50
Implementation of CART ....................................................................................................50
The classification tree ..........................................................................................................51
Training performance ..........................................................................................................54
Validation performance .......................................................................................................56
Comparison of ANN and CART algorithm.........................................................................57
8 Discussion.............................................................................................................................59
Applicability of ANN based classification algorithm in HCC proteomics .........................59
ANN & CART: Which is better?.........................................................................................60
Future Directions .................................................................................................................61
9 Conclusion ...........................................................................................................................62
References...............................................................................................................................63
Appendix .................................................................................................................................a
Outputs of different classification model constructed in this study....................................... a
Companion CD ......................................................................................................................d
4
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
List of Figures and Tables
Page
Figures
Figure 1.1 The Central Dogma of biology.
Figure 1.2 The cell cycle.
Figure 1.3 Proteomic based on 2D-E approach.
Figure 3.1 Major Proteomic technologies.
Figure 3.2 Chart for marker discovery and development using proteomic
technologies.
Figure 4.1 The artificial neuron.
Figure 4.2 Transfer functions.
Figure 4.3 Topology of feed-forward network.
Figure 5.1 A sample image of 2D-E gel.
Figure 5.2 Raw spots intensities versus natural log-transformation.
Figure 5.3 A dendrogram of average linkage clustering of protein and samples.
Figure 5.4 A screenshot of neural network manager.
Figure 5.5 Network topology of ANN_1.
Figure 5.6 Network topology of ANN_2.
Figure 5.7 Network topology of ANN_3.
Figure 5.8 Network topology of ANN_4.
Figure 5.9 Network topology of ANN_5.
Figure 5.10 Network topology of ANN_6.
Figure 5.11 Network topology of ANN_7.
Figure 6.1 Training curves of ANNs.
Figure 6.2 Linear regression of ANNs.
Figure 6.3 Receiver operating characteristic curves of ANNs.
Figure 7.1 The optimal classification tree constructed by CART.
Figure 7.2 Receiver operating characteristic curves for ANN and CART.
8
10
15
23
24
27
27
28
31
32
35
36
38
38
39
39
40
40
41
44
47
49
53
58
Tables
Table 5.1 An overview of the proteomic data file in tab-delimited format.
Table 5.2 Detailed settings of ANNs used in this study.
Table 6.1 Learning sensitivities and specificities of ANN built in this study.
Table 6.2 The sensitivities and specificities of ANNs built in this study.
Table 7.1 Classification trees constructed by BPS.
Table 7.2 Training sensitivities and specificities of CART.
Table 7.3 Estimated sensitivities and specificities of CART.
Table 7.4 The sensitivities and specificities of CART built in this study.
Table 7.5 The area under curve of ANN6, ANN_6 + hard-limit trasformation and
CART.
Table 8.1 Major advantages and disadvantages of ANN and CART.
Table a. Validation output of ANN and CART models constructed in this study.
5
30
42
45
46
51
55
55
56
58
62
a
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
1 Introduction
Liver cancer is one of the most life-threatening solid tumours worldwide with more than one
million cases diagnosed each year. In Hong Kong, liver and related malignancies account for
12.5% of cancer cases, and it is the second leading cause of cancer death with the mortality
rate of 21 per 100,000 populations. Major risk factors of HCC includes chronic hepatitis
virus infections, in particular hepatitis B and hepatitis C; cirrhosis caused by either hepatitis
or alcoholism [1]; and chronic exposures to various cytotoxic substances such as arsenic [2];
polyvinyl chloride (PVC) [3] etc.
The geographical distribution of this cancer is uneven, with highest incidence rate in eastern
and Southeastern Asia, sub-Sharn Africa and Melanesia [4, 5]. The remarkable variations of
the incidence between different regions coincide with the prevalence of chronic hepatitis
infections, as well as dietary habits, environmental and genetic factors [6].
The diagnosis of liver cancer usually occurs at late stages in the disease when there are few
effective treatment options and the prognosis for patients with HCC is very poor. Currently,
hepatic excision remains the standard for treatment of HCC, nevertheless, the procedure is
somewhat not sufficient due to low resectability rate. In addition, recurrence often happens in
most of the cases (>60%) after resection with short life expectancy of about 6 months from
the time of diagnosis [7, 8].
In order to pursue a better disease management of liver cancers, there is an urgent need in
developing models for early cancer detection, as well as understanding the underlying
mechanisms of liver cancer development.
Proteomic approaches (studies of protein
expression profiles) have recently gained popularity in deciphering a global view on protein
6
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
expressions in a variety of studies, and they provide an excellent opportunity in uncovering
the biologically significant patterns (or fingerprints) specific in liver cancer.
Recently, several proteomic studies have been conducted on cultured cells, tissues as well as
blood from liver cancer patients utilizing different approaches [9-13]. Several proteins have
been identified, and most of them have been indicated to be involved in the development of
cancer, as well as drug responses and toxicities. However, to the best of our knowledge, the
biological significance of those proteins found in these study remains to be elucidated. In
addition, neither a liver cancer specific pattern, nor a reliable classification model has been
established
Data mining is a wide set of statistical analytical techniques in discovering hidden data
attributes, trends and patterns in large databases. Since the mid-nineties, data mining is
gaining ground and being used increasingly in marketing, environmental, banking,
commercial, as well as several medical applications.
In this study, the data on the protein expression profiles of 66 patients was adopted from the
University of Hong Kong Medical Centre with permission. Preliminary examination on the
data revealed more than 1400 proteins, and more than 90 proteins showed changes with
statistical significance of p < 0.01. We have employed ANN to construct a classification
model in order to distinguish among normal and tumour tissues.
In addition, the
discriminative performance of the ANN was evaluated with classification models built by
CART algorithm using the proteomic pattern generated from 2D-E.
7
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Pathogenesis of Cancer
The Central Dogma of biology
The Central Dogma is a series complex processes involving the generation of RNA based on
the genetic code in genomic DNA by means of “transcription” and subsequent production of
a protein using the RNA as template through the process of “translation”. Proteins and some
specialized RNA (such as tRNA) are functional units (machines) which carry out biological
functions (Figure 1.1).
DNA
Transcription – making a functional copy of
genetic matierial
RNA
In some cases, RNA itself is functional
Function
Processing of mRNA transcript +
Translation – use the mRNA as template to
build proteins, the functional unit of gene
Protein
Post-translational modifications –
final steps to make proteins
functional
Function
Figure 1.1 The Central Dogma of biology
The Central Dogma of biology describes the processes of which a gene is transcribe from DNA to RNA and
then translated into protein to carry out functions.
8
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
In the study of biology, the Central Dogma act as the basis for most of the biological
processes, thus most of the studies on biological systems focus on the variations of the
entities involved in the dogma, in particular, the DNA sequences, the expression profiles of
RNA (by means of gene chips) and proteins (proteomic profiling).
The Cell-Cycle
The cell cycle is the recurring sequence of events that includes the duplication of a cell's
contents and its subsequent division. The cell cycle is divided into two major phases, namely
interphase and mitosis.
During interphase, appropriate cellular components are copied. Interphase is made up of three
distinct sub-phases: G1, S, and G2. The G1 and G2 phases serve as checkpoints for the cell to
make sure that it is ready to proceed in the cell cycle, while S phase involves the DNA
synthesis and replication of chromosomes. Mitosis is the part of the cell cycle when the cell
prepares for and completes cell division. A summary of the process cell cycle is depicted in
Figure 1.2.
The cell cycle is a highly regulated process in which the cell division/growth is tightly
controlled (i.e. the cell will proceed, or suspend the cycle depend on needs).
Every
machineries including proteins and chemical factors involved in the cell cycle are “turned on
or off” in a specific time with high precision during different phases of the process. In cancer
however, some of the machineries malfunctions and resulted in a lost of control in cell
division, resulting in either cell death, or tumour growth.
9
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 1.2 The cell cycle
The cell cycle is divided into interphase, which is further divided into G1, S, and G2 sub-phases, and mitosis.
Interphase involves growth and preparation of genetic materials and machineries for cell division; and mitosis
involves the final steps of cell cleavage from one to two daughter cells. (Courtesy of Randy Poon, Department
of Biochemistry, The Hong Kong University of Science & Technology)
Cancer is a Result of Uncontrolled Cell Growth
Cancer is a class of diseases characterized by abnormal and uncontrolled (malignant) growth
of cells through the dysregulation of the cell cycle. The resulting mass, also known as tumour
or neoplasm, can invade and destroy surrounding normal tissues. Cancer cells from the
10
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
tumour can also spread through the bloodstream or lymph system to start new cancers in
other parts of the body, this process is known as metastasis.
As mentioned earlier, cancer is often caused by the malfunctions of the machineries involved
in the cell cycle, and in most of the cancer cases, there are multiple dysfunctions of the cell
cycle process which cause tumour growth.
Different cell types have different tendency to become cancerous, the more active the cell
cycle proceeds, the more likely the cell may mutate to become cancer. In liver, hepatocytes,
which populate more than 90% of the organ mass, are highly active cells that involves in the
metabolism and detoxification processes.
Cancer related to the dysregulated growth of
hepatocytes is called hepatocellular carcinoma, or simply HCC. HCC is the most frequent
type of liver cancers and it is one of the most common solid malignancies together with
nasopharyngeal cancer (NPC) and colorectal cancer (CRC).
Proteomics
A disease arises when a gene or protein is over- or under-expressed, or when a mutation in a
gene results in a malformed protein, resulting in the alteration a protein's function from
normal. In cancer, it is often the case that some of the genes or proteins are expressed in an
abnormal fashion, especially for genes that are involved in the regulation of the cell cycle.
As the terminal and the function unit of the gene, it is of great importance to study the
expression profiles and function of the proteins (the machineries) in order to decipher the
pathological process of liver cancer.
11
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Proteomics is defined as the systematic large-scale analysis of protein expression under
normal and perturbed states, in this study, liver cancer. Proteomics generally involves the
separation, identification, and characterization of all of the proteins in a experimental sample
in a global and comprehensive manner.
There is a broad range of technologies used in proteomics, but the central paradigm has been
the use of 2-D gel electrophoresis (2D-E) followed by mass spectrometry (MS). 2D-E is used
to first separate the proteins by intrinsic charge and then by molecular size, while MS is used
for the identification using the peptide mass fingerprints. The proteomic profiling using 2DE approach is summarized in the Figure 1.3.
12
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Protein extraction from
excised liver tissues
1st dimension isoelectric focusing by pI
MALDI-TOF/MS - MS/MS protein
identification
Data mining
nd
2
dimension gel electrophoresis by
Image analysis and spot data
warehousing
Figure 1.3 Proteomics based on 2D-E approach
1) Protein were extracted from tumour/non-tumour tissues and protein concentration was quantified;
2) protein were loaded onto 11 cm gradient gel strips for 1st dimension separation according to their intrinsic
charges with resolution of pH4-7;
3) the gel strips were subsequently subjected to 2nd dimension electrophoresis according to molecular weight;
4) The 2-DE gels stained with silver were digitized using calibrated densitometer and the spots on gel were
matched using specialized software. The spot data were then sent to database for data warehousing;
5) The spot data were extracted from database and subjected to data mining analyses;
6) Meaningful spots were subjected to MALDI-TOF MS or MS/MS for identification basing on trypsin digested
peptide masses.
Large-scale approaches including genomics and proteomics, can generate much more useful
information than traditional hypothesis-driven approach in the study of biology. The two
approaches are not mutually exclusive however, and indeed broad hypotheses can be formed
13
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
by selecting the appropriate data from the –omics experiment through proper use of data
mining techniques which will be discuss in the following section.
Data Mining
Data mining is a set of analytical techniques in discovering hidden data attributes, trends and
patterns in large datasets, and provide valuable insights into the function or environment of
certain scenarios. By uncovering the pattern and trends in the dataset, data mining also
makes the prediction of future events possible.
Data mining methods are all based on induction-based learning [14], the process of forming a
general concept definition by observing specific examples of the concept to be learned. Data
mining in general can be divided into four major steps:
1. Assemble a collection of data (from data warehouse) to analyze;
2. Transfer data to data mining software for analysis;
3. Result interpretation to examine if what has been discovered is useful;
4. Apply what has been discovered to new situations.
Data mining strategies
Data mining can be classified into two major strategies, namely unsupervised and supervised.
Unsupervised clustering builds models from data without predefine classes while supervised
learning builds models by using input attributes to predict output attribute values. Supervised
learning strategies can be further divided according to whether output attributes are
14
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
categorical or discrete, including classification, estimation and prediction. The hierarchy of
data mining strategies is shown in figure 1.4.
Data Mining
Strategies
Unsupervised
Clustering
Supervised
Learning
Classification
Estimation
Prediction
Figure 1.4 A hierarchy of data mining strategies
Unsupervised data mining techniques
Hierarchical clustering analysis
Clustering analysis was first used by Tryon [15] in 1939. It encompasses a number of
different algorithms for grouping objects of similar kind into respective categories by
developing taxonomies of objects according to the degree of association. In hierarchical
clustering analysis, the similarity/desimilarity between objects are measured by Euclidean
distance:
distance(x,y) = {
i
(xi - yi)2 }½
Clustering analysis are used to discover structures in data without providing an explanation or
supervision, that is, in other words, discovers structures in data without explaining why they
exist.
15
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Supervised data mining techniques
Statistical regression
Statistical regression is a supervised learning technique that generalizes a set of numeric data
by creating a mathematical equation relating one or more input attributes to a single numeric
output attributes. Popular statistical regression techniques include linear regression which the
value of an output attribute is determined by a linear sum of weighted input attribute values:
f (x) = a * y + b * z + k
where x is the output; y, z are inputs; a, b are weights and k is a constant.
Logistic regression is a broad class of models includes ordinary regression and ANOVA, as
well as multivariate statistics such as ANCOVA and log-linear regression. Logistic regression
allows one to predict a discrete outcome, such as group membership, from a set of variables
that may be continuous, discrete, dichotomous, or a mix of any of these. Generally, the
dependent or response variable is dichotomous, such as presence/absence or success/failure.
16
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Artificial neural network
Artificial Neural Network (ANN) is computational data mining tools based on the basic
neural structure and learning model of the brain. [16] ANN learns from experience and stores
information as patterns, just as human brains do.
Unlike traditional computer programmes,
ANN promises a new way of solving complex classification problems by developing a
pattern through alterations in neuronal inputs (in our case, the protein expressions) and
weights in the perceptional layers from the learning data set and assign class outcomes based
on the pattern built. Details of ANN will be discussed in chapter 3.
Classification & regression trees
On the other hand, classification and regression tree (CART) analysis is a statistical based,
tree-structured data mining technique proposed by statistician Breiman in 1984 [17, 18].
CART is a recursive-partitioning algorithm that builds a decision tree by identifying a set of
if-then logical (univariate split) conditions recursively through an exhaustive search over all
variables in the dataset, and permit accurate prediction or classification of cases using a rather
simple and easy to understand graphical representations.
Rationale and objectives of study
There is an urgent need to build a reliable and reproducible classification model for early
detection in order to provide better treatment for patients suffering from liver cancers. Using
such models based on proteomic profiles therefore may provide an excellent opportunity in
17
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
providing better medical diagnosis and prognosis and thus increase the quality of life of
patients suffering from HCC.
In this study, ANN was implemented to construct a classification model based on the liver
cancer proteomic dataset in order to distinguish among normal and tumour tissues. In
addition, the discriminative performance of the ANN was evaluated with classification
models built by CART algorithm.
18
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
2 Terms of Reference
Problem domain and problem statement
⇒ To classify tissues from tumour and normal based on the 2-DE proteomic profiles using
artificial neural network;
⇒ To design and implement a neural network based classification model to achieve a
reliable sensitive and specific prediction of different disease phenotype;
⇒ Comparative study on classification models built by different supervised learning
algorithms such as ANN and CART.
Project objectives
To build a classification model using artificial neural network technology in order to
differentiate cancerous tissues from normal based on their protein expression patterns. In
addition, the discriminative performance of the ANN will be evaluated with classification
models built by CART algorithm.
Expected deliverables
⇒ Literature review on current researches on liver cancer and biomarkers discovery,
technology and approaches that have been employed;
⇒ Literature review on major data mining approaches;
⇒ The neural network based classification model which includes
o Conceptual design and implementation;
o Optimization;
19
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
o Evaluation on sensitivities and specificities;
⇒ Comparison with other known data mining model
o Classification & Regression Trees (CART);
The approach of project
⇒ Literature reviews will be done based on articles and reviews published in
internationally recognized journals;
⇒ Protein expression dataset will be obtained from work place with supervisor’s agreement;
⇒ Artificial neural network (feed-forward backpropagation network) will be built using
MATLAB 7.0 Release 14 (which will consists of ~100 inputs and 2-3 perceptron layers);
⇒ Validation will be carried out by statistical means such as receiver operator curves;
⇒ CART based classification model will be implemented for comparison with ANN on
discriminative performance.
20
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
3 Liver Cancer Proteomics – a brief review
The poor survival of patients with HCC is related to a lack of reliable tools for early
diagnosis. Therefore, it is of urgent need for the discovery and refinement of markers (or
fingerprints) that is highly sensitive and specific in order to promote better HCC detection,
earlier intervention and successful treatment, thus improving long-term outcomes.
Proteomics, or the study of protein expressions, has recently gained much popularity due to
its unique ability to delineate global changes in protein expression patterns that reflects
several cellular actions taking place in the transformation of normal to disease state. In HCC,
in particular, proteomics has been aimed at identifying changes in protein expression,
structure, modifications and sub-cellular localization [19].
Some of these changes may
indicate the formation of cancer and thus lead to great interest for the discovery of novel
markers for the detection of cancer.
In last few years, remarkable advances have been made in several proteomic technologies
[20-22], which facilitate more delicate approaches to investigate the proteome in a more
reliable and reproducible manner.
21
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Proteomic approaches & current findings
Proteomics involves the combination of high-resolution protein separations, identifications,
data warehousing and data analyses techniques. To date, most of the studies are focused on
the first two components, while the latter two are now starting to gain attention as there is a
great amount of data being generated that leads to increasing needs for better management
and efficient use of data.
Nowadays, 2D gel electrophoresis followed by mass spectrometry (2D-E/MS), liquid
chromatography – tandem mass spectrometry (LC-MS/MS), surface enhanced laser
desorption ionization (SELDI, also known as protein chip) and protein microarray are the
four major technologies being used in the study of proteomics. Chignard and Beretta [19] has
written an excellent review on several proteomic technology platforms (Figure 3.1) and
strategies of marker discovery (Figure 3.2) currently being adopted by several research
laboratories. However, due to the scope of this report, detail descriptions of these platforms
are not included here.
22
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 3.1 Major Proteomics Technologies
The figure shows the 4 major proteomic technology platforms being used by most research laboratories: (A)
Two-dimensional gel electrophoresis (2D-E); (B) multidimensional protein identification technology; (C)
surface-enhanced laser desorption ionization (SELDI); and (D) protein microarray. (Adopted from [19])
23
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 3.2 Chart for marker discovery and development using proteomics technologies
(Top) The different proteomic strategies used to date for hepatocellular carcinoma marker discovery. (Bottom)
Subsequent steps toward test development and clinical validation. Abbreviations: SELDI, surface-enhanced
laser desorption ionization; 2D-PAGE, 2-dimensional polyacrylamide gel electrophoresis; 2D-LC, 2dimensional liquid chromatography; LCM, laser capture microdissection. (Adopted from [19])
Several proteomic studies of HCC have been carried out in these few years from several
groups [9, 12, 13, 22-28].
Several proteins of different classes have been identified to
correlate to the pathogenesis of HCC. And some of them have been now put into the list of
new candidate markers for the detection of HCC and pending for clinical validations.
Data warehousing is one of the major issue to ease the access of information in proteomics
due to the enormous amount of data being generated from experiments. Cho et. al. [11] in
Korea attempted to build a HCC proteome database based on Pedro standard [29, 30], and
was made available through the Internet (Available: http://yprcpdb.proteomix.org/).
24
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Data mining – an emerging discipline in HCC proteomics
With the large pool of data available from the HCC proteome, data mining is gaining
popularity and being used by several teams [27, 31, 32]. Common data mining techniques
including unsupervised clustering of proteins, classifications analyses (such as CART,
logistic regression, ANN, etc) have been used recently for identification of HCC specific
protein markers, or pattern of protein expressions that can acts as a fingerprint for the
detection of HCC.
Further optimization and advancement of data mining techniques are being carried out in
order to establish more reliable, sensitive and specific detection methods of HCC in order to
extend the quality of medical management of HCC and promote the quality of life of patients.
25
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
4 Artificial Neural Network and Uses in Proteomics
Artificial neural network (ANN) branches from the research of artificial intelligence in
computer science. It tries to mimic fault-tolerance and learning properties of the biological
brain. ANN can be regarded as one of the multivariate nonlinear analytical tools, and are
known to be good at recognizing patterns from noisy and complex data, and estimate their
nonlinear relations.
According to the DARPA Neural Network Study [33], ANN is defined as:
A neural network is a system composed of many simple processing elements operating in
parallel whose function is determined by network structure, connection strengths, and the
processing performed at computing elements or nodes.
The idea of ANN arose back in 1943 by McCulloch and Pitts [34] based on the knowledge of
neurology at that moment. These models made several assumptions about how neurons
worked. Their networks were based on simple neurons which were considered to be binary
devices with fixed thresholds. The results of their model were simple logic functions such as
"a or b" and "a and b". In 1950-1960s, different models of ANN emerged such as the
Perceptron developed by Rosenblatt [35] and ADALINE (ADAptive LInear Element)
developed by Widrow and Hoff in 1960 [36].
There was a major dispute in the development of ANN in 1969 in which the idea was
criticized as “...our intuitive judgment that the extension (to multilayer systems) is sterile" [37]
and suspended researches in ANN. Nevertheless, ANN regained its momentum in 1970s, and
it is widely used since the DARPA neural network studies [33]. Nowadays, ANN is applied
26
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
in a variety of applications in the area of aerospace, banking, financial, securities, speech, as
well as medicine [38].
The artificial neuron model
ANN composes network of artificial neurons. An artificial neuron is designed in order to
mimic the functions of neurons in the biological brain. The artificial neuron model consists
of inputs, weights, bias and transfer function. A typical artificial neuron (based on [38]) is
depicted below:
Figure 4.1 The artificial neuron. (Adopted from [38])
The output of a neuron depends on the neurons inputs and on its transfer function. In the
neuron model above, the input p is transmitted through a connection that is multiplied by
weight w, and the bias b is added to the product wp. The transfer function f , which is usually
a step function or sigmoid functions, takes the net input n, which is the sum of (wp + b) and
produces output a. There are various kinds of transfer function, including hard-limit transfer
function, linear transfer function and log-sigmoid transfer function.
Figure 4.2 Transfer functions: Hard-limit (left); linear (middle); log-sigmoid (right). (Adopted from [38])
27
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
ANN Architectures
ANN can be classified into two major topologies, feed-forward neural network and recurrent
neural network. In feed-forward network, data flows from input to output in a one-directional
manner, in contrast, in recurrent network, the output from neuron layer may feed backward to
the previous layer as input. Figure 4.3 shows the topology of a typical feed-forward network.
Feed-forward networks cannot perform temporal (time) computation, and recurrent network
is required for temporal behaviour.
Figure 4.3 Topology of a feed-forward network.
Learning Rules
Learning (or training) rule is the procedure for modifying the weights and bias, i.e. w and b of
ANN. Learning rules can be divided into two major categories, namely supervised and
unsupervised. Depending on the problem nature, both learning methods are widely used.
In unsupervised learning, the weights and biases are modified in response to network inputs
only, and there is no target output available. Most algorithms falling in this category perform
clustering operations, which categorize the input patterns into a number of classes.
In supervised learning, the weights and biases of ANN are adjusted by means of the training
set, which the input and the corresponding correct output are provided. As the inputs are
applied to the network, the network outputs are then compared to the targets and adjust is
28
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
made in order to move the outputs closer to the targets. Supervised learning is also known as
inductive learning and the rule is usually used in classification tasks.
Backpropagation algorithm
Backpropagation was created by Widrow-Hoff [36] and it is one of the supervised learning
rules. Backpropagation is a gradient descent algorithm which the network weights are moved
along the negative of the gradient of the performance function, which is calculated by
comparing network outputs with corresponding targets. Backpropagation refers to which the
gradient is computed and feedback to the network in order to do adaptive changes on weights
and biases.
Applications of ANN in HCC proteomics
There are several types of ANN that is being used in the area of medicine, and the major use
of ANN includes several classification analyses in medical diagnosis, as well as selforganizing maps in gene/protein clustering analyses in sample profiling studies [31,32] In this
study, the multiple-layer perceptron (MLP) based feed-forward neural network model was
implemented and trained using backpropagation learning algorithm in order to classify HCC
from normal tissues based on their proteomic profiles.
29
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
5 Implementation and training of ANN
Proteomic data structure & characteristics
Proteomic data format
According to the data warehouse specification, the data was stored in database table using
MS-Access as DBMS. In order to retrieve data from the database, the proteomic data was
exported into tab-delimited text file with the attributes as shown in Table 5.1.
Table 5.1 An overview of the proteomic data file in tab-delimited format.
List of protein spots
Spot intensities of individual sample (in ppm)
↓
↓
Spot ID
Sample 1
Sample 2
….
000001
5.2
3971.9
Å Spots Intensities
000002
1512
12341.3
…
…
…
For each individual samples, the proteomic profile consisted of a list of intensities of spots
and this list acted as a proteomic fingerprint for classification analysis. In this study, data of
132 samples with 1433 spots each was utilized in the implementation and evaluation of the
ANN. (Attached on CD bundled, filename: proteome.xls)
30
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Protein spot intensities & non-linearity
The intensity of protein spots was measured according to the darkness of silver stain on gel.
(Silver reacts with lysine residues, a type of amino acid residues which exists in almost every
protein, and become dark brown in colour, as shown in Figure 5.1) The darkness was
quantified by densitometer and the values was expressed in ppm (parts per million).
Figure 5.1 A sample image of 2D-E gel.
Proteins were separated by means of intrinsic charge and molecular weight and formed spots on gel. In order to
visualize the proteins, silver was used to stain the lysine residues to give dark brown color.
The intensities of protein did not follow a linear relationship with the quantity of protein.
And the distribution of spots intensities did not follow a normal distribution (Figure 5.2b).
The highly skewed nature of spot intensities might give rise to inaccurate statistical results:
for example, the mean and variances, and even the non-parametric Wilcoxon test assumes
symmetry in outcome variables [39]. In order to prevent the inaccuracies, log-transform was
done as recommended in [40] prior to analysis.
31
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 5.2 Raw spots intensities versus natural log-transformation.
The highly skewed distribution of raw spot intensities (left) and the effect of log-transformation (right). Adopted
from [40]
32
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Before data were transferred to the data warehouse, the spot intensities was normalized such
that the total sum of intensities of the protein spots were the same among different samples,
based on the fact that the total amount of protein for each samples profiled were the same
(50µg protein per sample).
Normalization of the spots also removed effects of inter-
experimental fluctuations and differential staining.
Data pre-processing
Before feeding proteomic data into the ANN, the following pre-processing procedures were
done in order to reduce noise and artifacts, and transform data into appropriate range for input:
1. Log-transformation;
2. False positive signals removal;
3. Statistical filtering using Student’s t-test;
4. Normalization of intensities into ranges of [0, 1];
5. Data conversion into MATLAB data format.
Log-transformation and Data Reduction
The proteomic data set was natural log-transformed in order to establish a symmetric and
linear distribution and background was subtracted subsequently.
In order to remove false positive signals, and filter was set such that spots with more than
70% presence in either tumour or non-tumour group in data set was taken and out of 1433
spots, 1203 spots were filtered and 230 spots were used to subsequent analyses.
33
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Further reduction was done by calculation the probability value (p) using Student’s t-test.
Only spots showing a statistical significant change with p-value of smaller than 0.05 was
taken. After the statistical filter, 92 spots left.
The transformation and data reduction
procedures were done in Microsoft Excel (Microsoft Corp. Redmond, WA) and was included
in CD attached (Filename: proteome data.xls).
Hierarchical clustering
Unsupervised hierarchical clustering was performed to give a general overview of the
proteomic patterns of samples after data reduction. Using this technique, samples with
similar proteomic profiles were grouped into clusters and samples that do not group into the
corresponding cluster were treated as outliers. For protein spots, the clustering technique
groups according to their similarity in expression among samples. Clustering of the
proteomic data was performed using Cluster developed by Eisen [41] (available:
http://rana.lbl.gov). A 2-dimensional average linkage algorithm was used to generate cluster
of samples and protein spots based on the processed data using standard procedures as stated
in the developer’s manual. Treeview developed from the same group was used to visualize
the cluster and the result is shown in Figure 5.3. Samples were divided into two major
clusters, non-tumour (left cluster) and tumour (right cluster) and protein spots were also
divided into two major cluster, up-regulated (upper cluster) and down-regulated (lower
cluster).
Outliers were samples that were not grouped in the corresponding clusters
(highlighted in red), which indicates that the proteomic profiles was different from others.
34
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 5.3 A dendrogram of average linkage clustering of protein spots (row) and samples (column).
Protein expression levels were depicted as scale from green – down-regulated to red – up-regulated. Samples
which were not grouped into corresponding cluster were highlighted in red at the top. A high resolution image
was available on CD attached at the back of the report. (Filename: cluster.pdf)
35
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Normalization
The spot intensities were normalized into scale [0, 1] and were converted to MATLAB
format before feeding into the ANN. The files were saved on CD attached (ann_data_set.mat)
Implementation of ANN
MATLAB 7.0 Release 14 (The Mathworks Inc.,
Natick, MA) was used for the
implementation. The ANN was constructed using “Neural Network Toolbox” bundled with
the software. A screenshot for the neural network manager (nntool) was shown below:
Figure 5.4 A screenshot of neural network manager.
36
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Training & Optimization
Training and Simulation Datasets
The proteome data set was randomly divided into training and simulation sets with each set
consisting of 66 non-tumour and 66 tumour samples (i.e. a total of 132 samples). The data in
the training dataset was used for training the network while the simulation dataset was used
for evaluation of the network.
ANN Design & Optimizations
A total of 7 feed-forward backpropagation ANN was designed namely ANN_1 to ANN_7.
ANN_1, ANN_2 and ANN_3 were 2-layered network with increasing number (10, 20, 30
respectively) of neurons in the 1st layer and was trained by the Levenberg-Marquardt (LM)
algorithm (a fast heuristic variant of backpropagation, this method was chosen because it was
the fastest training algorithm when compared to others [38]). ANN_4 was similar to ANN_2,
but it uses the classical gradient descent algorithm instead of LM. ANN_5, ANN_6 and
ANN_7 were 3-layered networks and consist of different number of neurons in the 1st layer
and 2nd layers. The detailed network topologies and settings are shown Figure 5.5 to 5.12 and
Table 5.2 respectively. The performance of the ANNs was evaluated using post-training
linear regression and receiver operating characteristic curves (ROC) analyses.
37
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 5.5 Network topology of ANN_1.
Figure 5.6 Network Topology of ANN_2.
38
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 5.7 Network Topology of ANN_3.
Figure 5.8 Network Topology of ANN_4.
39
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 5.9 Network Topology of ANN_5.
Figure 5.10 Network Topology of ANN_6.
40
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 5.11 Network Topology of ANN_7.
41
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Table 5.2 Detailed settings for ANNs used in this study.
Tansig – Tan-sigmoid transfer function; Purelin – Linear transfer function; GDM – gradient descent
backpropagation with momentum; LM – Levenberg-Marquardt algorithm
ANN_1
ANN_2
ANN_3
ANN_4
ANN_5
ANN_6
ANN_7
No. of
neuronal
layers
2
(1 hidden)
2
(1 hidden)
2
(1 hidden)
2
(1 hidden)
3
(2 hidden)
3
(2 hidden)
3
(2 hidden)
No. of inputs
92
92
92
92
92
92
92
No. of
neurons in 1st
layer
10
(Tansig)
20
(Tansig)
30
(Tansig)
20
(Tansig)
10
(Tansig)
20
(Tansig)
20
(Tansig)
No. of
neurons in
2nd layer
1
(Purelin)
1
(Purelin)
1
(Purelin)
1
(Purelin)
5
(Tansig)
5
(Tansig)
5
(Tansig)
No. of
neurons in
3rd layer
0
0
0
0
1
(Purelin)
1
(Purelin)
1
(Purelin)
Training
function
LM
LM
LM
GDM
LM
LM
LM
Epoch
100
100
100
10000
100
100
100
Performance
Goal
0
0
0
1e -30
0
0
0
42
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
6 Results
Training performance
In order to search for the best artificial neural network (ANN) classification model, 7 ANNs,
namely: ANN_1, ANN_2, ANN_3, ANN_4, ANN_5, ANN6 and ANN_7 were constructed
with increasing complexities. The training curves of the networks are shown in Figure 6.1 on
the next page.
Speed
All ANNs designed in this study showed convergence and reached a performance goal of at
least 1e-25 with the exception of ANN_4, which the root mean square of error (MSE) was
0.0007 when the epoch limit was reached. With increasing complexities, the training time for
ANN_1 was the shortest (around 10s1), while ANN_7 was the longest (around 5 min1).
Classification Performance
The learning outcomes of the ANN models are listed in table 6.1. Linear regression analysis
was performed between network outputs and targets of the training dataset. The R-square
value revealed that all ANNs built was able to distinguish non-tumour and tumour samples in
the training dataset with high accuracies. In order classify outputs of the network into two
discrete classes for calculation of sensitivities and specificities, the ANN output was transfer
to a hard-limit transfer function where f (p) < 0 (where S is the inputs) was defined as nontumour (a = -1) and f (p) ≥ 0 was defined as tumour (a = 1). All ANNs built were able to
achieve perfect discriminations between the two classes in the training set.
1
Based on UNIX time-sharing server with 24 x 1GHz RISC processors and 32GB of memory
43
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
a.
b.
c.
d.
e.
f.
g.
Figure 6.1 Training curves of
a) ANN_1, b) ANN_2, c) ANN_3, d) ANN_4, e) ANN_5, f) ANN_6 and g) ANN_7.
44
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Table 6.1 Learning sensitivities and specificities of ANNs built in this study.
(No. of samples n = 33 for tumour and n = 33 for non-tumour)
ANN_1
ANN_2
ANN_3
ANN_4
ANN_5
ANN_6
ANN_7
Sensitivity
Tumour
Non-tumour
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
Specificity
Tumour
Non-tumour
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
100%
R-square
1.000
1.000
1.000
0.995
1.000
1.000
1.000
45
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Validation performance
Sensitivities and specificities
The sensitivities and the specificities of the ANN models were examined based on the
simulation dataset which was not included in the training dataset. As shown in Table 6.2,
ANN_1, ANN_2, ANN_3, ANN_6 and ANN_7 showed high sensitivities and specificities in
discriminating between tumour and non-tumour of greater than 70%. In particular, the
discrimination performance of ANN_6 was as highest with sensitivities of 90.91% for tumour,
87.88% for non-tumour and specificities of 88.24% for tumour and 90.63% for non-tumour
respectively. A detailed list of ANN outputs are listed in Table a in the appendix.
Table 6.2 The sensitivities and specificities of ANNs built in this study.
(n = 33 for tumour and n = 33 for non-tumour)
Sensitivity
Tumour
Nontumour
Specificity
Tumour
Nontumour
ANN_1
ANN_2
ANN_3
ANN_4
ANN_5
ANN_6
ANN_7
72.73%
75.76%
72.73%
60.61%
63.64%
90.91%
78.79%
84.85%
84.85%
81.82%
87.88%
87.88%
87.88%
90.91%
82.76%
83.33%
80.00%
83.33%
84.00%
88.24%
89.66%
75.68%
77.78%
75.00%
69.05%
70.73%
90.63%
81.08%
Statistical evaluation
Further evaluations of the ANNs were performed using the simulation dataset and the
simulation output were analyzed using two statistical methods: firstly by linear regression and
secondly by receiver operating characteristic curves (ROC).
46
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
a.
b.
c.
d.
e.
f.
g.
Figure 6.2 Linear Regression of
a) ANN_1, b) ANN_2, c) ANN_3, d) ANN_4, e) ANN_5, f) ANN_6 and g) ANN_7.
47
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Figure 6.2 shows the linear regression between network output and target of the simulation
set. All ANNs built showed a best-linear-fit with R-square (shown as R in figure) of over 0.5.
ANN_6 has an R-square of 0.808, indicating that the network output is correlated with target
with satisfactory accuracies.
In addition, ROC analysis was performed as another statistical means of measuring the
discriminative performance of ANNs constructed. Figure 6.3a shows the ROC of 2-layered
networks ANN_1 to ANN_4. The increase in the number of neurons in the hidden layer did
not correlate with the improvement in performance according to the area under curve of ROC
(AUC, where the performance is the best when AUC approaches 1). Instead, ANN_2, which
composes of 20 neurons in the hidden layer, performed best among the 2-layered networks.
For ANN_4, due to the incomplete training (MSE = 0.0007) when compared to the others, the
classification performance was the worst (AUC = 0.832).
For 3-layered networks ANN_5 to ANN_7, the AUC was 0.839, 0.936 and 0.875 respectively.
Among the three networks, ANN_6 performed best, which was consistent with other means
of evaluation.
48
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
1 - Sensitivity
Source of the Curve
1.0
ANN_1 AUC = 0.877 p = 1.4e-07
ANN_2 AUC = 0.893 p = 4.2e-08
ANN_3 AUC = 0.839 p = 2.2e-06
ANN_4 AUC = 0.832 p = 3.6e-06
0.8
Reference Line
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1 - Specificity
a.
1 - Sensitivity
1.0
Source of the Curve
ANN_5 AUC = 0.846 p = 1.4e-06
ANN_6 AUC = 0.936 p = 1.2e-09
ANN_7 AUC = 0.875 p = 1.6e-07
0.8
Reference Line
0.6
0.4
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1 - Specificity
b.
Figure 6.3 Receiver operating characteristic curves for
a) ANN_1, ANN_2, ANN_3 and ANN_4, b) ANN_5, ANN_6 and ANN_7.
The area under curve and the significance are listed in the legend.
49
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
7 Comparison with Classification and Regression Trees
Algorithm
Apart from artificial neural network based classification algorithm, in this study, another
statistical algorithm, namely classification and regression trees (CART) was also
implemented and the discriminative performance between the two algorithms were compared
and evaluated.
Implementation of CART
Data pre-processing
Data pre-processing of CART was different from ANN. In the case of ANN, spots were
filtered based on the presence in sample and only spots showing statistical significant
changes in level were used as network inputs. For CART algorithm, the filters may weaken
the performance due to its recursive partitioning nature.
Therefore, the only pre-treatment was log-transformation and no filter was used in the
implementation of CART, i.e. all 1433 spots were used for the construction of CART based
classification model.
50
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
CART implementation
CART was implemented using Biomarker Pattern Software (BPS) Version 4.0 (Ciphergen
Biosystems Inc., Fremont, CA).
GINI impurity criterion was used for splitting and
estimation was performed using 10-fold cross-validation. Similar to ANN implementation,
the same training and simulation set was used.
The classification tree
Based on CART, a number of trees were constructed and are listed in Table 7.1. The
maximal tree was tree number 1, where it consists of 5 terminal nodes and lowest cost (best
performance). Tree number 2 performed equally well as tree number 1, therefore, it was
chosen as the optimal tree and used for further analyses.
Table 7.1 Classification trees constructed by BPS.
The optimal tree with lowest misclassification cost was taken for analysis
Tree Number
Terminal Nodes
Cross-Validated Cost
1
5
0.273 ± 0.084
2**
4
0.273 ± 0.084
3
3
0.364 ± 0.095
4
2
0.333 ± 0.091
5
1
1.000 ± 0.000
** Optimal
51
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
The details of the optimal decision tree are shown in Figure 7.1. Classification was done
recursively from parent to its corresponding child nodes. In this study, the classification tree
consisted of 3 classifier nodes, namely SSP0026, SSP2201 and SSP3102. For every sample,
the classification scheme determined the route using the natural log-transformed intensity of
the classifier spots in a hierarchical manner and assigned a final class outcome (tumour or
non-tumour) in one of the four terminal nodes. The classification rules are listed below:
For sample going to terminal node 1 (non-tumour)
SSP0026 ≤ 9.072 and SSP2201 ≤ 3.899
For sample going to terminal node 2 (tumour)
SSP0026 ≤ 9.072 and SSP2201 > 3.899
For sample going to terminal node 3 (tumour)
SSP0026 > 9.072 and SSP3102 ≤ 6.928
For sample going to terminal node 4 (non-tumor)
SSP0026 > 9.072 and SSP3102 > 6.928
52
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
_
No
NT
T
Node 2
SSP2201
> 3.899
No
NT =
T =
Terminal
Node
NT =
T =
3
0
=
=
Yes
Node 3
SSP3102
> 6.928
No
NT =
T =
3
27
Terminal
1
Yes
SSP0026
> 9.072
30
6
Terminal
Terminal
Node
2
Node
3
NT
T
0
27
NT
T
0
6
=
=
=
=
Yes
Node
NT =
T =
4
30
0
Figure 7.1 The optimal classification tree constructed by CART.
A decision tree was constructed using GINI splitting criterion (minimal misclassification cost) and 10-fold cross
validation. Based on the evaluation the classifier criteria in the nodes and follow the tree path to the terminal
node, a terminal class, tumour (T) or non-tumour (NT) is assigned to the sample.
53
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Training performance
Speed
The induction of the classification model using CART was very efficient, 5 candidate trees
were constructed by the algorithm in about 10s2.
Classification of training set and cross-validation estimation
The sensitivities and specificities of CART for the training set are listed in table 7.2. The tree
correctly identified 96.97% of the samples in the training set. Table 7.3 shows the estimated
performance of CART based on 10-fold cross-validation test. The predicted sensitivities and
specificities of the model are 86.37% and 86.65% respectively. The output of CART is listed
in table a in the appendix.
2
Based on Windows PC with 1 x 2GHz processor and 1GB of memory, 10-fold cross-validation
54
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Table 7.2 Training sensitivities and specificities of CART.
(n = 33 for tumour and n = 33 for non-tumour)
Training Performance (training set)
Sensitivity
Tumour
Non-tumour
96.97%
96.97%
Specificity
Tumour
Non-tumour
96.97%
96.97%
Table 7.3 Estimated sensitivities and specificities of CART.
(n = 33 for tumour and n = 33 for non-tumour)
Estimated Performance (10-CV)
Sensitivity
Tumour
Non-tumour
90.91%
81.82%
Specificity
Tumour
Non-tumour
83.33%
90.00%
55
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Validation performance
Similar to ANN models, the CART model was also subjected to validation test using the
same simulation dataset as the ANN and the classification performance was evaluated. Out
of the 33 tumour samples in the dataset, CART correctly identified 27 samples and 6 were
misclassified as non-tumour, while 26 non-tumour samples were classified accurately with 7
misclassifications as tumour. Table 7.4 lists the sensitivities and specificities of CART,
where this model achieved an overall sensitivity of 80.31% and specificity of 80.33%
respectively, which was lower then predicted.
Table 7.4 The sensitivities and specificities of CART built in this study.
(n = 33 for tumour and n = 33 for non-tumour)
Validation Performance (validation set)
Sensitivity
Tumour
Non-tumour
81.82%
78.79%
Specificity
Tumour
Non-tumour
79.41%
81.25%
56
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Comparison of ANN and CART algorithm
Training Performance
The algorithm of CART was more efficient than ANN, which it took about 10s for the
construction of the candidate trees listed in Table 7.1 and it took minutes for construction of
ANN with 2 hidden layers and one output layer.
Classification Performance
The ROC curve analysis was performed in order to evaluate the classification performance of
ANN and CART. Figure 7.2 shows the ROC curve of the ANN_6, ANN_6 after hard limit
transfer function and CART and Table 7.5 lists the corresponding AUC values of the three
model. ANN_6 achieved the highest AUC of 0.936, while CART could only achieve 0.803.
Since CART classifies samples in a discrete manner, a hard-limit function with cutoff value
of 0 was performed over the ANN outputs of ANN_6 which could simulate the output of
CART. After the hard-limit transformation, Hardlim(ANN_6) achieved an AUC of 0.894,
which was still higher than that of CART.
57
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
ROC Curve
1.0
Source of the Curve
CART
ANN_6
Hardlim
(ANN_6)
0.8
Reference Line
0.6
0.4
Sensitivity
0.2
0.0
0.0
0.2
0.4
0.6
0.8
1.0
1 - Specificity
Diagonal segments are produced by ties.
Figure 7.2 Receiver operating characteristic curves for ANN_6, ANN_6 + hard-limit transformation and
CART.
Table 7.5 The area under curve of ANN6, ANN_6 + hard-limit trasformation and CART.
Model
AUC
CART
0.803
ANN_6
0.936
Hardlim(ANN_6)
0.894
58
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
8 Discussion
Artificial neural network, along with logistic regression, classification and regression trees
(CART), has been reported as one of the most efficient data mining techniques in clinical
practice. In this study, an attempt was made to construct a classification model from 66 sets
of tumour and non-tumour 2D-E proteomic profiles (a total of 132 samples) using artificial
neural network. The resulting model successfully classified liver samples from tumour to
non-tumour with fine accuracies of 89.4% sensitivity and 89.4% specificity. In addition, the
model was statistically supported by and ROC curve with AUC of 0.936.
In addition to the ANN model, a classification model based on CART was also constructed
for comparison and evaluation. The discriminative performance of CART was slightly lower
then that of ANN by 9.1% in both sensitivity and specificity. However, in terms of training
time complexity and ease of use, CART may outperform ANN.
Applicability of ANN based classification algorithm in HCC
proteomics
As mentioned earlier in the report, ANN has been used extensive in clinical practices,
particular in the area of medical diagnosis and profiling studies.
In this study, the
implementation of ANN in the classification of tumour and non-tumour is one of the first
attempts in applying data mining technologies on to proteomic profiles. The resulting model,
which correctly segregates most of the samples into their corresponding groups, may indicate
the possibilities of applying a similar technique in other cancer related proteomic studies,
such as cancer prognosis.
59
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Nevertheless, ANN is not without problems. Firstly in terms of disease marker discoveries,
ANN utilizes a set of variables, and develops classification model in a “black box” manner
without a clear logical correlations on how the variables are used. If one focuses on the
potential markers but not the classification model, ANN may not be a good choice in contrast
to CART and logistic regression. Secondly, the training of ANN is comparatively complex in
terms of time and space, especially when deal with a large number of inputs and complex
network topologies.
ANN & CART: Which is better?
Both ANN and CART are commonly used to perform classification task, especially for noisy
and non-linear datasets.
In this study, both classification models performed well in
classification between tumour and non-tumour. Table 8.1 lists the major advantages and
disadvantages of ANN and CART in terms of classification analysis. The two algorithms use
different approaches in the course of inductive learning and result in different ways of
classification. ANNs utilized all the neuronal inputs for the construction of classification
model and by means of changing its weights and bias.
On the other hand, CART
exhaustively search for a hierarchical set of classifiers and construct a decision making tree
for classification. Both methods offer unique advantages and at the same time suffer from
their intrinsic weakness. Therefore, no one method could replace the other and we could say
that they are not mutually exclusive and complementary to each other.
60
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Table 8.1 Major advantages and disadvantages of ANN and CART
ANN
CART
Algorithm
Learn and develop classification
model through associative learning
in a biological neuron-like manner
Search of a hierarchical set of
classifiers and construct
classification tree in a recursive bipartitioning manner
Advantages
Good performance
Good performance
Robust to noise and non-linearity
Robust to noise and non-linearity
Classification based on the whole
set of inputs
Easy to implement
Training is relatively fast
Need less samples than CART to
build a good classification model
The decision making tree is easy to
understand
Further investigation of classifiers
are possible
Disadvantages
“Black box” like operation, not
recommended for biomarker
discoveries
Computationally more complex
than CART
Decision making is not
understandable
Classification based on only a few
major classifiers
Hierarchical nature of classifiers
may result in overwhelming effect
of the variables in parent nodes
A large number of samples is often
needed to built a tree of satisfactory
performance
Classification tree composition
may easily change totally with
changes in training data
Future Directions
The 2D-E proteomic profiling of hepatocellular carcinoma has generated a large pool of
global protein expression data which consists of rich information on the biology of cancer.
ANN, as well as CART algorithms are used for building preliminary classification models on
tumour/non-tumour differentiation based on the hidden pattern in the proteomic dataset.
With increasing number of samples, both models may perform better than what is reported in
this report. In addition, it is worthwhile to utilize similar algorithms in more sophisticated
studies such as cancer recurrence, tumour staging in order to help combating HCC.
61
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
9 Conclusion
In conclusion, both the ANN and CART model built produces good predictive ability in
differentiating between tumour and non-tumour tissues based on their 2D-E proteomic
profiles with over 80% accuracies.
Both classification models are somewhat similar in terms of discriminative performance and
only show a small difference by means of statistical evaluations. And more importantly,
none of the two models are mutually exclusive and are in fact complementary to each other.
Given these encouraging results, it is worthwhile to investigate the potential use of ANN,
CART, as well as other data mining algorithms in more sophisticated studies such as cancer
recurrence in order to explore their potentials and application in the management of HCC.
62
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
References
[1]
H. B. El-Serag, "Hepatocellular carcinoma: an epidemiologic view," J Clin
Gastroenterol, vol. 35, pp. S72-8, 2002.
[2]
M. P. Waalkes, J. Liu, H. Chen, Y. Xie, W. E. Achanzar, Y. S. Zhou, M. L. Cheng,
and B. A. Diwan, "Estrogen signaling in livers of male mice with hepatocellular
carcinoma induced by exposure to arsenic in utero," J Natl Cancer Inst, vol. 96, pp.
466-74, 2004.
[3]
K. Heinemann, S. N. Willich, L. A. Heinemann, T. DoMinh, M. Mohner, and G. E.
Heuchert, "Occupational exposure and liver cancer in women: results of the
Multicentre International Liver Tumour Study (MILTS)," Occup Med (Lond), vol. 50,
pp. 422-9, 2000.
[4]
P. Pisani, D. M. Parkin, F. Bray, and J. Ferlay, "Estimates of the worldwide mortality
from 25 cancers in 1990," Int J Cancer, vol. 83, pp. 18-29, 1999.
[5]
D. M. Parkin, "Global cancer statistics in the year 2000," Lancet Oncol, vol. 2, pp.
533-43, 2001.
[6]
C. J. Chen and D. S. Chen, "Interaction of hepatitis B virus, chemical carcinogen, and
genetic susceptibility: multistage hepatocarcinogenesis with multifactorial etiology,"
Hepatology, vol. 36, pp. 1046-9, 2002.
[7]
M. A. Feitelson, B. Sun, N. L. Satiroglu Tufan, J. Liu, J. Pan, and Z. Lian, "Genetic
mechanisms of hepatocarcinogenesis," Oncogene, vol. 21, pp. 2593-604, 2002.
[8]
B. W. Wong, J. M. Luk, I. O. Ng, M. Y. Hu, K. D. Liu, and S. T. Fan, "Identification
of liver-intestine cadherin in hepatocellular carcinoma--a potential disease marker,"
Biochem Biophys Res Commun, vol. 311, pp. 618-24, 2003.
63
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
[9]
D. B. Kristensen, N. Kawada, K. Imamura, Y. Miyamoto, C. Tateno, S. Seki, T.
Kuroki, and K. Yoshizato, "Proteome analysis of rat hepatic stellate cells,"
Hepatology, vol. 32, pp. 268-77, 2000.
[10]
G. S. Yoon, H. Lee, Y. Jung, E. Yu, H. B. Moon, K. Song, and I. Lee, "Nuclear
matrix of calreticulin in hepatocellular carcinoma," Cancer Res, vol. 60, pp. 1117-20,
2000.
[11]
S. Y. Cho, K. S. Park, J. E. Shim, M. S. Kwon, K. H. Joo, W. S. Lee, J. Chang, H.
Kim, H. C. Chung, H. O. Kim, and Y. K. Paik, "An integrated proteome database for
two-dimensional electrophoresis data analysis and laboratory information
management system," Proteomics, vol. 2, pp. 1104-13, 2002.
[12]
L. F. Steel, T. S. Mattu, A. Mehta, H. Hebestreit, R. Dwek, A. A. Evans, W. T.
London, and T. Block, "A proteomic approach for the discovery of early detection
markers of hepatocellular carcinoma," Dis Markers, vol. 17, pp. 179-89, 2001.
[13]
K. S. Park, S. Y. Cho, H. Kim, and Y. K. Paik, "Proteomic alterations of the variants
of human aldehyde dehydrogenase isozymes correlate with hepatocellular
carcinoma," Int J Cancer, vol. 97, pp. 261-5, 2002.
[14]
R. J. Roiger and M. W. Geatz, Data Mining: A Tutorial Based Primer: AddisonWesley, 2003.
[15]
R. Tryon, Cluster Analysis. Ann Arbor, MI: Edward Brothers, 1939.
[16]
D. Ballard, An Introduction to Natural Computation. Cambridge MA: MIT Press,
1997.
[17]
L. Breiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression
Trees. Wadsworth: Pacific Grove, 1984.
[18]
D. Steinberg and P. Colla, CART - Classification and Regression Trees. San Diego:
Salford Systems, 1997.
64
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
[19]
N. Chignard and L. Beretta, "Proteomics for hepatocellular carcinoma marker
discovery," Gastroenterology, vol. 127, pp. S120-5, 2004.
[20]
J. D. Wulfkuhle, L. A. Liotta, and E. F. Petricoin, "Proteomic applications for the
early detection of cancer," Nat Rev Cancer, vol. 3, pp. 267-75, 2003.
[21]
S. Hanash, "Disease proteomics," Nature, vol. 422, pp. 226-32, 2003.
[22]
P. R. Srinivas, M. Verma, Y. Zhao, and S. Srivastava, "Proteomics for cancer
biomarker discovery," Clin Chem, vol. 48, pp. 1160-9, 2002.
[23]
E. Zeindl-Eberhart, S. Haraida, S. Liebmann, P. R. Jungblut, S. Lamer, D. Mayer, G.
Jager, S. Chung, and H. M. Rabes, "Detection and identification of tumor-associated
protein variants in human hepatocellular carcinomas," Hepatology, vol. 39, pp. 540-9,
2004.
[24]
E. E. Schwegler, L. Cazares, L. F. Steel, B. L. Adam, D. A. Johnson, O. J. Semmes, T.
M. Block, J. A. Marrero, and R. R. Drake, "SELDI-TOF MS profiling of serum for
detection of the progression of chronic hepatitis C to hepatocellular carcinoma,"
Hepatology, vol. 41, pp. 634-42, 2005.
[25]
C. Li, Y. X. Tan, H. Zhou, S. J. Ding, S. J. Li, D. J. Ma, X. B. Man, Y. Hong, L.
Zhang, L. Li, Q. C. Xia, J. R. Wu, H. Y. Wang, and R. Zeng, "Proteomic analysis of
hepatitis B virus-associated hepatocellular carcinoma: Identification of potential
tumor markers," Proteomics, vol. 5, pp. 1125-39, 2005.
[26]
T. C. Poon and P. J. Johnson, "Proteome analysis and its impact on the discovery of
serological tumor markers," Clin Chim Acta, vol. 313, pp. 231-9, 2001.
[27]
T. C. Poon, T. T. Yip, A. T. Chan, C. Yip, V. Yip, T. S. Mok, C. C. Lee, T. W. Leung,
S. K. Ho, and P. J. Johnson, "Comprehensive proteomic profiling identifies serum
proteomic signatures for detection of hepatocellular carcinoma and its subtypes," Clin
Chem, vol. 49, pp. 752-60, 2003.
65
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
[28]
P. Shalhoub, S. Kern, S. Girard, and L. Beretta, "Proteomic-based approach for the
identification of tumor markers associated with hepatocellular carcinoma," Dis
Markers, vol. 17, pp. 217-23, 2001.
[29]
K. L. Garwood, C. F. Taylor, K. J. Runte, A. Brass, S. G. Oliver, and N. W. Paton,
"Pedro: a configurable data entry tool for XML," Bioinformatics, vol. 20, pp. 2463-5,
2004.
[30]
K. Garwood, T. McLaughlin, C. Garwood, S. Joens, N. Morrison, C. F. Taylor, K.
Carroll, C. Evans, A. D. Whetton, S. Hart, D. Stead, Z. Yin, A. J. Brown, A. Hesketh,
K. Chater, L. Hansson, M. Mewissen, P. Ghazal, J. Howard, K. S. Lilley, S. J. Gaskell,
A. Brass, S. J. Hubbard, S. G. Oliver, and N. W. Paton, "PEDRo: a database for
storing, searching and disseminating experimental proteomics data," BMC Genomics,
vol. 5, pp. 68, 2004.
[31]
H. Yokoo, T. Kondo, K. Fujii, T. Yamada, S. Todo, and S. Hirohashi, "Proteomic
signature corresponding to alpha fetoprotein expression in liver cancer cells,"
Hepatology, vol. 40, pp. 609-17, 2004.
[32]
T. C. Poon, A. Y. Hui, H. L. Chan, I. L. Ang, S. M. Chow, N. Wong, and J. J. Sung,
"Prediction of liver fibrosis and cirrhosis in chronic hepatitis B infection by serum
proteomic fingerprinting: a pilot study," Clin Chem, vol. 51, pp. 328-35, 2005.
[33]
DARPA, "Technical Report DARPA Neural Network Study Final Report October
1987-February 1988," Lincoln Laboratory, MIT 1989.
[34]
W. S. McCulloch and W. Pitts, "A logical calculus of the ideas immanent in nervous
activity," Bulletin of Mathematical Biophysics, vol. 5, pp. 115-133, 1943.
[35]
F. Rosenblatt, "The Perceptron: A Probabilistic Model for Information Storage and
Organization in the Brain," Psychological Review, vol. 65, pp. 386-408, 1958.
66
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
[36]
B. Widrow and M. E. Hoff, "Adaptive switching circuits," presented at IRE
WESCON, New York, 1960.
[37]
M. L. Minsky and S. A. Papert, Perceptrons, An introduction to computational
geometry (Expanded edition). Cambridge, MA: MIT Press, 1987.
[38]
M. T. Hagan, H. B. Demuth, and M. Baele, Neural Network Design. Boston, MA:
PWS Publishing Co., 1997.
[39]
J. Gibbons, "From two independent samples: Mann-Whitney-Wilcoxon procedures,"
in Nonparametric Methods for Quantitative Analysis, 3rd ed. Columbus, OH:
American Sciences Press, Inc, 1997, pp. 171-188.
[40]
S. Meleth, J. Deshane, and H. Kim, "The case for well-conducted experiments to
validate statistical protocols for 2D gels: different pre-processing = different lists of
significant proteins," BMC Biotechnol, vol. 5, pp. 7, 2005.
[41]
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein, "Cluster analysis and
display of genome-wide expression patterns," Proc Natl Acad Sci U S A, vol. 95, pp.
14863-8, 1998.
67
Lam Yee Hong Brian
Classification of Liver Cancer by Artificial Neural Network
U990209278
Appendix
Outputs of different classification model constructed in this study
Table a. Validation output of ANN and CART models constructed in this study (-1 = non-tumour, 1 = tumour)
Target
ANN_1
ANN_2
ANN_3
ANN_4
ANN_5
ANN_6
ANN_7
Hardlim
(ANN_1)
Hardlim
(ANN_2)
Hardlim
(ANN_3)
Hardlim
(ANN_4)
Hardlim
(ANN_5)
Hardlim
(ANN_6)
Hardlim
(ANN_7)
CART
-1
-1.58
-0.97
0.30
-1.78
-0.99
-0.78
-1.18
-1
-1
1
-1
-1
-1
-1
-1
-1
-1.56
-2.52
-1.28
-1.27
-1.00
-1.02
-1.39
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.44
-0.38
-1.50
-1.08
-0.69
-1.03
-1.09
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.42
-0.96
-2.82
-1.39
-0.65
-1.04
-0.52
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.37
-1.37
-1.05
-1.15
-1.38
-1.01
-1.05
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.31
-1.54
-1.23
-1.04
-1.24
-0.93
-1.45
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.23
-1.41
-1.29
-1.22
-1.17
-1.02
-1.16
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.16
-0.57
-1.06
-0.79
-1.02
-0.90
-1.05
-1
-1
-1
-1
-1
-1
-1
1
-1
-1.14
-0.93
-1.04
-1.47
-1.19
-1.02
-1.62
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.12
-1.28
-0.35
-1.30
-0.86
-1.01
-0.52
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.07
-0.35
-1.02
-1.24
-1.35
-0.99
-1.18
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.05
-0.74
-1.02
-1.00
-1.12
-0.99
-1.60
-1
-1
-1
-1
-1
-1
-1
1
-1
-1.04
-1.98
-1.21
-1.28
-1.09
-0.99
-0.58
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1.03
-0.86
-1.26
-0.94
-1.29
-1.01
-1.16
-1
-1
-1
-1
-1
-1
-1
-1
-1
-0.98
-1.75
-0.89
-0.35
-1.00
-1.00
-2.14
-1
-1
-1
-1
-1
-1
-1
1
-1
-0.96
-0.32
-0.86
-0.86
-1.00
0.93
1.13
-1
-1
-1
-1
-1
1
1
-1
-1
-0.96
-0.01
-0.66
-0.60
0.05
-0.98
-1.40
-1
-1
-1
-1
1
-1
-1
1
-1
-0.95
-1.79
-1.96
-1.16
-1.46
-1.03
-0.71
-1
-1
-1
-1
-1
-1
-1
-1
-1
-0.93
-1.42
-1.25
-1.29
-0.86
-0.73
0.73
-1
-1
-1
-1
-1
-1
1
-1
-1
-0.91
-1.47
-0.42
-0.95
-1.23
-1.04
-0.99
-1
-1
-1
-1
-1
-1
-1
1
-1
-0.86
-1.13
-1.18
-1.55
-1.77
-0.79
-0.16
-1
-1
-1
-1
-1
-1
-1
-1
-1
-0.85
-1.88
-0.13
-2.63
-1.01
-1.00
-0.61
-1
-1
-1
-1
-1
-1
-1
1
-1
-0.83
-0.75
-1.11
-0.77
-1.41
-1.00
-0.73
-1
-1
-1
-1
-1
-1
-1
-1
a
Lam Yee Hong Brian
Classification of Liver Cancer by Artificial Neural Network
U990209278
-1
-0.76
-1.16
-0.06
-1.23
-0.95
-0.71
-1.04
-1
-1
-1
-1
-1
-1
-1
-1
-1
-0.69
-0.45
-0.73
-0.73
-0.56
-0.76
-0.59
-1
-1
-1
-1
-1
-1
-1
1
-1
-0.62
-0.95
-0.73
-1.31
-1.55
-0.60
-0.20
-1
-1
-1
-1
-1
-1
-1
-1
-1
-0.56
-0.85
-0.14
-1.45
-0.94
-0.89
-1.38
-1
-1
-1
-1
-1
-1
-1
-1
-1
-0.01
0.57
-0.18
-0.03
-0.05
-0.83
-0.99
-1
1
-1
-1
-1
-1
-1
-1
-1
0.07
-2.47
0.70
-0.44
-0.59
-1.02
-0.88
1
-1
1
-1
-1
-1
-1
-1
-1
0.55
0.53
0.11
0.73
0.15
1.00
-0.54
1
1
1
1
1
1
-1
-1
-1
0.85
0.27
1.17
0.26
0.06
1.00
-0.55
1
1
1
1
1
1
-1
-1
-1
1.01
1.23
1.29
0.89
1.09
1.00
1.13
1
1
1
1
1
1
1
-1
-1
1.16
0.78
1.04
0.23
-0.41
-0.88
-0.34
1
1
1
1
-1
-1
-1
-1
1
-0.26
-0.74
-0.20
-0.04
-0.92
-0.02
-0.64
-1
-1
-1
-1
-1
-1
-1
1
1
2.19
2.27
0.34
2.27
1.21
0.98
0.40
1
1
1
1
1
1
1
1
1
1.34
1.33
1.36
0.12
3.06
1.00
1.15
1
1
1
1
1
1
1
1
1
0.08
-0.24
-0.29
-1.45
0.43
1.00
0.62
1
-1
-1
-1
1
1
1
1
1
-0.41
0.67
-1.60
-1.22
-0.97
1.00
-0.29
-1
1
-1
-1
-1
1
-1
1
1
1.56
1.43
0.95
1.76
3.58
1.00
1.71
1
1
1
1
1
1
1
1
1
-0.05
1.02
-0.64
-0.33
2.17
0.99
-0.56
-1
1
-1
-1
1
1
-1
1
1
-0.98
-0.36
0.28
-0.10
-0.59
1.00
-0.16
-1
-1
1
-1
-1
1
-1
1
1
1.96
1.61
0.66
0.92
3.60
1.00
0.42
1
1
1
1
1
1
1
1
1
0.20
-0.47
-0.11
-1.41
-1.16
0.96
0.42
1
-1
-1
-1
-1
1
1
1
1
1.95
3.03
1.41
2.68
2.34
1.00
1.21
1
1
1
1
1
1
1
1
1
-1.24
-0.31
-0.06
-0.92
-1.02
-0.55
0.67
-1
-1
-1
-1
-1
-1
1
1
1
0.25
0.89
2.21
0.89
1.23
1.00
0.25
1
1
1
1
1
1
1
1
1
1.43
1.13
1.42
2.47
1.06
1.00
1.09
1
1
1
1
1
1
1
-1
1
1.47
1.88
0.72
1.86
0.95
1.00
0.47
1
1
1
1
1
1
1
1
1
1.03
-0.57
0.39
-0.34
-0.03
1.00
1.21
1
-1
1
-1
-1
1
1
-1
1
1.02
1.04
1.17
1.36
0.72
1.00
1.11
1
1
1
1
1
1
1
1
1
1.67
0.93
1.87
0.40
1.97
1.00
1.27
1
1
1
1
1
1
1
1
1
2.47
1.59
1.17
1.52
1.13
1.00
0.89
1
1
1
1
1
1
1
-1
1
1.27
2.11
1.06
2.21
0.80
1.00
1.29
1
1
1
1
1
1
1
1
1
-0.94
-1.77
-1.67
-0.92
-1.31
0.59
-1.12
-1
-1
-1
-1
-1
1
-1
-1
1
-0.97
0.04
0.77
-0.50
-1.05
0.20
-1.44
-1
1
1
-1
-1
1
-1
1
b
Lam Yee Hong Brian
Classification of Liver Cancer by Artificial Neural Network
U990209278
1
-0.38
0.14
1.23
-1.41
-0.67
1
0.05
1
1.03
1
1
1
1
1.00
-0.02
-1
1
1.25
1.15
1.60
-0.45
1.00
0.95
1
1
1.40
-0.09
1.00
0.84
1.00
0.15
1
1
1.60
2.32
0.19
0.39
1.76
1.00
0.89
1
1
1
3.09
0.62
2.04
0.83
-0.71
1.00
0.98
1
1
1
-0.10
1.10
-0.16
-0.08
0.94
-0.94
0.35
-1
1
-1
1.55
0.23
1.25
0.68
0.76
1.00
2.25
1
1
1
1
1.60
1.21
0.72
2.06
0.76
1.00
0.50
1
1
1
1.22
0.23
1.55
1.85
0.78
1.00
0.94
1
1
0.91
1.37
1.01
0.64
1.12
1.00
0.53
1
1
0.18
-0.09
0.29
-0.65
-0.47
1.00
0.67
1
c
1
-1
-1
1
-1
1
1
1
-1
1
1
1
-1
1
1
1
1
1
1
1
1
1
1
1
-1
1
1
-1
-1
1
-1
1
1
1
1
1
1
-1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
-1
1
-1
-1
1
1
1
Classification of Liver Cancer by Artificial Neural Network
Lam Yee Hong Brian
U990209278
Companion CD
A CD is attached which include the PDF format of this report, the proteomic data, data for
MATLAB and the PDF file of the clustering analysis.
CD Content:
1. dissertation.pdf (dissertation in PDF format)
2. proteomic data.xls (Proteomic data and transformation in Excel format)
3. ann_data_set.mat (Data used for training and simulation of artificial neural network
4. cluster.pdf (The high-resolution dendrogram of average linkage clustering of samples
and proteins in PDF format)
d
Download