A Generative Framework for Prediction and Hui Li , Xiaoyi Li

advertisement
2013 IEEE International Conference on Bioinformatics and Biomedicine
A Generative Framework for Prediction and
Informative Risk Factor Selection of Bone Diseases
Hui Li∗ , Xiaoyi Li∗ , Yuan Zhang† , Murali Ramanathan‡ , Aidong Zhang∗
∗
Department of Computer Science and Engineering, State University of New York at Buffalo, USA
{hli24,xiaoyili,azhang}@buffalo.edu
† College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing, China
{zhangyuan}@emails.bjut.edu.cn
‡ Department of Pharmaceutical Sciences, State University of New York at Buffalo, USA
{murali}@buffalo.edu
Abstract—With the rapid development of healthcare industry,
the overwhelming amounts of electronic health records (EHRs)
have been well documented and shared by healthcare institutions
and practitioners. It is important to take advantage of EHR data
to develop an effective disease risk management model that not
only predicts the progression of the disease, but also provides
a candidate list of informative risk factors (RFs) in order to
prevent the disease. Although EHRs are valuable sources due to
the comprehensive patient information, it is difficult to pinpoint
the underlying causes of the disease in order to assess the risk of
a patient in developing a target disease. Because of the entangled
EHR data, it is also challenging to discriminate between patients
suffering from the disease and without the disease for the purpose
of selecting RFs that cause the disease. To tackle these challenges,
we propose a disease memory (DM) framework which can extract
the integrated features by modeling the relationships among RFs
and more importantly between RFs and the target disease by
establishing a deep graphical model with two types of labels. The
variants of DM can model characteristics for patients with disease
and without disease respectively via training deep networks with
different samples. Experiments on a real bone disease data set
show that the proposed framework can successfully predict the
bone disease and select the informative RFs that are beneficial
and useful to aid clinical decision support. Most of the selected
RFs are validated by medical literature and some new RFs will
attract interests in medical research. The stable and promising
performance on evaluation metrics confirms the effectiveness of
our model.
I.
I NTRODUCTION
The Electronic Health Record (EHR) is a longitudinal electronic record of patient health information including diverse
information like demographics, medications, past medical history, laboratory data, and lifestyles. EHRs are valuable sources
for exploratory analysis and statistics to assist clinical decisionmaking and further medical research. Researchers have been
converting EHR data into risk factors (RFs) for the disease
risk analysis which includes two crucial tasks: disease risk
prediction and informative risk factor (RF) selection. With
the success of both tasks, patient can avoid unnecessary tests,
reduce the cost of public health care, and change their modifiable RFs for disease control or prevention. Usually, numerous
potential RFs need to be considered simultaneously since
observed and hidden reasons behind all RFs are worth learning
for the exploration of the disease progression. However, it is
an extremely challenging task to capture the disease characteristics and clinical nuances for predicting disease progression
978-1-4799-1310-7/13/$31.00 ©2013 IEEE
and detecting the informative risk factors (RFs) due to the
complexity and diversity of the EHR data. The difficulties
are showing in many aspects. First, it is hard to find a good
RF representation so that the salient integrated features can
be disentangled from heterogeneous information. Second, it
is difficult to discriminate the different roles of independent
features for both healthy and diseased patients.
Osteoporosis and bone fractures are common bone diseases
associated with aging and may be clinically silent but can
cause significant mortality and morbidity after onset. Over
the past few decades, osteoporosis has been recognized as
a common bone disease that affects more than 75 million
people in the United States, Europe and Japan, and it causes
more than 8.9 million fractures annually worldwide [1]. It’s
reported that 20-25% of people with a hip fracture are unable
to return to independent living and 12-20% die within one year.
Although the diagnosis of osteoporosis is usually based on the
assessment of bone mineral density (BMD) using dual energy
X-ray absorptiometry (DXA), the World Health Organization
(WHO) embarked on a project to integrate information on
RFs to better predict the risk of bone disease in men and
women worldwide [2]. In this paper, we propose a novel
approach for the study of bone diseases in two aspects: bone
disease prediction and disease RF selection according to the
significance.
Existing models usually fall into two categories: the expert
knowledge based model or the handcrafted feature set based
model. The first mentioned model mainly relies on a small
number of well-known RFs which have been validated by an
expert in this field like [3]. However, the information based
on the expert knowledge is limited so that some important
features might be discarded and thus affect the predictive
performance. The second mentioned model tries to find the
informative RFs by calculating their statistical significance and
then measure the predictive power. The assessment method of
the relationship between a disease and a handcrafted RF is
based on the regression model [4], Artificial Neural Network
(ANN) [5], association rules and decision tree [6]. Although
these models are theoretically acceptable for analyzing the
risk dependence of several variables, it pays little attention
to the relationships among RFs and between RFs and the
target disease. Furthermore, they usually select statistically
significant features from an expert support candidate list,
11 RFs
672 RFs
CDM
Integrated
Risk
Features
Original
Dataset
samples from Phase 1. The high bone loss rate, as the positive
output, shows higher possibility to have bone fractures.
Phase2
Phase1
Training Process
Tr
Disease
Samples
Validate
NDM
Candidate
Informative
RFs
Non-Disease
Samples
Medical
Knowledge
BDM
Fig. 1: Overview of our framework for bone health.
which means there still exists the loss of useful information if
the list is not comprehensive. Recently, mining the causality
relationship between RFs and a specific disease has attracted
considerable research attentions. In [7], limited RFs are used
to construct a Bayesian network and the RFs are assumed
conditionally independent of one another. However, learning
the Bayesian networks becomes tough and even impossible as
the number of RFs increases.
II.
P ROBLEM D EFINITION
In this section, we define our problem by showing a
pipeline for the whole framework. Generally speaking, our
proposed system contains a two-task framework, as shown in
Fig. 1. The upper component of Fig. 1 shows the roadmap for
the first task: the bone disease prediction based on integrated
RFs. The bottom component of Fig. 1 shows the roadmap
for the second task: informative RF selection. Given patients’
information, our system can not only predict the risk of
osteoporosis and bone fractures, but also rank the informative
RFs and explain the semantics of each RF. The description of
each component is given as follows.
Task 1 – The Bone Disease Prediction Component. In
this component, we feed the original data set to the comprehensive disease memory (CDM). The training procedure of
CDM includes two procedures: pre-training and fine-tuning.
In the pre-training step, we train CDM in an unsupervised
fashion. This pre-training procedure aims at capturing the
characteristics among all RFs with the ultimate goal of guiding
the learning towards basins of optima that supports a better
generalization. In the fine-tuning step, we take advantage of
two types of labeled information (osteoporosis and bone loss
rate) for the purpose of focusing on these two prediction tasks.
We use a greedy layer-wised learning algorithm to train a twolayer Deep Belief Network (DBN) which is the structure of
CDM. Besides, all RFs in the original data are projected onto
a new space with the lower dimensionality by restricting the
number of units in the output layer of DBN. Therefore, the
integrated risk features are extracted by CDM module from
the original date set. These lower-dimensional integrated risk
features are new representation of original higher-dimensional
RFs which will be examined by a two-phase prediction module. Two classifiers, Logistic Regression and Support Vector
Machine (SVM), are composed of the prediction module. In
Phase 1, we predict the risk of osteoporosis for all test samples.
We regard the osteoporotic bone as the positive output and the
normal bone as the negative output. Because the osteoporotic
patients tend to have more severe bone fractures. In Phase 2,
we further predict the risk of bone loss rate for all positive
Task 2 – The Informative RF selection Component.
Since we are not able to explain the semantics of the integrated
RFs extracted by the first component, we are required to select
the meaningful and significant RFs from all candidates in the
second component. Instead of using all samples into training
procedure, we first split the original data set into two parts:
diseased samples and non-diseased samples. In the procedure
of training, we separately train bone disease memory (BDM)
using diseased samples and non-disease memory (NDM) using
non-diseased sample, shown a dashed arrows at the bottom
component of Fig. 1. Once the training session is complete,
both memories are used to reconstruct data respectively based
on the contrast group of samples. A two-layer DBN, as the
structure of NDM and BDM, has properties to reconstruct
samples. But it yields large reconstruction errors if we use
BDM to reconstruct non-diseased samples because of the
mismatch between the input data and the memory module.
The contrasts are valuable information to explain why a nondiseased person will get disease. Similarly, the differences are
obvious when reconstructing diseased samples using NDM. All
RFs cumulatively lead to the reconstruction errors. Our ultimate goal is to find the top-N individual RFs which contribute
greatly to the reconstruction errors. The top-N selected RFs
form a candidate informative RF list that will be validated
using the medical knowledge such as medical reports from
WHO and National Osteoporosis Foundation (NOF), as well
as biomedical literatures from PubMed.
III. M ETHODOLOGY
In this section, we first introduce both single-layer and
multi-layer learning approaches which are preliminaries to our
proposed method. Then we propose our model focusing on the
prediction and informative RF selection for bone diseases.
A. Single-Layer Learning for the Latent Reasons
To have a good RF representation of latent reasons for the
data, we propose to use Restricted Boltzmann Machine (RBM)
[8]. A RBM is a generative stochastic graphical model that can
learn a probability distribution over its set of inputs, with the
restriction that their visible units and hidden units must form
a fully connected bipartite graph. Specifically, it has a single
layer of hidden units that are not connected to each other and
have undirected, symmetrical connections to a layer of visible
units. We show a shallow RBM in Fig. 2(a). The model defines
the following energy function: E : {0, 1}D+F → R :
E(v, h; θ) = −
D F
i=1 j=1
vi Wij hj −
D
i=1
b i vi −
F
a j hj ,
(1)
j=1
where θ = {a, b, W } are the model parameters. D and F are the
number of visible units and hidden units. The joint distribution over
the visible and hidden units is defined by:
P (v, h; θ) =
1
exp(−E(v, h; θ)),
Z(θ)
(2)
where Z(θ) is the partition function that plays the role of a
normalizing constant for the energy function.
Exact maximum likelihood learning is intractable in RBM.
In practice, efficient learning is performed using Contrastive
Divergence (CD) [9]. In particular, each hidden unit activation
F
is penalized in the form: j=1 KL(ρ|hj ), where F is the total
)
.
+,
)
.
-
Osteoporosis
Prediction
Classifiers
))
+,
Fig. 2: (a) Shallow Restricted Boltzmann Machine, which contains
a layer of visible units v that represent the data and a layer of
hidden units h that learn to represent features that capture higherorder correlations in the data. The two layers are connected by a
matrix of symmetrically weighted connections, W , and there are no
connections within a layer. (b) A 2-Layer DBN in which the top
two layers form a RBM and the bottom layer forms a multi-layer
perceptron. It contains a layer of visible units v and two layers of
hidden units h1 and h2.
number of hidden units, hj is the activation of unit j and ρ is
a predefined sparsity parameter, typically a small value close
to zero (we use 0.05 in our model). So the overall cost of a
sparse RBM used in our model is:
E(v, h; θ)
=
Bone Loss Rate
Prediction
-
))
)
F
− D
vi Wij hj − D
i=1 bi vi −
F i=1 j=1 F
a
h
+
β
KL(ρ|h
j ) + λ W ,
j=1 j j
j=1
&
& )
)
))
)
&)
))
&
CDM
Whole
( Data
Set
(
(
Fig. 3: Bone disease prediction using a two-layer DBN model.
procedure. After pre-training, the values of the latent variables
in every layer can be inferred by a single, bottom-up pass that
starts with an observed data vector in the bottom layer and
uses the generative weights in the reverse direction. The top
layer of DBN forms a compressed manifold of input data, in
which each unit in this layer has distinct weighted non-linear
relationship with all of the input factors.
C. Bone Disease Prediction Using CDM
(3)
where W is the regularizer, β and λ are hyper-parameters1
The advantage of RBM is that it investigates an expressive
representation of the input risk factors. Each hidden unit in
RBM is able to encode at least one high-order interaction
among the input variables. Given a specific number of latent reasons in the input, RBM requires less hidden units
to represent the problem complexity. Under this scenario,
RFs can be analyzed by a RBM model with an efficient
CD learning algorithm. In this paper, we use RBM for an
unsupervised greedy layer-wise pre-training. Specifically, each
sample describes a state of visible units in the model. The goal
of learning is to minimize the overall energy so that the data
distribution can be better captured by the single-layer model.
B. Multi-Layer Learning for Mining Abstractive Reasons
The new representations learned by a shallow RBM (one
layer RBM) can model some directed hidden causalities behind
the RFs. But there are more abstractive reasons behind them
(i.e. the reasons of the reasons). To sufficiently model reasons
in different abstractive levels, we can stack more layers into
the shallow RBM to form a deep graphical model, namely,
a DBN [10]. DBN is a probabilistic generative model that
is composed of multiple layers of stochastic, latent variables.
The latent variables typically have binary values and are often
called hidden units. The top two layers form a RBM which can
be viewed as an associative memory. The lower layer forms a
multi-layer perceptron (MLP) [11] which receives top-down,
directed connections from the layers above. The states of the
units in the lowest layer represent data vector.
We show a two-layer DBN in Fig. 2(b), in which the
pre-training follows a greedy layer-wise training procedure.
Specifically, one layer is added on top of the network at each
step, and only that top layer is trained as an RBM using CD
strategy [9]. After each RBM has been trained, the weights are
clamped and a new layer is added and then repeats the above
1 We tried different settings for both β and λ and found our model is not
very sensitive to the input parameters. We fixed β to 0.1 and λ to 0.0001.
Our goal now is to disentangle the salient integrated
features from the complex EHR data for the bone disease
prediction. We define an integrated RF learning model based
on the given data set for two types of bone disease prediction, osteoporosis and bone loss rate and DBN that is
introduced in the last section. Our general idea is shown
in Fig. 3, where a good RF representation for predicting
osteoporosis and bone loss rate is achieved by learning a
set of intermediate representation using a DBN structure at
bottom appending a regression layer (classifiers) on it. This
multi-learning model can capture the characteristics from both
observed input (bottom-up learning) and labeled information
(top-down learning). The internal model, which memorizes the
trained parameters using the whole training data and preserves
the information for both normal and abnormal patients, is
term as the comprehensive disease memory (CDM). That
is, the learned representation model CDM discovers good
intermediate representations that can be shared across two
prediction tasks with the combination of knowledge from both
input layer with the original training data and output layer with
two types of class labels. The training procedure for CDM
will focus on two specific prediction tasks (osteoporosis and
bone loss rate) with all risk factors as the input and model
parameters as the output. It includes a pre-training stage and
a fine-tuning stage. In the first stage, the unsupervised pretraining stage, we apply the layer-wised CD learning procedure
for putting the parameter values in the appropriate range for
further supervised training. It guides the learning towards
basins of attraction of minima that support better risk factor
generalization from the training data set. So the result of the
pre-training procedure establishes an initialization point of the
fine-tuning procedure inside a region of parameter space in
which the parameters are henceforth restricted. In the second
stage, the fine-tuning (FT) stage, we take advantage of the two
labeled information to train our model in a supervised fashion.
In this way, the prediction errors for both prediction tasks will
be minimized. Specifically, we use parameters from the pretraining stage to calculate the prediction results for each sample
and then back propagate the errors between the predicted result
Bone disease memory (BDM)
) ) Non-disease memory (NDM)
) ) ))
))
&)
))
)&)
Normal data set
Data reconstruction
(a)
(k)
))
))
&)
))&
)&)&
)&
Abnormal data set
Data reconstruction
(b)
Fig. 4: Informative RF selection based on (a) BDM and (b) NDM.
and the ground truth about osteoporosis from top to bottom
to update model parameters to a better state. Since we have
another type of labeled information, we then repeat the finetuning stage by calculating errors between the predicted result
and another ground truth about bone loss rate. After the twostage training procedure, our CDM is well trained and can be
used to predict bone diseases.
D. Informative Risk Factor Selection Using BDM and NDM
In the previous section, we use CDM to model both
diseased patients and healthy patients together and establish
a comprehensive disease memory which captures the salience
of all RFs by a limited number of integrated RFs for predicting
osteoporosis and bone loss rate. In this section, we model
the diseased patients and healthy patients separately based on
their unique characteristics and identify the RFs that cause
the disease (or osteoporosis). We first define a pair of disease
memory models with a contrast pattern (diseased patients vs.
non-diseased patients). We term the bone disease memory
(BDM) model as a type of DM model which is trained by
the diseased samples so it only memorizes the characteristics
of those patients who suffer from the osteoporosis disease or
having high bone loss rate. BDM is different from CDM in
that it is a disease-targeted model that implies possible latent
reasons to those abnormal patients. Given an abnormal sample,
our goal is to represent the latent reasons leading to his/her
disease. The top block of Fig. 4(a) shows a hierarchical latent
structure underlying the observed RFs, which is well trained
using the abnormal samples. To find informative RFs, we will
apply this model with the normal samples as the input data and
its reconstruction as output, as illustrated in Fig. 4(a). Note that
there are obvious contrasts between the input and output since
data reconstructed by BDM reflects abnormal cases which
is contrary to the input. Under this scenario, the differences
between both sides help us in finding the informative RFs.
Similarly, we term the non-disease memory (NDM) model as
a model which is trained by the non-diseased samples who
have normal bone and low bone loss rate and memorizes
their attributes. The structure of NDM is similar with BDM
as shown in Fig. 4(b), but NDM is a non-disease targeted
model that keeps information about normal patients. Contrary
to BDM, the top block of NDM memorizes the characteristics
of normal patients since it is totally trained by the normal
samples. It has the same function with BDM to find the
informative RFs. Also, it can be severed as a cross-validation
for analyzing the informative RFs provided by BDM.
Distance Metrics. For the convenience of finding the
informative RFs which cause a normal case becomes abnormal,
we look inside to track the distance for each column pair
(each column is a risk factor) between the original data and
the reconstructed data. Note that if information provided is
not reliable it also yields a large distance. To remove the
unreliable information and purify the informative RF list, we
first examine the validity of BDM. We will calculate the
distance ddB between the original disease samples and the
data generated by BDM. We denote the distance for the kth
RF between the original non-disease data and data generated
(k)
by BDM as dnB . The cumulative distance for the kth RF is:
(k)
(k)
(k)
dcB = |dnB - ddB |. We use Root Mean Square Error (RMSE)
(k)
(k)
to calculate the distance for both dnB and ddB and absolute
(k)
distance for dcB . The sum of distances for all RFs is large
since BDM and the input data follow diverse distributions.
NDM has the similar function as BDM. The only difference
is that NDM is used to generate samples with the disease
samples as input and the distances between the reconstructed
(k)
data and the original data is ddN . And the validation for NDM
is the distances between the original non-disease samples and
(k)
data generated by NDM dnN . The cumulative distance can be
(k)
(k)
(k)
calculated using dcN = |ddN - dnN |. We then rank the distances
(k)
(k)
for dcB and dcN with a descending order respectively and then
find the top-N informative RF. Ideally, the candidate informative RFs produced by either BDM or NDM are consistent and
close from one another because only the informative RFs cause
a larger distance if we successfully remove the unreliable data.
A. Data Set
IV.
E XPERIMENTS
The Study of Osteoporotic Fractures (SOF) is the largest
and most comprehensive study of risk factors (RFs) for bone
diseases which includes 9704 Caucasian women aged 65 years
and older. It contains 20 years of prospective data about
osteoporosis, bone fractures, breast cancer, and so on. Potential
risk factors (RFs) and confounders were classified into 20
categories such as demographics, family history, lifestyle, and
medical history [12]. A number of potential RFs are grouped
and organized at the first and second visits which include 672
variables scattered into 20 categories as the input of our model.
The rest of the visits contain time-series dual-energy x-ray
absorptiometry (DXA) scan results on bone mineral density
(BMD) measure, which will be extracted and processed as the
label for our data set. Based on the WHO standard, T-score
of less than -12 indicates the osteopenia condition that is the
precursor to osteoporosis, which is used as the first type of
label. The second type of label is the annual rate of BMD
variation. We use at least two BMD values in the data set to
calculate the bone loss rate and define the high bone loss rate
with greater than 0.84% bone loss in each year [13].
B. Evaluation Metric
The error rate on a test data set is commonly used as the
evaluation method of the classification performance. Nevertheless, for most skewed medical data sets, the error rate could
be still low when misclassifying entire minority samples to the
class of majority. Thus, two alternative measurements are used
in this paper. First, Receiver Operator Characteristic (ROC)
curves are plotted to generally capture how the number of
correctly classified abnormal cases varies with the number of
incorrectly classifying normal cases as abnormal cases. Since
in most medical problems, researchers usually attach great
importance to the fraction of examples classified as abnormal
cases that are truly abnormal, the measurements, PrecisionRecall (PR) curves, are also plotted to show this property.
2 T-score of -1 corresponds to BMD of 0.82, if the reference BMD is 0.942
and the reference standard deviation is 0.122.
0.8
0.7
0.7
0.6
0.6
0.5
0.4
0.3
LR (Expert): 0.729
SVM (Expert): 0.601
LR (RBM): 0.638
SVM (RBM): 0.591
LR (RBM with FT): 0.795
SVM (RBM with FT): 0.785
0.2
0.1
0
0
0.2
0.4
0.6
False Positive Rate
0.8
1
0.8
0.4
0.7
0.5
0.4
0.3
0.2
0.2
0.1
0.1
0
0.2
0.4
0.6
0.8
1
0.7
0.6
0.3
0
LR (Expert): 0.458
SVM (Expert): 0.343
LR (DBN): 0.393
SVM (DBN): 0.386
LR (DBN with FT): 0.718
SVM (DBN with FT): 0.72
0.9
0.8
0.5
0
1
0.9
Precision
0.8
1
LR (Expert): 0.458
SVM (Expert): 0.343
LR (RBM): 0.379
SVM (RBM): 0.358
LR (RBM with FT): 0.594
SVM (RBM with FT): 0.581
True Positive Rate
1
0.9
Precision
True Positive Rate
1
0.9
0.6
0.5
LR (Expert): 0.729
SVM (Expert): 0.601
LR (DBN): 0.662
SVM (DBN): 0.631
LR (DBN with FT): 0.878
SVM (DBN with FT): 0.879
0
0.2
0.4
0.6
0.8
0.4
0.3
1
0.2
0
0.2
False Positive Rate
Recall
(a) ROC and PR curve for single-layer learning.
0.4
0.6
0.8
1
Recall
(b) ROC and PR curve for multi-layer learning.
Fig. 5: Performance Comparison
C. Experiments and Results for Task 1
RFs are extracted based on the expert opinion [3], [14]
and summarized using the following variables: age, weight,
height, BMI, parent fall, smoke, excess alcohol, rheumatoid
arthritis, physical activity. We apply two basic classifiers LR
and SVM and choose the parameters by cross-validation for
fairness. Note that this is a supervised learning process since
all samples for this expert knowledge based model are labeled.
For fair comparison with the classification results using the
expert knowledge, we fix the number of the output dimensions
to be equal to the expert selected RFs. Specifically, we fix
the number of units in the output layer to be 11, where each
unit in this layer represents a new integrated feature describing
complex relationships among all 672 input factors, rather than
a set of typical RFs selected by experts.
Since the sample size is large and highly imbalanced in
Phase1, we evaluate the performance using area under curve
(AUC) of ROC and PR curves. AUC indicates the performance
of a classifier: the larger the better (an AUC of 1.0 indicates
a perfect performance). The number of samples in Phase2 is
small and balanced, thus we only evaluate the performance
using the classification error rate. We will present and discuss
the experiment results for both phases.
1) Phase 1– Osteoporosis Prediction: From Figure 5(a),
we observe that a shallow RBM without FT “LR (RBM)” and
“SVM (RBM)” get a sense of how data is distributed which
represents the basic characteristics of the data itself. Although
the performances are not always higher than the expert model
“LR (Expert)” and “SVM (Expert)”, this is a completely
unsupervised process without borrowing knowledge from any
types of labeled information. Achieving such a comparable
performance is not easy since the expert model is trained in
a supervised way. But we find that the model is lack of focus
to a specific task and thus leads to poor performances. Further
improvements may be possible by more thorough experiments
with a two-stage fine-tuning. So we take advantage of the
labeled information and transform from an unsupervised task
to a semi-supervised task because of the partial label data.
Figure 5(a) shows the classification results after using the twostage fine-tuning to boost the performance of all classifiers “LR
(RBM with FT)” and “SVM (RBM with FT)”. Especially, the
AUC of PR of our model outperforms the expert system.
Since the capacity for the RBM model with one hidden
layer is usually small, it indicates a need for a more expressive model over the complex data. To satisfy this need, we
add a new layer of non-linear perceptron at the bottom of
RBM, which forms a DBN as shown in Fig. 2(b). This new
added layer greatly enlarges the overall model expressiveness.
More importantly, the deeper structure is able to extract more
abstractive reasons. As we expected, using a deeper structure
without labeled information, both LR (DBN) and SVM (DBN)
yield better performance than the shallow RBM model, as
illustrated in Fig. 5(b). And the model “LR (DBN with FT)”
and “SVM (DBN with FT)” further improve its behavior
because of the two-stage fine-tuning. The performance using
DBN model improves at 32% average rate using ROC measure
and 80% average rate using PR measure.
2) Phase 2 – Bone Loss Rate Prediction: In this section,
we show the bone loss rate prediction using the abnormal cases
after Phase1. High bone loss rate is an important predictor of
higher fracture risk. Our integrated risk features are good at
detecting this property since they integrate the characteristics
of data itself and nicely tuned under the help of two kinds
of labels. We compare the results between expert knowledge
based model and our DBN with fine-tuning model that yields
the best performance for Phase1. Since our result is also finetuned by the bone loss rate, we can directly feed the 11 new
integrated features into Phase2. Table I shows that our model
achieves high predictive power when predicting bone loss rate.
In this case, the expert model fails because the limited features
are not sufficient to describe the bone loss rate which may
interact with other different RFs. This highlights the need for
a more complex model to extract the precise attributes from an
amount of potential RFs. Moreover, our CDM module takes
into account the whole data set, not only keeping the 672 risk
factor dimensions but also utilizing two types of labeled data.
TABLE I: Classification error rate comparison
Expert
DBN with FT
LR-Error
0.383
0.107
SVM-Error
0.326
0.094
D. Experiments and Results for Task 2
In this section, we will show experiments and results
on informative RF selection. Based on the proposed method
shown in Figure 1, we show a case study which lists the top
20 informative RFs selected using BDM and NDM in Table II.
Variable descriptions are shown from the data provider [12].
In this study osteoporosis appears to be associated with
several known risk factors that are well described in the
literature. Based on the universal rule used by FRAX [3]
that is a popular fracture risk assessment tool developed by
WHO, some of the selected RFs have already been used
to evaluate fracture risk of patients such as age, fracture
history, family history, BMD and extra alcohol intake. Some
researchers find that not only well-known RFs are associated
TABLE II: Informative risk factors generated by BDM and NDM
Variable
AGE
IFX14
INTX
Fracture history
FACEF
ANYF
History
MHIP80
DSTBMC
Exam
PRXBMD
TURNUM
STEADY
Physical
STEPUP
performance
STDARM
GAID
Exercise
50TMWT
Life style
DR30
Breast cancer
BRSTCA
LISYS
Blood pressure
DIZZY
Vision
CSHAVG
0.9
0.8
0.85
0.7
0.8
0.6
0.75
0.5
0.7
0.65
0.6
The materials published in this paper are partially supported by the National Science Foundation under Grants No.
1218393, No. 1016929, and No. 0101244.
0.4
R EFERENCES
0.3
0.2
Informative RF
Integrated RF
Expert RF
0.55
0.5
0
the observed and latent reasons behind risk factors (RFs) using
a deep graphical model pre-trained by CD algorithm. We
found an effective way of modeling the comprehensiveness
and uniqueness from different samples. First, we combined two
types of bone disease labeled information to train our model
for the prediction task. Second, we formulated a reconstruction
pattern comparison framework to select the informative RFs
for bone diseases. Besides, a group of “disease memories”
(DMs) including comprehensive disease memory (CDM), bone
disease memory (BDM) and non-disease memory (NDM) were
well defined and applied to our experiments. Our extensive
experiment results showed that the proposed method improves
the prediction performance and has great potential to select the
informative RFs for bone diseases.
VI. ACKNOWLEDGMENTS
Description
The patient’s age at this visit
Vertebral fractures
Intertrochanteric fractures
Face fracture
Follow-up time to 1st any fracture since current visit
Mom hip fracture after age 80
Distal radius bone mass content(gm/cm)
Proximal radius bone mass density(gm/cm2 )
Number of steps in turn
Steadiness of turn
Ability to step up one step
Does participant use arms to stand up?
Aid used for pace tests(i.e.crutch,cane,walker)
Total number of times of activity/year at age 50
How often did you have 5 or more drinks one day
Breast cancer status
Systolic blood pressure lying down (mmHg)
Dizziness upon standing up
Average contrast sensitivity
AUC of PR
AUC of ROC
Category
Demographics
10
20
30
40
Number of informative RFs
(a) AUC of ROC curve
50
Informative RF
Integrated RF
Expert RF
0.1
0
0
10
20
30
40
Number of informative RFs
50
(b) AUC of Precision-Recall curve
Fig. 6: Osteoporosis prediction based on informative RFs. AUC of
ROC curve(on left) and AUC of Precision-Recall curve(on right)
with osteoporosis and more falls, but more lifestyle-related
behavioral and environmental risk factors are also important
causes of falls in older women. In Table II, some selected
RFs have been well studied like DIZZY, GAID, STDARM
and 50TMWT [15], [16]. The rest of the RFs would attract
medical researchers’ interests and call researchers’ attention
on monitoring bone disease progression.
In general, it is probably not practical to acquire many
features from all participants. So what are the most important
questions the physician need to know? How many features
they need to achieve a good predictive performance. Using
the proposed approach we selected top 50 informative RFs,
instead of using all of them, and fed them directly to the
Logistic Regression classifier for the osteoporosis prediction.
Fig. 6 shows that we only need top 20 informative RFs so
as to improve both ROC and PR curves. The area under
the ROC curve and the precision-recall curve (AUC) for
our selected RFs (denoted as Informative RF) is even better
than the RFs selected using expert knowledge (denoted as
Expert RF) when the number of selected RFs is fixed to 20.
The proposed informative RF selection method exhibits great
power of predicting osteoporosis in that the selected RFs are
rather significant than the rest RFs. But the performance of
the prediction result of top 50 RFs selected by BDM and
NDM is always inferior to that of integrated RFs extracted by
CDM (denoted as Integrated RF) in that some information are
discarded and those information might still make contribution
to enhancing the predictive behaviors.
V. C ONCLUSIONS
We proposed to tackle the problem of bone disease prediction and informative risk factor (RF) selection by modeling
[1] W. H. Organization et al., “Who scientific group on the assessment of
osteoporosis at primary health care level,” 2004.
[2] W. H. O. S. group on the prevention and management of osteoporosis.
Report, Prevention and management of osteoporosis: report of a WHO
scientific group. WHO, 2003.
[3] Http://www.shef.ac.uk/FRAX/.
[4] R. Bender, “Introduction to the use of regression models in epidemiology,” Methods Mol Biol, vol. 471, pp. 179–195, 2009.
[5] G. Lemineur, R. Harba, N. Kilic, O. Ucan, O. Osman, and L. Benhamou,
“Efficient estimation of osteoporosis using artificial neural networks,”
in Industrial Electronics Society. IEEE, 2007, pp. 3039–3044.
[6] C. Ordonez and K. Zhao, “Evaluating association rules and decision
trees to predict multiple target attributes,” Intelligent Data Analysis,
vol. 15, no. 2, pp. 173–192, 2011.
[7] H. Li, C. Buyea, X. Li, M. Ramanathan, L. Bone, and A. Zhang,
“3d bone microarchitecture modeling and fracture risk prediction,” in
Proceedings of the ACM Conference on Bioinformatics, Computational
Biology and Biomedicine. ACM, 2012, pp. 361–368.
[8] G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality
of data with neural networks,” science, 2006.
[9] M. A. Carreira-Perpinan and G. E. Hinton, “On contrastive divergence
learning,” 2005.
[10] G. E. Hinton, “Deep belief networks,” Scholarpedia, vol. 4, no. 5, p.
5947, 2009.
[11] F. Rosenblatt, Principles of neurodynamics; perceptrons and the theory
of brain mechanisms. Washington: Spartan Books, 1962.
[12] Http://www.sof.ucsf.edu/interface/.
[13] J. Sirola, A.-K. Koistinen, K. Salovaara, T. Rikkonen, M. Tuppurainen,
J. S. Jurvelin, R. Honkanen, E. Alhava, and H. Kröger, “Bone loss rate
may interact with other risk factors for fractures among elderly women:
A 15-year population-based study,” Journal of osteoporosis, vol. 2010,
2010.
[14] Cummings, S.R., Nevitt, M.C., Browner, W.S., Stone, K., Fox, K.M.,
Ensrud, K.E., Cauley, J., Black, D., and Vogt, T.M., “Risk factors for
hip fracture in white women.” Study of Osteoporotic fractures research
group, vol. 332, pp. 767–773, 1995.
[15] K. A. Faulkner, J. A. Cauley, S. A. Studenski, D. Landsittel, S. Cummings, K. E. Ensrud, M. Donaldson, and M. Nevitt, “Lifestyle predicts
falls independent of physical risk factors,” Osteoporosis international,
vol. 20, no. 12, pp. 2025–2034, 2009.
[16] R. Bensen, J. D. Adachi, A. Papaioannou, G. Ioannidis, W. P. Olszynski,
R. J. Sebaldt, T. M. Murray, R. G. Josse, J. P. Brown, D. A. Hanley
et al., “Evaluation of easily measured risk factors in the prediction of
osteoporotic fractures,” BMC musculoskeletal disorders, vol. 6, no. 1,
p. 47, 2005.
Download