An Ensemble Model for Mobile Device based Arrhythmia Detection

advertisement
An Ensemble Model for Mobile Device based Arrhythmia
Detection
Kang Li, Suxin Guo, Jing Gao and Aidong Zhang
Computer Science and Engineering Department
State University of New York at Buffalo
Buffalo, 14260, USA
{kli22, suxinguo, jing, azhang}@buffalo.edu
ABSTRACT
Recent advances in smart mobile device technology have resulted in global availability of portable computing devices
capable of performing many complex functions. With the
ultimate intent of promoting human’s well-being, mobile device based arrhythmia detection (MAD) has attracted lots
of attention recently. Without any guidance or supervision
from experts, the performance of arrhythmia detection is
usually unsatisfactory. Supervised learning can learn from
labeled cardiac cycles to detect arrhythmias for each mobile
device user if enough training data is provided. However, it
is time-consuming, costly and sometimes impossible to let
experts annotate enough training data for each user. To
tackle this problem, we take advantage of publicly available
and well annotated data to infer knowledge which can be
treated as experts for MAD. To reduce the space usage of
the framework, we extract from each source of labeled data
an expert model, which consists of a task-independent individual characteristic vector and a task-related preference
vector. Multiple experts are then integrated into an ensemble model for arrhythmia detection. Both space and
time complexities of this proposed approach are theoretically analyzed and experimentally examined. To evaluate
the performance of the method, we implement it on the
MIT-BIH Arrhythmia Dataset and compare it with seven
state-of-the-art methods in the area. Extensive experimental results show that the proposed algorithm outperforms
all the baseline methods, which validates the effectiveness of
the proposed algorithm in MAD.
Categories and Subject Descriptors
I.5.2 [Pattern Recognition]: Design Methodology—Classifier design and evaluation; J.3 [Computer Applications]:
Life and Medical Science—health
General Terms
Algorithms, Theory, Experimentation, Performance
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
BCB ’13, September 22 - 25, 2013, Washington, DC, USA
Copyright 2013 ACM 978-1-4503-2434-2/13/09 ...$15.00.
ACM-BCB 2013
Figure 1: A Mobile ECG Monitoring System.
Keywords
ECG, Arrhythmia Detection, Ensemble Model
1.
INTRODUCTION
Arrhythmias, which are irregular rates or rhythms of heartbeats, reveals abnormal heart activities. In serious cases,
during arrhythmias, hearts cannot pump enough blood to
the body, which can cause damages to brains and hearts, and
can even cause sudden cardiac death. Therefore, monitoring
heart activities and detecting arrhythmias are of great importance to people’s well-being. Arrhythmia detection can
be used to alarm heart disease onsets, accelerate first aids
and save people’s lives.
Electrocardiogram (ECG) plays a major role in monitoring heart activities and detecting arrhythmias, which interprets cardiac electrical activities over time as periodic
signals. Each period represents the dynamic patterns of
heart activity in a cardiac cycle (heart beat). In the task
of arrhythmia detection, cardiac cycles are divided into two
classes: Normal and abnormal. The former stands for normal states of the heart while the latter represents abnormal
cardiac cycles which likely lead to heart damages or even
sudden death. Our goal is to detect abnormal cardiac cycles
timely from ECG data.
Recent advances in smart mobile devices enable people
to integrate many complex functions to portable computing devices including smart phones. The advances in smart
health sensors further enable miniaturizing ECG assessing
equipments. Due to the importance of arrythmia detection
and the popular usage of smart phones, integrating ECG
monitoring and arrhythmia detecting functions to mobile
devices have aroused great interest. Fig. 1 [13] presents
such a cyber-physical system. In the system, ECG signals
collected through ECG electro-nodes are first sent to smart
266
phones. After processing the signals, data are then sent to
a remote data warehouse for long term data managements.
One simple way to infer abnormal cardiac cycles is to detect anomalous patterns from ECG data. This approach is
referred to as unsupervised learning as no annotated data
are included and no experts are consulted during this process. Such unsupervised arrhythmia detection methods usually suffer from relatively poor performance due to the lack
of supervision. Therefore, supervised learning approaches
should be preferred which learn from expert annotated ECG
data to predict abnormality of unlabeled ECG data. However, most of the existing supervised approaches fail to handle the following difficulties in MAD:
• Lack of labeled data: Existing supervised arrhythmia detection methods usually require sufficient labeled data for both normal and abnormal cardiac cycles in order to make accurate predictions on unlabeled
data. However, the labeling process needs professional
training and consumes lots of time. Furthermore, the
number of normal cardiac cycles is usually much larger
than that of abnormal ones, and thus collecting enough
labeled data for abnormal heart beats is especially difficult. Therefore, it is nearly impossible to obtain sufficient labeled data for each mobile device user.
• Limited space on mobile devices: Typically, in supervised learning, the more labeled instances are provided
in the training set, the more likely that the supervised
method will perform better. In MAD, we need to store
a big number of labeled cardiac cycles if we want to
guarantee the effectiveness of the supervised learning
approaches. However, due to limited storage space of
mobile devices, it is very difficult to store enough labeled data, which will in turn degrade the performance
of supervised learning approaches.
• Requirement of high efficiency: To issue timely alerts,
arrhythmia detection has to be made in real time on
the streaming data collected from mobile devices. This
requires a highly efficient algorithm.
To tackle these challenges, we propose an effective and
efficient framework to combine expert knowledge extracted
from multiple sources of well-annotated data and apply the
knowledge to mobile device users for the task of arrhythmia
detection. Specifically, to handle the lack of labeled data,
we take advantage of existing publicly available and well
annotated ECG sets to guide the classification of cardiac
cycles of mobile device users. To reduce the space usage,
we extract an expert model from each ECG data set in the
form of two vectors: A task-independent individual characteristic vector and a task-related preference vector. The
former represents the type of cardiac cycles that the expert
is professional with, and the latter encodes how the expert
decides if a cardiac cycle is normal or abnormal (arrhythmia). An ensemble model is then built upon the extracted
expert characteristics and preferences with higher weights
assigned to more helpful experts. Intuitively, if the ECG
data exhibit similar patterns to the target user, we trust the
expert constructed from this data set better because it is
more likely to make correct decisions on the cardiac cycles
of the target user. Cardiac cycles of the target user are classified according to the weighted integration of the experts’
preferences. Note that in this framework, the expert extraction and expert integration are derived off-line and then
applied to predict on mobile device users’ ECG data online.
ACM-BCB 2013
The model can handle the collected ECG signals of mobile
device users in a very fast speed.
Although there are many existing methods on detecting
arrhythmias using ECG signals, the proposed work in this
paper significantly differs from the existing work in both
the method and the aimed task. Here we summarize the
difference from the following two major categories of related
work.
One relevant topic is distribution matching [13, 2], which
has been used to tackle the arrhythmia detection problem on
ECG signals. Although it also utilizes publicly available and
well-labeled data sources to help detect arrythmia in target
users, these approaches adopt quite different assumptions
and strategies. They are developed based on the idea of
matching the distributions of source signals and target signals by reweighing the features with the hope that matching
distributions can be found between source and target users.
In contrast, in the proposed approach, we do not make any
assumptions about the distributions of the source or target
signals. Our model is designed based on the well-grounded
principle that an expert should be weighed higher if the
source data it learns from has similar characteristics compared to the target signals.
Another relevant study is multiple source transfer learning
(MSTL) [15, 4, 6], which can transfer knowledge from multiple labeled sources (sets of labeled ECG signals in this task)
to unlabeled data in a target domain (the target user’s ECG
signals in this task). Note that these approaches are not particularly designed for arrythmia detection, and thus they fail
to address some unique challenges in the proposed problem.
For example, they are unable to handle the class imbalance
problem (normal cardiac cycles are way more than abnormal cardiac cycles) and space shortage problem. Different
from these approaches, our proposed method is developed
to address these challenges and thus it is more suitable for
the MAD problem.
In summary, the contributions of this paper include:
• We propose a novel model which maps each source
labeled ECG set to two vectors while preserving the
characteristics and discriminating ability of the expert.
By the expert extraction model, space usage can be
significantly reduced to meet the requirement of the
mobile device based implementation.
• We propose an effective and efficient ensemble model,
which integrates decisions from all the experts for the
task of MAD. The contribution of each expert to the
model is derived based on how similar the source and
target signals are, so the experts which are closer to
the target can get higher weights in the combination.
We derive an efficient solution to infer the model, and
we theoretically analyze the time and space usage of
the framework.
• We evaluate the proposed ensemble model on real and
publicly available data for the task of arrhythmia detection. We compare the proposed approach with seven
state-of-the-arts algorithms. Experiments show that
the proposed approach outperforms the compared algorithms and it is highly efficient in both time and
space.
2.
2.1
METHOD
Notation and Problem Definition
267
Table 1: Notation
t
{ni }ti=1
nT
r
c
i
{Sn
}t
i ×r i=1
{Yni ×1 }ti=1
i
TnT ×r
YnT ×1
T
number of well labeled ECG sets
number of cardiac cycles in the t labeled ECG sets
number of cardiac cycles in the target ECG set
number of features of each cardiac cycle
number of classes in label matrices
t labeled ECG sets
t label matrices for the source ECG sets
target unlabeled ECG set
target label matrix
In the paper, scalers and constants are denoted by lowercase letters in {a, b, α, β, ...}. Matrices and vectors are denoted by upper-case letters in {A, B, Γ, Λ, ...}. We use As×t
to indicate that matrix A contains s rows and t columns.
Ai,: is the i-th row, A:,j is the j-th column and Aij is the
entry of the i-th row and j-th column.
The formulation of our model involves several mathematical concepts, which we explicitly introduce here. The
P trace
of a square matrix Us×s is defined as T r(Us×s ) = si=1 Uii .
The squared Frobenius norm of a matrix A is defined as
kAk2F = T r(A> A) = T r(AA> ). A◦B denotes the Hadamard
product of two equal size matrices A and B, by which we
have (A ◦ B)ij = Aij · Bij . Similarly, the Hadamard division
is defined as (A B)ij = Aij /Bij . N (A|B, σ 2 ) represents
that A follows a Gaussian distribution with mean B and
variance σ 2 . Is×t is an all ones matrix with Iij = 1 for
i ∈ [1, s] and j ∈ [1, t].
As listed in Table 1, we suppose there are t well labeled
ECG signals and different signals are collected from different
individuals. Viewing the t source ECG sets as t experts, we
denote the t experts as Sn1 1 ×r , ..., Snt t ×r . ni for i ∈ [1, t] is
the number of cardiac cycles in the i-th expert and r is the
number of features for each cardiac cycles. Without loss of
generality, we suppose the values in {S i }ti=1 have been scaled
to the range of nonnegative. The label matrices for the t
i
experts are Yn11 ×c , ..., Yntt ×c in which Yj,m
is the probability
that the j-th cardiac cycle in the i-th expert ECG set belongs
to the m-th class for i ∈ [1, t], j ∈ [1, ni ] and m ∈ [1, c].
Without loss of generality, we use the first class m = 1 to
denote the class of normal cardiac cycles and use m ∈ [2, c]
to denote c − 1 types of arrhythmias. For simplicity here, in
the paper we only consider two classes classification task, in
which we aim to determine if each cardiac cycle is normal
or abnormal.
We denote the ECG signals of a mobile device user as
TnT ×r , and the unknown label matrix for the target as
YnTT ×c .
The problem of arrhythmia detection can be expressed
as:given a target ECG signal TnT ×r , t source ECG sets
{Sni i ×r }ti=1 and the corresponding label sets {Ynii ×c }ti=1 ,
learn a mapping F : T → Y T .
2.2
Expert Extraction
Generally, there are two drawbacks of directly utilizing
the t source ECG sets for the learning of the target ECG
set. On the one hand, each source ECG set usually contains
over a thousand of cardiac cycles. When t is large, the source
ECG sets take huge amount of space, and cause high energy
usage on mobile devices. On the other hand, due to the
fact that ECG signals are collected using different assessing
equipments and under different conditions, there are noises
in the ECG sets, which reduces the discriminating ability of
ACM-BCB 2013
the source ECG sets.
To solve the above mentioned problems, in this section,
we propose a novel expert extraction method to map each
source ECG set to two vectors: a task-independent expert
characteristic vector and a task-related expert preference
vector. The former (characteristics) is the consensus shared
by the cardiac cycles from the same expert (source ECG
set), and captures the type of cardiac cycles that the expert
is professional with. The latter measures how the expert decide if a cardiac cycle is normal or abnormal (arrhythmias).
The extracted two vectors thus can capture both the characteristics and the discriminating ability of each expert while
significantly reducing space usage on mobile devices.
Without loss of generality, we assume that each cardiac
cycle can be expressed as:
i
i
Sj,:
= X1×f
Bf>×r + Λij,: + S .
(1)
In Eq. 1, S is the noise term, X i is the task-independent
characteristics of the i-th expert, f is the number of features
in the characteristics, B is the mapping matrix which maps
from the original feature space of S i to the characteristic
feature space of X i , and Λij,: is the unique property of the
j-th cardiac cycle in the i-th expert, which determines the
label of the cardiac cycle.
For each person, normal cardiac cycles are usually much
more than arrhythmias. This can be observed on patients
with severe heart diseases. In mathematics, this fact means
the feature vectors S i of cardiac cycles locate around a mean
X i B > and have low probability to be highly deviated with
large Λij,: . Such knowledge can be included into the model
with a prior distribution on Λi . Specifically, we place the
row-wise independent spherical Gaussian prior distribution
on Λi as:
!
!
r
i>
X
T r(Λi CΛ−1
)
kΛi:,k k2
i Λ
p(Λi ) ∝ exp −
= exp −
.
2CΛi
2
k=1
k
(2)
In Eq. 2, CΛi is a diagonal matrix with diagonal entries
CΛi = diag{CΛi , ..., CΛir }.
1
The conditional distributions over the existing expert ECG
i t
sets {S }i=1 are:
p(S i |X i , B) =
ni
r
Y
Y
i
N (Sjk
|(X i B > )jk , σS2 ),
j=1 k=1
in which σS2 is introduced by the noise term S and the
unique property of each cardiac cycle Λij,k .
Similarly, for the label matrix Y i , we have:
i
i
Yj,:
= Λij,: Wr×c
+ Y ,
in which Y is the noise term, and W i is the proposed taskrelated preference vector of the i-th expert. W i measures
i
how the i-th expert determines whether a cardiac cycle Sj,:
is normal or abnormal (arrhythmias).
The conditional distributions over the observed label matrices {Y i }ti=1 are:
p(Y i |Λi , W i ) =
ni
c
Y
Y
i
N (Yjm
|(Λi W i )jm , σY2 ),
j=1 m=1
in which σY2 is introduced by the noise Y .
268
By Bayesian inference, to obtain the optimal values for
{X i }ti=1 , B and {W i }ti=1 , we need to maximize the log posterior distribution as:
For each X i , we have:
>
>
∂JX
= −2I i S i B + 2I i I i X i B > B
∂X i
Jp = max log p(X, B, W |S, Y )
>
= max log
X,B,W
t
Y
X,B,W
t
Y
>
>
i
where Pc×f
= W i B.
Therefore, the updating rule for {X i }ti=1 is:
h >
>
>
X i ← X i ◦ (I i S i B + γI i H i P )
p(X i , B, W i |S i , Y i )
i=1
∝ max log
>
+ γ(−2I i H i P + 2I i I i X i P > P ),
X,B,W
p(S i |X i , B)p(Y i |Λi , W i ).
i=1
i
>
>
(I i I i X i B > B + γI i I i X i P > P ) .
(5)
By estimating {Λi }ti=1 as Λi = S i − IX i B > , we further
formulate the objective as:
(3) fix {X i }ti=1 and {W i }ti=1 , the objective J is formu"
!
lated
into:
ni
t
r
i
i
> 2
X
(Sjk
− Xj,:
Bk,:
)
1 XX
2
t h
i
X
Jp = max
−
log(2πσS ) +
>
X,B,W
2 j=1
σS2
JB = min
kS i − I i X i B > k2F + γkH i − I i X i B > W i k2F .
i=1
k=1
B≥0
!#
i=1
ni
c
i
i
i
>
i
2
(Y
−
(S
−
X
B
)W
)
1XX
:,m
jm
j,:
j,:
2
−
log(2πσY ) +
. For B, we have:
2 j=1 m=1
σY2
t h
X
>
>
>
∂JB
2
2
=
2BQi Qi − 2S i Qi
In the above objective Jp , σS and σY are parameters,
∂B
i=1
therefore, the objective is equivalent to:
i
>
>
>
+2γ(W i W i BQi Qi − W i H i Qi ) ,
t h
X
i
>
J = min
kSni i ×r − Ini i ×1 X1×f
Br×f
k2F
> >
X,B,W
in which Qif ×ni = X i I i .
(3)
i=1
i
Therefore, the updating rule for B is:
i
>
+γk(Sni i ×r − Ini i ×1 X1×f
Br×f
)W i − Y i k2F ,
" t
X i> i>
>
2
B ← B◦
S Q + γW i H i Qi
σS
in which γ = σ2 is a nonnegative parameter.
i=1
Y
(6)
#
This equivalent objective well fits our initial intuitions.
t X
i
>
i i>
i
i>
i i>
In J , kSni i ×r − Ini i ×1 X1×f
Br×f
k2F guarantees the extracted
BQ Q + γW W BQ Q
.
task-independent expert characteristics X i to well represent
i=1
each cardiac cycle in the i-th source ECG set. For the other
We summarize the solution in Alg. 1.
i
>
part, k(Sni i ×r −Ini i ×1 X1×f
Br×f
)W i −Y i k2F ensures that the
i
extracted task-related expert preference W can correctly
Algorithm 1 Expert Extraction for Mobile Device based
classify cardiac cycles in the i-th source ECG set. The paArrhythmia Detection
rameter γ then works as the trade-off coefficient which conRequire: t source ECG sets {S i }ti=1 , the corresponding latrols the importance of the above two parts.
bel matrices {Y i }ti=1 , number of features f for expert
To optimize the objective, we add nonnegative constraints
i t
i t
characteristics and nonnegative trade-off coefficient γ
to {X }i=1 , {W }i=1 and B as in [12] and use the iterative
Ensure:
Expert characteristics {X i }ti=1 , expert preferences
multiplicative update method [12]. By this technique, we
i t
{W }i=1 and mapping matrix B
iteratively update one aimed matrix while fixing all the other
1: Randomly initialize {X i }ti=1 , {W i }ti=1 and B with the
matrices. In detail, we
non-negative constraints
(1) fix B and {X i }ti=1 , the objective J is formulated into:
2: repeat
t
X
3:
Update {W i }ti=1 has:
i
JW = min
kΛi W i − Y i k2F .
>
>
W i ← W i ◦ (Λi Y ) (Λi Λi W i )
W ≥0
i=1
4:
i
For each W , we have:
Therefore, the updating rule for {W i }ti=1 is:
h >
i
>
W i ← W i ◦ (Λi Y ) (Λi Λi W i )
(4)
(2) fix B and {W i }ti=1 , the objective J is formulated into:
X≥0
t
X
>
kS i − I i X i B > k2F + γkH i − I i X i B > W i k2F ,
i=1
>
>
i
in which Hc×n
= W i Si − Y i
i
ACM-BCB 2013
>
>
>
X i ← X i ◦ (I i S i B + γI i H i P )
>
>
∂JW
= −2Λi Y + 2Λi Λi W i
∂W i
JX = min
Update {X i }ti=1has:
>
for i ∈ [1, t].
i
>
>
(I i I i X i B > B + γI i I i X i P > P ) .
5:
Update B as: h
Pt i > i >
>
B←B◦
Q + γW i H i Qi
i=1 S
i
P
>
>
>
ti=1 BQi Qi + γW i W i BQi Qi
.
6: until convergence
There are two types of stopping conditions that can be
applied to the Alg. 1. First, setting the largest iteration
number of the iterative updating algorithms. Second, setting
an upper-bound of the changes of a specified matrix. In the
269
implementation, we use the second method, and at the end
P
P
(s)
(s−1)
of s-th iteration we check if rj=1 fm=1 kBj,m − Bj,m k ≤
0.0001. If yes, the algorithm has reached convergence; if no,
jump to the next iteration.
Another issue that needs to be taken care of is undersampling the labeled source ECG sets to balance the number of cardiac cycles in each class. The reason for this operation is that in the objective J , data with imbalanced
class distributions cause W i put more weights to instances
in the majority class (normal cardiac cycles in the paper).
In the implementation, we randomly under-sample the normal cardiac cycles to be the same amount of the number of
arrhythmias in each expert ECG set.
By the expert extraction method, we are able to map each
labeled expert ECG set to two 1×f vectors while keeping the
discriminating ability of the expert. In the real world, since
there is nearly infinite number of possible labeled expert
ECG sets, storing all possible processed expert vectors on
mobile devices will have very low space and time efficiencies.
To solve this problem, we propose a straightforward strategy to efficiently select the processed experts and to control
the expert size as user-specified. The principle of the expert
selecting strategy is to maximize the coverage of the set of
experts while minimizing duplicates.
We define the similarity of the i-th and the j-th experts
as:
S(i, j) = Sv (X i , X j )Sv (W i , W j ),
where Sv (U, V ) is an off the shelf similarity measure on two
vectors U and V .
If two experts have high similarity, then they are highly
duplicated. A threshold pruning strategy can be used to select the experts: with a pre-defined threshold of the expert
similarity, iteratively check if two arbitrary experts have similarity higher than the threshold. If yes, randomly remove
one. Keep the iterations until no more to be removed.
2.3
Ensemble of Experts
Given t processed and selected experts, the most naive
way is to select the best expert for each mobile device user.
However, there are several significant drawbacks of this naive
strategy. First, choosing the best expert for a specific mobile
device user would either involve complex distribution matching of the user’s ECG and the experts’ signals, or require
labeled cardiac cycles of the user to supervise the choice of
the expert. Therefore, the naive strategy is quite inefficient
and quite expensive (requiring labeled cardiac cycles). Second, compared to the number of mobile device users, the
number of experts is much smaller. We may not be able to
find an expert that performs well enough for each mobile
device user.
To avoid the above drawbacks, in the section, we propose
an ensemble of experts model which integrates the selected
experts for MAD.
Specifically, the integration of opinions from the selected
t experts is achieved through:
p(Y T |T ) =
t
X
We model the conditional probability p(S i |T ) as:
p(S i |T ) = α + βD(X i , X T ),
in which X T is the mobile user characteristic vector, and we
discuss the details of estimating X T in the next section. α
and β are two coefficients that map the vector similarity of
X i and X T to the probability p(S i |T ).
D is a distance metric measuring the closeness of two characteristic vectors X i and X T . For simplicity, in the implementation, we use L2 norm as the distance metric:
v
u f
uX
D(X i , X T ) = t (Xki − XkT )2 .
k=1
The intuition behind the model of p(S i |T ) is that if an
expert has closer characteristics with the target T , then it
is more trustworthy in learning the target T . In our setting, D is a distance matrix, and the closer X T and X i are,
the smaller D(X T , X i ) is. To fit the intuition, we have the
constraint for β as β ≤ 0.
Manually setting the values of α and β is subjective and
may not well fit the real situations. Therefore, we propose a
cross-validation based learning method to automatically set
the values of α and β.
When α and β are optimized, for each of the t labeled
expert ECG sets, the label matrix {Y i }ti=1 should be close
to the label matrix estimated by all the other experts. By
this intuition, we obtain the objective for optimizing α and
β as:
Jαβ = min
α,β
p(Y T |T, S i )p(S i |T ).
(7)
In Eq. 7, p(Y T |T ) is the conditional probability of the target label matrix Y T given the target ECG set T . p(Y T |T, S i )
is the probability that the i-th expert S i classifying on the
ni h
t X
X
i
Yj,:
− (α + β · Di,: )Rij
i2
,
i=1 j=1
where Dil is the distance measured by D(X i , X l ) for l ∈ [1, t]
ij
and l 6= i. The auxiliary matrix Rt−1×c
is defined as:
ij
i
Rl,:
= Sj,:
− X lB> · W l.
Then we have:
ni
t
i
X X h ij ij >
∂Jαβ
i >
=
I 2R R (α · I + β · Di )> − 2Rij Yj,:
,
∂α
i=1 j=1
t
ni
i
X X i h ij ij >
∂Jαβ
i >
=
D 2R R (α · I + β · Di )> − 2Rij Yj,:
.
∂β
i=1 j=1
Setting
∂Jαβ
∂α
= 0 and
∂Jαβ
∂β
= 0, we then obtain:
>
ij i >
ij ij >
βDi
i=1
j=1 I R Yj,: − R R
α=
,
Pt Pni
ij ij > I >
i=1
j=1 IR R
P t P ni
>
i
i >
Rij Yj,:
− Rij Rij αI >
i=1
j=1 D
β=
.
Pt Pni
i ij ij > D i >
i=1
j=1 D R R
Pt
i=1
ACM-BCB 2013
target set T obtains label matrix Y T . p(S i |T ) is the probability that the i-th expert fits the classification task on the
target set T .
Among the three conditional probabilities, p(Y T |T, S i )
can be easily obtained as:
i
>
p(Y T |T, S i ) = T − InTT ×1 X1×f
Br×f
Wr×c .
Pni
270
Mobile Device based Implementation
Space Usage over Various Numbers of Experts
7
10
With extracted expert characteristics {X i }ti=1 , expert preferences {W i }ti=1 , mapping matrix B, and the ensemble of
experts model, we are able to classify ECG signals of mobile
device users. In this section, we explicitly discuss the details
of the mobile device based implementation.
To measure the closeness from the target ECG signal to
each existing expert, we extract the characteristics of the
target by the objective:
6
10
Log Space Usage
2.4
The Mixture of Experts Model
Multiple Sources Learning Baselines
Oversampling Baselines
Undersampling Baselines
5
10
4
10
JT = min kT̂n×r −
X T ≥0
T
>
In×1 X1×f
Br×f
k2F .
(8)
3
10
In the objective JT , T̂n×r is a part of the target ECG set
and n ≤ nT . In mobile device based implementation, the objective stands for learning the user characteristics using the
data collected in a certain period of time. The classification
on the user’s signal starts after the extracting period.
To solve the objective, an iterative multiplication approach
is derived. Since
∂JT
= −2I > T B + 2I > IXB > B,
∂X T
we can infer:
h
i
X T ← X T ◦ (I > T B) (I > IXB > B) .
Then the cardiac cycles in the target ECG signal can be
classified according to Eq. 7.
We summarize the algorithm for MAD in Alg. 2.
Algorithm 2 Mobile Device based Arrhythmia Detection
Require: t expert characteristics {X i }ti=1 , expert preferences {W i }ti=1 , the mapping matrix Br×f , ensemble
model coefficients α and β, and the target ECG set T
Ensure: Y T the label matrix which includes the probabilities that whether each cardiac cycles in T is arrhythmia
1: Randomly initialize X T with the non-negative constraints
2: repeat
3:
Update X T as: X T ← X T ◦ (I > T B) (I > IXB > B)
4: until convergence
5: Calculate the distance between target and expert characteristics:
DT (i) = D(X i , X T ) for i ∈ [1, t]
6: Classify cardiac
in T through:
P cycles
T
Yj,:
= ti=1 Tj,: − X i B > W i α + βDiT
2.5
Space Usage and Time Efficiency
In this section, we theoretically analyze the space usage
and the time efficiency of the proposed ensemble algorithm.
Space Usage
In this method, mobile devices need to store following ini
formation: t extracted expert characteristics {X1×f
}ti=1 ; t
i
extracted expert preferences {Wr×c
}ti=1 ; 1 mapping matrix
Br×f ; and two parameters of the ensemble model α and β.
T
Moreover, to avoid repeats on the calculations of Dt−1×1
and {X i B > }ti=1 , we also prefer to keep them in the mobile devices. Therefore, the overall space for the proposed
ACM-BCB 2013
10
20
30
40
50
60
70
80
90
100
Number of Experts
Figure 2: Theoretical Comparisons on Space Usages.
ensemble algorithm is:
SP ACEM = O (t · f + t · r · c + r · f + t + t · r + 1)
≤ O t · r · c + r2 .
In the SP ACEM , c is the fixed number of classes and
c ≥ 2. r is the number of features in each cardiac cycles,
which is usually determined by the sampling frequency of
the used ECG sensors. f is the input number of features
and 0 ≤ f ≤ r. Thus the overall space is linear w.r.t. the
number of experts.
In comparison, general multiple source learning baselines
for arrhythmia detection usually require mobile devices storing the t original ECG sets {Sni i ×r }ti=1 . For simplicity, suppose ni = n for i ∈ [1, t]. In these baselines, the minimum
space usage is:
SP ACEB = O (t · n · r) .
Several existing methods utilize over-sampling and undersampling to balance the number of cardiac cycles in each
class. Suppose for each expert ECG set, ρ · n of them are
arrhythmias, in which ρ ∈ (0, 0.5). Then for over-sampling
baselines, the space usages are at least:
SP ACEO = O ((1 − ρ) · t · n · r) .
And for under-sampling baselines, the space usages are at
least:
SP ACEU = O (ρ · t · n · r) .
Under the settings that ρ = 0.1, r = 40, c = 10 and n =
1000, the detailed comparisons among the above strategies
are demonstrated in Fig. 2. In comparison, the used space
of the proposed ensemble model is consistently 50 to 500
times less than the other baselines.
Along with the expert selection strategy proposed in Section 2.2, the space usage of the proposed model is low
enough for mobile devices.
2.5.2
2.5.1
0
Time Efficiency
The computation costs are as follows. The proposed ensemble model is divided into four parts.
(1) Expert Extraction. As shown in Alg. 1, this part
involves an iterative solution, in which we iteratively update
i
{W1×f
}ti=1 , {X i }ti=1 and Br×f .
Specifically, in each iteration, updating {W i }ti=1 takes
O(t · c · n · r2 ); updating {X i }ti=1 takes O(t · n · r · f 2 );
and updating B takes O(t · c · n · r2 · f 2 ).
271
Suppose q is the total number of iterations in the optimizing process, the overall time complexity for the expert
extraction is:
T IM EEE = O(q · t · c · n · r2 · f 2 ).
(2) Ensemble Model Learning. The aim of this part is
to learn the parameters α and β for the ensemble model.
ij
In the process, calculating the auxiliary matrix Rl.:
takes
O(t2 · c · n · r · f ). Calculating α and β takes O(t3 · c · n).
Thus the overall time complexity for the ensemble model
learning is:
2
3
T IM EM E = O(t · c · n · r · f + t · c · n).
(3) Target Characteristics Extraction. As shown in Alg. 2,
this part focuses on extracting the characteristic vectors X T
for the target ECG signals from mobile device users. The
time complexity is:
T IM ET E = O(q · n · r · f 2 ).
(4) Real Time Classification. With the above learned
model, the cardiac
in the target
ECG set canbe clasP cycles
T
sified as Yj,:
= ti=1 Tj,: − X i B > W i α + βDiT .
Since X i B > and D> have been calculated and stored in
the last step. The classification of each cardiac cycle in the
target ECG set can be achieved at the cost of:
T IM ERT = O(t · c · r).
Among the four steps, the experts extraction part and the
ensemble model learning part (step (1) and (2)) don’t need
input of any information from the target users, thus can be
finished before the implementation on mobile devices.
In the step (3), T IM ET E is linear w.r.t. the number of
iterations in the optimizing process as well as the number of
cardiac cycles for the target characteristics extraction. This
step can be finished on mobile devices before the classifications of cardiac cycles of the target users.
The final step, which requires real time processing, takes
only O(t · c · r). The complexity is linear to the number
of experts t, considering that r, the number of features in
each cardiac cycles, and c, the number of classes, are taskdependent constants.
We will experimentally evaluate the T IM ET E and the
T IM ERT in Section 3.
3.
EXPERIMENTS
In this section, we experimentally evaluate and demonstrate the effectiveness and efficiency of the proposed ensemble model for MAD.
In the experiments, we implement the proposed algorithm
on the MIT-BIH Arrhythmia Dataset [16], and compare the
method with seven state-of-the-art approaches in the area.
The results show that the proposed framework could successfully integrate decisions from different experts and identify arrhythmias in ECG signals with significant improvements over the compared methods on most cases. Besides,
the real time processing speed and low space usage further
validate that the proposed algorithm is suitable for the mobile device based implementation.
3.1
Dataset Descriptions
The MIT-BIH Arrhythmia Dataset [16] has been used by
many existing researches [13, 9] for experiments on arrhythmia detection. To fairly compare the proposed algorithm,
ACM-BCB 2013
we use the ECG signals of the same set of patients as in [9].
The patient numbers of the set are 100, 101, 103, 105, 109,
115, 121, 201, 202, 210, 215, 230 and 232. The selected ECG
signals are preprocessed following a similar way to [14].
Each patient’s ECG signal covers from 1008 to 1415 cardiac cycles ({ni }ti=1 ), and each cardiac cycles includes r = 39
dimensional features. The percentage of arrhythmias (ρ) in
each patient’s ECG set varies from 0.7% to 29.38% with only
9.09% on average.
3.2
Evaluation Metric
The most common evaluation metric for classification results is overall accuracy, which captures the percentage of
correctly classified instances over the whole set of instances.
However, for arrhythmia detection, the accuracy metric does
not work properly. For example, suppose all the instances in
an extremely imbalanced dataset are classified to the majority class, the evaluated accuracy is very high while all classifications on the minority are wrong. The ECG datasets
in experiments are very imbalanced, and in arrhythmia detection, we focus more on the arrhythmias/minority. Therefore, in the experiments, we have to use an evaluation metric
which can overcome the class imbalance.
Generally, for binary classifications, the classified results
can be divided into four cases: TP (true positive), TN (true
negative), FP (false positive) and FN (false negative), where
TP means the instances classified to be positive and in ground
truth they are positive, and similarly for TN, FP and FN,
respectively.
Without loss of generality, in the experiments, we assume
the class of arrhythmias to be the positive class. The ROC
curve, which illustrates the fraction of TP out of the positives vs. the fraction of FP out of the negatives. Thus ROC
can be used to evaluate the performance of arrhythmia detection.
In the experiments, we use area-under-the-curve (AUC)
which captures the area under the ROC curve to numerically evaluate the performance of each investigated methods.
AUC is in the range of [0, 1]. The higher AUC a method
can achieve, the better its performance is.
3.3
Compared Methods
We compare the proposed ensemble model with a set of 6
state-of-the-art approaches from the multiple sources transfer learning (MSTL) area, which are described as follows:
The first two compared approaches are two recent MSTL
methods CRC [15] and GCM [8]. CRC uses the assumption
that all source experts are closely related to the target sets
while GCM assumes that the majority of the source experts
are similar to the target. Based on the different assumptions, both methods seek to maximize the consensus among
experts on classifications of target instances.
Two recently proposed MSTL methods MDA [4] and LWE
[7] relax the above assumptions through weighing the importance of the source experts. Both of methods are developed
based on the assumption that if the predictions of an expert
are consistent among the instances which are close in the
feature space, then the expert should obtain a high weight.
The difference of the two methods is that MDA assigns a
weight to each expert while LWE put a weight on each instance of each expert.
Since the datasets we used in the experiments are highly
imbalanced, two MSTL methods DAM [6] and SLW [9] for
272
Data Index
100
101
103
105
109
115
121
201
202
210
215
230
232
CRC
0.666
0.611
0.511
0.522
0.620
0.576
0.534
0.600
0.600
0.617
0.620
0.614
0.652
GCM
0.777
0.779
0.626
0.654
0.739
0.679
0.610
0.699
0.715
0.699
0.760
0.679
0.771
MDA
0.760
0.742
0.478
0.714
0.700
0.654
0.655
0.843
0.818
0.830
0.537
0.334
0.724
LWE
0.722
0.423
0.543
0.718
0.753
0.720
0.492
0.854
0.795
0.819
0.632
0.610
0.948
DAM
0.925
0.753
0.648
0.725
0.879
0.746
0.572
0.894
0.847
0.899
0.544
0.674
0.855
LP
0.959
0.802
0.683
0.617
0.837
0.503
0.526
0.892
0.675
0.828
0.869
0.824
0.954
SLW
0.975
0.820
0.920
0.731
0.964
0.713
0.710
0.945
0.881
0.947
0.919
0.859
0.978
EE
1.000
0.987
0.955
0.713
0.949
0.860
0.893
0.926
0.930
0.932
0.935
0.680
0.982
Table 2: Experiments on the MIT-BIH Arrhythmia Dataset.
imbalanced dataset are also considered. DAM computes the
weight of each source by computing the Maximal Mean Discrepancy [1] between source samples and target samples.
SLW assigns weights to each expert based on the expert’s
ability of predicting accurately on the local region of the
target in the feature space.
Moreover, to demonstrate the benefits of MSTL, we include the Label Propagation (LP) [20] as another compared
method. The LP does not utilize any source expert, and
it purely relies on propagating labels from small amount of
labeled data in the target’s dataset.
The results of these compared algorithms on the MIT-BIH
Arrhythmia Dataset are collected from [9].
3.4
Performance Study
In the experiments, we iteratively fix one of the ECG set
as the target while viewing other sets as the source experts.
For all the cases, we set γ = 1000, and f = 16. In the
implementation, we run all the experiments in MATLAB
using a PC with a 2.30 GHz Intel Core i7 − 3610QM CPU
and 8.00 GB RAM.
We present the experimental results in Table 2. In the
table, the results of the proposed method are listed under
the EE (Ensemble of Experts), the name of each ECG set
is listed under the ”Data Index”, and the best performance
of the investigated approaches are marked in bold.
In the comparisons between the proposed method with
the LP approaches, the results of our method significantly
outperform the LP in 12 out of the 13 ECG signal sets.
Please notice that in experiments on LP, several target cardiac cycles are assumed to be labeled ahead of learning while
in the proposed ensemble model, no labeled cardiac cycles
in the targets are required for the learning. The superior
performance of our method over the LP method verifies the
benefits of transfer knowledge from publicly available and
well labeled ECG sets to the signals of general mobile device users. The knowledge transferring process not only does
not need labeled cardiac cycles of the targets, but also can
perform significantly better than the supervised LP method,
which proves that the proposed ensemble model is effective
in solving the problem of lack of labeled data in the task of
MAD.
Among the 6 compared state-of-the-art approaches in MSTL,
we notice that CRC generally performs badly on all the
cases. In 11 out of the 13 cases, the results are worse than
ACM-BCB 2013
those of the LP methods, which indicates that for arrhythmia detection, multiple sources transfer learning using CRC
could not achieve better performance than using the target
itself by label propagations. The reason under the bad performance of CRC is that the characteristics of each ECG set
could be quite different from the others, thus the consensus
of the diverse expert characteristics may not match the target well. GCM works better since it only assumes majority
of the experts are reasonable. However, for the task of arrhythmia detection, majority experts may be far from the
targets in the feature space. Thus the performance of GCM
is still lower than that of LP in 9 out of the 13 cases.
The two weighted MSTL methods MDA and LEW show
very vulnerable performance over different cases. In several ECG sets such as 103 for MDA and 101 for LWE, the
AUC scores are even lower than 0.5, which means performing worse than randomly guessing. Such vulnerability in
performance is mainly caused by the severe data imbalance
in ECG sets. Since the percentages and distributions of arrhythmias in different ECG sets are quite diverse, the weighing MSTL approaches MDA and LWE performs well when
the weighing fit targets and perform badly otherwise.
As for the two MSTL methods DAM and SLW for imbalanced data sets, the results are better than the results of
LP in most cases. Similar to the proposed ensemble model,
these two methods also assumes the importance of each expert is proportional to the distance between the expert and
the target. The difference among the two methods and our
method is that DAM uses Maximal Mean Discrepancy to estimate the distance, SLW uses percentage of shared regions
in feature space to capture the distance, and the proposed
ensemble model utilizes the extracted experts and target
characteristics to estimate the distance. The superiority of
these three methods over all the other compared approaches
proves the robustness of the assumption.
In the comparisons between the proposed method with the
other 7 investigated approaches, our model performs significantly the best in 8 out of the 13 cases. In the other 5
cases, the results are also close to the best or highly ranked
in the 8 investigated methods. Overall, the proposed ensemble model outperforms all the other investigated methods in
average AUC scores. This fact proves that the proposed ensemble model designated for MAD can outperform the stateof-the-art multiple sources transfer learning and supervised
classification techniques on performance.
273
−3
10
6
10
Log Rate of Change
2
10
Log Running Time
100
201
230
4
10
0
10
−2
10
−4
10
Testing Time for Each Cardiac Cycle
Running Time over Iterations
−4
1.5
100
201
230
−4
10
−6
200
400
600
800
1000
1200
1400
Number of Iterations
1600
1800
2000
Figure 3: Convergence Rate.
3.5
10
0
1.2
1.1
1
0.7
200
400
600
800
1000
1200
1400
Number of Iterations
Running Time and Space Usage
1800
2000
0
200
400
600
800
1000
1200
1400
Cardiac Cycle Index
Figure 5: Testing Time.
Space Usage
5
10
4
Space Usage (KB)
10
Running Time
As aforementioned in Section 2.5.2, in the mobile device
based implementation, we mainly need to consider the time
costs on two steps: target characteristics extraction and real
time classifications. We thus focus on testing the related
time costs T IM ET E and T IM ERT , respectively.
The time cost for the target characteristics extractions can
be observed in details in Fig. 3 and Fig. 4 where we show
the plots for patients 100, 201 and 230 (due to space limit,
we only show 3 cases).
In the two graphs, Fig. 3 shows the changing values of
Pr Pf
(s)
(s−1)
j=1
m=1 kBj,m − Bj,m k over the s-th iteration while
s varies from 1 to 2000. Obviously, from the plot, we can
observe that the change rate of B is less than 10−4 before
the 2000-th iteration which means in our settings, the target characteristics extraction process converges in less than
2000 iterations. Fig. 4 shows the time cost of each iteration
in the process. For all the three patients, in the target characteristics extraction process, each iteration takes less than
10−3 second. Therefore, overall, the target characteristics
extraction process takes less than 2 seconds.
The other part of the mobile device based implementation
of our method is the real time classifications of cardiac cycles
of the users. To evaluate this time cost, we illustrate the
classification time for each cardiac cycle of the patients 100,
201 and 230 in Fig. 5, respectively. Obviously, the time
cost for classifying each cardiac cycle is less than 2 × 10−4
seconds which is 4000 times faster than the regular time cost
of a cardiac cycle (0.8 second).
According to the above results, even though mobile devices have lower processing speed than the equipment we
used in the above simulations, the time costs of target characteristics extractions and cardiac cycle classifications are
low enough for the mobile device based implementation.
Moreover, as aforementioned in Section 2.5.2, the target characteristics extraction time T IM ET E is independent
of the number of the experts while the classification time
T IM ERT is linear to the number of the experts. Thus the
above experiment results support that the proposed ensemble model can utilize over 40, 000 experts in the learning
while keeping the real time processing speed.
To sum up, the experiments in this section proves that
the proposed ensemble model is highly efficient for MAD.
ACM-BCB 2013
1600
Figure 4: Running Time.
In this section, by numerically evaluating the running time
and space usage of our model in the implementation, we seek
to prove the superiority and efficiency of our model for MAD.
3.5.1
1.3
0.8
−5
−8
0
100
201
230
0.9
10
10
x 10
1.4
Running Time
Convergence Rate Changes over Iterations
Oversampling
Naive
Undersampling
Mixture of Experts
3
10
2
10
1
10
0
10
100
201
230
Dataset Index
Figure 6: Space Usage in Experiments.
3.5.2
Space Usage
As discussed in Section 2.5.1, for the mobile device based
t
t
implementation, we need to store X i i=1 , W i i=1 , B and several other parameters in the mobile devices. We experimentally evaluate the space usage and demonstrate the results
in Fig. 6. Besides the space usage of the proposed method
SP ACEM , the space costs of the three discussed baselines
strategies SP ACEB for naive multiple source transfer learning, SP ACEO for oversampling and SP ACEU for undersampling MSTL have also been experimentally evaluated,
respectively.
Obviously, the proposed ensemble model could save around
100 times space than all the baselines, and uses only around
10 KB for learning from 12 source experts.
When scaling the space usage to 40, 000 experts, the space
usage of our method is only around 40 MB, which is still
quite low for the mobile device based implementation. In the
contrast, for the most space-saving baseline over-sampling,
the space usage will be over 1 GB which is very large for mobile device based monitoring programs, and tends to waste
much energy in running.
Overall, the experiments in this section prove that the
proposed ensemble model is space efficient enough for MAD.
4.
RELATED WORK
In the areas of machine learning and data mining, arrhythmia detection can be categorized a specific problem in
anomaly detection [19, 18] which focuses on mining the irregular and rare instances in data sets. For the problem, one
general idea is to over-sampling the minority [5] or undersampling [11] the majority. As discussed in Section 2.2, in
the proposed method, we utilize the same idea to randomly
274
under-sampling the majority to balancing the trained expert
preferences {W i }ti=1 . Another general idea [3] in solving the
data imbalance is to weigh the importance of each instance
in a sense minority instances obtain higher weights while
majority instances obtain lower weights. The drawback of
these approaches is that they require sufficient labeled instances to guide the sampling and reweighing.
There are also several methods that do not require any
supervised information. One class support vector machine
(OCSVM) [17] assumes majority instances are densely located in the reproducing kernel Hilbert space and tries to
draw a round boundary to distinguish majority and minority. [10] utilizes another assumption that minority are either not in any cluster or far from their cluster centers. On
the one hand, although such unsupervised methods do not
require labeled instances, they always require the training
data to include sufficient minority to better learn the distributions of minority and majority. For MAD, such conditions
can hardly be satisfied due to the fact that there may be no
arrhythmia during the training data collecting period, no
matter how long the period is. On the other hand, such
methods usually need long learning time which is inefficient
for the mobile device based implementation.
5.
CONCLUSIONS
In this paper, we proposed an ensemble model for MAD.
To solve the problem of lack of labeled data for each mobile device user, we proposed to integrate information from
existing publicly available and well labeled ECG signals as
supervised experts to guide the learning of cardiac cycles
in the ECG signals of mobile device users. To reduce the
space usage as well as running time, each expert ECG set
is mapped into a characteristic vector and a preference vector before the implementation on mobile devices. Moreover,
to fully utilize multiple extracted experts, we proposed to
simultaneously consider the opinions of each expert following the principle that an expert that is closer to the target
has better predicting ability. Experiments on a real world
data set validated that the proposed method performs better than 7 state-of-the-art methods while keeps low space
usage and real time processing speed for MAD.
6.
ACKNOWLEDGMENTS
The materials published in this paper are partially supported by the National Science Foundation under Grants
No. 1218393, No. 1016929, and No. 0101244.
7.
REFERENCES
[1] Integrating structured biological data by kernel
maximum mean discrepancy. Bioinformatics, pages
e49–e57, 2006.
[2] M. T. Bahadori, Y. Liu, and D. Zhang. Learning with
minimum supervision: A general framework for
transductive transfer learning. Proc. of ICDM’11,
pages 61–70, 2011.
[3] X. Chang, Q. Zheng, and P. Lin. Cost-sensitive
supported vector learning to rank. Learning, pages
305–314, 2009.
[4] R. Chattopadhyay, J. Ye, S. Panchanathan, W. Fan,
and I. Davidson. Multi-source domain adaptation and
its application to early detection of fatigue. Procc of
KDD’11, pages 717–725, 2011.
ACM-BCB 2013
[5] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P.
Kegelmeyer. Smote: synthetic minority over-sampling
technique. Journal of Artificial Intelligence Research,
16(1):321–357, 2002.
[6] L. Duan, I. W. Tsang, D. Xu, and T.-S. Chua.
Domain adaptation from multiple sources via auxiliary
classifiers. Proc. of ICML’09, pages 1–8, 2009.
[7] J. Gao, W. Fan, J. Jiang, and J. Han. Knowledge
transfer via multiple model local structure mapping.
Proc. of KDD’08, pages 283–291, 2008.
[8] J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han.
Graph-based consensus maximizaion among multiple
supervised and unsupervised models. Proc. of
NIPS’09, 2009.
[9] L. Ge, J. Gao, H. Ngo, K. Li, and A. Zhang. On
handling negative transfer and imbalanced
distributions in multiple source transfer learning.
Proc. of SDM’13, 2013.
[10] Z. He, X. Xu, and S. Deng. Discovering cluster-based
local outliers. Pattern Recognition Letters,
24(9-10):1641–1650, 2003.
[11] S. B. Kotsiantis and P. E. Pintelas. Mixture of expert
agents for handling imbalanced data sets. Annals of
Mathematics Computing and Teleinformatics,
1(1):46–55, 2003.
[12] D. D. Lee and H. S. Seung. Learning the parts of
objects by non-negative matrix factorization. Nature,
pages 788–91, 1999.
[13] K. Li, N. Du, and A. Zhang. Detecting ecg
abnormalities via transductive transfer learning. In
Proc. of ACM BCB’12, pages 210–217, 2012.
[14] P. Li, K. Chan, S. Fu, and S. Krishnan. An abnormal
ecg beat detection approach for long-term monitoring
of heart patients based on hybrid kernel machine
ensemble. Biomedical Engineering, pages 346–355,
2005.
[15] P. Luo, F. Zhuang, H. Xiong, Y. Xiong, and Q. He.
Transfer learning from multiple source domains via
consensus regularization. Proc. of CIKM’08, page 103,
2008.
[16] G. B. Moody and R. G. Mark. The impact of the
mit-bih arrhythmia database. IEEE Engineering in
Medicine and Biology Magazine, pages 45–50, 2001.
[17] B. Schölkopf, J. C. Platt, J. S. Shawe-Taylor, A. J.
Smola, and R. C. Williamson. Estimating the support
of a high-dimensional distribution. Neural
Computation, 13(7):1443–1471, 2001.
[18] C. G. M. Snoek, M. Worring, J. C. Van Gemert, J.-M.
Geusebroek, and A. W. M. Smeulders. The challenge
problem for automated detection of 101 semantic
concepts in multimedia. Proc. of MULTIMEDIA’06,
pages 421–430, 2006.
[19] L. I. Xin-fu, Y. U. Yan, and Y. I. N. Peng. A New
Method of Text Categorization on Imbalanced
Datasets, pages 10–13. 2008.
[20] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and
B. Sch. Learning with local and global consistency.
Proc. of NIPS’03, page 595602, 2003.
275
Download