Analysis of Inter-beat Intervals (IBI) Prediction of Paroxysmal Atrial

Prediction of Paroxysmal Atrial
Fibrillation (PAF) Onset through
Analysis of Inter-beat Intervals
(IBI)
By
Charles
Q. Du
Submitted to the Department of Electrical Engineering and Computer Science
In Partial Fulfillment of the Requirements for the Degrees of
Bachelor of Science in Electrical Engineering and Computer Science
and Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
Massachusetts Institute of Technology
JUL 3 0 2003
May 9,2003
Copyright 2003 Charles Q. Du. All rights reserved.
LIBRARIES
The author hereby grants to M.I.T. permission to reproduceand distldhute
Publicly paper andelectronic copies of this.t'(esis
And to eraht othe4s the right to dod6.
Author ..............
Department 6fflectrical Engineering and Computer Science
May 9, 2003
..........................
_Rorsrlind W. Picard
M.I.Ij% s A4
or
C ertified by ......................
Accepted by .......................
.
..................
Arthur C. Smith
Chairman, Department Committee on Graduate Theses
ENG
Prediction of Paroxysmal Atrial Fibrillation (PAF) Onset through
Analysis of Inter-beat Intervals (1BI)
By
Charles Q. Du
Submitted to the Department of Electrical Engineering and Computer Science
In Partial Fulfillment of the Requirements for the Degrees of
Bachelor of Science in Electrical Engineering and Computer Science
and Master of Engineering in Electrical Engineering and Computer Science
Abstract
PAF is a type of progressive cardiac arrhythmia that poses severe health risks,
sometimes leading to ventricular arrhythmia and post-operative mortality. Some of the
difficulties with treating PAF include screening for patients with the disorder, detecting
To address these issues,
episode occurrences, and predicting occurrences.
electrocardiogram (ECG) data from the PhysioNet Online Database was used to develop
a technique to screen, detect, and predict the onset of PAF. Methodologies explored
included Hidden Markov Modeling on inter-beat intervals, entropy, and heart-rate
spectrograms. Initial testing indicates the technique to be discriminant between PAF and
non-PAF (possibly other cardiac disorder) patients (89% sensitivity and 55% specificity).
Even more promising is its ability to discriminate between PAF patients and healthy
Both results are from data not
individuals (89% sensitivity and 81% specificity).
involved in training. The IBI-based algorithm could be incorporated into medical devices
with the potential of contributing to new healthcare technology.
Acknowledgements
I wish to thank my thesis advisor, Prof. Rosalind Picard, for all her cheerful
encouragement, tireless support, and helpful criticisms. Her patient and insightful
feedback over the course of eight revisions helped me tremendously in shaping my
endeavors in this fascinating and critical field of study.
Next on my thank you list would be Prof. Roger Mark, for being a taskmaster and
mentor both. Thank you for your instruction in physiology and for keeping me in shape
this term. I swear the heavy lifting I did TAing for 022 was the only real exercise I had
time for. And to think, I was even paid to run around!
Many thanks also to Yuan Qi and Ashish Kapoor. Yuan for always being willing
to share his brilliant insights and works. Ashish for always being available, taking time
from his own busy schedule to offer helpful advice.
Thanks should also go to Dr George Moody and Wei Zong, for critical help in
collecting data and interpreting results.
And of course, to Amy, who fed me and kept me warm when I pulled my late
night work sessions.
2
To my parents,Miting and Jin.
Mom, Dad,I'll start working on a PhD...
as soon as I figure out what I want it in.
Promise!
3
Contents
Page
1.0 Introduction
8
1.1 Paroxysmal Atrial Fibrillation .....................................
8
1.2 Prior W orks .........................................................
11
1.3 Specific Relevant Techniques .....................................
12
1.4 Statement of Goals ..................................................
15
2.0 Theory
16
2.1 Electrophysiology ....................................................
16
2.1.1
Electro-Cardiograms ..................................
16
2.1.2
Inter-beat Intervals .......................................
18
2.1.3
Spectrogram s .............................................
18
2.2 Pattern Recognition ..................................................
2.2.1
Hidden Markov Modeling ............................
3.0 Methods
20
21
22
3 .1 T oolk it ..................................................................
22
3.2 R aw D ata ................................................................
23
3.3 Term inologies ..........................................................
28
3.4 Techniques ...........................................................
28
3.4.1
Prematurity Weighting ...................................
29
3.4.2
Discrete HMMs on Entropy ............................
30
3.4.3
Gaussian-mixture HMMs on IBIs ...................... 30
3.4.4
Gaussian-mixture HMMs on Spectrogram ............ 31
3.5 Training and Testing ..................................................
31
3.5.1
Event I Screening .........................................
32
3.5.2
Event 2 Detection .........................................
35
3.5.3
Event 3 Prediction ........................................
36
4
4.0 Results
38
4.1 Prematurity Weighting ........................................
39
4.2 Discrete HMMs on Entropy .......................................
40
4.3 Gaussian Mixture HMMs on IBIs ..................................
41
4.3.1
Event I Screening ......................................
42
4.3.2
Event 2 Detection .......................................
43
4.3.3
Event 3 Prediction .......................................
44
4.4 Gaussian Mixture HMMs on Spectrogram ........................
5.0 Discussion
44
46
5.1 Event I Screening ...................................................
46
5.2 Event 2 Detection ...................................................
47
5.3 Event 3 Prediction ...................................................
48
5.4 Entropy Failure Speculation .......................................
48
6.0 Conclusion
49
7.0 Recommended Future Works
50
Appendix A: Complete HMM Results
52
Appendix B: Spectrograms (with 20 frequency bins)
57
Appendix C: Spectrograms (with 50 frequency bins)
61
References
64
5
List of Figures
Page
Figure 1 - Exponential weighting of PBs.....................................
13
Figure 2 - Demonstration of exponential weighting scheme.................
14
Figure 3 - Electrical anatomy of the human heart..............................
17
Figure 4 - Comparing spectrograms of a PAF patient,
a non-PAF patient, and a healthy individual..................
20
Figure 5 - Raw data used in analysis...........................................
23
Figure 6a - Analysis of type-P, N, and
Q
IBI series histogram...............
26
Figure 6b - Analysis of type-PI & P2 IBI histograms ........................
27
Figure 7 - Processing of data used in Event I screening......................
32
Figure 8 - Processing of data used in Event 2 detection........................
36
Figure 9 - Processing of data used in Event 3 prediction......................37
Figure 10 - Discretization of entropy series....................................41
6
List of Tables
Page
Table I - W ei's "last subm ission" Results ......................................
14
Table 2 - Prematurity Weighting Test Ranges ................................
29
Table 3 - Results Overview .....................................................
38
Table 4 - Prematurity W eighting Results ........................................
40
Table 5 - Discrete-output HMM on Entropy Results ........................
41
Table 6 - Gaussian-mix HMMs on IBIs, Event I..........................
42
Table 7 - Gaussian-mix HMMs on IBIs, Event 2 ............................
43
Table 8 - Gaussian-mix HMMs on IBIs, Event 3 .............................
44
Table 9 - Gaussian-mix HMMs on Spectrograms, Event I (freq res 50)...
44
Table 10 - Gaussian-mix HMMs on Spectrograms, Event 1 (freq res 20) ... 45
Table I I - Gaussian-mix HMMs on Spectrograms, Event 2................
45
Table 12 - Gaussian-mix HMMs on Spectrograms, Event 3................
45
7
1.0 Introduction
1.1 Paroxysmal Atrial Fibrillation
Cardiac arrhythmic disorders have been known for over a hundred years, and
atrial fibrillation(AF) in particular has now been recognized as the most common of all
arrhythmias.
Though comprehensive statistics are not available, best estimates reveal
that it is probably present in more than 1% of the population.
AF is estimated to be
present in 4-6% of all people above the age of 65, but is also present in young and middle
aged individuals [Stefaneli et al., 2002].
AF is associated with a high mortality rate,
largely due to stroke and congestive heart failure[Wyse et al, 2001].
It is found
incidentally in about 25% of all stroke admissions and has been shown to lead to poor
control of the ventricular rate.
In coronary bypass patients, AF may represent risk for
immediate post-operative mortality as well as prolonged hospitalization.
Usually AF is
associated with certain symptoms: chest discomfort, fatigue, dizziness, palpitations,
dyspnea, and syncope [Savelieva et al., 2000].
Interestingly, a significant portion of
patients diagnosed with the disorder suffer from no obvious symptoms, and are only
diagnosed incidentally during physical examinations, pre-operative assessments, or
population surveys. Even in patients that report symptomatic episodes, Holter recording
and transtelephonic recording have demonstrated a rate of asymptomatic episodes that
exceeds symptomatic episodes by more than twelve-fold.
Given that asymptomatic
episodes probably have the same health risk, a significant portion of the aging population
could potentially benefit from early detection and treatment.
Paroxysmal (sudden attack) AF is a progressive of atrial fibrillation that could
lead to permanent AF or other cardiovascular disorders. Cases of paroxysmal AF pose an
8
addition problem for detection, as episodes may be intense but short periods that could
otherwise go unnoticed. To record these events as they occur, Holter devices are usually
used to record 24-hour ECG data and then analyzed for signs of arrhythmic disorder.
Treatment for these patients could include pharmacological suppression [Jais et al.,
2000], high-frequency pacing (HFP), or RF ablation. The primary therapeutic goal is the
restoration and maintenance of normal sinus rhythm(NSR), which leads to optimized
cardiac functions [Luederitz et al., 2000]. Recent medical technology advances include
the implant able atrial defibrillators, which apply synchronized shock therapy upon
detection of atrial fibrillation.
In practice, detection of atrial fibrillation by atrial defibrillators is a well known
problem.
Patients that undergo the implantation procedures are known to have atrial
fibrillation, as determined through screening using Holter recording.
Detection by
implanted devices is also easier since the contacts used are in close contact with the
cardiac tissue. Different defibrillators might use different detection procedures.
One
example is the detection system developed by Swerdlow et al[Swerdlow et al., 2000].
This system classified cardiac patterns as atrial arrhythmic by tracking P-waves (pattern
matching ECG recording with idealized P-waves), and reported proper detection in 88%
of 190 AT (atrial tachycardia) episodes, and 98% of 132 AF episodes. Adjustment of
sensitivity to P-wave amplitude accounts for tradeoff between false positives from farfield R-waves and false negatives from missed P-waves. Note again that attainment of
such high detection levels is due in large part to the lack of signal distortion that results
from implantation.
Surface ECGs (which are used in this particular study) demonstrate
much greater variability in performance due to variability in noise inputs.
Further
9
considerations in practical detection of AF include distinguishing between AF and
ventricular fibrillations, which is the focus of dual-chamber detection algorithms.
Various research projects have focused on prevention of atrial fibrillation,
including a new approach based upon the genetic basis of the disease and curing through
gene therapy. As part of the Human Genome Project, researchers have identified genes
involved in diseases such as cardiomyopathy, Long QT syndrome, and atrial fibrillation.
The scrutiny in this area could go a long way towards explaining AF amongst the young
and middle aged, where no cause could be found in most cases. There is also potential
for helping members of around 100 families around the world which have been identified
with a familial form of the disease [Brugada et al., 1999]. Unfortunately, gene therapy
techniques have yet to be fully developed, and prior genetic knowledge of susceptibility
towards AF does not aid in predicting when exactly episodes will occur.
On a more traditional track, researchers have also looked at AF prevention
through permanent pacing or periodic pacing. The theory of this approach is that pacing
may suppress triggers of AF such as atrial premature beats, or perhaps reduce atrial
stretch which appears to predispose to AF. It has also been suggested that atrial pacing
may increase the benefits of antiarrhythmic drugs. To date, only small scale studies have
been done to indicate that atrial pacing may maintain NSR in patients with paroxysmal
AF as the primary or sole arrhythmia.
In such cases, interventions were applied after
detection of abnormally high resting parasympathetic tone, or periods of relative
bradycardia. The results have been mixed, with some patients remaining free from AF,
others required additional anti-arrhythmic medication, and a few developed permanent
AF after pacemaker implantation [Sopher et al., 2000].
10
The focus of current treatment has been post-onset intervention, with an emphasis
on specificity so as to prevent false positive detection [Schwartz et al., 2000].
An
additional area that deserves more attention is in the development of a reliable PAF
prediction scheme.
If an accurate and reliable system of heart attack prediction could be
developed, it would be a major breakthrough in healthcare efficiency, allowing for
interventions that prevent episodes that would otherwise result in significant tissue
damage. A prediction scheme that is both sensitive and specific to AF would not only
serve to initialize intervention, but also give a reliable measure of the intervention's
effectiveness.
1.2 Prior Works
Much of the prior work studied and cited in this study has been derived from
PhysioNet.
PhysioNet is a public service of the Research Resource for Complex
Physiologic Signals, funded by the National Center for Research Resources of the
National Institutes of Health.
It offers free access via the web to large collections of
recorded physiologic signals and related open-source software, and was a strong
motivating factor in directing research in cardiology.
Physionet organizes the annual Computers in Cardiology competitions, with the
goal to stimulate effort and advance the state of the art in facing a single clinically
significant problem, and to foster both friendly competition and wide-ranging
collaborations.
In the 2001 CinC competition, the challenge was to develop a fully
automated method to predict the onset of paroxysmal atrial fibrillation/flutter (PAF),
based on the ECG prior to the event.
I1
The top scorers in the 2001 challenge were announced during the 25 September
plenary session of Computers in Cardiology in Rotterdam. The top score and the award in
PAF screening was obtained by Gunther Schreier and colleagues of the Austrian
Research Centers Seibersdorf (Graz, Austria), with a predictive accuracy [TruePos +
TrueNeg / all]of 82%. In PAF prediction, the top score was obtained by Wei Zong and
colleagues at the Harvard-MIT Division of Health Sciences and Technology (Cambridge,
Massachusetts, USA), with a sensitivity [TruePos / TruePos + FalseNeg] of 79%.
The effectiveness of the algorithm proposed in this paper would use the CinC
challenge as one of its benchmarks, however, it is important to note the scoring metric
used in the Prediction part.
The score was not calculated based on the predictive
accuracy as a whole, but instead on the number of consecutive ECG pairs from PAF
patients that were correctly labeled as preceding or distal to an episode. The submitted
labels for non-PAF ECGs were not used in scoring at all. Thus, the scoring on Prediction
entirely focused on sensitivity, and was therefore biased toward schemes that considered
each ECG pair to be from a PAF patient. More detailed description of the CinC test set
will be covered in Section 3.2. Also missing from the data provided were continuation
sets from the ECGs in the CinC test set, limiting the amount of data available for the
purposes of PAF detection.
1.3 Specific Relevant Techniques
The technique used in the competition by Wei Zong and colleagues at MIT [Zong
et al., 2001] was especially insightful. They noted that the number and timing of atrial
premature beats (APBs), appeared to be "of significant value in terms of predicting
12
imminent PAF episodes". In their analysis, they used an exponential weighting system
that assigned a non-equal weighting value for each APB in an immediately preceding
interval (Fig 1), with higher weighting given to APBs that occurred a short time prior.
Figure 1 - Exponential weighting of APBs, figure taken from Wei et al., 2001
11
As could be seen,
the
r
parameters
in
used
1
determining
the
weighting
and
w
01
*k"
(1"On)
classification
include apb-threshold, window length (w), and exponential rate (tau).
The APB-
threshold determines the ratio of an IBI over the time-averaged IBI length where the beat
would be defined as an APB.
The time-averaged IBI length (RRavg) at time n is derived by the formula:
RRavg(n) = 0.9 * RRavg(n-1) + 0.1 *RR(n). For example, if the APB-threshold is 20%,
and the RRavg at the time is I sec, any immediately following beats with an IBI shorter
than 0.8 seconds would be designated an APB. Note: strictly speaking, this would
indicate a premature beat, not necessarily than an atrial premature beat. Presumably,
Wei's beat detector ignored other types of premature beats (ventricular, junctional, etc.).
For the purposes of this paper, these cardiac events will be referred to as Premature
Beats, or PBs. The window length determines how far back in time to look for APBs.
The exponential rate determines the relative weighting of APBs closer to the current time,
as compared to those in the more distant past. Through a simple final PAF-threshold
13
analysis on the weighted results, they were able to produce a method of discrimination
with high sensitivity and moderately high specificity.
Figure 2 - Demonstration
of weighting scheme. Note
that
ECG
the
that
generated the lower event
"
0
1
"
MC
ZW
s
log would be assigned a
of
probability
higher
having PAF under this
weighting scheme. Figure
taken from Wei et al.,
2001
Their utilization of weighted PB count was the original inspiration for this paper,
which focuses on inter-beat intervals, a measure that is the time between heart beats,
which is significantly smaller in the instance of an PB. With Wei's permission, his "last
submission" answers were analyzed and
are presented (Table 1).
The results are
presented in terms of specificity (true-negatives/ [true-negatives + false-positives]),
sensitivity (true-positives/ [true-positives + false-negatives]), and predictive accuracy
([true-positives + true-negatives] / [all results]).
Wei's Results
Screening*
Prediction
TruePos TrueNeg FalsePos
22
14
8
22
44
28
FalseNeg
6
6
Sens.
79%
79%
Spec.
63%
61%
Pred.Acc
72%
66%
Table I - Wei's "last submission" record, not his best result, which was 79% Predictive Accuracy
For the CinC competition, where the test data sets were pairs of ECGs, the
algorithm described above was easily adapted. For Screening, the weighted value was
computed for each ECG, and the maximum value between each pair was used. For
Prediction, the weighted value for each ECG could be computed, and the ECG with the
higher value is designated as pre-PAF.
14
In fact, Wei also noticed the bias towards sensitivity in the competition for
prediction. He did not utilize any screening for non-PAF patients. Instead, he achieved a
high score simply scoring each ECG pair based on the ECG with the higher APB count.
This approach, unfortunately, would not work in real world scenarios.
1.4 Goals
This study has multiple goals. We wish to develop a system that does a good job
with screening, detection, and prediction. To determine how well the system achieves
each of these goals, we set up a series of test events to demonstrate its viability. Here, we
will state the goals, and then outline the testing procedures in Section 3.3.
Test Event 1: PAF screening
Event I is to determine if subjects at risk of PAF can be distinguished from those
representing a larger population, based on their ECGs.
The test will involve
comparing those ECGs in Group P (PAF), against those in Group N (non-PAF,
with other disease), as well as those in Group
Q (normal).
Test Event 2: PAF detection
Event 2 is to determine if subjects identified as at risk of PAF could be reliably
identified as currently suffering a PAF attack or not. The test will involve those
ECGs in Group P-Ic (no-current-attack PAF) against those in Group
P-2c(current-attack PAF), as well as against those in Groups N-c and Q-c.
15
Test Event 3: PAF prediction
Event 3 is intended to determine if subjects in Group P have distinctive and
detectable changes in their ECGs immediately before PAF. (In other words, is the
imminent onset of PAF predictable in an individual known to be at risk of PAF?)
A successful method for doing so should be able to determine for those in Group
P-2c the detectable "precursors" for those about to suffer an attack.
2.0 Theory
2.1 Electrophysiology
The normal functioning of the cardiovascular system depends in large part on its
electrical activity.
activity of its cells.
The heart as a whole is controlled by the coordinated electrical
This control must be communicated between what are called
pacemaker cells of the heart, both to determine the rate of beating, and to influence the
strength of contraction.
Disturbances of the pacemaker communications disrupt its
normal ability to generate and conduct electrical signals. These disturbances give rise to
abnormal electrical activity, which frequently leads to cardiac malfunction and death.
2.1.1 Electro-Cardiograms
To monitor the electrical behavior of the heart, cardiologist use a system called
Electro-Cardiograms, or ECGs for short.
It is at its most basic a system of electrodes
used to measure electrical conduction on the skin. An ECG could be taken with as few as
two "leads", which are wires attached to the skin.
By convention though, leads are
16
positioned on the body in such a way that six vertical and six horizontal "views" of
electrical activity could be monitored and presented.
Figure
A.
1
Xli-
-
Human
Heart.
Note
the
correspondence
between electrical
stimulation
and
ECG readouts.
11
e-
3
Electrical
Anatomy of the
7ri
In general, different phases of the ECGs correspond to cardiac excitation in different
areas of the heart. The initial structure observed for a given heartbeat is the P wave,
which results from the spread of excitation through the atria from the SA node. During
the P-R interval that follows, the atria contract to expel blood into the ventricles as the
electrical impulse propagates down to the AV node. A sharp spike called the QRS
complex is observed in the ECG at the end of atrial contraction, followed immediately by
a Q-T interval during which ventricles contract. The last structure observed is normally
the T wave, which results from the repolarization of the ventricles.
A full ECG data stream usually consists of several leads containing electrical levels
sampled at around 128Hz. The resulting information is also subject to noise, both from
17
muscular distortions and problems with the wire attachment. Problems would then arise
from analyzing the noisy data.
2.1.2 Inter-beat Intervals
The different analysis techniques presented in this study focus on inter-beat
intervals (IBIs), which are the measured time intervals between QRS complexes.
In
physiological terms, IBIs correspond to the time between initiation of consecutive
ventricular contractions.
Part of the advantage of IBIs is also its relative detection
reliability in the presence of various noise sources.
The physiological significance of Inter-beat Intervals has been established in
studies of multifocal triggering and irregular nodal conduction. Interatrial septum pacing
in particular [Padeletti et al., 2000] has also been demonstrated to be a safe and feasible
technique for reduction of arrhythmia in general as well as associated mortality.
2.1.3
Spectrograms
Efforts were also made to study additional features of the data, such as the
spectrogram and its entropy. Since the entropy is a scalar measure of the spectrum, a
good spectrum estimation technique was needed.
For the field of bio-medical
applications, the signals are non-stationary, and therefore require non-standard spectrum
estimation techniques. To this end, Yuan Qi's algorithm [Qi et. al., 2002] was used, with
results to be mentioned in Section 4.2.
The spectrogram could be thought of as an application of Short-Time Fourier
Transforms, with the frequencies on the y-axis and the time segments on the x-axis. The
18
power in each frequency at any particular time instant is represented by the amplitudes of
the vertical cross-sections at that time instant. In standard literature, the power spectrum
of heart rate for a normal person could be found represented by three frequency bands.
The low-frequency, medium frequency, and high-frequency bands range from 0 to 0.5
Hz, and have been linked to various sympathetic
and parasympathetic
control
mechanisms. Due to the complex nature of biological controls and the natural variability
between individuals and physiological states, it is yet unclear how competing influences
could be segregated.
In the case of patho-electrocardiology in particular, it is unclear
how the frequencies of activity are influenced by the myriad of ectopic activities
associated with arrhythmias, each of which would introduce variability to instantaneous
heart rate(ihr) measurements. For example, the parasympathetic controls might detect the
sudden acceleration in SA nodal activity and act to suppress heart rate as a whole.
Another possibility would be the atrial irregularities resulting in insufficient atrial filling,
triggering the baro-receptor reflex to increase heart rate to compensate.
It is hard to
predict which effect would dominate.
On the spectrogram, heart rate irregularity due to any input would appear as
contributions to all frequency bands, or vertical streaks (Figure 4).
In order to take
advantage of the spectrogram features, the tradeoff would need to be made between time
resolution and frequency resolution.
Time resolution would be favored to better
determine the duration of ectopic behavior. On the other hand, to better examine the
changes in power spectrum preceding and following ectopic events, greater frequency
resolution would be necessary regardless of actual control influences (sympathetic,
parasympathetic, etc).
19
Entropy (defined as E[-log p(x)]) is a relative measure of variability present in the
power spectrum in the sense that it is maximal for a uniform distribution. In the case of
ectopic beats, one would expect high entropy values corresponding to times of episodic
occurrences.
-
4
S1 'Figure
Spectrogram
of PAF patient
12
II
(pre-episode,
0i.
type P-2, top)
and non-PAF
patient (type
2.2 Pattern
middle),
he alth y
Recognition
N,
100
Ela
nd
individual
(type
t
!1,11
RAW
Pattern
Q,
),
al
recognition is a
time
1favor
resolution,
note
field related to
vertical
signal
ar
streaks
due to
insvdalgth
ION0q!
IP
sprocessing
and
artificial
intelligence.
Roe ~is
M I
I
the
200
general
I
given to a
Iname
04
It
diverse
set
of
techniques with
which one could classify and model signals. In general, there are two types of signal
models which are utilized, known as deterministic models and statistical models.
Deterministic models generally exploit some known properties of the signal, whereas
statistical models try to characterize the signal as a parametric random process.
20
2.2.1
Hidden Markov Models
Hidden Markov Models are a particular form of statistical models. In this specific
application, the HMM will be used to model the observations (features of time series of
IBIs) corresponding to the hidden physiological states (degree of PAF-behavior).
Compared with observable Markov models, HMMs have the advantage of being widely
applicable and could be applied with few presumptions about the underlying hidden
states. Every HMM has a set of parameters, which could be trained to better account for
certain sets of observed behavior. In the toolkit by Kevin Murphy (Section 3.1), the main
parameters are
Q
(number of states in the model), M (number of Gaussian mixtures), 0
(number of possible output observations), and "diagonal" or "full"
matrices).
(for covariance
Other important aspects of setting up the problem, such as initialization of
states, state transition matrices, and observation probability distribution in states are
handled by the toolbox.
Given a set of initial state parameters, an HMM could be
"trained" on sets of observation sequences to better describe the evolution of observations
(the number of times the HMM loops over each piece of training data is yet another
parameter).
After training HMMs on different sets of observations, we could then
evaluate different observation sequences using the HMMs.
Each HMM would yield a
probability of that particular observation sequence arose from its model, and the HMM
that returns the highest probability is the one that best fits the observations.
Two
particular types of HMMs used in this study are the discrete-output HMMs and Gaussianmixture HMMs.
21
Discrete-output HMMs model events with discrete observations.
The range of
observations fit a finite "alphabet", with the drawback that quantization of observations
could lead to loss of useful information. Gaussian-mixture HMMs are applied towards
modeling of outputs better characterized as samples out of a continuous distribution. By
adjusting the number of states and the number of Gaussian-mixtures, the model could
account for a wide range of observations. One of the parameters on the Gaussian-mixture
HMMs is "left-right", which allows for modeling using a particular form of state
transitions.
If set to "left-right", then the context would be a series of states where
transition is forbidden to any state with lower indices. In terms of physiology, this could
be used to characterize progressive changes over time.
3.0 Methods
Throughout the period of this research project, several public domain toolkits were
utilized to aid in the analysis of data. Each has proven to be quite useful, and will be
briefly presented in Section 3.1.
The availability of these tools significantly helped in
speeding up the analysis. Description of the raw data are presented in section 3.2. The
way the raw data was processed and organized, as well as how techniques were applied to
training and testing are presented in Section 3.3.
3.1 Toolkits
WAVE is an extensible interactive graphical environment for manipulating sets of
digitized signals with optional annotations. Designed for workstations with the open
22
source Xview toolkit, WAVE was built using the WFDB library developed for
physiologic signal processing, so it could be applied to any of a wide variety of data
formats supported by the WFDB library.
In addition, a beta version of WAVE re-
implemented using the Gimp Tool Kit (GTK), which was portable to MS Windows, was
also used. The current release of GTKWave is still in development and should be
considered usable but potentially unstable. WAVE was used in pre-processing the ECGs
to extract the IBI information.
In addition to WAVE, the Hidden Markov Model (HMM) Toolbox for Matlab
written and provided by Kevin Murphy was also used extensively. The functions were
easy-to-use m-files that allowed for models ranging from discrete-output HMMs to
Gaussian-mixture HMMs. Both modeling techniques were explored in the course of the
analysis.
3.2 Raw Data
Figure 5 - Raw data. note that all PAF episodes (if any) were only present in continuation sets.
30-minute ECC
The entirety of
P-cont.
Group P
PAF, 25
rs.
sets type P-1. 25 sets type P-2
Group N
30-minute ECOs, non-PAF, possibly other cardiac disordet 50 sets
5-minutes,50 sets
N-cont.
5-minute, 50 sets
Q
oupE
the data consists of 4
Groups
(Figure
5),
Group
Organized according to
the types of patients.
Cic test S
Group P consists of
30-minute ECUs, mixture of type P's and type N's. 100 sets total
data from patients with PAF. Group N contains data from patients with no PAF, but
perhaps some other cardiovascular disease. Group
Q consists
of data from the MIT-BIH
23
Normal Sinus Rhythm Database, recorded from patients with no detectable arrhythmias.
Finally, a test set that was originally used for the CinC challenge was incorporated to
study the relative effectiveness of the technique developed in this study.
contains 30-minutes worth of ECG recordings.
Each record
Each of the records in Group P and
Group N has an additional corresponding 5-minute "continuation" record.
A more
detailed description is as follows.
The Group P records contains 25 type P-1 records and 25 type P-2 records. P-2
records contain the ECG immediately preceding an episode of PAF, which can be
verified by examining the like-numbered continuation record (type P-2c). Thus, for
example, record p16 (a type P-2 record) immediately precedes the episode of PAF in record
pl6c. The records of type P-1 contain 30 minutes of the ECG during a period that is
distant from any episode of PAF (there is no PAF episodes during the 45-minute period
before the beginning or after the end of the 30-minute record). The corresponding 5minute continuation record (type P-1c) shows that (at least!) the minutes immediately
following the record do not contain a PAF episode. Since the data was collected with no
manual audits, a few of the 30-minute records in this group may contain very short bursts
of PAF episodes that escaped notice while the learning set was being compiled.
The 50 records in Group N come from subjects who do not have documented
atrial fibrillation, either during the period from which the records were excerpted or at
any other time. The subjects include healthy controls, patients referred for long-term
ambulatory ECG monitoring, and patients in intensive care units with possibly other
cardiac problems. It was interesting to note that during preliminary analysis, the author
24
of this report printed IBI histograms of the presumed "normal" population only to
discover obvious signs of arrhythmia. (Figure 6)
Since
the absence
of PAF alone does
not establish
"normal"
electrical
cardiovascular behavior, additional data not included in the original CinC challenge had
to be found from which benchmarks of strictly normal patterns could be inferred.
Due to the discovery of abnormal ECG activity in the Group N patients, further
data was sought out online, also from the Physionet online database. The data from the
MIT-BIH Normal
Sinus Rhythm
detectable arrhythmias.
Group
Q,
Database were
recorded from patients with no
These 18 ECGs are classified in this report as belonging to
and used as a benchmark for comparison against the presence of abnormalities.
Each of the original records in this group represented 20-hours worth of ECG recording.
For the purposes of this study, four sections of recordings were taken from each 20-hour
record, and analyzed.
The procedure with which the sections were extracted are
discussed in Section 3.3.
25
Figure 6a - Type-P (example is actually P-2) IBI series histogram (top), Type-N (middle), Type-Q
(bottom), each across seven consecutive 200-IBI segments. The interval lengths are reflected by the
x-axis, and the y-axis reflects the number of IBIs of that length. Note the pattern shows a tri-modal
distribution suggestive of consistently irregular heart beats for type-N as well as type-P. A strictly
normal IBI distribution would have a histogram consistently similar to that of type-Q, with a single
Gaussian distribution and no outliers.
----
1 00
soL
- --
-
- ----- -
-
-
-
--
I
_1
100
50o
-
-
1oon
50
50
0.
F
50
-
---
sl .
-
-O
0
1009
S
~i~l1.
-
15
-
-----
-
1-
0.
0o
1
50
0
311
1000
5
loon
0
50
loo
0.5
1.15
15
100
50-
5
100
0
-
501
50
0
n
100
5 050
1009
100
-
-
-
-
1
15
-
--------.-
OS
5
1
100
5
50
01
1000
15
01
100
50
OT
15
100
5O
50
0
-1--
-
100 0
0 .5
1
1.5
26
Figure 6b- Closer inspection of type P1 IBI histogram and its continuation Plc IBI histogram, as
compared against that of type P2 and P2c. Note that the type P2 histogram contained greater
spread, indicative of abnormal beats. Also, there was a noticeable distribution shift between the last
200-IBI histogram of P2 and P2c, whereas the distribution was consistent between P1 and Plc.
100
50
0
100?
05
50
100
05
1
115
0o -A65
100?
50
015
100
0
50-
1-5
1
-
A.-
0 5
100
5
1
50
-
1 05
1.5
1
1.5
1-P3.cI
100
5C
1
L
C
0
0.5
P2
10C
5C
C
C
5C
10
C
I OC
05
1
5
5C
10
Sc
100
C
50
9
10
n. 5
so
11
C
100C
5C
C0
0.5
P2c
100C
50
11.5
-
R-
0
0.5
1
1.5
27
The final addition to the collection of data utilized was the test set provided by the
original CinC challenge. This test set consisted of 100 30-minute records, which were
classified either type P-1, P-2, or N. In the original competition, the test set was used to
evaluate relative success in screening and prediction. It is used for the same purpose in
this study. Since no continuation sets are available, no conclusions about detection could
be drawn from this data set.
Note that the ECGs for the CinC test set (only!) were
grouped so that consecutive pairs belonged to a single patient. If the patient was a PAF
patient, then one of the ECGs would be distal to any episodes while the other
immediately precedes an episode.
3.3 Terminologies
In evaluating the results from the different tests, a set of common measures will
be used to determine relative success.
From the field of health studies, the terms
specificity (true-negatives/ [true-negatives + false-positives])
and sensitivity (true-
positives/ [true-positives + false-negatives]) will be used, as well as predictive accuracy
or accuracy ([true-positives + true-negatives] / [all results]). From signal processing, the
terms detection rate (synonymous with sensitivity) and false-alarm rate (I - specificity)
could also be used to characterize test results.
3.4 Analysis Techniques
Four different analysis techniques were used in exploring how best to
discriminate between subsets of patients in each of the test event categories.
28
3.4.1 Prematurity Weighting
The first technique tried was a variant of the exponential weighting of PBs used
by Wei Zong. The differences in the analysis were that 1) the IBI extraction in this study
was done using WAVE, instead of the specially developed beat type detector used by
Wei, 2) the definition threshold for an PB in this study was not fixed at 15% (as Wei
assumed in his analysis), but was allowed to be a test variable. The parameters involved
were: Tau, the exponential rate used in the weighting function; W, the length of time the
exponential weighting function extends into the past; PBthreshhold, the ratio of an IBI
over the time-averaged IBI length where the beat would be defined as an PB; and
PAFthreshhold, the cutoff value of the weighted results between those designated as
PAF or not. To determine the optimum parameters, the variables were tested over the
following ranges (Table 2) in the training set. The optimum parameters determined were
subsequently applied to the evaluation and testing sets.
Table 2
Tau
Initial Value
1
End Value
10
Search Increment
1/3
W
PB threshold
PAF threshold
5
.75
20
30
.95
320
1
.01
1
For the CinC test sets, the screening methodology followed Wei's original
approach (Section 1.2), where the max weighted value of each ECG pair were taken to be
representative of the person's state.
29
3.4.2
Discrete-output HMMs on Entropy
The second technique studied was the application of discrete-output HMMs on
the entropy of the signal.
The relevant parameters included both the number of
discretized levels, and the number of states. The level of discretization was chosen to be
100, which resulted in as little information loss as possible (Fig 10).
states was varied between 2 and 15.
The number of
For the generation of the entropy signal, the inputs
were taken as the instantaneous heart rate, the corresponding relative time markers, and a
frequency range for the spectrogram. The frequency range was varied between 2, 4, and
6 Hz with no observable changes in analysis results.
3.4.3
Gaussian-mixture HMMs on IBIs
The third analysis technique was the one given the most in-depth exploration.
Gaussian-output HMMs were trained on classification types and tested in each of the
event categories.
The relevant parameters were the number of mixtures fitted (M), the
number of states in the model (Q), the type of transition matrix (left right/not).
The
maximum number of iterations in training was set to be 5 for the entirety of the study,
and the type of covariance matrix was set to 'diag'.
between 1 &5,
Q
between
The parameters were varied (M
&3, and leftright between true/false), with all results that
converged recorded in Appendix A.
Important results were highlighted in the Section
4.3.
30
3.4.4
Gaussian-mixture HMMs on Spectrograms
Finally, a last analysis technique was attempted, also using Gaussian-output
HMMs. Instead of IBIs, the models were instead trained and tested on spectrograms of
the heart rate signal. The same parameters applies from the above subsection, except that
the type of covariance matrix was set to be 'full".
Using covariance matrix type 'diag'
resulted in numerical errors. Also of concern was the need to discard startup noise from
the spectrogram estimation.
To accomplish this, the algorithm was run with the full
instantaneous heart rate input for the 30-minute ECG and the 5-minute continuation ECG
appended together. The last 1400 samples of each output series were taken to be noisefree.
3.5 Training and Testing
For each of the tests, the raw data is processed (as will be detailed in the
following sub-sections), and divided into training, evaluation, and testing sets. Except for
the CinC Challenge test sets, which provides the metric of comparison, all other data
series are randomly divided into the three mentioned dataset types (training, evaluation,
and testing) with a probability ratio of 1:1:1 (so each data series has an equal probability
of being assigned to any one of the 3 groups). The three groups (which would on average
contain roughly the same number of datasets) could then be used to train, evaluate, and
test the algorithm, respectively. Since it is not known if any one person contributed more
than one data set, we cannot say with certainty that the people in the training set were
different than those in the evaluation and test sets. In the case of HMMs, the optimum
parameters are experimentally determined using the training and evaluation sets.
The
31
algorithm would be applied to the test set and the CinC set only once using the
determined optimum parameters. For the Prematurity Weighting method, the optimum
parameters were determined on the training sets, and tested exactly once each on the
evaluation and testing sets. This structure minimizes bias in test results.
3.5.1 Event 1 : Screening
The training and testing was arranged differently for each Event. Event I (PAF
screening) in particular was based on comparing results aimed at discriminating between
Groups P, N, and
Q, rather than
focusing on Group P as in Events 2 & 3 (PAF detection
& prediction, respectively).
The raw ECG data corresponding to the various patient groups were compiled and
processed using the ihr, or instantaneous heart rate, function provided by WAVE. The
resulting data is then inverted to derive 30 minutes worth of inter-beat intervals for each
record. All data are then loaded into Matlab for processing and analysis
Figure 7 - Event
1 screening. Each data series now contain uniform numbers of IBIs.
Group P
1200 IB1s. PA F, 25 sets type P-1, 25 sets type P-2
Group N
1200 IBIs. non-PAF. possibly othem cardiac disoriexs. 50 seats
Group Q
1200 IBIs, certified
as
non-srrhythlmic, 72 sets
CinC test set
1200 iBIs, mixture of type P and type N ECOs, 100 sets
Further pre-processing is done by partitioning each record into 1200 IBI segments
(with extra IBIs dropped from the beginning of the 30 minutes). Since each record
reflects a patient with variable heart rate, every fixed-time interval segment would likely
32
contain a different number of IBIs. By establishing a fixed number of IBIs per segment,
this discrepancy between segments could be eliminated, while still capturing the
sequence of heart beats that led up to a possible cardiac episode. The IBIs discarded
from the beginning of the time series are known to be distal to PAF episodes, and
assumed to be less relevant to PAF observation. The segment length (in other words, the
number of IBIs included), was chosen to be 1200 due to the fact that the number of IBIs
contained in the different 30-minute time series of Group P & N ranged from 3379 to
1331. In other words, the time series with the slowest average heart rate contained 1331
detected beats, and 1200 is an arbitrarily chosen segment length that all ECG time series
could satisfy.
The single exception to the 1200 IBI rule was one of the CinC test set series,
which contained only 1142 IBIs over 30-minutes.
The discrepancy is minor, and
individual performance tracking of that specific data series did not indicate strong biases
in results.
The training and testing across techniques are kept consistent to allow
comparison. Prematurity weighting was the first technique attempted. It serves as the
benchmark for later comparison with other methods.
The algorithm was partially
reproduced from the original paper (as cited in Section 1.3) and tested on the CinC test
dataset, so as to verify the results originally reported.
The technique using discrete-output HMMs on entropy was implemented using
the following procedure. Two separate HMMs were trained on the entropy estimates for
Groups P & N (Fig 10), and then tested on the CinC test set. The data series used in this
part of the analysis were derived from the full 30-minute ECGs using Yuan Qi's
3-3
algorithm.
The analysis was repeated using the 1200 IBI segments with no noticeable
difference in results.
Given the surprising lack of obvious features and results, the
technique was only applied with Event I Screening. Attempts were made to probe into
the reason behind the failure.
The technique using Gaussian-mixture HMMs on IBIs was more involved, with
data from application to all 3 Events, and the promising results that were derived
prompted further exploration.
Group P and N data, as mentioned in Section 3.2, were
each organized as 50 records, each with a segment of 1200 IBIs preceding PAF derived
from the original 30-minute records.
Group
Q,
which was originally eighteen 20-hour
ECG recordings, was converted into IBI time series using the ihr function mentioned
earlier. The first 8000 IBIs in each type-Q data series were separated into 4 consecutive
sets of eighteen 2000 IBI segments. Each 2000 IBI segments were turned into 1200 IBI
segments by truncating the last 800 IBIs of each segment. The end result was the 72 sets
of 1200 IBI segments, as shown in Figure 7 above. Over the course of the IBI analysis,
the maximum iteration parameter was kept at 5, and the covariance matrix type set to be
diagonal.
Finally, the Event I testing was attempted using Gaussian mixture HMMs trained
on spectrograms directly. This approach builds on the use of entropy as a discriminatory
measure, looking beyond variability in the power spectrum, and modeling instead the
appearance of ectopic beats and possible shifts in rhythmic control frequencies. For this
part, the covariance matrix type was set to be full, since the datasets modeled are larger,
with 20 frequency bins per time sample for the purposes of this study (so each set of
34
input observations were a 20xl200 array).
Diagonal covariance matrix was attempted
but generated too many numerical errors for the results to be useful.
3.5.2 Event 2 : Detection
For Event 2 (PAF detection), processing similar to that described for Event I was
applied to each of the 5-minute continuation sets, instead of the 30-minute sets (Fig 8).
The resulting IBIs are collected into a single 200 IBI segment each (with extra IBIs
dropped from the end of the 5 minutes). The IBIs at the beginning are of more interest,
since the data specification as introduced in the CinC competition stated that the PAF
episode (when present) would be at the beginning of the 5-minute segment that has been
marked off. A visual presentation of processed data is presented below in Figure 8. Since
the CinC test sets did not include continuation sets, they could not be used in testing for
Event 2.
Note that one of the continuation sets of type N contained only 60 IBIs over 5
minutes.
Checking the original ECG showed that the record flat-lined about a minute
into recording, possibly due to lead detachment.
35
P-2c
200
Figure 8 - Event 2 detection. The
original continuation sets were
P-2c
IBIs,
25 sets
processed into
2(X) IB~s, 25 sets
P-1c
N-c
200
200 IBIs, 50 sets
IBI
series. The
methods are trained and tested on
type P-Ic, P-2c, N-c, and Q-c
segments and with the goal of
reliably detecting the P-2c types.
IBIs, 50 sets
Q-c
Q-c
200 IB1s, 50 sets
200 1BIs, 50 sets
For
I
training
and
testing, only Gaussian-mixtures were used. Separate HMMs were trained on the type PIC, P-2c, N and
Q sets,
and then evaluated for log-likelihood matching against the 200
IBI segments in the evaluation and test sets.
Note the distinction emphasized here
between type P-Ic's (no PAF episode) and type P-2c's (PAF episode). The ultimate goal
is to distinguish type P-2c ECGs from all others. To accomplish this, the algorithm for
randomized set assignments and testing was actually run twice. The first time testing for
pair wise discrimination between P-2c, N, and
discrimination between P-2c, P-Ic, and
Q. (Fig
Q, and
the second time for pair wise
8)
3.5.3 Event 3 : Prediction
Event 3 (PAF prediction) used similar pre-processing procedures on the raw data
as the previous test events. The time series were converted into IBIs and segmented. In
this test case, however, each of the 1200 IBI segments were further divided into six
consecutive 200 IBI segments (Fig 9).
The goal of the exercise was to determine whether a patient would soon be
experiencing a PAF episode, before the episode occurs. In terms of this study, that means
singling out the type P-2 datasets. The HMMs were trained on the first and last 200 IBI
36
segment (assumption is that the last 200 IBI segment prior to PAF episode would contain
features that could be used for prediction) of each data series. The HMMs are then used
to evaluate the log likelihood of matching the segments in the middle. If the segments in
the middle were consistently more similar to the first segment (a higher log likelihood),
then the record was classified as imminently-PAF. After trying out different comparative
methods (using different combinations of segments), it was found that the 200 IBI
segment immediately prior to the last 200 IBI segment was the most discriminant (which
is what would be expected).
This procedure was noteworthy in that the attempt was
made to train on acceptable state and episodic states for an individual, rather than a
population of patients.
Figure 9 - Event 3 prediction. The long 1200-IBI segments were truncated and the resulting 200-IBI
segments are used for predictive testing.
Group P
six 200-IBI segments, PAF, 25 sets type P-1, 25 sets type P-2
P-cont.
200 IBIs, 50 sets
Group N
six 200-IBI segments, non-PAF, possibly other cardiac disoiders, 50 sets
N-cont.
200
Group
Q
six 200-IBI segments, certified as non-archythmic, 72 sets
IBIs, 50 sets
Q-co1nt
200 IBIs, 50 sets
The results from using the segmented tracking were found to be useful, but yet
another classification was explored using the general form used in Event I Screening.
Separate HMMs were trained on P-2 types (the patients about to suffer a PAF episode),
37
P-I types (PAF patients not about to suffer an episode), and
Q
types. Training was done
using the contiguous 1200 IBI segments used previously in Event 1.
4.0 Results
The complete results of all parameters and techniques explored are presented in
Appendix A.
The important results are presented in order of techniques used in the
sections to follow.
Here is a quick overview of the best results achieved across the
different technique used for the set of test events:
Table 3 - Results Overview * discontinued due to poor results **result cited in Abstract **results
from CinC test set (see Section 4.3.1) yielded even better results, and were cited in Abstract
P vs. N
Prematurity
Discrete-output
Gaussian-mix
Gaussian-mix
(EvalSet)
Weighting
HMM(Entropy)
HMMs(IBIs)
HMMs
Screening
Sensitivity:94
Specificity:64
Sensitivity:75
Specificity: 14
Sensitivity:88***
Specificity:54
Sensitivity:87
Specificity:24
Accuracy:48
Detection
Accuracy: 81
Sensitivity: 100
Accuracy: 72
Sensitivity: 100
Accuracy:53
Sensitivity: 13
(P2c
Specificity:93
Specificity:79
Specificity:89
Nc)
Accuracy: 95
Accuracy:88
Accuracy:67
Prediction
Sensitivity:90
Specificity:85
Sensitivity:43
Specificity:94
Sensitivity:27
Specificity:67
Accuracy:79
Accuracy:50
(Spectrograms)
vs.
*
*
Accuracy:87
Gaussian-mix
P vs. N
Prematurity
Discrete-output
Gaussian-mix
(TestSet)
Weighting
HMMs(Entropy)
HMMs(IBIs)
HMMs(Spectrograms)
Screening
Sensitivity:93
*
Sensitivity:89
Sensitivity:93
Specificity:44
Specificity:55
Specificity:27
Detection
Accuracy:67
Sensitivity:91
Accuracy:71
Sensitivity: 100
Accuracy:59
Sensitivity:9
(P2c
vs.
Nc)
Prediction
*
Specificity:82
Specificity:65
Specificity:94
Accuracy: 86
Sensitivity:71
Accuracy:79
Sensitivity:42
Accuracy:61
Sensitivity:43
*
Specificity:56
Specificity: 100
Specificity:75
Accuracy:62
Accuracy:74
Accuracy:67
38
P vs.
Q
Weighting
Gaussian-mix
HMMs(IBIs)
Sensitivity:94
Sensitivity:88
Specificity:93
Specificity:92
Specificity:93
Detection
Accuracy: 93
Sensitivity: 100
Accuracy: 90
Sensitivity: 100
Accuracy:81
Sensitivity: 15
(P2c
Specificity:85
Specificity:91
Specificity:100
Qc)
Accuracy:87
Accuracy:93
Accuracy:74
Prediction
Sensitivity:92
Specificity:95
Sensitivity:63
Specificity:90
Sensitivity:33
Specificity:77
Accuracy:94
Accuracy:84
Accuracy:70
Prematurity
Gaussian-mix
Gaussian-mix
HMMs(Spectrograms)
Prematurity
(EvalSet)
Screening
P vs.
vs.
Q
**
Gaussian-mix
HMMs(Spectrograms)
Sensitivity:57
(TestSet)
Weighting
HMMs(IBs)
Screening
Sensitivity:95
Sensitivity:89
Sensitivity:53
Specificity:91
Specificity:81
Specificity:95
Detection
Accuracy:93
Sensitivity: 100
Accuracy:84
Sensitivity: 100
Accuracy:78
Sensitivity:6
(P2c
Specificity:91
Specificity:78
Specificity:100
Accuracy:94
Sensitivity:80
Accuracy: 83
Sensitivity:82
Accuracy:44
Sensitivity:30
Specificity:96
Specificity:87
Specificity:83
Accuracy:94
Accuracy:85
Accuracy:64
vs.
Qc)
Prediction
4.1 Prematurity Weighting
For this part of the analysis, the technique utilized was intended to reproduce and
further expand upon the technique described by Wei Zong's paper (see Section 1.2). The
end results of the simulation (at 76% classification accuracy ([True Pos + True Neg] /
Total) for Event 1) closely matched the results (79% classification accuracy) reported by
Wei
(though the "'last submission"
records on Physionet
classification accuracy, see Wei's Results in Section 1.3).
only showed
a 72%
The end parameters were a
threshold of 317, with a PB defined as an IBI that is 23% shorter in duration than the
weighted sum of previous IBIs. The difference in beat detectors used was probably the
39
main variability factor, and might have resulted in the slight (3%) reduction in
classification accuracy.
In comparing these results to the results to follow, it is useful to note that Wei
Zong's APB method and the modified Prematurity Weighting method used here only
utilizes the last 10 minutes of the 30-minute ECG data. In later analysis using HMMs,
when some IBIs are dropped from the beginning of the 30-minute ECG, the results of the
PB method still serve as a useful benchmark. The following table presents the results
tested on the CinC data set using optimized parameters derived from training sets.
Table 4
Event 1: Screening
CinC Set
P vs. N
Full 30min
TruePos
19
TrueNeg
19
FalsePos
3
FalseNeg
9
Sens
68%
Spec
86%
PredAcc
76%
Parameters
paf threshold=317
pb threshold=.77
window= 10
tau = 6.33
segments
Event 2: Detection
CinC Set
P2c vs.
Plc
5-min
TruePos
19
TrueNeg
22
FalsePos
3
FalseNeg
6
Sens
76%
Spec
88%
PredAcc
82%
Parameters
paf threshold=24
pb threshold=.75
window-5
tau=6.33
segments
Event 3: Prediction
CinC Set
P vs. N
Full 30-min
segments
TruePos
17
TrueNeg
53
FalsePos
19
FalseNeg
11
Sens
61%
Spec
74%
PredAcc
70%
Parameters
paf threshold
= 20
pb threshold
= .75
window= 10
tau = 6
4.2 Approximate Entropy and Discrete-output HMM
Using the technique developed by Yuan Qi, the inter-beat interval time series in the
CinC data sets were converted into an entropy series, and discretized for HMM analysis.
40
The level of discretization was chosen to be 100, which resulted in as little information
loss as possible (Fig 10).
Figure 10 - discretized entropy series, with enough discrete levels to minimize information loss
The entropy series were
then
analyzed
using
4
discrete-output HMMs with
3
2
the results in the following
0
500
1000
1500
2000
2500
3000
table. The results were less
than satisfactory given the
80
60
20
0
low predictive accuracy, and
A.
40
1)
histogram plots (Fig 6) of
500
1000
1500
2000
2500
3000
the
entropy
data
were
examined for visible features that would allow for discrimination between the population
groups. No extraordinary differing factors could be found. Another remedy that was
attempted was adjusting the resolution of the spectrum estimation, also with no obvious
improvement.
CinCSet
P vs. N
P vs. N
P vs. N
P vs. N
TruePos
21
21
21
18
TrueNeg
3
2
3
5
FalsePos
19
20
19
17
FalseNeg
7
7
7
10
Sensitivity
75%
75%
75%
64%
Specificity
14%
9%
14%
23%
Pred Acc
48%
46%
48%
46%
Parameters
Q=2
Q=3
Q=4
Q = 15
Table 5 - 0 (number of discrete levels used) was 100, trained on Groups P & N and tested on CinC
dataset for Event 1 Screening
4.3 Gaussian Mixture HMM on IBIs
When the results using the entropy did not meet expectations, the decision was made
to focus instead on IBIs. Histogram plots of Bis over different time segments revealed
41
some interesting patterns (Fig 6), notably the tendency of IBIs to diverge into clusters of
long IBIs and short 1131s.
This observation supports the initial observation that PBs
become more prevalent before and during PAF. Even though histograms were useful in
identifying this pattern of IBI change over time, it doesn't capture the pattern of IBI
changes over time.
As a result, the IBI series were analyzed using Gaussian-mixture
HMMs, which would capture the both the ectopic events, but also additional information
on timing and frequency.
The results for the analysis are presented below in the order of Events attempted.
4.3.1
Event I: Screening
Table 6
EvalSet
P vs. N
P vs. Q
N vs. Q
TruePos
14
14
4
TrueNeg
7
22
FalsePos
6
2
12
12
FalseNeg
2
2
9
*best predictive accuracy for both P vs. N and P vs.
Parameters
M =2
Q=1
50
Pred
72%
90
43
Spec
38%
75
25
Pred
66%
85
32
Parameters
M =3
Sens
88%
88
31
Spec
54%
92
Sens
86%
Q
16
18
FalsePos
8
6
0
100
6
6
18
7
46
22
12
FalsePos
9
5
15
FalseNeg
2
2
Spec
55%
81
Pred
71%
84
444
5
Parameters
M =2
Q1
11
Sens
89%
89
45
TrueNeg
8
FalsePos
14
FalseNeg
1
Sens
96%
Spec
57%
Pred
70%
Parameters
M =2, Q1
TrueNeg
5
EvalSet
P vs. N
P vs. Q
N vs. Q
TruePos
14
*highest P vs.
Q sensitivity
TestSet
P vs. N
P vs. Q
N vs. Q
TruePos
16
16
9
TrueNeg
CinCSet
P vs. N
TruePos
27
11
FalseNeg
2
Q=2
*parameters of best predictive accuracy on the evaluation set were used on the test set and the CinC
set
42
4.3.2
Event 2: Detection
Table 7
EvalSet
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
TruePos
TrueNeg
11
1
8
7
33
29
FalsePos
3
2
6
FalseNeg
Sens
0
100%
3
7
3
50
FalsePos
0
2
9
FalseNeg
4
5
Sens
64%
73
64
FalsePos
FalseNeg
Sens
3
0
0
100%
5
Spec
79%
94
83
Pred
88%
89
73
Parameters
M =
Q 3
Spec
Pred
84%
89
71
Parameters
M= I
Q= 2
Spec
79%
Pred
Parameters
88%
M=I
100
91
Q1
64
65
93
64
*highest predictive accuracy for P2c vs. Nc
EvalSet
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
TruePos
7
8
9
TrueNeg
14
133
26
3
100%
94
74
*highest specificity for P2c vs. Nc
EvalSet
TruePos
TrueNeg
1
P2c vsNc
P2c vs. Qc
11
32
Nc vs. Qc
9
23
3
12
*highest predictive accuracy for P2c vs. Qc
TestSet
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
TruePos
I1
9
TrueNeg
11
21
FalsePos
6
6
FalseNeg
0
0
Sens
100%
100%
Spec
65%
78%
Pred
79%
83%
1
25
0
16
6%
100%
62%
Parameters
M 1
Q 3
*parameters of best predictive accuracy for P2c vs. Nc used for test set
EvalSet
P2c vs. PIc
P2c vs. Qc
Plc vs. Qc
TruePos
7
8
9
TrueNeg
7
21
21
FalsePos
4
6
6
FalseNeg Sens
1
88%
0
100
2
82
Spec
64%
78
78
Pred
74%
83
79
Parameters
M 1
Q3
*best predictive accuracy for P2c vs. Plc
TestSet
TruePos
P2cvs.Plc8
P2c vs. Qc 9
Plc vs. Qc 7
TrueNeg
FalsePos
FalseNeg
Sens
Spec
Pred
Parameters
5
21
19
2
6
8
1
0
0
89%
100
100
71%
78
70
81%
M=1
Q 3
83
76
*parameters of best predictive accuracy for P2c vs. Plc used for test
set
43
4.3.3
Event 3: Prediction
Table 8
Using six 200 IBI segments:
Pl vs. P2
CinC
Test series
TruePos
11
7
Sens
44%
25%
FalseNeg
14
21
FalsePos
3
11
TrueNeg
22
61
Pred
66%
68%
Spec
88%
85%
Parameters
M 1
Q 1
Using 1200 IBI segments (similar to Event 1):
Eval Set
P2 vs. P1
P2 vs. Q
P1 vs. Q
TruePos
4
5
7
TrueNeg
6
27
25
FalsePos
2
3
5
FalseNeg
4
_
3
1
Sens
50%
63
88
Spec
75%
90
83
Pred
63%
84
84
Parameters
M =
Q 3
*highest predictive accuracy for P2 vs. P1 and P2 vs. Q
TrueNeg
6
20
20
FalsePos
2
3
3
FalseNeg
6
2
2
Sens
45%
82
75
Spec
75%
87
87
Pred
58%
85
84
Parameters
M 1
Q 3
Plvs.Q
TruePos
5
9
6
CinCSet
P2 vs. PI
TruePos
18
TrueNeg
41
FalsePos
31
FalseNeg
10
Sens
64%
Spec
57%
Pred
59%
Parameters
M =1, Q = 3
Sens
11%
Spec
100%
Pred
65%
Parameters
M = 5,Q =1
Pred
53%
TestSet
P2 vs. P1
P2 vs. Q
*parameters used on TestSet and CinCSet
4.4 Gaussian Mixture IMMs on Spectrogram
Event]: Screening
(Note: number of
frequency bins set
Table 9
EvalSet
P vs. N
TruePos
2
ITrueNeg
28
FalsePos
0
to 50)
FalseNeg
16
*best predictive accuracy and sensitivity
TestSet
Pvs.N
TruePos
2
TrueNeg
FalsePos
FalseNeg
Sens
Spec
15
0
15
12%
100%
I Parameters
1 M=5,Q=1
44
Event1: Screening
(Note: number of frequency bins set to 20)
Table 10
EvalSet
Pvs.N
TruePos
13
TrueNeg
4
FalsePos
13
FalseNeg
2
Sens
87%
Spec
24%
Pred
53%
Parameters
M=5,Q=1
Sens
57%
Spec
93%
Pred
81%
Parameters
M-1,Q=2
Spec
*best predictive accuracy and sensitivity
Eval Set
Pvs.Q
TruePos
8
TrueNeg
26
TestSet
TruePos
Pvs.N
13
TestSet
P vs. Q
CinCSet
Pvs.N
I FalsePos I FalseNeg
2
6
TrueNeg_
FalsePos
FalseNeg
11
1
Sens
93%
27%
Pred
59%
Parameters
4
TruePos
7
TrueNeg
21
FalsePos
1
FalseNeg
8
Sens
53%
Spec
95%
Pred
78%
Parameters
M = 1, Q- 2
TruePos
TrueNeg
4
FalsePos
8
FalseNeg
3
Sens
Spec
33%
Pred
56%
Parameters
1 M=5,Q=I
Spec
89%
Parameters
M 1
100%
Pred
67%
74%
Q
10
77%
M=5,Q=1
*best parameters used on testset and CinCset
Event 2: Detection
(Note: number of frequency bins set to 20)
Table I I
EvalSet
P2c vs. Nc
P2c vs. Qc
TruePos
TrueNeg
FalsePos
1
2
17
30
2
FalseNeg
7
0
11
13%
15%
Sens
2
*best predictive accuracy and sensitivity
TestSet
TruePos
TrueNeg
FalsePos
FalseNeg
Sens
Spec
Pred
Parameters
P2c vs. Nc
P2c vs. Qc
1
1
16
11
1
10
15
9%
6%
94%
100%
61%
44%
M =1
Q=2
Sens
27%
33%
Spec
67%
77%
Pred
50%
70%
Parameters
M= 3
Q =1
0
*best parameters used on testset
Event 3: Prediction
(Note: number of frequency bins set to 20)
Table 12
Eval Set
P2 vs. N
P2 vs. Q
TruePos
3
2
TrueNeg
10
24
FalsePos
5
7
FalseNeg
8
4
*best predictive accuracy and sensitivity
45
TestSet
P2 vs. N
P2 vs.Q
TruePos
3
TrueNeg
3
15
15
FalsePos
5
3
FalseNeg
4
7
Sens
43%
30%
Spec
75%
83%
Pred
67%
64%
Parameters
M= 3
Q 1
*best parameters used on testset
5.0 Discussion
5.1 Event 1: Screening
In the evaluation sets, the technique of using Gaussian mixture modeling in Event
I demonstrated significant specificity (True Neg/ [True Neg + False Pos]) ranging up to
92% and high sensitivity (True Pos/ [True Pos + False Neg]) ranging up to 100% in
comparison between Group P and Group
Q.
The results were comparatively lower
between those of Group P and Group N, with sensitivity high of 94% and specificity high
of 54%. This makes some sense since one would expect that there is greater difference
between patients with PAF and those who are healthy, as opposed to comparison between
those with PAF and those who might have some unspecified arrhythmia. A comparison
between Group N and Group
Q resulted
in only 46% sensitivity (screening for N-types)
and 92% specificity. This could be due to the fact that some of those in Group N have no
arrhythmic disorders. The test set for Event I verified the general distribution of results
shown in the evaluation sets, using one state of 2 Gaussians as a model for the system.
The final test was for discrimination between the set of Group Ns and Ps provided by the
original CinC test set. The Gaussian-mixture technique provided sensitivity of 96% and a
specificity of 57%, with an overall accuracy of 70%. What would make this even more
promising is if the Gaussian-mixture
outputs were used in conjunction with the
Prematurity Weighting method, which has a higher specificity than it does sensitivity.
46
Testing for Event I using discrete-output HMMs on entropy yielded disappointing
results by comparison. The sensitivity had a high of 75%, but had only a specificity of
23%. Using Gaussian-mixture HMMs on spectrograms with a frequency resolution of 50
bins yielded an accuracy high of 12% in the evaluation sets, and a specificity around
100%.
The additional testing
for Event
I
using
Gaussian-mixture
HMMs on
spectrograms with a frequency resolution of 20 bins yielded much better results, with an
accuracy high of 57% in the evaluation sets, and a specificity around 93%. Applying the
algorithm to the CinC test set resulted in only a 56% prediction accuracy.
5.2 Event 2 : Detection
In testing for Event 2, the algorithm for Gaussian-mixture HMMs on IBIs was
used.
The results from the evaluation sets were very promising, with a high of 100%
accuracy in recognizing the immediate onset of PAF and up to 100% specificity
depending on the parameters used. Since actual periods of PAF were not included in the
CinC test set, the only one-time test was given based on the randomized test set. The
result was an 89% accuracy of discrimination between PAF and non-PAF episodes.
Given that PAF is an atrial disorder, and the technique used for detection is only utilizing
the time between QRS complexes (a primarily ventricular event), these results further
demonstrates the importance of IBIs as an overall indicator of cardiac health, as opposed
to localized ventricular state.
47
5.3 Event 3 : Prediction
For Event 3, the results from the segmented HMM analysis (200 IBI blocks) were
the reverse of what was observed in Event 2, with a high specificity and a relatively low
sensitivity.
Interestingly, when tested against the test set provided in the original CinC
competition, the technique returned a result of 68% predictive accuracy, but with a much
higher specificity (86%) than sensitivity (25%), which would not have translated into a
high score in the competition.
In hindsight, the test could have been better structured
given more training data for the healthy state for an individual, as the training data might
have been insufficient. Lack of data preclude further exploration in this study.
By comparison, the use of HMMs directly trained on the entire 1200 IBIs yielded
better sensitivity (64% on the CinC set) and a better scoring on the CinC testing (18/22).
Also noteworthy is the fact that when P-2 patients are trained and tested against type
Q
patients, the sensitivity and specificity both jump to be around 90%.
5.4 Entropy Failure Speculation
The failure of the entropy analysis was a major surprise given previous literature
and experiments [Vikman et al, 1999], which suggested that approximate entropy (ApEn)
would decrease prior to cardiac episodes. ApEn measures the logarithmic likelihood that
runs of patterns that are close to each other will remain close in the subsequent
incremental comparisons. A time series containing many repetitive patterns has a
relatively small ApEn; conversely, more random data produce higher values. The cause
of failure with entropy is uncertain, but possibilities include either mismatched
resolutions, or possible discrepancy in each patient due to emotional or physical distress
48
not documented at the time of recording. Whereas the variability in the power spectrum
might not be sufficient to characterize PAF onset, the spectrogram itself offered two
features worth considering. The first was the vertical streaking corresponding to ectopic
beats, and the other the shifting of control frequency (relative strengths of sympathetic,
parasympathetic output) due to ectopic behavior.
The fact that direct training on the
spectrogram yielded better results is encouraging, though further analysis utilizing
tradeoffs between time resolution and frequency resolution could potentially prove even
more enlightening.
6.0 Conclusions
This study demonstrates the feasibility of screening, detection, and prediction of
PAF based solely on the pattern of inter-beat intervals. Features extracted from IBIs,
such as entropy and spectrograms were also explored as possible discriminators.
The
application of pattern recognition techniques builds on the previous work with threshold
and weighting functions for narrowly defined events.
Feasibility of using Gaussian-
mixture HMMs in PAF analysis yielded unambiguous results.
Especially noteworthy
was the results regarding comparisons between those with PAF and healthy patients.
While the techniques appear robust enough to differentiate between PAF and non-PAF
patients with only reasonable accuracy, it is highly successful (~90% sensitivity and
specificity) when used in discriminating between healthy individuals and those with PAF.
49
7.0 Recommended Future Works
The key application of IBI-based PAF prediction lies in surface-lead ECG
processing, where the noise from muscular distortions and lead contact movements
contribute to a fundamental limit in the accurate tracking of small amplitude P-waves.
By focusing on the larger and more distinctive QRS complexes, the techniques analyzed
in this study are better suited to the type of commercial applications in the area of health
and fitness products. There are several specific areas in the determination of health state
that this author would like to see explored in the future.
In terms of continuing research in use of IBIs in PAF analysis, the patient-specific
training that was described in Section 3.3.3 bears great potential.
The ultimate goal
would be to enable devices to adapt to personal variability in physiology and provide
prediction measures based on individual case history. To achieve this type of analysis,
what would most likely be required is longer ECGs of PAF patients than what was used
in this study. Training could then be applied to regions distal and in proximity to PAF
episodes for that individual.
In terms of academic questions left to explore, there remains the question of
screening and prediction for other atrial arrhythmias.
identifying other arrhythmic episodes?
artificial stimuli, such as drugs?
Are IBIs equally important in
Are these techniques valid in the presence of
Once prediction accuracies could be established to
within a certain threshold, there would also be many questions that would follow
regarding measurement of the exact preventative value for interventions, in terms of
successfully prevented arrhythmic episodes.
50
Besides academic pursuits that could draw on this research, there are many
potential uses of this technology to benefit public health and services. The prevalence of
PAF in the general population has never been accurately determined, and this study could
contribute much to making available to the public the necessary testing and evaluation
that would otherwise require access to a medical institution.
Rather than relying on
skilled technicians, expensive hardware, and human expert interpretation, the algorithms
could be easily automated and calibrated to work with commercially available fitness
sensors.
The ideas are currently being explored in various institutions.
Among
commercial entities, Motorola in particular has taken a strong interest in incorporating
algorithms into mobile devices for providing valuable service to their customers.
Continuation along the current paths of research would be likely to yield great future
dividends in terms of quality health care that is both personal and pervasive.
51
Appendix A
Complete HMM Results
Gaussian-Mixture HMMs on IBIs
Event 1: Screening
EvalSet
P vs. N
P vs. Q
N vs. Q
TruePos
15
14
4
TrueNeg
5
19
9
FalsePos
8
5
15
FalseNeg
1
2
9
Sens
94%
86
31
Spec
39%
79
38
Pred
69%
83
35
Parameters
M =3
Q=1
Toolkit commandlines used to generate above:
[priorli, transmatli, mixmatli, muli, Sigmali) =
init
mhmm(mod_tri, 1, 3, 'diag', 0)
[LL, priorl,
transmatl, mul, Sigmal, mixmatl] = learn mhmm(mod_tri,
priorli,
transmatli,
muli, Sigmali, mixmatli, 5);
loglikl = loglik_mhmm(mod-tri, priorl, transmatl, mixmatl, mul, Sigmal)
EvalSet
P vs. N
P vs. Q
TruePos
14
14
TrueNeg
7
22
FalsePos
6
2
FalseNeg
2
2
Sens
88%
88
Spec
54%
92
N vs. Q
Pred
72%
90
4
12
12
9
31
50
43
Parameters
M =2
Q=1
Toolkit commandlines used to generate above:
[priorli,
transmatli, mixmatli, muli, Sigmali]
init
mhmm(mod tri,
1, 2, 'diag', 0)
[LL, priorl,
transmatl, mul, Sigmal, mixmatl) = learnmhmm(modtri,
priorli,
transmatli,
muli, Sigmali, mixmatli, 5);
loglikl
logik_mhmm(mod-tri, priorl, transmatl, mixmatl, mul, Sigmal)
EvalSet
P vs. N
P vs. Q
N vs. Q
TruePos
10
TrueNeg
3
8
16
3
22
FalsePos
10
8
2
FalseNeg
6
8
10
EvalSet
P vs. N
P vs. Q
N vs. Q
15
18
FalsePos
8
6
4
12
12
EvalSet
Pvs.N
TruePos
14
TrueNeg
5
18
6
FalsePos
8
6
TruePos
15
Pvs.Q
16
N vs. Q
6
EvalSet
P vs. N
TruePos
TrueNeg
5
Sens
63%
50
23
Spec
23%
67
92
Pred
45%
60
68
FalseNeg
1
1
9
Sens
94%
94
31
Spec
38%
75
50
FalseNeg
2
Sens
86%
18
0
7
100
46
Spec
38%
75
Sens
94%
15
TrueNeg
3
FalsePos
10
FalseNeg
1
P vs. Q
16
12
12
0
100
N vs. Q
5
11
13
8
38
25
Spec
23%
50
46
Parameters
M =1
Q=1
Pred
69%
83
Parameters
M 5
Q =5
43
Pred
66%
Parameters
M=3
85
32
Q=2
Pred
62%
Parameters
M =2
70
Q=2
43
left-right
52
EvalSet
P vs. N
TruePos
14
TrueNeg
5
FalsePos
8
FalseNeg
2
Sens
86%
Spec
38%
Pred
66%
Parameters
M =2
P vs. Q
N vs. Q
13
4
19
14
5
10
3
9
81
31
79
58
80
49
Q=2
EvalSet
P vs. N
P vs. Q
N vs. Q
TruePos
TrueNeg
13
1
2
8
FalseNeg
3
1
9
Sens
81%
94
31
Spec
8%
15
4
FalsePos
12
22
16
Pred
48%
43
32
Parameters
M=e
Q=2
left-right
8
33
Toolkit commandlines used to generate above:
[priorli, transmatli, mixmatli, muli, Sigmali] = init mhmm(mod tri, 2, 1, 'diag', 1)
transmatli,
priorli,
transmatl, mul, Sigmal, mixmatl] = learnmhmm(mod-tri,
[LL, priorl,
muli, Sigmali, mixmatli, 5);
loglikl = log_lik mhmm(mod_tri, priorl, transmatl, mixmatl, mul, Sigmal)
EvalSet
P vs. N
TruePos
15
TrueNeg
3
FalsePos
10
FalseNeg
1
Sens
94%
Spec
23%
Pred
62%
Parameters
M =1
Q=2
P vs. Q
16
8
16
0
100
40
60
N vs. Q
4
8
16
9
31
33
32
TestSet
P vs. N
P vs. Q
N vs. Q
TruePos
16
16
9
TrueNeg
11
22
12
FalsePos
9
5
15
FalseNeg
2
2
11
Sens
89%
89
45
Spec
55%
81
44
Pred
71%
84
45
CinCSet
P vs. N
TruePos
27
TrueNeg
8
FalsePos
14
FalseNeg
1
I Sens
Spec
57%
I Pred
96%
70%
Parameters
M =2
Q=1
Parameters
M =2, Q= I
Event 2: Detection
EvalSet
TruePos
TrueNeg
FalsePos
FalseNeg
Sens
Spec
Pred
Parameters
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
10
8
13
33
1
2
1
3
91%
73
93%
94
92%
89
M =1
Q= 3
7
29
6
7
50
83
73
EvalSet
P2c vsNc
TruePos
TrueNeg
FalsePos
FalseNeg
Sens
Spec
Pred
Parameters
11
P2c vs. Qc
Nc vs. Qc
11
11
3
32
32
3
3
3
0
0
11
100%
100
21
79%
91
91
88%
93
71
M=
Q=3
left-right
Eval Set
TruePos
TrueNeg
FalsePos
FalseNeg
Sens
Spec
Pred
Parameters
P2c vsNc
8
14
0
3
73%
100%
88%
M =1
P2c vs. Qc
Nc vs. Qc
8
9
33
27
2
8
3
5
73
64
94
77
89
73
Q=2
left-right
53
94
74
Pred
84%
89
71
Parameters
M= I
Q= 2
Spec
79%
91
65
Pred
88%
94
64
Parameters
M =1
Sens
64%
64
64
Spec
93%
Parameters
M= 2
Q =1
77
Pred
80%
91
73
FalseNeg
3
4
3
Sens
73%
64
79
Spec
93%
97
74
Pred
84%
89
76
Parameters
M= 3
FalseNeg
2
2
9
Sens
Spec
33
100%
33
44
96
92
Pred
89%
89
73
Parameters
M =
M= 3
Pred
74%
86
63
Parameters
M 1
Q 3
EvalSet
P2c vsNc
P2c vs. Qc
Nc vs. Qc
TruePos
7
8
9
TrueNeg
FalsePos
14
0
33
26
2
9
FalseNeg
4
3
5
Sens
Spec
64%
100%
73
64
EvalSet
P2c vsNc
TruePos
FalsePos
3
FalseNeg
Sens
0
0
100%
12
5
64
FalsePos
1
FalseNeg
4
4
5
P2c vs. Qc
I1
Nc vs. Qc
9
TrueNeg
11
32
23
EvalSet
P2c vsNc
P2c vs. Qc
Nc vs. Qc
TruePos
7
7
9
TrueNeg
13
35
27
Eval Set
P2c vsNc
P2c vs. Qc
Nc vs. Qc
TruePos
8
7
11
TrueNeg
13
34
26
FalsePos
1
TestSet
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
TruePos
1
1
7
TrueNeg
FalsePos
16
0
23
22
2
11
1
0
18
1
9
1
100
100
_
Q 1
Sequence Re-run to generate a larger test set of P2c's:
Eval Set
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
TestSet
P2c vs. Nc
P2c vs. Qc
Nc vs. Qc
TrueNeg
12
23
26
TruePos
8
8
2
TruePos
11
11
1
FalsePos
7
2
0
FalseNeg
Sens
0
100%
3
17
73
11
Spec
63%
92
100
TrueNeg
11
21
FalsePos
6
4
FalseNeg
0
2
Sens
100%
85%
Spec
65%
84%
Pred
79%
84%
25
0
16
6%
100%
62%
Parameters
M = I
Q=3
54
Event 3: Prediction
Using six 200 IBI segments:
TruePos
TrueNeg
FalseNeg
FalsePos
Sens
Spec
Pred
Parameters
Acc
P l vs. P2
11
22
3
14
44%
88%
66%
CinC
7
61
11
21
25%
85%
68%
M =
Q=
Test series
Using 1200 IBI segments (similar to Event 1):
EvalSet
P2-PI
P2-Q
TruePos
4
5
6
TrueNeg
6
27
28
FalsePos
2
3
2
FalseNeg
4
3
2
Sens
50%
63
75
Spec
75%
90
93
Pred
63%
84
89
Parameters
M=3
TruePos
3
6
7
TrueNeg
7
27
25
FalsePos
FalseNeg
5
2
Spec
88%
90
83
Pred
63%
87
84
Parameters
M =1
Q= 3
1
Sens
38%
75
88
TrueNeg
6
20
20
FalsePos
2
3
3
FalseNeg
5
2
2
Sens
55%
82
75
Spec
75%
87
87
Pred
63%
85
84
Parameters
M =1
Q=3
PI-Q
TruePos
6
9
6
CinCSet
P2 vs. PI
TruePos
18
TrueNeg
41
FalsePos
31
FalseNeg
10
Sens
64%
Spec
57%
Pred
59%
Parameters
M = 1,Q = 3
* Note: number of frequency bins set to 20
EvalSet
TruePos TrueNeg FalsePos
FalseNeg
P vs.N
13
4
13
2
Sens
87%
Spec
24%
I Pred
Parameters
M=5,Q=1
PI-Q
EvalSet
P2-Pl
P2-Q
PI-Q
TestSet
P2-PI
P2-Q
1
3
5
Q=1
Gaussian Mixture HMMs on Spectrogram
Event1: Screening
53%
Toolkit commandlines used to generate above:
[priorli, transmatli, mixmatli, muli, Sigmali
=
initmhmm(modtri, 1, 5, 'full', 0)
[LL, priorl, transmatl, mul, Sigmal, mixmatl]
learnmhmm(mod-tri,
priorli,
transmatli,
muli, Sigmali, mixmatli, 5);
loglikl = log likmhmm(mod_tri, priorl, transmatl, mixmatl, mul, Sigmal)
EvalSet
Pvs.N
TruePos
12
TrueNeg FalsePos
4
11
FalseNeg
2
Sens
86%
Spec
27%
Pred
55%
Parameters
M=5,Q=2
EvalSet
TruePos
TrueNeg
FalseNeg
Sens
Spec
Pred
Parameters
FalsePos
55
Pvs.Q
|8
26
2
6
57%o
93%
81%
M=,Q=2
FalseNeg
1
Sens
93%
Spec
27%
Pred
59%
Parameters
M=5,Q=I
I Spec
Parameters
M=,Q=2
TestSet
Pvs.N
TruePos
TrueNeg
4
FalsePos
13
TestSet
Pvs.Q
TruePos
7
TrueNeg
21
FalsePos
FalseNeg
8
I Sens
1
53%
95%
Pred
1 78%
CinCSet
Pvs.N
TruePos
10
TrueNeg
4
FalsePos
8
FalseNeg
3
Sens
77%
Spec
33%
Pred
56%
11Parameters
M=5,Q= I
Pred
67%
74%
Parameters
M 1
Q= 2
11
Event 2: Detection
(Note: number of frequency
TruePos TrueNeg
EvalSet
1
17
P2c vs. Nc
2
130
P2c vs. Qc
TestSet
P2c vs. Nc
P2c vs. Qc
bins set to 20)
FalsePos
2
FalseNeg
7
Sens
13%
Spec
89%
0
11
15%
100%
Sens
9%
6%
Spec
94%
100%
Pred
61%
44%
Parameters
M 1
Sens
27%
33%
Spec
67%
77%
Pred
50%
70%
Parameters
M 3
Q 1
Sens
43%
30%
Spec
75%
Pred
67%
64%
Parameters
M=3
Q=1
TruePos
TrueNeg
FalsePos
FalseNeg
1
1
16
1
0
10
11
15
Q= 2
Event 3: Prediction
(Note: number of frequency bins set to 20)
FalseNeg
TruePos TrueNeg FalsePos
EvalSet
10
5
8
P2 vs. N
3
7
4
2
24
P2 vs. Q
TestSet
P2 vs. N
P2 vs. Q
TruePos
3
3
TrueNeg
15
15
FalsePos
5
3
FalseNeg
4
7
83%
56
Appendix B
Sample Spectrograms (frequency resolution of 20 bins)
Spectrograms of Type P (8 randomly selected):
II
I1111111 I
II~
~1
p
III
L'
~
I
'Ii
II
I
III
liii
~
2i;ITI~
III
II
II
L
(a)
(b)
1--i IT--
-T-TT
IJIIU
I
Is
fj
jjtt
Li
il!
I
I
(d)
(c)
The four spectrograms above demonstrate expected behavior: there is frequent ectopic
activity observable. The two spectrograms on the right are preceding PAF episodes (type
P-2), and appear to show greater frequency of ectopic activity towards the end. The two
on the left are not preceding PAF episodes (type P-1), and have scattered ectopic activity.
All spectrograms are followed by the spectrogram of its continuation set (either showing
PAF, or not), as separated by the dark blue line.
These two spectrograms (left, P-1; right, P-2)appear strangely quiescent for a patient
diagnosed with PAF.
RA1
II , 11
I
F
(e)
I
I l
1
I
k
11
1
l
ll f 4 1I V '
iii 1. 1
~'it
~ lif
'i
1
1
(t)
57
Relatively quiescent, with events near the
'111 M 6,
Alva;
end, suggesting precursor to PAF episode,
I
Ilk
I
"ll
but is actually distant from any episode, type
It
P-1.
(g)
(h)
Type P-2 with strong ectopic behavior long
before a reported episode. Visual inspection
of ECG revealed premature beats, bursts of
tachycardia, and indeed no PAF prior to the
2.0
Boo
continuation set.
8-1-
Spectrograms of Type N (8 randomly selected):
14IlT
'Ai
Iii
~
I
I
kill" IT;
,'? i~
I-
(b)
(a)
ItI
III
ii
III It
6
U
(c)
(d)
These five spectrograms of non-PAF
patients appears relatively healthy.
There are few visible ectopic beats.
(e)
58
JT-1
1
2
~I
III
ii'u
a
11111
2 jj
I
mIi~I
U
(g)
(f)
I
ilkI1
~I",
These non-PAF patients have spectrograms
II
'
that demonstrate ectopic behavior.
iI~it
I II
I.
oU
(h)
Spectrograms of Type
Q (8
randomly selected):
~
4
I
I
1 5ii
IF
1 'q I
Ilk%
(a)
(b)
~JI
j
'U
(c)
(d)
59
151
I
j
II
I
I
If1111
liii
(e)
4
1W ,
'I
1 11
if
ii
II
(f)
1
0.6
it
pill
N
OA
I
I l it'
Ii
it
2
4w
(g)
am
Soo
Iwo
12M
14M
(h)
All eight spectrums of type Q exhibit only minor irregularities, if any.
60
Appendix C
Sample Spectrograms (frequency resolution of 50 bins)
Spectrograms of Type P (same order as presented in Appendix B):
44
T&
I
I
IT
I" ri
fI
'a
ol
(b)
(a)
14
6_7
4
1.4
-Ii
12
oA
a;
04
ijj
9
02
.2
7,
.1
'rji i
I f'
(d)
(c)
Note the smearing effect in time compared to spectrograms with 20 frequency bins
(Appendix B).
On the bottom two spectrograms, greater frequency resolution could be observed in the
horizontal bands.
1A
12
0§e
08
0.6
'0,86
014
0
02
7A
0,2
0
0
200
400
60
80
100
10
1000
100
1
0
2DO
400
goo
Soo
1000
1200
1400
low
1800
(e)
61
-
Type P-1.
.4
4t
loo-
(g)
Type P-2 with strong ectopic behavior
F
-
2
long before a reported episode, serious
~
-
smearing in time.
- --- -- -------
-
4
(h)
Spectrograms of Type N (8 randomly selected):
I,
IikV,
-4
t2~ 12
2V.
~
~
IL"-"w
-
'r ~
2
2
(a)
(b)
I
-Mo
.
FL
I 'k2
(d)
(c)
These five spectrograms of non-PAF
'J1
........
.......
---
---
---
patients appears relatively healthy.
There are few visible ectopic beats.
---
(e)
62
la~
(g)
(f)
These non-PAF patients have spectrograms
Or,
that demonstrate ectopic behavior.
o2
(h)
63
References
Anselme, F. Saoudi, N. and Cribier, A. (2000). Pacing in Prevention of Atrial
Fibrillation: The PIPAF Studies. JournalofInterventional CardiacElectrophysiology,4
(Supplement 1): 177-184, January 2000
Brugada, R., Brugada, J., Roberts, R. (1999). Genetics of Cardiovascular Disease with
Emphasis on Atrial Fibrillation. Journalof Interventional CardiacElectrophysiology,
3(1): 7-13; Mar 1999
Chen, Y. J., Chen, S. A., Tai, C. T., Yu, W. C., Feng, A. N., Ding, Y. A., and Chang M.
S. (1998). Electrophysiologic Characteristics of a Dilated Atrium in Patients with
Paroxysmal Atrial Fibrillation and Atrial Flutter. Journalof InterventionalCardiac
Electrophysiolog, 2 (2): 181-186, June 1998
Fischer, A. and Mehta, D. (2002). Atrial Fibrillation after Atrial Flutter Ablation.
Journalof InterventionalCardiacElectrophysiolog, 6(2): 181-182; Jun 2002
Giorgberidze, I., Saksena, S., Mehra, R., Krol, R. B., Munsif, A. N., and Mathew., P.
(1997). Effects of High-Frequency Atrial Pacing in Atypical Atrial Flutter and Atrial
Fibrillation
Journalof Interventional CardiacElectrophysiology, 1(2): 111-123; Sep 1997
Jais, P., Shah, D. C., Haissaguerre, M., Hocini, M., Garrigue, S., and Clementy, J. (2000).
Atrial Fibrillation: Role of Arrhythmogenic Foci. Journalof InterventionalCardiac
Electrophysiology, 4: 29-37, Jan 2002
Luederitz, B. and Jung, W. (2000). Quality of Life in Atrial Fibrillation. Journalof
Interventional CardiacElectrophysiology, 4(1): 201-209; Jan 2000
Nisam, S. (1998). Can Implantable Defibrillators Reduce Non-arrhythmic Mortality?
JournalofInterventional CardiacElectrophysiolog, 2(4): 371-375; Dec 1998
Padeletti, L., Porciani, M. C., Michelucci, A., Colella, A., Costoli, A., Ciapetti, C.,
Pieragnoli, P., Musilli, N., and Gensini, G. F. (2000). Prevention of Short Term
Reversible Chronic Atrial Fibrillation by Permanent Pacing at the Triangle of Koch.
Journalof Interventional CardiacElectrophysiology, 4(4): 575-5 83; Dec 2000
Qi, Y, Minka, T. and Picard, R W. (2002), "Bayesian Spectrum Estimation of Unevenly
Sampled Nonstationary Data," Proceedingsof the InternationalConference on Acoustics
Speech and Signal Processing,Orlando, FL, May 2002.
Saksena, S. (1999). Electrophysiologic Study in Patients with Atrial Fibrillation: An Idea
Whose Time Has Come Yet Again. Journalof InterventionalCardiac
Electrophysiology, 3(2): 101-107; Jul 1999
64
Savelieva, I. And Camm, A. J. (2000). Clinical Relevance of Silent Atrial Fibrillation:
Prevalence, Prognosis, Quality of Life, and Management. Journalof Interventional
CardiacElectrophysiology, 4(2): 369-382; Jun 2000
Schwarz, M., Maglio, C., Akhtar, M., and Sra, J. (2000). Implantable Atrial Defibrillator
and Detection of Atrial Flutter. JournalofInterventionalCardiac Electrophysiology, 4
(1): 257-259, February 2000
Sopher, S. M. and Camm, A. J. (2000). Atrial Pacing to Prevent Atrial Fibrillation?
Journalof interventionalCardiacElectrophysiology, 4(1): 149-1 53; Jan 2000
Stefaneli, C. B., Bradley, D. J., Leroy, S., Dick, M., Serwer, G. A., and Fischbach, P. S.
(2002). Implantable Cardioverter Defibrillator Therapy for Life-Threatening
Arrhythmias in Young Patients. Journalof Interventional CardiacElectrophysiology,
6(3): 235-244, July 2002
Swerdlow, C. D., Schls, W., Dijkman, B., Jung, W., Sheth, N.V., Olson, W. H.,
Gunderson, B. D. (2000). Detection of Atrial Fibrillation and Flutter by a Dual-Chamber
Implantable Cardioverter-Defibrillator. American Heart Association. Circulation.
2000;101:878.
Timmermans, C., Rodriguez, L. M., Ayers, G. M., Siu, A., Smeets, J., Barenbrug, G. M.,
Wellens, H. J. J. (2000). Design and Preliminary Data of the Metrix TM Atrioverter
Expanded Indication Trial. Journalof Interventional CardiacElectrophysiology,4
(Supplement 1): 197-199, January 2000
Vikman, S., Maekikallio, T. H., Yli-Mayry, S, Pikkujamsa, S, Koivisto, A. M.,
Reinikainen, P., Airaksinen, K. E. J., Huikuri, H. V. (1999). Altered Complexity and
Correlation Properties of R-R Interval Dynamics Before the Sponatneous Onset of
Paroxysmal Atrial Fibrillation. American HeartAssociation Circulation, 100:2079-2084,
1999.
Weise, D. G. (2001). Atrial Fibrillation: A Risk Factor for Increased Mortality - An
AVID Registry Analysis. JournalofInterventional CardiacElectrophysiology, 5(3):
267-273; Sep 2001.
Zong, W., Mukkamala, R., Mark, R.G. (2001). A Method for Predicting Paroxysmal
Atrial Fibrillation Based on ECG Arrhythmia Analysis. Computers in Cardiology,
28:125-128, 2001.
65