Prediction of Paroxysmal Atrial Fibrillation (PAF) Onset through Analysis of Inter-beat Intervals (IBI) By Charles Q. Du Submitted to the Department of Electrical Engineering and Computer Science In Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY Massachusetts Institute of Technology JUL 3 0 2003 May 9,2003 Copyright 2003 Charles Q. Du. All rights reserved. LIBRARIES The author hereby grants to M.I.T. permission to reproduceand distldhute Publicly paper andelectronic copies of this.t'(esis And to eraht othe4s the right to dod6. Author .............. Department 6fflectrical Engineering and Computer Science May 9, 2003 .......................... _Rorsrlind W. Picard M.I.Ij% s A4 or C ertified by ...................... Accepted by ....................... . .................. Arthur C. Smith Chairman, Department Committee on Graduate Theses ENG Prediction of Paroxysmal Atrial Fibrillation (PAF) Onset through Analysis of Inter-beat Intervals (1BI) By Charles Q. Du Submitted to the Department of Electrical Engineering and Computer Science In Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science Abstract PAF is a type of progressive cardiac arrhythmia that poses severe health risks, sometimes leading to ventricular arrhythmia and post-operative mortality. Some of the difficulties with treating PAF include screening for patients with the disorder, detecting To address these issues, episode occurrences, and predicting occurrences. electrocardiogram (ECG) data from the PhysioNet Online Database was used to develop a technique to screen, detect, and predict the onset of PAF. Methodologies explored included Hidden Markov Modeling on inter-beat intervals, entropy, and heart-rate spectrograms. Initial testing indicates the technique to be discriminant between PAF and non-PAF (possibly other cardiac disorder) patients (89% sensitivity and 55% specificity). Even more promising is its ability to discriminate between PAF patients and healthy Both results are from data not individuals (89% sensitivity and 81% specificity). involved in training. The IBI-based algorithm could be incorporated into medical devices with the potential of contributing to new healthcare technology. Acknowledgements I wish to thank my thesis advisor, Prof. Rosalind Picard, for all her cheerful encouragement, tireless support, and helpful criticisms. Her patient and insightful feedback over the course of eight revisions helped me tremendously in shaping my endeavors in this fascinating and critical field of study. Next on my thank you list would be Prof. Roger Mark, for being a taskmaster and mentor both. Thank you for your instruction in physiology and for keeping me in shape this term. I swear the heavy lifting I did TAing for 022 was the only real exercise I had time for. And to think, I was even paid to run around! Many thanks also to Yuan Qi and Ashish Kapoor. Yuan for always being willing to share his brilliant insights and works. Ashish for always being available, taking time from his own busy schedule to offer helpful advice. Thanks should also go to Dr George Moody and Wei Zong, for critical help in collecting data and interpreting results. And of course, to Amy, who fed me and kept me warm when I pulled my late night work sessions. 2 To my parents,Miting and Jin. Mom, Dad,I'll start working on a PhD... as soon as I figure out what I want it in. Promise! 3 Contents Page 1.0 Introduction 8 1.1 Paroxysmal Atrial Fibrillation ..................................... 8 1.2 Prior W orks ......................................................... 11 1.3 Specific Relevant Techniques ..................................... 12 1.4 Statement of Goals .................................................. 15 2.0 Theory 16 2.1 Electrophysiology .................................................... 16 2.1.1 Electro-Cardiograms .................................. 16 2.1.2 Inter-beat Intervals ....................................... 18 2.1.3 Spectrogram s ............................................. 18 2.2 Pattern Recognition .................................................. 2.2.1 Hidden Markov Modeling ............................ 3.0 Methods 20 21 22 3 .1 T oolk it .................................................................. 22 3.2 R aw D ata ................................................................ 23 3.3 Term inologies .......................................................... 28 3.4 Techniques ........................................................... 28 3.4.1 Prematurity Weighting ................................... 29 3.4.2 Discrete HMMs on Entropy ............................ 30 3.4.3 Gaussian-mixture HMMs on IBIs ...................... 30 3.4.4 Gaussian-mixture HMMs on Spectrogram ............ 31 3.5 Training and Testing .................................................. 31 3.5.1 Event I Screening ......................................... 32 3.5.2 Event 2 Detection ......................................... 35 3.5.3 Event 3 Prediction ........................................ 36 4 4.0 Results 38 4.1 Prematurity Weighting ........................................ 39 4.2 Discrete HMMs on Entropy ....................................... 40 4.3 Gaussian Mixture HMMs on IBIs .................................. 41 4.3.1 Event I Screening ...................................... 42 4.3.2 Event 2 Detection ....................................... 43 4.3.3 Event 3 Prediction ....................................... 44 4.4 Gaussian Mixture HMMs on Spectrogram ........................ 5.0 Discussion 44 46 5.1 Event I Screening ................................................... 46 5.2 Event 2 Detection ................................................... 47 5.3 Event 3 Prediction ................................................... 48 5.4 Entropy Failure Speculation ....................................... 48 6.0 Conclusion 49 7.0 Recommended Future Works 50 Appendix A: Complete HMM Results 52 Appendix B: Spectrograms (with 20 frequency bins) 57 Appendix C: Spectrograms (with 50 frequency bins) 61 References 64 5 List of Figures Page Figure 1 - Exponential weighting of PBs..................................... 13 Figure 2 - Demonstration of exponential weighting scheme................. 14 Figure 3 - Electrical anatomy of the human heart.............................. 17 Figure 4 - Comparing spectrograms of a PAF patient, a non-PAF patient, and a healthy individual.................. 20 Figure 5 - Raw data used in analysis........................................... 23 Figure 6a - Analysis of type-P, N, and Q IBI series histogram............... 26 Figure 6b - Analysis of type-PI & P2 IBI histograms ........................ 27 Figure 7 - Processing of data used in Event I screening...................... 32 Figure 8 - Processing of data used in Event 2 detection........................ 36 Figure 9 - Processing of data used in Event 3 prediction......................37 Figure 10 - Discretization of entropy series....................................41 6 List of Tables Page Table I - W ei's "last subm ission" Results ...................................... 14 Table 2 - Prematurity Weighting Test Ranges ................................ 29 Table 3 - Results Overview ..................................................... 38 Table 4 - Prematurity W eighting Results ........................................ 40 Table 5 - Discrete-output HMM on Entropy Results ........................ 41 Table 6 - Gaussian-mix HMMs on IBIs, Event I.......................... 42 Table 7 - Gaussian-mix HMMs on IBIs, Event 2 ............................ 43 Table 8 - Gaussian-mix HMMs on IBIs, Event 3 ............................. 44 Table 9 - Gaussian-mix HMMs on Spectrograms, Event I (freq res 50)... 44 Table 10 - Gaussian-mix HMMs on Spectrograms, Event 1 (freq res 20) ... 45 Table I I - Gaussian-mix HMMs on Spectrograms, Event 2................ 45 Table 12 - Gaussian-mix HMMs on Spectrograms, Event 3................ 45 7 1.0 Introduction 1.1 Paroxysmal Atrial Fibrillation Cardiac arrhythmic disorders have been known for over a hundred years, and atrial fibrillation(AF) in particular has now been recognized as the most common of all arrhythmias. Though comprehensive statistics are not available, best estimates reveal that it is probably present in more than 1% of the population. AF is estimated to be present in 4-6% of all people above the age of 65, but is also present in young and middle aged individuals [Stefaneli et al., 2002]. AF is associated with a high mortality rate, largely due to stroke and congestive heart failure[Wyse et al, 2001]. It is found incidentally in about 25% of all stroke admissions and has been shown to lead to poor control of the ventricular rate. In coronary bypass patients, AF may represent risk for immediate post-operative mortality as well as prolonged hospitalization. Usually AF is associated with certain symptoms: chest discomfort, fatigue, dizziness, palpitations, dyspnea, and syncope [Savelieva et al., 2000]. Interestingly, a significant portion of patients diagnosed with the disorder suffer from no obvious symptoms, and are only diagnosed incidentally during physical examinations, pre-operative assessments, or population surveys. Even in patients that report symptomatic episodes, Holter recording and transtelephonic recording have demonstrated a rate of asymptomatic episodes that exceeds symptomatic episodes by more than twelve-fold. Given that asymptomatic episodes probably have the same health risk, a significant portion of the aging population could potentially benefit from early detection and treatment. Paroxysmal (sudden attack) AF is a progressive of atrial fibrillation that could lead to permanent AF or other cardiovascular disorders. Cases of paroxysmal AF pose an 8 addition problem for detection, as episodes may be intense but short periods that could otherwise go unnoticed. To record these events as they occur, Holter devices are usually used to record 24-hour ECG data and then analyzed for signs of arrhythmic disorder. Treatment for these patients could include pharmacological suppression [Jais et al., 2000], high-frequency pacing (HFP), or RF ablation. The primary therapeutic goal is the restoration and maintenance of normal sinus rhythm(NSR), which leads to optimized cardiac functions [Luederitz et al., 2000]. Recent medical technology advances include the implant able atrial defibrillators, which apply synchronized shock therapy upon detection of atrial fibrillation. In practice, detection of atrial fibrillation by atrial defibrillators is a well known problem. Patients that undergo the implantation procedures are known to have atrial fibrillation, as determined through screening using Holter recording. Detection by implanted devices is also easier since the contacts used are in close contact with the cardiac tissue. Different defibrillators might use different detection procedures. One example is the detection system developed by Swerdlow et al[Swerdlow et al., 2000]. This system classified cardiac patterns as atrial arrhythmic by tracking P-waves (pattern matching ECG recording with idealized P-waves), and reported proper detection in 88% of 190 AT (atrial tachycardia) episodes, and 98% of 132 AF episodes. Adjustment of sensitivity to P-wave amplitude accounts for tradeoff between false positives from farfield R-waves and false negatives from missed P-waves. Note again that attainment of such high detection levels is due in large part to the lack of signal distortion that results from implantation. Surface ECGs (which are used in this particular study) demonstrate much greater variability in performance due to variability in noise inputs. Further 9 considerations in practical detection of AF include distinguishing between AF and ventricular fibrillations, which is the focus of dual-chamber detection algorithms. Various research projects have focused on prevention of atrial fibrillation, including a new approach based upon the genetic basis of the disease and curing through gene therapy. As part of the Human Genome Project, researchers have identified genes involved in diseases such as cardiomyopathy, Long QT syndrome, and atrial fibrillation. The scrutiny in this area could go a long way towards explaining AF amongst the young and middle aged, where no cause could be found in most cases. There is also potential for helping members of around 100 families around the world which have been identified with a familial form of the disease [Brugada et al., 1999]. Unfortunately, gene therapy techniques have yet to be fully developed, and prior genetic knowledge of susceptibility towards AF does not aid in predicting when exactly episodes will occur. On a more traditional track, researchers have also looked at AF prevention through permanent pacing or periodic pacing. The theory of this approach is that pacing may suppress triggers of AF such as atrial premature beats, or perhaps reduce atrial stretch which appears to predispose to AF. It has also been suggested that atrial pacing may increase the benefits of antiarrhythmic drugs. To date, only small scale studies have been done to indicate that atrial pacing may maintain NSR in patients with paroxysmal AF as the primary or sole arrhythmia. In such cases, interventions were applied after detection of abnormally high resting parasympathetic tone, or periods of relative bradycardia. The results have been mixed, with some patients remaining free from AF, others required additional anti-arrhythmic medication, and a few developed permanent AF after pacemaker implantation [Sopher et al., 2000]. 10 The focus of current treatment has been post-onset intervention, with an emphasis on specificity so as to prevent false positive detection [Schwartz et al., 2000]. An additional area that deserves more attention is in the development of a reliable PAF prediction scheme. If an accurate and reliable system of heart attack prediction could be developed, it would be a major breakthrough in healthcare efficiency, allowing for interventions that prevent episodes that would otherwise result in significant tissue damage. A prediction scheme that is both sensitive and specific to AF would not only serve to initialize intervention, but also give a reliable measure of the intervention's effectiveness. 1.2 Prior Works Much of the prior work studied and cited in this study has been derived from PhysioNet. PhysioNet is a public service of the Research Resource for Complex Physiologic Signals, funded by the National Center for Research Resources of the National Institutes of Health. It offers free access via the web to large collections of recorded physiologic signals and related open-source software, and was a strong motivating factor in directing research in cardiology. Physionet organizes the annual Computers in Cardiology competitions, with the goal to stimulate effort and advance the state of the art in facing a single clinically significant problem, and to foster both friendly competition and wide-ranging collaborations. In the 2001 CinC competition, the challenge was to develop a fully automated method to predict the onset of paroxysmal atrial fibrillation/flutter (PAF), based on the ECG prior to the event. I1 The top scorers in the 2001 challenge were announced during the 25 September plenary session of Computers in Cardiology in Rotterdam. The top score and the award in PAF screening was obtained by Gunther Schreier and colleagues of the Austrian Research Centers Seibersdorf (Graz, Austria), with a predictive accuracy [TruePos + TrueNeg / all]of 82%. In PAF prediction, the top score was obtained by Wei Zong and colleagues at the Harvard-MIT Division of Health Sciences and Technology (Cambridge, Massachusetts, USA), with a sensitivity [TruePos / TruePos + FalseNeg] of 79%. The effectiveness of the algorithm proposed in this paper would use the CinC challenge as one of its benchmarks, however, it is important to note the scoring metric used in the Prediction part. The score was not calculated based on the predictive accuracy as a whole, but instead on the number of consecutive ECG pairs from PAF patients that were correctly labeled as preceding or distal to an episode. The submitted labels for non-PAF ECGs were not used in scoring at all. Thus, the scoring on Prediction entirely focused on sensitivity, and was therefore biased toward schemes that considered each ECG pair to be from a PAF patient. More detailed description of the CinC test set will be covered in Section 3.2. Also missing from the data provided were continuation sets from the ECGs in the CinC test set, limiting the amount of data available for the purposes of PAF detection. 1.3 Specific Relevant Techniques The technique used in the competition by Wei Zong and colleagues at MIT [Zong et al., 2001] was especially insightful. They noted that the number and timing of atrial premature beats (APBs), appeared to be "of significant value in terms of predicting 12 imminent PAF episodes". In their analysis, they used an exponential weighting system that assigned a non-equal weighting value for each APB in an immediately preceding interval (Fig 1), with higher weighting given to APBs that occurred a short time prior. Figure 1 - Exponential weighting of APBs, figure taken from Wei et al., 2001 11 As could be seen, the r parameters in used 1 determining the weighting and w 01 *k" (1"On) classification include apb-threshold, window length (w), and exponential rate (tau). The APB- threshold determines the ratio of an IBI over the time-averaged IBI length where the beat would be defined as an APB. The time-averaged IBI length (RRavg) at time n is derived by the formula: RRavg(n) = 0.9 * RRavg(n-1) + 0.1 *RR(n). For example, if the APB-threshold is 20%, and the RRavg at the time is I sec, any immediately following beats with an IBI shorter than 0.8 seconds would be designated an APB. Note: strictly speaking, this would indicate a premature beat, not necessarily than an atrial premature beat. Presumably, Wei's beat detector ignored other types of premature beats (ventricular, junctional, etc.). For the purposes of this paper, these cardiac events will be referred to as Premature Beats, or PBs. The window length determines how far back in time to look for APBs. The exponential rate determines the relative weighting of APBs closer to the current time, as compared to those in the more distant past. Through a simple final PAF-threshold 13 analysis on the weighted results, they were able to produce a method of discrimination with high sensitivity and moderately high specificity. Figure 2 - Demonstration of weighting scheme. Note that ECG the that generated the lower event " 0 1 " MC ZW s log would be assigned a of probability higher having PAF under this weighting scheme. Figure taken from Wei et al., 2001 Their utilization of weighted PB count was the original inspiration for this paper, which focuses on inter-beat intervals, a measure that is the time between heart beats, which is significantly smaller in the instance of an PB. With Wei's permission, his "last submission" answers were analyzed and are presented (Table 1). The results are presented in terms of specificity (true-negatives/ [true-negatives + false-positives]), sensitivity (true-positives/ [true-positives + false-negatives]), and predictive accuracy ([true-positives + true-negatives] / [all results]). Wei's Results Screening* Prediction TruePos TrueNeg FalsePos 22 14 8 22 44 28 FalseNeg 6 6 Sens. 79% 79% Spec. 63% 61% Pred.Acc 72% 66% Table I - Wei's "last submission" record, not his best result, which was 79% Predictive Accuracy For the CinC competition, where the test data sets were pairs of ECGs, the algorithm described above was easily adapted. For Screening, the weighted value was computed for each ECG, and the maximum value between each pair was used. For Prediction, the weighted value for each ECG could be computed, and the ECG with the higher value is designated as pre-PAF. 14 In fact, Wei also noticed the bias towards sensitivity in the competition for prediction. He did not utilize any screening for non-PAF patients. Instead, he achieved a high score simply scoring each ECG pair based on the ECG with the higher APB count. This approach, unfortunately, would not work in real world scenarios. 1.4 Goals This study has multiple goals. We wish to develop a system that does a good job with screening, detection, and prediction. To determine how well the system achieves each of these goals, we set up a series of test events to demonstrate its viability. Here, we will state the goals, and then outline the testing procedures in Section 3.3. Test Event 1: PAF screening Event I is to determine if subjects at risk of PAF can be distinguished from those representing a larger population, based on their ECGs. The test will involve comparing those ECGs in Group P (PAF), against those in Group N (non-PAF, with other disease), as well as those in Group Q (normal). Test Event 2: PAF detection Event 2 is to determine if subjects identified as at risk of PAF could be reliably identified as currently suffering a PAF attack or not. The test will involve those ECGs in Group P-Ic (no-current-attack PAF) against those in Group P-2c(current-attack PAF), as well as against those in Groups N-c and Q-c. 15 Test Event 3: PAF prediction Event 3 is intended to determine if subjects in Group P have distinctive and detectable changes in their ECGs immediately before PAF. (In other words, is the imminent onset of PAF predictable in an individual known to be at risk of PAF?) A successful method for doing so should be able to determine for those in Group P-2c the detectable "precursors" for those about to suffer an attack. 2.0 Theory 2.1 Electrophysiology The normal functioning of the cardiovascular system depends in large part on its electrical activity. activity of its cells. The heart as a whole is controlled by the coordinated electrical This control must be communicated between what are called pacemaker cells of the heart, both to determine the rate of beating, and to influence the strength of contraction. Disturbances of the pacemaker communications disrupt its normal ability to generate and conduct electrical signals. These disturbances give rise to abnormal electrical activity, which frequently leads to cardiac malfunction and death. 2.1.1 Electro-Cardiograms To monitor the electrical behavior of the heart, cardiologist use a system called Electro-Cardiograms, or ECGs for short. It is at its most basic a system of electrodes used to measure electrical conduction on the skin. An ECG could be taken with as few as two "leads", which are wires attached to the skin. By convention though, leads are 16 positioned on the body in such a way that six vertical and six horizontal "views" of electrical activity could be monitored and presented. Figure A. 1 Xli- - Human Heart. Note the correspondence between electrical stimulation and ECG readouts. 11 e- 3 Electrical Anatomy of the 7ri In general, different phases of the ECGs correspond to cardiac excitation in different areas of the heart. The initial structure observed for a given heartbeat is the P wave, which results from the spread of excitation through the atria from the SA node. During the P-R interval that follows, the atria contract to expel blood into the ventricles as the electrical impulse propagates down to the AV node. A sharp spike called the QRS complex is observed in the ECG at the end of atrial contraction, followed immediately by a Q-T interval during which ventricles contract. The last structure observed is normally the T wave, which results from the repolarization of the ventricles. A full ECG data stream usually consists of several leads containing electrical levels sampled at around 128Hz. The resulting information is also subject to noise, both from 17 muscular distortions and problems with the wire attachment. Problems would then arise from analyzing the noisy data. 2.1.2 Inter-beat Intervals The different analysis techniques presented in this study focus on inter-beat intervals (IBIs), which are the measured time intervals between QRS complexes. In physiological terms, IBIs correspond to the time between initiation of consecutive ventricular contractions. Part of the advantage of IBIs is also its relative detection reliability in the presence of various noise sources. The physiological significance of Inter-beat Intervals has been established in studies of multifocal triggering and irregular nodal conduction. Interatrial septum pacing in particular [Padeletti et al., 2000] has also been demonstrated to be a safe and feasible technique for reduction of arrhythmia in general as well as associated mortality. 2.1.3 Spectrograms Efforts were also made to study additional features of the data, such as the spectrogram and its entropy. Since the entropy is a scalar measure of the spectrum, a good spectrum estimation technique was needed. For the field of bio-medical applications, the signals are non-stationary, and therefore require non-standard spectrum estimation techniques. To this end, Yuan Qi's algorithm [Qi et. al., 2002] was used, with results to be mentioned in Section 4.2. The spectrogram could be thought of as an application of Short-Time Fourier Transforms, with the frequencies on the y-axis and the time segments on the x-axis. The 18 power in each frequency at any particular time instant is represented by the amplitudes of the vertical cross-sections at that time instant. In standard literature, the power spectrum of heart rate for a normal person could be found represented by three frequency bands. The low-frequency, medium frequency, and high-frequency bands range from 0 to 0.5 Hz, and have been linked to various sympathetic and parasympathetic control mechanisms. Due to the complex nature of biological controls and the natural variability between individuals and physiological states, it is yet unclear how competing influences could be segregated. In the case of patho-electrocardiology in particular, it is unclear how the frequencies of activity are influenced by the myriad of ectopic activities associated with arrhythmias, each of which would introduce variability to instantaneous heart rate(ihr) measurements. For example, the parasympathetic controls might detect the sudden acceleration in SA nodal activity and act to suppress heart rate as a whole. Another possibility would be the atrial irregularities resulting in insufficient atrial filling, triggering the baro-receptor reflex to increase heart rate to compensate. It is hard to predict which effect would dominate. On the spectrogram, heart rate irregularity due to any input would appear as contributions to all frequency bands, or vertical streaks (Figure 4). In order to take advantage of the spectrogram features, the tradeoff would need to be made between time resolution and frequency resolution. Time resolution would be favored to better determine the duration of ectopic behavior. On the other hand, to better examine the changes in power spectrum preceding and following ectopic events, greater frequency resolution would be necessary regardless of actual control influences (sympathetic, parasympathetic, etc). 19 Entropy (defined as E[-log p(x)]) is a relative measure of variability present in the power spectrum in the sense that it is maximal for a uniform distribution. In the case of ectopic beats, one would expect high entropy values corresponding to times of episodic occurrences. - 4 S1 'Figure Spectrogram of PAF patient 12 II (pre-episode, 0i. type P-2, top) and non-PAF patient (type 2.2 Pattern middle), he alth y Recognition N, 100 Ela nd individual (type t !1,11 RAW Pattern Q, ), al recognition is a time 1favor resolution, note field related to vertical signal ar streaks due to insvdalgth ION0q! IP sprocessing and artificial intelligence. Roe ~is M I I the 200 general I given to a Iname 04 It diverse set of techniques with which one could classify and model signals. In general, there are two types of signal models which are utilized, known as deterministic models and statistical models. Deterministic models generally exploit some known properties of the signal, whereas statistical models try to characterize the signal as a parametric random process. 20 2.2.1 Hidden Markov Models Hidden Markov Models are a particular form of statistical models. In this specific application, the HMM will be used to model the observations (features of time series of IBIs) corresponding to the hidden physiological states (degree of PAF-behavior). Compared with observable Markov models, HMMs have the advantage of being widely applicable and could be applied with few presumptions about the underlying hidden states. Every HMM has a set of parameters, which could be trained to better account for certain sets of observed behavior. In the toolkit by Kevin Murphy (Section 3.1), the main parameters are Q (number of states in the model), M (number of Gaussian mixtures), 0 (number of possible output observations), and "diagonal" or "full" matrices). (for covariance Other important aspects of setting up the problem, such as initialization of states, state transition matrices, and observation probability distribution in states are handled by the toolbox. Given a set of initial state parameters, an HMM could be "trained" on sets of observation sequences to better describe the evolution of observations (the number of times the HMM loops over each piece of training data is yet another parameter). After training HMMs on different sets of observations, we could then evaluate different observation sequences using the HMMs. Each HMM would yield a probability of that particular observation sequence arose from its model, and the HMM that returns the highest probability is the one that best fits the observations. Two particular types of HMMs used in this study are the discrete-output HMMs and Gaussianmixture HMMs. 21 Discrete-output HMMs model events with discrete observations. The range of observations fit a finite "alphabet", with the drawback that quantization of observations could lead to loss of useful information. Gaussian-mixture HMMs are applied towards modeling of outputs better characterized as samples out of a continuous distribution. By adjusting the number of states and the number of Gaussian-mixtures, the model could account for a wide range of observations. One of the parameters on the Gaussian-mixture HMMs is "left-right", which allows for modeling using a particular form of state transitions. If set to "left-right", then the context would be a series of states where transition is forbidden to any state with lower indices. In terms of physiology, this could be used to characterize progressive changes over time. 3.0 Methods Throughout the period of this research project, several public domain toolkits were utilized to aid in the analysis of data. Each has proven to be quite useful, and will be briefly presented in Section 3.1. The availability of these tools significantly helped in speeding up the analysis. Description of the raw data are presented in section 3.2. The way the raw data was processed and organized, as well as how techniques were applied to training and testing are presented in Section 3.3. 3.1 Toolkits WAVE is an extensible interactive graphical environment for manipulating sets of digitized signals with optional annotations. Designed for workstations with the open 22 source Xview toolkit, WAVE was built using the WFDB library developed for physiologic signal processing, so it could be applied to any of a wide variety of data formats supported by the WFDB library. In addition, a beta version of WAVE re- implemented using the Gimp Tool Kit (GTK), which was portable to MS Windows, was also used. The current release of GTKWave is still in development and should be considered usable but potentially unstable. WAVE was used in pre-processing the ECGs to extract the IBI information. In addition to WAVE, the Hidden Markov Model (HMM) Toolbox for Matlab written and provided by Kevin Murphy was also used extensively. The functions were easy-to-use m-files that allowed for models ranging from discrete-output HMMs to Gaussian-mixture HMMs. Both modeling techniques were explored in the course of the analysis. 3.2 Raw Data Figure 5 - Raw data. note that all PAF episodes (if any) were only present in continuation sets. 30-minute ECC The entirety of P-cont. Group P PAF, 25 rs. sets type P-1. 25 sets type P-2 Group N 30-minute ECOs, non-PAF, possibly other cardiac disordet 50 sets 5-minutes,50 sets N-cont. 5-minute, 50 sets Q oupE the data consists of 4 Groups (Figure 5), Group Organized according to the types of patients. Cic test S Group P consists of 30-minute ECUs, mixture of type P's and type N's. 100 sets total data from patients with PAF. Group N contains data from patients with no PAF, but perhaps some other cardiovascular disease. Group Q consists of data from the MIT-BIH 23 Normal Sinus Rhythm Database, recorded from patients with no detectable arrhythmias. Finally, a test set that was originally used for the CinC challenge was incorporated to study the relative effectiveness of the technique developed in this study. contains 30-minutes worth of ECG recordings. Each record Each of the records in Group P and Group N has an additional corresponding 5-minute "continuation" record. A more detailed description is as follows. The Group P records contains 25 type P-1 records and 25 type P-2 records. P-2 records contain the ECG immediately preceding an episode of PAF, which can be verified by examining the like-numbered continuation record (type P-2c). Thus, for example, record p16 (a type P-2 record) immediately precedes the episode of PAF in record pl6c. The records of type P-1 contain 30 minutes of the ECG during a period that is distant from any episode of PAF (there is no PAF episodes during the 45-minute period before the beginning or after the end of the 30-minute record). The corresponding 5minute continuation record (type P-1c) shows that (at least!) the minutes immediately following the record do not contain a PAF episode. Since the data was collected with no manual audits, a few of the 30-minute records in this group may contain very short bursts of PAF episodes that escaped notice while the learning set was being compiled. The 50 records in Group N come from subjects who do not have documented atrial fibrillation, either during the period from which the records were excerpted or at any other time. The subjects include healthy controls, patients referred for long-term ambulatory ECG monitoring, and patients in intensive care units with possibly other cardiac problems. It was interesting to note that during preliminary analysis, the author 24 of this report printed IBI histograms of the presumed "normal" population only to discover obvious signs of arrhythmia. (Figure 6) Since the absence of PAF alone does not establish "normal" electrical cardiovascular behavior, additional data not included in the original CinC challenge had to be found from which benchmarks of strictly normal patterns could be inferred. Due to the discovery of abnormal ECG activity in the Group N patients, further data was sought out online, also from the Physionet online database. The data from the MIT-BIH Normal Sinus Rhythm detectable arrhythmias. Group Q, Database were recorded from patients with no These 18 ECGs are classified in this report as belonging to and used as a benchmark for comparison against the presence of abnormalities. Each of the original records in this group represented 20-hours worth of ECG recording. For the purposes of this study, four sections of recordings were taken from each 20-hour record, and analyzed. The procedure with which the sections were extracted are discussed in Section 3.3. 25 Figure 6a - Type-P (example is actually P-2) IBI series histogram (top), Type-N (middle), Type-Q (bottom), each across seven consecutive 200-IBI segments. The interval lengths are reflected by the x-axis, and the y-axis reflects the number of IBIs of that length. Note the pattern shows a tri-modal distribution suggestive of consistently irregular heart beats for type-N as well as type-P. A strictly normal IBI distribution would have a histogram consistently similar to that of type-Q, with a single Gaussian distribution and no outliers. ---- 1 00 soL - -- - - ----- - - - - -- I _1 100 50o - - 1oon 50 50 0. F 50 - --- sl . - -O 0 1009 S ~i~l1. - 15 - ----- - 1- 0. 0o 1 50 0 311 1000 5 loon 0 50 loo 0.5 1.15 15 100 50- 5 100 0 - 501 50 0 n 100 5 050 1009 100 - - - - 1 15 - --------.- OS 5 1 100 5 50 01 1000 15 01 100 50 OT 15 100 5O 50 0 -1-- - 100 0 0 .5 1 1.5 26 Figure 6b- Closer inspection of type P1 IBI histogram and its continuation Plc IBI histogram, as compared against that of type P2 and P2c. Note that the type P2 histogram contained greater spread, indicative of abnormal beats. Also, there was a noticeable distribution shift between the last 200-IBI histogram of P2 and P2c, whereas the distribution was consistent between P1 and Plc. 100 50 0 100? 05 50 100 05 1 115 0o -A65 100? 50 015 100 0 50- 1-5 1 - A.- 0 5 100 5 1 50 - 1 05 1.5 1 1.5 1-P3.cI 100 5C 1 L C 0 0.5 P2 10C 5C C C 5C 10 C I OC 05 1 5 5C 10 Sc 100 C 50 9 10 n. 5 so 11 C 100C 5C C0 0.5 P2c 100C 50 11.5 - R- 0 0.5 1 1.5 27 The final addition to the collection of data utilized was the test set provided by the original CinC challenge. This test set consisted of 100 30-minute records, which were classified either type P-1, P-2, or N. In the original competition, the test set was used to evaluate relative success in screening and prediction. It is used for the same purpose in this study. Since no continuation sets are available, no conclusions about detection could be drawn from this data set. Note that the ECGs for the CinC test set (only!) were grouped so that consecutive pairs belonged to a single patient. If the patient was a PAF patient, then one of the ECGs would be distal to any episodes while the other immediately precedes an episode. 3.3 Terminologies In evaluating the results from the different tests, a set of common measures will be used to determine relative success. From the field of health studies, the terms specificity (true-negatives/ [true-negatives + false-positives]) and sensitivity (true- positives/ [true-positives + false-negatives]) will be used, as well as predictive accuracy or accuracy ([true-positives + true-negatives] / [all results]). From signal processing, the terms detection rate (synonymous with sensitivity) and false-alarm rate (I - specificity) could also be used to characterize test results. 3.4 Analysis Techniques Four different analysis techniques were used in exploring how best to discriminate between subsets of patients in each of the test event categories. 28 3.4.1 Prematurity Weighting The first technique tried was a variant of the exponential weighting of PBs used by Wei Zong. The differences in the analysis were that 1) the IBI extraction in this study was done using WAVE, instead of the specially developed beat type detector used by Wei, 2) the definition threshold for an PB in this study was not fixed at 15% (as Wei assumed in his analysis), but was allowed to be a test variable. The parameters involved were: Tau, the exponential rate used in the weighting function; W, the length of time the exponential weighting function extends into the past; PBthreshhold, the ratio of an IBI over the time-averaged IBI length where the beat would be defined as an PB; and PAFthreshhold, the cutoff value of the weighted results between those designated as PAF or not. To determine the optimum parameters, the variables were tested over the following ranges (Table 2) in the training set. The optimum parameters determined were subsequently applied to the evaluation and testing sets. Table 2 Tau Initial Value 1 End Value 10 Search Increment 1/3 W PB threshold PAF threshold 5 .75 20 30 .95 320 1 .01 1 For the CinC test sets, the screening methodology followed Wei's original approach (Section 1.2), where the max weighted value of each ECG pair were taken to be representative of the person's state. 29 3.4.2 Discrete-output HMMs on Entropy The second technique studied was the application of discrete-output HMMs on the entropy of the signal. The relevant parameters included both the number of discretized levels, and the number of states. The level of discretization was chosen to be 100, which resulted in as little information loss as possible (Fig 10). states was varied between 2 and 15. The number of For the generation of the entropy signal, the inputs were taken as the instantaneous heart rate, the corresponding relative time markers, and a frequency range for the spectrogram. The frequency range was varied between 2, 4, and 6 Hz with no observable changes in analysis results. 3.4.3 Gaussian-mixture HMMs on IBIs The third analysis technique was the one given the most in-depth exploration. Gaussian-output HMMs were trained on classification types and tested in each of the event categories. The relevant parameters were the number of mixtures fitted (M), the number of states in the model (Q), the type of transition matrix (left right/not). The maximum number of iterations in training was set to be 5 for the entirety of the study, and the type of covariance matrix was set to 'diag'. between 1 &5, Q between The parameters were varied (M &3, and leftright between true/false), with all results that converged recorded in Appendix A. Important results were highlighted in the Section 4.3. 30 3.4.4 Gaussian-mixture HMMs on Spectrograms Finally, a last analysis technique was attempted, also using Gaussian-output HMMs. Instead of IBIs, the models were instead trained and tested on spectrograms of the heart rate signal. The same parameters applies from the above subsection, except that the type of covariance matrix was set to be 'full". Using covariance matrix type 'diag' resulted in numerical errors. Also of concern was the need to discard startup noise from the spectrogram estimation. To accomplish this, the algorithm was run with the full instantaneous heart rate input for the 30-minute ECG and the 5-minute continuation ECG appended together. The last 1400 samples of each output series were taken to be noisefree. 3.5 Training and Testing For each of the tests, the raw data is processed (as will be detailed in the following sub-sections), and divided into training, evaluation, and testing sets. Except for the CinC Challenge test sets, which provides the metric of comparison, all other data series are randomly divided into the three mentioned dataset types (training, evaluation, and testing) with a probability ratio of 1:1:1 (so each data series has an equal probability of being assigned to any one of the 3 groups). The three groups (which would on average contain roughly the same number of datasets) could then be used to train, evaluate, and test the algorithm, respectively. Since it is not known if any one person contributed more than one data set, we cannot say with certainty that the people in the training set were different than those in the evaluation and test sets. In the case of HMMs, the optimum parameters are experimentally determined using the training and evaluation sets. The 31 algorithm would be applied to the test set and the CinC set only once using the determined optimum parameters. For the Prematurity Weighting method, the optimum parameters were determined on the training sets, and tested exactly once each on the evaluation and testing sets. This structure minimizes bias in test results. 3.5.1 Event 1 : Screening The training and testing was arranged differently for each Event. Event I (PAF screening) in particular was based on comparing results aimed at discriminating between Groups P, N, and Q, rather than focusing on Group P as in Events 2 & 3 (PAF detection & prediction, respectively). The raw ECG data corresponding to the various patient groups were compiled and processed using the ihr, or instantaneous heart rate, function provided by WAVE. The resulting data is then inverted to derive 30 minutes worth of inter-beat intervals for each record. All data are then loaded into Matlab for processing and analysis Figure 7 - Event 1 screening. Each data series now contain uniform numbers of IBIs. Group P 1200 IB1s. PA F, 25 sets type P-1, 25 sets type P-2 Group N 1200 IBIs. non-PAF. possibly othem cardiac disoriexs. 50 seats Group Q 1200 IBIs, certified as non-srrhythlmic, 72 sets CinC test set 1200 iBIs, mixture of type P and type N ECOs, 100 sets Further pre-processing is done by partitioning each record into 1200 IBI segments (with extra IBIs dropped from the beginning of the 30 minutes). Since each record reflects a patient with variable heart rate, every fixed-time interval segment would likely 32 contain a different number of IBIs. By establishing a fixed number of IBIs per segment, this discrepancy between segments could be eliminated, while still capturing the sequence of heart beats that led up to a possible cardiac episode. The IBIs discarded from the beginning of the time series are known to be distal to PAF episodes, and assumed to be less relevant to PAF observation. The segment length (in other words, the number of IBIs included), was chosen to be 1200 due to the fact that the number of IBIs contained in the different 30-minute time series of Group P & N ranged from 3379 to 1331. In other words, the time series with the slowest average heart rate contained 1331 detected beats, and 1200 is an arbitrarily chosen segment length that all ECG time series could satisfy. The single exception to the 1200 IBI rule was one of the CinC test set series, which contained only 1142 IBIs over 30-minutes. The discrepancy is minor, and individual performance tracking of that specific data series did not indicate strong biases in results. The training and testing across techniques are kept consistent to allow comparison. Prematurity weighting was the first technique attempted. It serves as the benchmark for later comparison with other methods. The algorithm was partially reproduced from the original paper (as cited in Section 1.3) and tested on the CinC test dataset, so as to verify the results originally reported. The technique using discrete-output HMMs on entropy was implemented using the following procedure. Two separate HMMs were trained on the entropy estimates for Groups P & N (Fig 10), and then tested on the CinC test set. The data series used in this part of the analysis were derived from the full 30-minute ECGs using Yuan Qi's 3-3 algorithm. The analysis was repeated using the 1200 IBI segments with no noticeable difference in results. Given the surprising lack of obvious features and results, the technique was only applied with Event I Screening. Attempts were made to probe into the reason behind the failure. The technique using Gaussian-mixture HMMs on IBIs was more involved, with data from application to all 3 Events, and the promising results that were derived prompted further exploration. Group P and N data, as mentioned in Section 3.2, were each organized as 50 records, each with a segment of 1200 IBIs preceding PAF derived from the original 30-minute records. Group Q, which was originally eighteen 20-hour ECG recordings, was converted into IBI time series using the ihr function mentioned earlier. The first 8000 IBIs in each type-Q data series were separated into 4 consecutive sets of eighteen 2000 IBI segments. Each 2000 IBI segments were turned into 1200 IBI segments by truncating the last 800 IBIs of each segment. The end result was the 72 sets of 1200 IBI segments, as shown in Figure 7 above. Over the course of the IBI analysis, the maximum iteration parameter was kept at 5, and the covariance matrix type set to be diagonal. Finally, the Event I testing was attempted using Gaussian mixture HMMs trained on spectrograms directly. This approach builds on the use of entropy as a discriminatory measure, looking beyond variability in the power spectrum, and modeling instead the appearance of ectopic beats and possible shifts in rhythmic control frequencies. For this part, the covariance matrix type was set to be full, since the datasets modeled are larger, with 20 frequency bins per time sample for the purposes of this study (so each set of 34 input observations were a 20xl200 array). Diagonal covariance matrix was attempted but generated too many numerical errors for the results to be useful. 3.5.2 Event 2 : Detection For Event 2 (PAF detection), processing similar to that described for Event I was applied to each of the 5-minute continuation sets, instead of the 30-minute sets (Fig 8). The resulting IBIs are collected into a single 200 IBI segment each (with extra IBIs dropped from the end of the 5 minutes). The IBIs at the beginning are of more interest, since the data specification as introduced in the CinC competition stated that the PAF episode (when present) would be at the beginning of the 5-minute segment that has been marked off. A visual presentation of processed data is presented below in Figure 8. Since the CinC test sets did not include continuation sets, they could not be used in testing for Event 2. Note that one of the continuation sets of type N contained only 60 IBIs over 5 minutes. Checking the original ECG showed that the record flat-lined about a minute into recording, possibly due to lead detachment. 35 P-2c 200 Figure 8 - Event 2 detection. The original continuation sets were P-2c IBIs, 25 sets processed into 2(X) IB~s, 25 sets P-1c N-c 200 200 IBIs, 50 sets IBI series. The methods are trained and tested on type P-Ic, P-2c, N-c, and Q-c segments and with the goal of reliably detecting the P-2c types. IBIs, 50 sets Q-c Q-c 200 IB1s, 50 sets 200 1BIs, 50 sets For I training and testing, only Gaussian-mixtures were used. Separate HMMs were trained on the type PIC, P-2c, N and Q sets, and then evaluated for log-likelihood matching against the 200 IBI segments in the evaluation and test sets. Note the distinction emphasized here between type P-Ic's (no PAF episode) and type P-2c's (PAF episode). The ultimate goal is to distinguish type P-2c ECGs from all others. To accomplish this, the algorithm for randomized set assignments and testing was actually run twice. The first time testing for pair wise discrimination between P-2c, N, and discrimination between P-2c, P-Ic, and Q. (Fig Q, and the second time for pair wise 8) 3.5.3 Event 3 : Prediction Event 3 (PAF prediction) used similar pre-processing procedures on the raw data as the previous test events. The time series were converted into IBIs and segmented. In this test case, however, each of the 1200 IBI segments were further divided into six consecutive 200 IBI segments (Fig 9). The goal of the exercise was to determine whether a patient would soon be experiencing a PAF episode, before the episode occurs. In terms of this study, that means singling out the type P-2 datasets. The HMMs were trained on the first and last 200 IBI 36 segment (assumption is that the last 200 IBI segment prior to PAF episode would contain features that could be used for prediction) of each data series. The HMMs are then used to evaluate the log likelihood of matching the segments in the middle. If the segments in the middle were consistently more similar to the first segment (a higher log likelihood), then the record was classified as imminently-PAF. After trying out different comparative methods (using different combinations of segments), it was found that the 200 IBI segment immediately prior to the last 200 IBI segment was the most discriminant (which is what would be expected). This procedure was noteworthy in that the attempt was made to train on acceptable state and episodic states for an individual, rather than a population of patients. Figure 9 - Event 3 prediction. The long 1200-IBI segments were truncated and the resulting 200-IBI segments are used for predictive testing. Group P six 200-IBI segments, PAF, 25 sets type P-1, 25 sets type P-2 P-cont. 200 IBIs, 50 sets Group N six 200-IBI segments, non-PAF, possibly other cardiac disoiders, 50 sets N-cont. 200 Group Q six 200-IBI segments, certified as non-archythmic, 72 sets IBIs, 50 sets Q-co1nt 200 IBIs, 50 sets The results from using the segmented tracking were found to be useful, but yet another classification was explored using the general form used in Event I Screening. Separate HMMs were trained on P-2 types (the patients about to suffer a PAF episode), 37 P-I types (PAF patients not about to suffer an episode), and Q types. Training was done using the contiguous 1200 IBI segments used previously in Event 1. 4.0 Results The complete results of all parameters and techniques explored are presented in Appendix A. The important results are presented in order of techniques used in the sections to follow. Here is a quick overview of the best results achieved across the different technique used for the set of test events: Table 3 - Results Overview * discontinued due to poor results **result cited in Abstract **results from CinC test set (see Section 4.3.1) yielded even better results, and were cited in Abstract P vs. N Prematurity Discrete-output Gaussian-mix Gaussian-mix (EvalSet) Weighting HMM(Entropy) HMMs(IBIs) HMMs Screening Sensitivity:94 Specificity:64 Sensitivity:75 Specificity: 14 Sensitivity:88*** Specificity:54 Sensitivity:87 Specificity:24 Accuracy:48 Detection Accuracy: 81 Sensitivity: 100 Accuracy: 72 Sensitivity: 100 Accuracy:53 Sensitivity: 13 (P2c Specificity:93 Specificity:79 Specificity:89 Nc) Accuracy: 95 Accuracy:88 Accuracy:67 Prediction Sensitivity:90 Specificity:85 Sensitivity:43 Specificity:94 Sensitivity:27 Specificity:67 Accuracy:79 Accuracy:50 (Spectrograms) vs. * * Accuracy:87 Gaussian-mix P vs. N Prematurity Discrete-output Gaussian-mix (TestSet) Weighting HMMs(Entropy) HMMs(IBIs) HMMs(Spectrograms) Screening Sensitivity:93 * Sensitivity:89 Sensitivity:93 Specificity:44 Specificity:55 Specificity:27 Detection Accuracy:67 Sensitivity:91 Accuracy:71 Sensitivity: 100 Accuracy:59 Sensitivity:9 (P2c vs. Nc) Prediction * Specificity:82 Specificity:65 Specificity:94 Accuracy: 86 Sensitivity:71 Accuracy:79 Sensitivity:42 Accuracy:61 Sensitivity:43 * Specificity:56 Specificity: 100 Specificity:75 Accuracy:62 Accuracy:74 Accuracy:67 38 P vs. Q Weighting Gaussian-mix HMMs(IBIs) Sensitivity:94 Sensitivity:88 Specificity:93 Specificity:92 Specificity:93 Detection Accuracy: 93 Sensitivity: 100 Accuracy: 90 Sensitivity: 100 Accuracy:81 Sensitivity: 15 (P2c Specificity:85 Specificity:91 Specificity:100 Qc) Accuracy:87 Accuracy:93 Accuracy:74 Prediction Sensitivity:92 Specificity:95 Sensitivity:63 Specificity:90 Sensitivity:33 Specificity:77 Accuracy:94 Accuracy:84 Accuracy:70 Prematurity Gaussian-mix Gaussian-mix HMMs(Spectrograms) Prematurity (EvalSet) Screening P vs. vs. Q ** Gaussian-mix HMMs(Spectrograms) Sensitivity:57 (TestSet) Weighting HMMs(IBs) Screening Sensitivity:95 Sensitivity:89 Sensitivity:53 Specificity:91 Specificity:81 Specificity:95 Detection Accuracy:93 Sensitivity: 100 Accuracy:84 Sensitivity: 100 Accuracy:78 Sensitivity:6 (P2c Specificity:91 Specificity:78 Specificity:100 Accuracy:94 Sensitivity:80 Accuracy: 83 Sensitivity:82 Accuracy:44 Sensitivity:30 Specificity:96 Specificity:87 Specificity:83 Accuracy:94 Accuracy:85 Accuracy:64 vs. Qc) Prediction 4.1 Prematurity Weighting For this part of the analysis, the technique utilized was intended to reproduce and further expand upon the technique described by Wei Zong's paper (see Section 1.2). The end results of the simulation (at 76% classification accuracy ([True Pos + True Neg] / Total) for Event 1) closely matched the results (79% classification accuracy) reported by Wei (though the "'last submission" records on Physionet classification accuracy, see Wei's Results in Section 1.3). only showed a 72% The end parameters were a threshold of 317, with a PB defined as an IBI that is 23% shorter in duration than the weighted sum of previous IBIs. The difference in beat detectors used was probably the 39 main variability factor, and might have resulted in the slight (3%) reduction in classification accuracy. In comparing these results to the results to follow, it is useful to note that Wei Zong's APB method and the modified Prematurity Weighting method used here only utilizes the last 10 minutes of the 30-minute ECG data. In later analysis using HMMs, when some IBIs are dropped from the beginning of the 30-minute ECG, the results of the PB method still serve as a useful benchmark. The following table presents the results tested on the CinC data set using optimized parameters derived from training sets. Table 4 Event 1: Screening CinC Set P vs. N Full 30min TruePos 19 TrueNeg 19 FalsePos 3 FalseNeg 9 Sens 68% Spec 86% PredAcc 76% Parameters paf threshold=317 pb threshold=.77 window= 10 tau = 6.33 segments Event 2: Detection CinC Set P2c vs. Plc 5-min TruePos 19 TrueNeg 22 FalsePos 3 FalseNeg 6 Sens 76% Spec 88% PredAcc 82% Parameters paf threshold=24 pb threshold=.75 window-5 tau=6.33 segments Event 3: Prediction CinC Set P vs. N Full 30-min segments TruePos 17 TrueNeg 53 FalsePos 19 FalseNeg 11 Sens 61% Spec 74% PredAcc 70% Parameters paf threshold = 20 pb threshold = .75 window= 10 tau = 6 4.2 Approximate Entropy and Discrete-output HMM Using the technique developed by Yuan Qi, the inter-beat interval time series in the CinC data sets were converted into an entropy series, and discretized for HMM analysis. 40 The level of discretization was chosen to be 100, which resulted in as little information loss as possible (Fig 10). Figure 10 - discretized entropy series, with enough discrete levels to minimize information loss The entropy series were then analyzed using 4 discrete-output HMMs with 3 2 the results in the following 0 500 1000 1500 2000 2500 3000 table. The results were less than satisfactory given the 80 60 20 0 low predictive accuracy, and A. 40 1) histogram plots (Fig 6) of 500 1000 1500 2000 2500 3000 the entropy data were examined for visible features that would allow for discrimination between the population groups. No extraordinary differing factors could be found. Another remedy that was attempted was adjusting the resolution of the spectrum estimation, also with no obvious improvement. CinCSet P vs. N P vs. N P vs. N P vs. N TruePos 21 21 21 18 TrueNeg 3 2 3 5 FalsePos 19 20 19 17 FalseNeg 7 7 7 10 Sensitivity 75% 75% 75% 64% Specificity 14% 9% 14% 23% Pred Acc 48% 46% 48% 46% Parameters Q=2 Q=3 Q=4 Q = 15 Table 5 - 0 (number of discrete levels used) was 100, trained on Groups P & N and tested on CinC dataset for Event 1 Screening 4.3 Gaussian Mixture HMM on IBIs When the results using the entropy did not meet expectations, the decision was made to focus instead on IBIs. Histogram plots of Bis over different time segments revealed 41 some interesting patterns (Fig 6), notably the tendency of IBIs to diverge into clusters of long IBIs and short 1131s. This observation supports the initial observation that PBs become more prevalent before and during PAF. Even though histograms were useful in identifying this pattern of IBI change over time, it doesn't capture the pattern of IBI changes over time. As a result, the IBI series were analyzed using Gaussian-mixture HMMs, which would capture the both the ectopic events, but also additional information on timing and frequency. The results for the analysis are presented below in the order of Events attempted. 4.3.1 Event I: Screening Table 6 EvalSet P vs. N P vs. Q N vs. Q TruePos 14 14 4 TrueNeg 7 22 FalsePos 6 2 12 12 FalseNeg 2 2 9 *best predictive accuracy for both P vs. N and P vs. Parameters M =2 Q=1 50 Pred 72% 90 43 Spec 38% 75 25 Pred 66% 85 32 Parameters M =3 Sens 88% 88 31 Spec 54% 92 Sens 86% Q 16 18 FalsePos 8 6 0 100 6 6 18 7 46 22 12 FalsePos 9 5 15 FalseNeg 2 2 Spec 55% 81 Pred 71% 84 444 5 Parameters M =2 Q1 11 Sens 89% 89 45 TrueNeg 8 FalsePos 14 FalseNeg 1 Sens 96% Spec 57% Pred 70% Parameters M =2, Q1 TrueNeg 5 EvalSet P vs. N P vs. Q N vs. Q TruePos 14 *highest P vs. Q sensitivity TestSet P vs. N P vs. Q N vs. Q TruePos 16 16 9 TrueNeg CinCSet P vs. N TruePos 27 11 FalseNeg 2 Q=2 *parameters of best predictive accuracy on the evaluation set were used on the test set and the CinC set 42 4.3.2 Event 2: Detection Table 7 EvalSet P2c vs. Nc P2c vs. Qc Nc vs. Qc TruePos TrueNeg 11 1 8 7 33 29 FalsePos 3 2 6 FalseNeg Sens 0 100% 3 7 3 50 FalsePos 0 2 9 FalseNeg 4 5 Sens 64% 73 64 FalsePos FalseNeg Sens 3 0 0 100% 5 Spec 79% 94 83 Pred 88% 89 73 Parameters M = Q 3 Spec Pred 84% 89 71 Parameters M= I Q= 2 Spec 79% Pred Parameters 88% M=I 100 91 Q1 64 65 93 64 *highest predictive accuracy for P2c vs. Nc EvalSet P2c vs. Nc P2c vs. Qc Nc vs. Qc TruePos 7 8 9 TrueNeg 14 133 26 3 100% 94 74 *highest specificity for P2c vs. Nc EvalSet TruePos TrueNeg 1 P2c vsNc P2c vs. Qc 11 32 Nc vs. Qc 9 23 3 12 *highest predictive accuracy for P2c vs. Qc TestSet P2c vs. Nc P2c vs. Qc Nc vs. Qc TruePos I1 9 TrueNeg 11 21 FalsePos 6 6 FalseNeg 0 0 Sens 100% 100% Spec 65% 78% Pred 79% 83% 1 25 0 16 6% 100% 62% Parameters M 1 Q 3 *parameters of best predictive accuracy for P2c vs. Nc used for test set EvalSet P2c vs. PIc P2c vs. Qc Plc vs. Qc TruePos 7 8 9 TrueNeg 7 21 21 FalsePos 4 6 6 FalseNeg Sens 1 88% 0 100 2 82 Spec 64% 78 78 Pred 74% 83 79 Parameters M 1 Q3 *best predictive accuracy for P2c vs. Plc TestSet TruePos P2cvs.Plc8 P2c vs. Qc 9 Plc vs. Qc 7 TrueNeg FalsePos FalseNeg Sens Spec Pred Parameters 5 21 19 2 6 8 1 0 0 89% 100 100 71% 78 70 81% M=1 Q 3 83 76 *parameters of best predictive accuracy for P2c vs. Plc used for test set 43 4.3.3 Event 3: Prediction Table 8 Using six 200 IBI segments: Pl vs. P2 CinC Test series TruePos 11 7 Sens 44% 25% FalseNeg 14 21 FalsePos 3 11 TrueNeg 22 61 Pred 66% 68% Spec 88% 85% Parameters M 1 Q 1 Using 1200 IBI segments (similar to Event 1): Eval Set P2 vs. P1 P2 vs. Q P1 vs. Q TruePos 4 5 7 TrueNeg 6 27 25 FalsePos 2 3 5 FalseNeg 4 _ 3 1 Sens 50% 63 88 Spec 75% 90 83 Pred 63% 84 84 Parameters M = Q 3 *highest predictive accuracy for P2 vs. P1 and P2 vs. Q TrueNeg 6 20 20 FalsePos 2 3 3 FalseNeg 6 2 2 Sens 45% 82 75 Spec 75% 87 87 Pred 58% 85 84 Parameters M 1 Q 3 Plvs.Q TruePos 5 9 6 CinCSet P2 vs. PI TruePos 18 TrueNeg 41 FalsePos 31 FalseNeg 10 Sens 64% Spec 57% Pred 59% Parameters M =1, Q = 3 Sens 11% Spec 100% Pred 65% Parameters M = 5,Q =1 Pred 53% TestSet P2 vs. P1 P2 vs. Q *parameters used on TestSet and CinCSet 4.4 Gaussian Mixture IMMs on Spectrogram Event]: Screening (Note: number of frequency bins set Table 9 EvalSet P vs. N TruePos 2 ITrueNeg 28 FalsePos 0 to 50) FalseNeg 16 *best predictive accuracy and sensitivity TestSet Pvs.N TruePos 2 TrueNeg FalsePos FalseNeg Sens Spec 15 0 15 12% 100% I Parameters 1 M=5,Q=1 44 Event1: Screening (Note: number of frequency bins set to 20) Table 10 EvalSet Pvs.N TruePos 13 TrueNeg 4 FalsePos 13 FalseNeg 2 Sens 87% Spec 24% Pred 53% Parameters M=5,Q=1 Sens 57% Spec 93% Pred 81% Parameters M-1,Q=2 Spec *best predictive accuracy and sensitivity Eval Set Pvs.Q TruePos 8 TrueNeg 26 TestSet TruePos Pvs.N 13 TestSet P vs. Q CinCSet Pvs.N I FalsePos I FalseNeg 2 6 TrueNeg_ FalsePos FalseNeg 11 1 Sens 93% 27% Pred 59% Parameters 4 TruePos 7 TrueNeg 21 FalsePos 1 FalseNeg 8 Sens 53% Spec 95% Pred 78% Parameters M = 1, Q- 2 TruePos TrueNeg 4 FalsePos 8 FalseNeg 3 Sens Spec 33% Pred 56% Parameters 1 M=5,Q=I Spec 89% Parameters M 1 100% Pred 67% 74% Q 10 77% M=5,Q=1 *best parameters used on testset and CinCset Event 2: Detection (Note: number of frequency bins set to 20) Table I I EvalSet P2c vs. Nc P2c vs. Qc TruePos TrueNeg FalsePos 1 2 17 30 2 FalseNeg 7 0 11 13% 15% Sens 2 *best predictive accuracy and sensitivity TestSet TruePos TrueNeg FalsePos FalseNeg Sens Spec Pred Parameters P2c vs. Nc P2c vs. Qc 1 1 16 11 1 10 15 9% 6% 94% 100% 61% 44% M =1 Q=2 Sens 27% 33% Spec 67% 77% Pred 50% 70% Parameters M= 3 Q =1 0 *best parameters used on testset Event 3: Prediction (Note: number of frequency bins set to 20) Table 12 Eval Set P2 vs. N P2 vs. Q TruePos 3 2 TrueNeg 10 24 FalsePos 5 7 FalseNeg 8 4 *best predictive accuracy and sensitivity 45 TestSet P2 vs. N P2 vs.Q TruePos 3 TrueNeg 3 15 15 FalsePos 5 3 FalseNeg 4 7 Sens 43% 30% Spec 75% 83% Pred 67% 64% Parameters M= 3 Q 1 *best parameters used on testset 5.0 Discussion 5.1 Event 1: Screening In the evaluation sets, the technique of using Gaussian mixture modeling in Event I demonstrated significant specificity (True Neg/ [True Neg + False Pos]) ranging up to 92% and high sensitivity (True Pos/ [True Pos + False Neg]) ranging up to 100% in comparison between Group P and Group Q. The results were comparatively lower between those of Group P and Group N, with sensitivity high of 94% and specificity high of 54%. This makes some sense since one would expect that there is greater difference between patients with PAF and those who are healthy, as opposed to comparison between those with PAF and those who might have some unspecified arrhythmia. A comparison between Group N and Group Q resulted in only 46% sensitivity (screening for N-types) and 92% specificity. This could be due to the fact that some of those in Group N have no arrhythmic disorders. The test set for Event I verified the general distribution of results shown in the evaluation sets, using one state of 2 Gaussians as a model for the system. The final test was for discrimination between the set of Group Ns and Ps provided by the original CinC test set. The Gaussian-mixture technique provided sensitivity of 96% and a specificity of 57%, with an overall accuracy of 70%. What would make this even more promising is if the Gaussian-mixture outputs were used in conjunction with the Prematurity Weighting method, which has a higher specificity than it does sensitivity. 46 Testing for Event I using discrete-output HMMs on entropy yielded disappointing results by comparison. The sensitivity had a high of 75%, but had only a specificity of 23%. Using Gaussian-mixture HMMs on spectrograms with a frequency resolution of 50 bins yielded an accuracy high of 12% in the evaluation sets, and a specificity around 100%. The additional testing for Event I using Gaussian-mixture HMMs on spectrograms with a frequency resolution of 20 bins yielded much better results, with an accuracy high of 57% in the evaluation sets, and a specificity around 93%. Applying the algorithm to the CinC test set resulted in only a 56% prediction accuracy. 5.2 Event 2 : Detection In testing for Event 2, the algorithm for Gaussian-mixture HMMs on IBIs was used. The results from the evaluation sets were very promising, with a high of 100% accuracy in recognizing the immediate onset of PAF and up to 100% specificity depending on the parameters used. Since actual periods of PAF were not included in the CinC test set, the only one-time test was given based on the randomized test set. The result was an 89% accuracy of discrimination between PAF and non-PAF episodes. Given that PAF is an atrial disorder, and the technique used for detection is only utilizing the time between QRS complexes (a primarily ventricular event), these results further demonstrates the importance of IBIs as an overall indicator of cardiac health, as opposed to localized ventricular state. 47 5.3 Event 3 : Prediction For Event 3, the results from the segmented HMM analysis (200 IBI blocks) were the reverse of what was observed in Event 2, with a high specificity and a relatively low sensitivity. Interestingly, when tested against the test set provided in the original CinC competition, the technique returned a result of 68% predictive accuracy, but with a much higher specificity (86%) than sensitivity (25%), which would not have translated into a high score in the competition. In hindsight, the test could have been better structured given more training data for the healthy state for an individual, as the training data might have been insufficient. Lack of data preclude further exploration in this study. By comparison, the use of HMMs directly trained on the entire 1200 IBIs yielded better sensitivity (64% on the CinC set) and a better scoring on the CinC testing (18/22). Also noteworthy is the fact that when P-2 patients are trained and tested against type Q patients, the sensitivity and specificity both jump to be around 90%. 5.4 Entropy Failure Speculation The failure of the entropy analysis was a major surprise given previous literature and experiments [Vikman et al, 1999], which suggested that approximate entropy (ApEn) would decrease prior to cardiac episodes. ApEn measures the logarithmic likelihood that runs of patterns that are close to each other will remain close in the subsequent incremental comparisons. A time series containing many repetitive patterns has a relatively small ApEn; conversely, more random data produce higher values. The cause of failure with entropy is uncertain, but possibilities include either mismatched resolutions, or possible discrepancy in each patient due to emotional or physical distress 48 not documented at the time of recording. Whereas the variability in the power spectrum might not be sufficient to characterize PAF onset, the spectrogram itself offered two features worth considering. The first was the vertical streaking corresponding to ectopic beats, and the other the shifting of control frequency (relative strengths of sympathetic, parasympathetic output) due to ectopic behavior. The fact that direct training on the spectrogram yielded better results is encouraging, though further analysis utilizing tradeoffs between time resolution and frequency resolution could potentially prove even more enlightening. 6.0 Conclusions This study demonstrates the feasibility of screening, detection, and prediction of PAF based solely on the pattern of inter-beat intervals. Features extracted from IBIs, such as entropy and spectrograms were also explored as possible discriminators. The application of pattern recognition techniques builds on the previous work with threshold and weighting functions for narrowly defined events. Feasibility of using Gaussian- mixture HMMs in PAF analysis yielded unambiguous results. Especially noteworthy was the results regarding comparisons between those with PAF and healthy patients. While the techniques appear robust enough to differentiate between PAF and non-PAF patients with only reasonable accuracy, it is highly successful (~90% sensitivity and specificity) when used in discriminating between healthy individuals and those with PAF. 49 7.0 Recommended Future Works The key application of IBI-based PAF prediction lies in surface-lead ECG processing, where the noise from muscular distortions and lead contact movements contribute to a fundamental limit in the accurate tracking of small amplitude P-waves. By focusing on the larger and more distinctive QRS complexes, the techniques analyzed in this study are better suited to the type of commercial applications in the area of health and fitness products. There are several specific areas in the determination of health state that this author would like to see explored in the future. In terms of continuing research in use of IBIs in PAF analysis, the patient-specific training that was described in Section 3.3.3 bears great potential. The ultimate goal would be to enable devices to adapt to personal variability in physiology and provide prediction measures based on individual case history. To achieve this type of analysis, what would most likely be required is longer ECGs of PAF patients than what was used in this study. Training could then be applied to regions distal and in proximity to PAF episodes for that individual. In terms of academic questions left to explore, there remains the question of screening and prediction for other atrial arrhythmias. identifying other arrhythmic episodes? artificial stimuli, such as drugs? Are IBIs equally important in Are these techniques valid in the presence of Once prediction accuracies could be established to within a certain threshold, there would also be many questions that would follow regarding measurement of the exact preventative value for interventions, in terms of successfully prevented arrhythmic episodes. 50 Besides academic pursuits that could draw on this research, there are many potential uses of this technology to benefit public health and services. The prevalence of PAF in the general population has never been accurately determined, and this study could contribute much to making available to the public the necessary testing and evaluation that would otherwise require access to a medical institution. Rather than relying on skilled technicians, expensive hardware, and human expert interpretation, the algorithms could be easily automated and calibrated to work with commercially available fitness sensors. The ideas are currently being explored in various institutions. Among commercial entities, Motorola in particular has taken a strong interest in incorporating algorithms into mobile devices for providing valuable service to their customers. Continuation along the current paths of research would be likely to yield great future dividends in terms of quality health care that is both personal and pervasive. 51 Appendix A Complete HMM Results Gaussian-Mixture HMMs on IBIs Event 1: Screening EvalSet P vs. N P vs. Q N vs. Q TruePos 15 14 4 TrueNeg 5 19 9 FalsePos 8 5 15 FalseNeg 1 2 9 Sens 94% 86 31 Spec 39% 79 38 Pred 69% 83 35 Parameters M =3 Q=1 Toolkit commandlines used to generate above: [priorli, transmatli, mixmatli, muli, Sigmali) = init mhmm(mod_tri, 1, 3, 'diag', 0) [LL, priorl, transmatl, mul, Sigmal, mixmatl] = learn mhmm(mod_tri, priorli, transmatli, muli, Sigmali, mixmatli, 5); loglikl = loglik_mhmm(mod-tri, priorl, transmatl, mixmatl, mul, Sigmal) EvalSet P vs. N P vs. Q TruePos 14 14 TrueNeg 7 22 FalsePos 6 2 FalseNeg 2 2 Sens 88% 88 Spec 54% 92 N vs. Q Pred 72% 90 4 12 12 9 31 50 43 Parameters M =2 Q=1 Toolkit commandlines used to generate above: [priorli, transmatli, mixmatli, muli, Sigmali] init mhmm(mod tri, 1, 2, 'diag', 0) [LL, priorl, transmatl, mul, Sigmal, mixmatl) = learnmhmm(modtri, priorli, transmatli, muli, Sigmali, mixmatli, 5); loglikl logik_mhmm(mod-tri, priorl, transmatl, mixmatl, mul, Sigmal) EvalSet P vs. N P vs. Q N vs. Q TruePos 10 TrueNeg 3 8 16 3 22 FalsePos 10 8 2 FalseNeg 6 8 10 EvalSet P vs. N P vs. Q N vs. Q 15 18 FalsePos 8 6 4 12 12 EvalSet Pvs.N TruePos 14 TrueNeg 5 18 6 FalsePos 8 6 TruePos 15 Pvs.Q 16 N vs. Q 6 EvalSet P vs. N TruePos TrueNeg 5 Sens 63% 50 23 Spec 23% 67 92 Pred 45% 60 68 FalseNeg 1 1 9 Sens 94% 94 31 Spec 38% 75 50 FalseNeg 2 Sens 86% 18 0 7 100 46 Spec 38% 75 Sens 94% 15 TrueNeg 3 FalsePos 10 FalseNeg 1 P vs. Q 16 12 12 0 100 N vs. Q 5 11 13 8 38 25 Spec 23% 50 46 Parameters M =1 Q=1 Pred 69% 83 Parameters M 5 Q =5 43 Pred 66% Parameters M=3 85 32 Q=2 Pred 62% Parameters M =2 70 Q=2 43 left-right 52 EvalSet P vs. N TruePos 14 TrueNeg 5 FalsePos 8 FalseNeg 2 Sens 86% Spec 38% Pred 66% Parameters M =2 P vs. Q N vs. Q 13 4 19 14 5 10 3 9 81 31 79 58 80 49 Q=2 EvalSet P vs. N P vs. Q N vs. Q TruePos TrueNeg 13 1 2 8 FalseNeg 3 1 9 Sens 81% 94 31 Spec 8% 15 4 FalsePos 12 22 16 Pred 48% 43 32 Parameters M=e Q=2 left-right 8 33 Toolkit commandlines used to generate above: [priorli, transmatli, mixmatli, muli, Sigmali] = init mhmm(mod tri, 2, 1, 'diag', 1) transmatli, priorli, transmatl, mul, Sigmal, mixmatl] = learnmhmm(mod-tri, [LL, priorl, muli, Sigmali, mixmatli, 5); loglikl = log_lik mhmm(mod_tri, priorl, transmatl, mixmatl, mul, Sigmal) EvalSet P vs. N TruePos 15 TrueNeg 3 FalsePos 10 FalseNeg 1 Sens 94% Spec 23% Pred 62% Parameters M =1 Q=2 P vs. Q 16 8 16 0 100 40 60 N vs. Q 4 8 16 9 31 33 32 TestSet P vs. N P vs. Q N vs. Q TruePos 16 16 9 TrueNeg 11 22 12 FalsePos 9 5 15 FalseNeg 2 2 11 Sens 89% 89 45 Spec 55% 81 44 Pred 71% 84 45 CinCSet P vs. N TruePos 27 TrueNeg 8 FalsePos 14 FalseNeg 1 I Sens Spec 57% I Pred 96% 70% Parameters M =2 Q=1 Parameters M =2, Q= I Event 2: Detection EvalSet TruePos TrueNeg FalsePos FalseNeg Sens Spec Pred Parameters P2c vs. Nc P2c vs. Qc Nc vs. Qc 10 8 13 33 1 2 1 3 91% 73 93% 94 92% 89 M =1 Q= 3 7 29 6 7 50 83 73 EvalSet P2c vsNc TruePos TrueNeg FalsePos FalseNeg Sens Spec Pred Parameters 11 P2c vs. Qc Nc vs. Qc 11 11 3 32 32 3 3 3 0 0 11 100% 100 21 79% 91 91 88% 93 71 M= Q=3 left-right Eval Set TruePos TrueNeg FalsePos FalseNeg Sens Spec Pred Parameters P2c vsNc 8 14 0 3 73% 100% 88% M =1 P2c vs. Qc Nc vs. Qc 8 9 33 27 2 8 3 5 73 64 94 77 89 73 Q=2 left-right 53 94 74 Pred 84% 89 71 Parameters M= I Q= 2 Spec 79% 91 65 Pred 88% 94 64 Parameters M =1 Sens 64% 64 64 Spec 93% Parameters M= 2 Q =1 77 Pred 80% 91 73 FalseNeg 3 4 3 Sens 73% 64 79 Spec 93% 97 74 Pred 84% 89 76 Parameters M= 3 FalseNeg 2 2 9 Sens Spec 33 100% 33 44 96 92 Pred 89% 89 73 Parameters M = M= 3 Pred 74% 86 63 Parameters M 1 Q 3 EvalSet P2c vsNc P2c vs. Qc Nc vs. Qc TruePos 7 8 9 TrueNeg FalsePos 14 0 33 26 2 9 FalseNeg 4 3 5 Sens Spec 64% 100% 73 64 EvalSet P2c vsNc TruePos FalsePos 3 FalseNeg Sens 0 0 100% 12 5 64 FalsePos 1 FalseNeg 4 4 5 P2c vs. Qc I1 Nc vs. Qc 9 TrueNeg 11 32 23 EvalSet P2c vsNc P2c vs. Qc Nc vs. Qc TruePos 7 7 9 TrueNeg 13 35 27 Eval Set P2c vsNc P2c vs. Qc Nc vs. Qc TruePos 8 7 11 TrueNeg 13 34 26 FalsePos 1 TestSet P2c vs. Nc P2c vs. Qc Nc vs. Qc TruePos 1 1 7 TrueNeg FalsePos 16 0 23 22 2 11 1 0 18 1 9 1 100 100 _ Q 1 Sequence Re-run to generate a larger test set of P2c's: Eval Set P2c vs. Nc P2c vs. Qc Nc vs. Qc TestSet P2c vs. Nc P2c vs. Qc Nc vs. Qc TrueNeg 12 23 26 TruePos 8 8 2 TruePos 11 11 1 FalsePos 7 2 0 FalseNeg Sens 0 100% 3 17 73 11 Spec 63% 92 100 TrueNeg 11 21 FalsePos 6 4 FalseNeg 0 2 Sens 100% 85% Spec 65% 84% Pred 79% 84% 25 0 16 6% 100% 62% Parameters M = I Q=3 54 Event 3: Prediction Using six 200 IBI segments: TruePos TrueNeg FalseNeg FalsePos Sens Spec Pred Parameters Acc P l vs. P2 11 22 3 14 44% 88% 66% CinC 7 61 11 21 25% 85% 68% M = Q= Test series Using 1200 IBI segments (similar to Event 1): EvalSet P2-PI P2-Q TruePos 4 5 6 TrueNeg 6 27 28 FalsePos 2 3 2 FalseNeg 4 3 2 Sens 50% 63 75 Spec 75% 90 93 Pred 63% 84 89 Parameters M=3 TruePos 3 6 7 TrueNeg 7 27 25 FalsePos FalseNeg 5 2 Spec 88% 90 83 Pred 63% 87 84 Parameters M =1 Q= 3 1 Sens 38% 75 88 TrueNeg 6 20 20 FalsePos 2 3 3 FalseNeg 5 2 2 Sens 55% 82 75 Spec 75% 87 87 Pred 63% 85 84 Parameters M =1 Q=3 PI-Q TruePos 6 9 6 CinCSet P2 vs. PI TruePos 18 TrueNeg 41 FalsePos 31 FalseNeg 10 Sens 64% Spec 57% Pred 59% Parameters M = 1,Q = 3 * Note: number of frequency bins set to 20 EvalSet TruePos TrueNeg FalsePos FalseNeg P vs.N 13 4 13 2 Sens 87% Spec 24% I Pred Parameters M=5,Q=1 PI-Q EvalSet P2-Pl P2-Q PI-Q TestSet P2-PI P2-Q 1 3 5 Q=1 Gaussian Mixture HMMs on Spectrogram Event1: Screening 53% Toolkit commandlines used to generate above: [priorli, transmatli, mixmatli, muli, Sigmali = initmhmm(modtri, 1, 5, 'full', 0) [LL, priorl, transmatl, mul, Sigmal, mixmatl] learnmhmm(mod-tri, priorli, transmatli, muli, Sigmali, mixmatli, 5); loglikl = log likmhmm(mod_tri, priorl, transmatl, mixmatl, mul, Sigmal) EvalSet Pvs.N TruePos 12 TrueNeg FalsePos 4 11 FalseNeg 2 Sens 86% Spec 27% Pred 55% Parameters M=5,Q=2 EvalSet TruePos TrueNeg FalseNeg Sens Spec Pred Parameters FalsePos 55 Pvs.Q |8 26 2 6 57%o 93% 81% M=,Q=2 FalseNeg 1 Sens 93% Spec 27% Pred 59% Parameters M=5,Q=I I Spec Parameters M=,Q=2 TestSet Pvs.N TruePos TrueNeg 4 FalsePos 13 TestSet Pvs.Q TruePos 7 TrueNeg 21 FalsePos FalseNeg 8 I Sens 1 53% 95% Pred 1 78% CinCSet Pvs.N TruePos 10 TrueNeg 4 FalsePos 8 FalseNeg 3 Sens 77% Spec 33% Pred 56% 11Parameters M=5,Q= I Pred 67% 74% Parameters M 1 Q= 2 11 Event 2: Detection (Note: number of frequency TruePos TrueNeg EvalSet 1 17 P2c vs. Nc 2 130 P2c vs. Qc TestSet P2c vs. Nc P2c vs. Qc bins set to 20) FalsePos 2 FalseNeg 7 Sens 13% Spec 89% 0 11 15% 100% Sens 9% 6% Spec 94% 100% Pred 61% 44% Parameters M 1 Sens 27% 33% Spec 67% 77% Pred 50% 70% Parameters M 3 Q 1 Sens 43% 30% Spec 75% Pred 67% 64% Parameters M=3 Q=1 TruePos TrueNeg FalsePos FalseNeg 1 1 16 1 0 10 11 15 Q= 2 Event 3: Prediction (Note: number of frequency bins set to 20) FalseNeg TruePos TrueNeg FalsePos EvalSet 10 5 8 P2 vs. N 3 7 4 2 24 P2 vs. Q TestSet P2 vs. N P2 vs. Q TruePos 3 3 TrueNeg 15 15 FalsePos 5 3 FalseNeg 4 7 83% 56 Appendix B Sample Spectrograms (frequency resolution of 20 bins) Spectrograms of Type P (8 randomly selected): II I1111111 I II~ ~1 p III L' ~ I 'Ii II I III liii ~ 2i;ITI~ III II II L (a) (b) 1--i IT-- -T-TT IJIIU I Is fj jjtt Li il! I I (d) (c) The four spectrograms above demonstrate expected behavior: there is frequent ectopic activity observable. The two spectrograms on the right are preceding PAF episodes (type P-2), and appear to show greater frequency of ectopic activity towards the end. The two on the left are not preceding PAF episodes (type P-1), and have scattered ectopic activity. All spectrograms are followed by the spectrogram of its continuation set (either showing PAF, or not), as separated by the dark blue line. These two spectrograms (left, P-1; right, P-2)appear strangely quiescent for a patient diagnosed with PAF. RA1 II , 11 I F (e) I I l 1 I k 11 1 l ll f 4 1I V ' iii 1. 1 ~'it ~ lif 'i 1 1 (t) 57 Relatively quiescent, with events near the '111 M 6, Alva; end, suggesting precursor to PAF episode, I Ilk I "ll but is actually distant from any episode, type It P-1. (g) (h) Type P-2 with strong ectopic behavior long before a reported episode. Visual inspection of ECG revealed premature beats, bursts of tachycardia, and indeed no PAF prior to the 2.0 Boo continuation set. 8-1- Spectrograms of Type N (8 randomly selected): 14IlT 'Ai Iii ~ I I kill" IT; ,'? i~ I- (b) (a) ItI III ii III It 6 U (c) (d) These five spectrograms of non-PAF patients appears relatively healthy. There are few visible ectopic beats. (e) 58 JT-1 1 2 ~I III ii'u a 11111 2 jj I mIi~I U (g) (f) I ilkI1 ~I", These non-PAF patients have spectrograms II ' that demonstrate ectopic behavior. iI~it I II I. oU (h) Spectrograms of Type Q (8 randomly selected): ~ 4 I I 1 5ii IF 1 'q I Ilk% (a) (b) ~JI j 'U (c) (d) 59 151 I j II I I If1111 liii (e) 4 1W , 'I 1 11 if ii II (f) 1 0.6 it pill N OA I I l it' Ii it 2 4w (g) am Soo Iwo 12M 14M (h) All eight spectrums of type Q exhibit only minor irregularities, if any. 60 Appendix C Sample Spectrograms (frequency resolution of 50 bins) Spectrograms of Type P (same order as presented in Appendix B): 44 T& I I IT I" ri fI 'a ol (b) (a) 14 6_7 4 1.4 -Ii 12 oA a; 04 ijj 9 02 .2 7, .1 'rji i I f' (d) (c) Note the smearing effect in time compared to spectrograms with 20 frequency bins (Appendix B). On the bottom two spectrograms, greater frequency resolution could be observed in the horizontal bands. 1A 12 0§e 08 0.6 '0,86 014 0 02 7A 0,2 0 0 200 400 60 80 100 10 1000 100 1 0 2DO 400 goo Soo 1000 1200 1400 low 1800 (e) 61 - Type P-1. .4 4t loo- (g) Type P-2 with strong ectopic behavior F - 2 long before a reported episode, serious ~ - smearing in time. - --- -- ------- - 4 (h) Spectrograms of Type N (8 randomly selected): I, IikV, -4 t2~ 12 2V. ~ ~ IL"-"w - 'r ~ 2 2 (a) (b) I -Mo . FL I 'k2 (d) (c) These five spectrograms of non-PAF 'J1 ........ ....... --- --- --- patients appears relatively healthy. There are few visible ectopic beats. --- (e) 62 la~ (g) (f) These non-PAF patients have spectrograms Or, that demonstrate ectopic behavior. o2 (h) 63 References Anselme, F. Saoudi, N. and Cribier, A. (2000). Pacing in Prevention of Atrial Fibrillation: The PIPAF Studies. JournalofInterventional CardiacElectrophysiology,4 (Supplement 1): 177-184, January 2000 Brugada, R., Brugada, J., Roberts, R. (1999). Genetics of Cardiovascular Disease with Emphasis on Atrial Fibrillation. Journalof Interventional CardiacElectrophysiology, 3(1): 7-13; Mar 1999 Chen, Y. J., Chen, S. A., Tai, C. T., Yu, W. C., Feng, A. N., Ding, Y. A., and Chang M. S. (1998). Electrophysiologic Characteristics of a Dilated Atrium in Patients with Paroxysmal Atrial Fibrillation and Atrial Flutter. Journalof InterventionalCardiac Electrophysiolog, 2 (2): 181-186, June 1998 Fischer, A. and Mehta, D. (2002). Atrial Fibrillation after Atrial Flutter Ablation. Journalof InterventionalCardiacElectrophysiolog, 6(2): 181-182; Jun 2002 Giorgberidze, I., Saksena, S., Mehra, R., Krol, R. B., Munsif, A. N., and Mathew., P. (1997). Effects of High-Frequency Atrial Pacing in Atypical Atrial Flutter and Atrial Fibrillation Journalof Interventional CardiacElectrophysiology, 1(2): 111-123; Sep 1997 Jais, P., Shah, D. C., Haissaguerre, M., Hocini, M., Garrigue, S., and Clementy, J. (2000). Atrial Fibrillation: Role of Arrhythmogenic Foci. Journalof InterventionalCardiac Electrophysiology, 4: 29-37, Jan 2002 Luederitz, B. and Jung, W. (2000). Quality of Life in Atrial Fibrillation. Journalof Interventional CardiacElectrophysiology, 4(1): 201-209; Jan 2000 Nisam, S. (1998). Can Implantable Defibrillators Reduce Non-arrhythmic Mortality? JournalofInterventional CardiacElectrophysiolog, 2(4): 371-375; Dec 1998 Padeletti, L., Porciani, M. C., Michelucci, A., Colella, A., Costoli, A., Ciapetti, C., Pieragnoli, P., Musilli, N., and Gensini, G. F. (2000). Prevention of Short Term Reversible Chronic Atrial Fibrillation by Permanent Pacing at the Triangle of Koch. Journalof Interventional CardiacElectrophysiology, 4(4): 575-5 83; Dec 2000 Qi, Y, Minka, T. and Picard, R W. (2002), "Bayesian Spectrum Estimation of Unevenly Sampled Nonstationary Data," Proceedingsof the InternationalConference on Acoustics Speech and Signal Processing,Orlando, FL, May 2002. Saksena, S. (1999). Electrophysiologic Study in Patients with Atrial Fibrillation: An Idea Whose Time Has Come Yet Again. Journalof InterventionalCardiac Electrophysiology, 3(2): 101-107; Jul 1999 64 Savelieva, I. And Camm, A. J. (2000). Clinical Relevance of Silent Atrial Fibrillation: Prevalence, Prognosis, Quality of Life, and Management. Journalof Interventional CardiacElectrophysiology, 4(2): 369-382; Jun 2000 Schwarz, M., Maglio, C., Akhtar, M., and Sra, J. (2000). Implantable Atrial Defibrillator and Detection of Atrial Flutter. JournalofInterventionalCardiac Electrophysiology, 4 (1): 257-259, February 2000 Sopher, S. M. and Camm, A. J. (2000). Atrial Pacing to Prevent Atrial Fibrillation? Journalof interventionalCardiacElectrophysiology, 4(1): 149-1 53; Jan 2000 Stefaneli, C. B., Bradley, D. J., Leroy, S., Dick, M., Serwer, G. A., and Fischbach, P. S. (2002). Implantable Cardioverter Defibrillator Therapy for Life-Threatening Arrhythmias in Young Patients. Journalof Interventional CardiacElectrophysiology, 6(3): 235-244, July 2002 Swerdlow, C. D., Schls, W., Dijkman, B., Jung, W., Sheth, N.V., Olson, W. H., Gunderson, B. D. (2000). Detection of Atrial Fibrillation and Flutter by a Dual-Chamber Implantable Cardioverter-Defibrillator. American Heart Association. Circulation. 2000;101:878. Timmermans, C., Rodriguez, L. M., Ayers, G. M., Siu, A., Smeets, J., Barenbrug, G. M., Wellens, H. J. J. (2000). Design and Preliminary Data of the Metrix TM Atrioverter Expanded Indication Trial. Journalof Interventional CardiacElectrophysiology,4 (Supplement 1): 197-199, January 2000 Vikman, S., Maekikallio, T. H., Yli-Mayry, S, Pikkujamsa, S, Koivisto, A. M., Reinikainen, P., Airaksinen, K. E. J., Huikuri, H. V. (1999). Altered Complexity and Correlation Properties of R-R Interval Dynamics Before the Sponatneous Onset of Paroxysmal Atrial Fibrillation. American HeartAssociation Circulation, 100:2079-2084, 1999. Weise, D. G. (2001). Atrial Fibrillation: A Risk Factor for Increased Mortality - An AVID Registry Analysis. JournalofInterventional CardiacElectrophysiology, 5(3): 267-273; Sep 2001. Zong, W., Mukkamala, R., Mark, R.G. (2001). A Method for Predicting Paroxysmal Atrial Fibrillation Based on ECG Arrhythmia Analysis. Computers in Cardiology, 28:125-128, 2001. 65