2178 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006 Statistical Models of Reconstructed Phase Spaces for Signal Classification Richard J. Povinelli, Senior Member, IEEE, Michael T. Johnson, Senior Member, IEEE, Andrew C. Lindgren, Student Member, IEEE, Felice M. Roberts, Student Member, IEEE, and Jinjin Ye, Student Member, IEEE Abstract—This paper introduces a novel approach to the analysis and classification of time series signals using statistical models of reconstructed phase spaces. With sufficient dimension, such reconstructed phase spaces are, with probability one, guaranteed to be topologically equivalent to the state dynamics of the generating system, and, therefore, may contain information that is absent in analysis and classification methods rooted in linear assumptions. Parametric and nonparametric distributions are introduced as statistical representations over the multidimensional reconstructed phase space, with classification accomplished through methods such as Bayes maximum likelihood and artificial neural networks (ANNs). The technique is demonstrated on heart arrhythmia classification and speech recognition. This new approach is shown to be a viable and effective alternative to traditional signal classification approaches, particularly for signals with strong nonlinear characteristics. Index Terms—Reconstructed phase spaces (RPSs), signal classification, statistical models. I. INTRODUCTION W E present this new approach based on reconstructed phase spaces (RPSs) as an alternative to traditional linear approaches to signal analysis and classification, which are typically based on frequency domain characteristics. The underlying assumption of traditional linear methods is that the salient information about signal characteristics is contained in the frequency power spectrum. From a stochastic process perspective, the first- and second-order statistics of the signal are represented by this power spectral information. However, there are many types of signals, both theoretical and experimental, for which a frequency domain representation is insufficient, because it is generally not possible to distinguish between signals that have the same power spectra but differing phase and/or higher order spectra. For example, signals generated through nonlinear differential or difference equations typ- Manuscript received July 22, 2004; revised July 11, 2005. This work was supported in part by the National Science Foundation by Grant IIS-0113508, by the Department of Education GAANN Fellowship, by an American Heart Association Fellowship, and by the Milwaukee Foundation Frank Rogers Bacon Research Assistantship. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Tulay Adali. R. J. Povinelli, M. T. Johnson, and F. M. Roberts are with the Electrical and Computer Engineering Department, Marquette University, Milwaukee, WI 53233 USA (e-mail: richard.povinelli@mu.edu; mike.johnson@mu.edu; felice.roberts@mu.edu). A. C. Lindgren is with the ATK Mission Research Corporation, Dayton, OH 45430 USA (e-mail: andrew.lindgren@atk.com). J. Ye is with the Electrical Engineering Department, University of California, Los Angeles, CA 90095 USA (e-mail: jjye@icsl.ucla.edu). Digital Object Identifier 10.1109/TSP.2006.873479 Fig. 1. Signals, broadband power spectra, and reconstructed phase spaces for two examples. ically exhibit broadband power spectral characteristics that are difficult to interpret, because the spectra do not contain sharp resonant peaks. An illustrative example of two signals that are indistinguishable using power spectral classification methods are seen in Fig. 1. The first signal in Fig. 1 is the logistic map , with . The second is the logistic map’s FT surrogate, which maintains the signal’s Fourier transform amplitude, but randomizes the phase [1]. Fig. 1 provides a graphical motivation for the signal analysis and classification approach presented in this paper. Signal classification techniques based on power spectral information can not distinguish between different signals that have the same power spectrum, but such signals may be distinguishable in a RPS. Additionally, the underlying theory, outlined in more detail in Section III, guarantees that the dynamics of any system are fully described by a RPS generated from any single state variable, provided that the dimension of the RPS is greater than twice the box counting dimension of the original system [2]. From a practical perspective for discrete-time signals, this 1053-587X/$20.00 © 2006 IEEE POVINELLI et al.: RPSs FOR SIGNAL CLASSIFICATION means that a RPS, which is a multidimensional plot of the time series signal against time-delayed versions of itself, contains all the information of the underlying system [3]. The particular models used here are statistical distributions that can be learned over RPSs and then used to classify unseen signals. Both nonparametric distributions based on binning and occurrence counts and parametric distributions based on Gaussian mixture model (GMM) distributions are illustrated. Two types of classifiers are applied: a Bayes maximum likelihood classifier and an artificial neural network (ANN) classifier. Thus, the proposed approach is particularly suited for differentiating between signals where the phase of the signal contains important differentiating information, because such differentiating phase information is captured by the RPS, or where the state structure captured by the RPS reveals state variables and relationships between state variables that provide greater differentiability across classes than the original state variable by itself. The new approach is applied to two domains—distinguishing heart arrhythmia and speech phoneme classification. The results from these two applications show that this new approach can be applied to a variety of signal classification problems. Previous work in signal classification is discussed in Section II. Background of the underlying dynamical system’s theory is given in Section III, with detailed descriptions of the distribution models, learning algorithms, and classification techniques in Section IV. Section V discusses the experimental setup and results, with conclusions in Section VI. II. PREVIOUS WORK In addition to the traditional signal analysis models based on spectrum amplitude such as autoregressive modeling [4] or cepstral analysis [5], there is also a large body of work on signal detection and classification in the field of communications [6], based on statistical decision theory. These methods are founded on the existence of an a priori underlying transmission or symbol model that can be used to analytically determine closed form conditional distributions for each signal class and derive a maximum likelihood detector. In the case of arbitrary signal classification, with complex underlying systems such as heart rhythms and speech, such complete time-domain signal models do not exist. The general approach to classification for these systems is based on machine learning principles, using preprocessing to identify relevant features for classifier models, rather than a symbol-based time-domain representation. Most research in using RPSs focuses on estimating dynamical invariants, which are not sensitive to initial conditions or smooth transformations of the space. These invariants may be classified into three categories: metric (Lyapunov exponents and dimension), natural measure (density) [7], and topology. Various methods of estimating dimension have been proposed, such as correlation dimension [8], minimum phase space volume [9], and box counting [7]. Density estimation techniques are typically histogram [10], [11] or GMM [12] based. Topological analysis techniques include templates [13] and global vector field reconstruction [14]. Dynamical systems methods have been used in many applications. Lyapunov exponents have been used for classifying 2179 signals as chaotic [14] and as additional features for speech recognition [15]. Fractal dimension, Richardson dimension, Lempel-Ziv complexity, and Hurst exponent have been used as features for classifying simulated and beta emission signals [16]. Estimates of dimensions have been used to analyze speech [17] and heart rate variability [18]. However, estimation of metric invariants is highly sensitive to noise and sample size. Without large sample sizes and the use of nonlinear filtering techniques, their estimations are suspect [19]. Topological analysis techniques such as templates [13] have been applied to simulated chaotic systems [20], voltage [20], and laser [21] time series. Global modeling techniques have been applied to the computation of Lyapunov exponents [14] and biological signal classification [22]. The topological analysis approach fits a functional model to the attractor. This approach is seen in [22]. However, there is relatively little literature directly applying RPSs to classification. Our approach differs from metric-based approaches by modeling the density of the RPS directly instead of calculating a measure of an average trajectory divergence (Lyapunov exponents) or estimating the dimension. The work presented here develops statistical features of RPSs, whereas topological analysis approach builds global vector reconstructions. III. PHASE SPACE RECONSTRUCTION THEORY The basis of this approach is that given access to the state structure of a system, a classification of such systems can be developed. We start by presenting a theoretical construct of the and problem. Given a finite-dimensional system state space , the dynamics of the system, a system is described . We then define a set of all possible dyby the pair with a topology . Without loss of generality, we namics on to be -dimensional, because given any , assume can be replaced by . The system classification then becomes one of partitioning according to the requirements of the classification problem with a particular dynamics identisuch that , where fied with a particular partition , . The problem for real world systems is how to gain access to and represent for a particular system. The approach used here is phase space reconstruction, also known as phase space embedding, and was first proposed in [23]. The methods for representing , which are the contributions of this work, are presented in the following section. The central premise is that a space and its associated dynamics, which are topologically equivalent to the original and its dynamics , can be recovered or system space unfolded from a time series of observations of a single state . variable for the original system Whitney showed that an -dimensional topological space can [24], where an embedding is a homeobe embedded in morphic mapping from one topological space to another. Takens [3] showed that it is a generic property that the map defined by (1) 2180 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006 is an embedding, where is an -dimensional state space, is a twice continuously differentiable diffeomorphism that describes the dynamics of the system, and is a twice continuously differentiable function representing the observation of a single state variable. Working from these original theorems, Sauer and Yorke [2] extended Takens’ work by showing that almost every time-delay is an embedding, indicating that except for a set map of degenerate cases with measure zero, the topological equivalence property is guaranteed. Therefore, these theorems guarantee that for almost every time delay embedding, the reconstructed dynamics of such a map are topologically identical to the true dynamics of the system [2]. In addition, they found a , where tighter bound on the required dimension as is the boxcounting dimension of the attractor of the underlying system. In other words, we have a mechanism for obtaining a conto tinuous, one-to-one, and onto transformation from , where and is the trajectory matrix defined , , a seas follows. Given a time series quence of state variable observations, a trajectory matrix of dimension and time lag is defined as .. . .. . .. . (2) where each row vector in the matrix represents a single point in the space; (3) . A row vector is a point in where the RPS. The pattern traced out by in is typically referred to as an attractor, even when the technical definition of an attractor is not formally met. We adopt this terminology. Recall, that the system/signal classification problem in this work is addressed by transforming a signal from a particular system into a RPS, which has a mathematical correspondence with the underlying system. Therefore, given , the collection of all possible RPSs , the system/signal classification problem and a mechanism is to define a partition of such that for identifying a particular with a particular . As is discussed next, a statistical machine learning approach is taken for defining the partition and for identifying an with a . In general, the classification accuracy will depend on how of characterizes the signals well the model of a partition that are labeled as belonging to that partition and how different is from the models of other partitions. the model of IV. METHODS As aforementioned, the approach adopted here is based on direct statistical modeling of the RPS, through the estimation of a joint probability density function over that space. Both discrete nonparametric and continuous parametric distributions are implemented. There are three main steps to applying the new methods. The first step, data analysis, is to determine the time lag and embedding dimension of the RPS and compensate for any nonstationarity of the observation function. The second step is to generate statistical models of the attractors. This is done using both discrete and continuous models. The final step is to build classifiers. A. Data Analysis To construct a RPS from a signal, the dimension of the RPS and the time lag at which to sample the signal must be selected. Although proper selection of dimension and time lag may appear to be critical to the success of the methods presented in this paper, in practice the methods presented here are effective across a range of dimensions and time lags. In addition, existing methods for choosing time lag, such as the first minimum of the automutual information function or the first zero crossing of the autocorrelation [25], dimension, such as false nearest neighbor [26] or Cao’s method [27], or both [28], [29] do not optimize to classification accuracy. Hence, they provide only a first estimate of appropriate time lags and dimensions. Thus, as proposed in [12], the mode of distribution of the first minimum of the automutual information function across all signals is used as an initial estimate of the time lag. Similarly, the mean plus two standard deviations of the distribution of false nearest neighbor dimensions across all signals is used as an initial estimate of the dimension. We show that the best empirical time lag and dimension may differ from the initial estimate in Section V. is a twice continuously Recall from (1) that differentiable function representing the observation of a single state variable. For many applications, the gain of this function is not controlled across and sometime within signals. For the heart arrhythmia example presented in Section V, baseline wandering, where the mean of the signal changes when the electrical sensor is physically disturbed, is a problem. For both datasets, the gain varies across signals. Thus, a mechanism is needed to compensate for a time varying observation function. The type of compensation will depend on the nature of the observation function. For example, the electrocardiogram (ECG) signals can be low pass filtered to remove baseline wandering and standardized in . the time domain as follows: The next step is to form the RPSs as specified by (2) for statistical modeling. When a Bayes classifier is used, the training signals are used to form RPSs for each class by appending in a column fashion the ’s formed from the signals belonging to that class. This enables statistical models for each class to be learned. When the ANN classifier is used, all training signals are used to form a single RPS. This enables a consistent set of features to be extracted for training the ANN. B. Statistical Models The use of histograms as an estimate of the discrete probability mass function (pmf) of the attractor is straightforward. The space is divided into bins and occurrence counts in each bin over the training examples, divided by the total number POVINELLI et al.: RPSs FOR SIGNAL CLASSIFICATION 2181 Fig. 2. Bin-based distribution modeling (100 bins). Fig. 3. of points, provide a direct estimate of the posterior probability within each bin (4) in bin , the posterior probability for is . Given In order to improve the reliability of the pmf estimate, it is desirable to establish a binning system that more accurately reflects the underlying distribution. The distribution of points throughout the phase space is nonuniform; therefore, nonuniformly spaced bins are used. Fig. 2 illustrates the nonuniformly spaced bins. A two-step process is used to form the bins. First, along each dimension a set of intercepts is computed such that in that dimension the histogram formed by the intercepts is uniform. The outlying bins of this one-dimensional (1-D) histogram extend to infinity. Second, the higher-dimensional bins are form as hypercubes whose boundaries are formed by the intercepts determined in the first step. In forming a RPS for a signal of reasonable length, the intercepts are approximately the same in each dimension. Fig. 2 illustrates, as an example, the case of a 2-D RPS with nine intercept values per dimension, which forms a 10 by 10 bin mass function over the entire space. Disadvantages of the bin-based system include an exponentially increasing number of bins as RPS dimension increases and fixed bin boundaries. Thus, a GMM, which has soft boundaries and does not necessarily require an exponential number of mixtures with respect to RPS dimension, is studied. The probability is of (5) where is the number of mixtures, is a normal distribution with mean and covariance matrix , and is the mixture weight. With the constraint that ,a GMM is a probability model that can accurately reflect a wide range of distributions with arbitrary precision given enough mixture components. GMM-based distribution modeling (16 mixtures). Given training data, the parameters for the GMM can be estimated using the well-known Expectation-Maximization (EM) is determined emalgorithm [30]. The number of mixtures pirically, but is robust across a range of mixture components as shown in Section V. A visualization of a GMM is shown in Fig. 3, where the principle axes of the ellipses indicate the one standard deviation of each mixture in the model. Both the GMM and bin-based approaches require exponential increases in model complexity to effectively model arbitrary feature spaces as the dimensionality of such feature spaces is increased. Because of the structure and hard bound boundaries of the bin-based model, it suffers from these scalability issues when applied to a RPS, because it models the whole space. The GMM approach is not completely immune to scalability problems, but it does provide two features in dealing with scalability that the bin-based approach does not. First, through the use of EM, the GMM model more accurately models the distribution of points in the RPS, i.e., it models the attractor. An example of this is illustrated in Fig. 3. Second, if the number of mixtures is held constant, a GMM increases linearly in complexity as the RPS dimension is increased. C. Classifiers To use either the binning or GMM models for signal classification, training data from several different types of signals are normalized, embedded, and the selected statistical model is learned. We have used two classifiers: a Bayesian maximum likelihood and an ANN [31]. Bayes classification of a new test signal is accomplished by computing the conditional likelihoods of the signal under each learned model and selecting the model with the highest likelihood. The likelihoods are computed on a point-by-point basis from the learned attractor models (6) 2182 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006 where is the th class. As aforementioned, attractor models are learned for each class individually. Thus, for the bin-based method each class has its own set of intercepts with corresponding probability mass functions, while for the GMM each class has unique mixture means and variances with corresponding cluster weights. The use of ANNs as a classifier requires that a set of features be extracted from the learned statistical model. This is done by using all training signals to form a single RPS. This enables a consistent set of features to be extracted for training so that there is a common set of intercepts in the case of bins or a common set of mixtures in the case of GMMs. For a GMM, this arrangement is often referred to as a tied-mixture model. Hence, the bin-based system with global intercepts is referred to as a tied-bin model. The features are the GMM or bin weights and are calculated, respectively, for a particular signal by determining the weights from the global GMM or the weights for the global bins for that signal. A set of ANNs is trained, one for each class, by using the bin counts or mixture weights as inputs. The training output is set to one when the signal belongs to that class and to zero when it does not. Testing is accomplished by computing the features over a new signal sample, and selecting the class label corresponding to the ANN with the greatest output. V. EXPERIMENTS AND RESULTS We now present the application of the RPS-based signal classification methods to two different domains. The first application is to distinguishing heart arrhythmias. This application shows how the new approach can accurately classify heart arrhythmias with only a 2-s signal. The second application is speech phoneme classification. Here we give initial results showing that for small datasets the new approach is similar to traditional spectral methods. A. Heart Arrhythmia Classification The goal of this application is to rapidly and accurately classify four different heart rhythms using signals generated by lead II of an electrocardiogram (ECG) [10]. These rhythms are sinus rhythm (SR) and the three arrhythmias: monomorphic ventricular tachycardia (MVT), polymorphic ventricular tachycardia (PVT), and ventricular fibrillation (VF). This is a clinically relevant problem because different therapies are applied depending on the type of rhythm. No therapeutic action is taken for SR. For VF and PVT, electronic shock is the most prevalent therapy. In the case of VF, collapse occurs within seconds and death within minutes unless the VF is corrected with the passage of a large electrical current through the heart muscle. However, shocking SR can sometimes induce VF. The data for these experiments were obtained from six patients during intercardiac defibrillator implantation. Data was collected from lead II of a 12 lead ECG. The signals were antialias filtered with a cutoff frequency of 200 Hz and subsequently digitized at 1200 Hz. The dataset includes 306 s of SR, 126 s of MVT, 116 s of PVT, and 114 s of VF. Because the data was collected during surgery and the chest was open, the lead placement was not ideal. This shows that the proposed methods Fig. 4. Reconstructed 2-D phase space for examples of SR, MVT, PVT, and VF with = 11. are somewhat robust to variances in the exact state variable that is measured. Data were examined by two experts, whose classification initially agreed on only 80% of four-second epochs. After consultation, they concurred on the remaining 20%. Fig. 4 provides an example of the RPSs for the four rhythm types. The signals are segmented into 2 s intervals and zero meaned and unit normalized. The initial estimates of time lag and dimension are 11 and 9, respectively, using the method proposed in [12]. Figs. 5 and 6, respectively, show classification accuracy versus dimension with lag held constant and accuracy versus time lag with dimension held constant using a 16 mixture full covariance GMM and a Bayes classifier (GMM:Bayes). A 10-fold cross validation is performed for all experiments. Data segments for each fold are randomly selected with the constraint that the proportion of classes be the same within each. The experiments are thus not patient independent, in that a patient’s data may appear in more than one fold. Classification accuracy is the total number of correctly classified signals divided by the total number of signals. As can be seen in Figs. 5 and 6, the initial estimates of time lag and dimension are reasonable, but not optimal. Additionally, it can be seen that the GMM:Bayes method is relatively robust for this problem across a range of time lags from 11 to 15 and dimensions from 13 to 19. Using the best empirical time lag and dimension, the number of mixtures is varied from 1 to 64. The accuracy increases from 75.8% to 92.2% as the number of mixtures increases from 1 to 8. From 16 to 64 mixtures, the accuracy is relatively stable in the range from 94.5% to 95.5%. Table I shows the best results for the two methods, bin-based statistical model and Bayesian classifier (bin:Bayes) and GMM:Bayes, with the structure of each RPS and number of mixtures/bins. The sensitivity results for the GMM:Bayes classifier are 100.0% for SR, 95.2% for MVT, 82.7% for PVT, and 96.5% for VF. Also seen in Table I are the results of two baseline techniques commonly used in automatic cardioverter defibrillators— a heart rate-based method and a gradient pdf-based method—and frequency-based approach. Details of the first two methods can POVINELLI et al.: RPSs FOR SIGNAL CLASSIFICATION 2183 Fig. 5. ECG classification accuracy versus dimension with = 11 for a 16 mixture GMM and Bayes classifier. Fig. 7. 2-D RPS for examples of normalized phonemes: /ao/, /ow/, /s/, /z/ with = 2. B. Speech Recognition Fig. 6. ECG classification accuracy versus time lag with d = 9 for a 16 mixture GMM and Bayes classifier. TABLE I ECG ACCURACY RESULTS FOR EACH METHOD be found in [32] and [33]. In short, the heart rate method estimates the heart rate for each of the four rhythms and classifies according to heart rate bands. The gradient pdf approach builds models of the gradient of the ECG signal and classifies using a Bayesian approach. The frequency method uses the centroid frequency of the power spectrum as a feature [34]. The best results were obtained when using a single mixture to model this feature across each class. A maximum likelihood classifier is used on the test signals. The GMM:Bayes method outperforms all other tested approaches including the bin:Bayes. The difficulty in adjusting the granularity of the bin:Bayes method in comparison to the GMM:Bayes method is also seen. The GMM:Bayes is able to successfully model a 21–dimensional space. To apply the bin-based method to such a space with only one division per dimension would require over one million bins. We also see that the initial estimates for time lag and dimension are reasonable, but not optimal, estimates. In this set of experiments, we apply our RPS-based classification methods to automatic speech recognition, specifically speaker dependent isolated phoneme classification [35]. The classification of isolated phonemes is directly related to the problem of continuous speech recognition, a necessary element of such important tasks as automated transcription and machine translation. We compare a traditional cepstral-based method to the RPS approaches. Examples of four speech phoneme 2-D RPS are shown in Fig. 7. The speech signals, which are sampled at 16 KHz, are taken from the TIMIT corpus [36]. The speech signals in the TIMIT corpus contain expertly labeled time-stamped phoneme boundaries, which can be used to extract the isolated phoneme data. The illustrated phonemes are of two different types, vowels, /ao/ and /ow/, and fricatives, /s/ and /z/. The vowels have smooth locally correlated trajectories, whereas the fricatives contain random locally uncorrelated trajectories, as would be expected from the associated differences in speech production mechanisms. We use a single speaker’s data and implement several different RPS models. For this speaker, there are 417 total phoneme utterances belonging to 47 classes. One class of the standard 48 is not present in this data set. For each method, a model is learned for each of the 47 classes, yielding 47 models. These 47 classes are folded into 39 classes as is consistent with the literature [37]. A leave-one-out cross-validation testing is used for the MFCC approach and bin-based statistical model combined with an ANN classifier (bin:ANN), while a class balanced 10-fold cross validation is used for all other experiments. The initial estimates of time lag and dimension are 2 and 12, respectively. In Fig. 8, we can see that these initial estimates are reasonable, but not optimal. We can also see that the method’s accuracy is stable across the range of dimensions from 10 to 18 for the GMM:Bayes method. The best results for each approach are given in Table II. 2184 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006 Fig. 8. GMM:Bayes phoneme classification accuracy versus dimension with = 2 for a 16 mixtures. TABLE II SPEAKER DEPENDENT CLASSIFICATION RESULTS The baseline Mel-frequency cepstral coefficients (MFCC) approach uses eight mixtures to model the distribution of 12 MFCCs for each phoneme. This is compared to three RPS-based methods. The bin:Bayes method has the lowest accuracy of 24.0%. The bin:ANN with a tied-bin statistical model with 400 (20 20) bins in combination with an ANN classifier outperforms the bin:Bayes approach. The ANN uses bin counts as inputs to an ANN with a 400-10-3-1 (400 input neurons, 10 sigmoid neurons in the first hidden layer, three sigmoid neurons in the second hidden layer, and one linear output neuron) architecture. Two versions of the GMM:Bayes approach are shown. The first is directly comparable to the MFCC result. The second uses a larger dimension and number of mixtures to achieve an accuracy of 62.6%. The sensitivity results for this GMM:Bayes classifier are: vowels 61.8%, fricatives 71.9%, nasals 57.1%, semivowels 40.0%, stops 46.3%, and silence 82.5%. The results indicate that RPS methods are capable of discriminating between phonemes. From the results of the last experiment, we can see that for this speaker dependent task, a GMM:Bayes approach outperforms the MFCC approach. In contrast to the ECG classification, where the best RPS approach outperforms the best traditional approach by 24.5%, only the GMM:Bayes approach outperforms the MFCC approach and by only 11%. These results give an indication of how well RPS approaches will perform on tasks that are well characterized by linear models such as speech production. VI. CONCLUSION We have presented the use of RPS representations as a novel and theoretically well founded method for signal classification, and shown that statistical models of the RPSs are viable for capturing the information in such a space. The approach is applied to two signal classification tasks. A key advantage of such phase space signal models is that this representation is capable of capturing the full dynamic structure of any finite-dimensional generating system. In the limit as the amount of data and corresponding model dimension increases, the attractor structure can fully describe complex behaviors of a system of arbitrarily large order. Another advantage of this approach is its ability to distinguish signals of very short duration, where the frequency resolution needed for accurate classification is unattainable. This is useful in applications such as heart arrhythmia classification, where such rapid classification enables previously inconceivable therapy options. It is important to note the types of signal classification problems for which RPS-based methods will not work better than power spectral-based methods. Assuming the chosen modeling approach is of high enough order; power spectral-based methods will outperform RPS-based methods when the phase of the signal is not important for differentiating between classes, as would be the case for a linear system. This can be seen in the phoneme recognition task discussed above, as there is an ongoing debate as to how important phase is in speech recognition [38]. Additionally, RPS-based methods would underperform methods based on a single state variable when the state structure captured by the RPS fails to provide additional signal class differentiating information. REFERENCES [1] J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J. D. Farmer, “Testing for nonlinearity in time series: The method of surrogate data,” Physica D, vol. 58, pp. 77–94, 1992. [2] T. Sauer, J. A. Yorke, and M. Casdagli, “Embedology,” J. Stat. Phys., vol. 65, pp. 579–616, 1991. [3] F. Takens, “Detecting strange attractors in turbulence,” in Proc. Dynam. Syst. Turbulence, Warwick, 1980, pp. 366–381. [4] G. E. P. Box and G. M. Jenkins, Time Series Analysis: Forecasting and Control, Rev. ed. San Francisco, CA: Holden-Day, 1976. [5] B. Gold and N. Morgan, Speech and Audio Signal Processing. New York: Wiley, 2000. [6] J. G. Proakis, Digital Communications, 4th ed. Boston, MA: McGrawHill, 2001. [7] E. Ott, Chaos in Dynamical Systems. Cambridge, U.K.: Cambridge Univ. Press, 1993. [8] P. Grassberger and I. Procaccia, “Measuring the strangeness of strange attractors,” Physica D, vol. 9, pp. 189–208, 1983. [9] H. Leung, “System identification using chaos with application to equalization of a chaotic modulation system,” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 45, pp. 314–320, 1998. [10] R. J. Povinelli, F. M. Roberts, K. M. Ropella, and M. T. Johnson, “Are nonlinear ventricular arrhythmia characteristics lost, as signal duration decreases?,” in Proc. Comput. Cardiol., Memphis, TN, 2002, pp. 221–224. [11] D. M. Tumey, P. E. Morton, D. F. Ingle, C. W. Downey, and J. H. Schnurer, “Neural network classification of EEG using chaotic preprocessing and phase space reconstruction,” in Proc. 1991 IEEE 17th Ann. Northeast Bioeng. Conf., 1991, pp. 51–52. [12] R. J. Povinelli, M. T. Johnson, A. C. Lindgren, and J. Ye, “Time series classification using Gaussian mixture models of reconstructed phase spaces,” IEEE Trans. Knowl. Data Eng., vol. 16, pp. 779–783, 2004. [13] D. Sciamarella and G. B. Mindlin, “Unveiling the topological structure of chaotic flows from data,” Phys. Rev. E, vol. 64, pp. 036 209:1–7, 2001. [14] R. Brown, “Calculating Lyapunov exponents for short and/or noisy data sets,” Phys. Rev. E, vol. 47, pp. 3962–3969, 1993. [15] A. Petry, D. Augusto, and C. Barone, “Speaker identification using nonlinear dynamical features,” Chaos Solitons Fractals, vol. 13, pp. 221–231, 2002. [16] P. E. Rapp, T. A. A. Watanabe, P. Faure, and C. J. Cellucci, “Nonlinear signal classification,” Int. J. Bifurc. Chaos in Appl. Sci. Eng., vol. 12, pp. 1273–93, 2002. POVINELLI et al.: RPSs FOR SIGNAL CLASSIFICATION [17] N. Tishby, “A dynamical systems approach to speech processing,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 1990, pp. 365–368. [18] Y. Ashkenazy, P. C. Ivanov, S. Havlin, C.-K. Peng, A. L. Goldberger, and H. E. Stanley, “Magnitude and sign correlations in heartbeat fluctuations,” Phys. Rev. Lett., vol. 86, pp. 1900–3, 2001. [19] M. Banbrook, S. McLaughlin, and I. Mann, “Speech characterization and synthesis by nonlinear methods,” IEEE Trans. Speech Audio Process., vol. 7, pp. 1–17, 1999. [20] H. D. I. Abarbanel, T. A. Carroll, L. M. Pecora, J. J. Sidorowich, and L. S. Tsimring, “Predicting physical variables in time-delay embedding,” Phys. Rev. E, vol. 49, pp. 1840–1853, 1994. [21] T. Sauer, “Time series prediction using delay coordinate embedding,” in Time Series Prediction: Forecasting the Future and Understanding the Past, A. S. Weigend and N. A. Gershenfeld, Eds. Reading, MA: Addison-Wesley, 1994, pp. 175–194. [22] J. Kadtke, “Classification of highly noisy signals using global dynamical models,” Phys. Lett. A, vol. 203, pp. 196–202, 1995. [23] N. H. Packard, J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, “Geometry from a time series,” Phys. Rev. Lett., vol. 45, pp. 712–716, 1980. [24] H. Whitney, “Differentiable manifolds,” Ann. Math., ser. 2nd, vol. 37, pp. 645–680, 1936. [25] H. Kantz and T. Schreiber, Nonlinear Time Series Analysis. Cambridge, U.K.: Cambridge Univ. Press, 1997. [26] H. D. I. Abarbanel, Analysis of Observed Chaotic Data. New York: Springer, 1996. [27] L. Cao, “Practical method for determining the minimum embedding dimension of a scalar time series,” Physica D, vol. 110, pp. 43–50, 1997. [28] T. Gautama, D. P. Mandic, and M. M. V. Hulle, “A differential entropy based method for determining the optimal embedding parameters of a signal,” in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Hong Kong, 2003, pp. 29–32. [29] J. B. Vitrano and R. J. Povinelli, “Selecting dimensions and delay values for time-delay embedding using a genetic algorithm,” in Proc. Genetic Evolut. Computat. Conf., San Francisco, CA, 2001, pp. 1423–1430. [30] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Statist. Soc., ser. B, vol. 39, pp. 1–38, 1977. [31] T. M. Mitchell, Machine Learning. New York: McGraw-Hill, 1997. [32] I. Singer, Implantable Cardioverter-Defibrillator. Amonk, NY: Futura, 1994. [33] C. Cabo and D. S. Rosenbaum, Quantitative Cardiac Electrophysiology. New York: Marcel Dekker, 2002. [34] I. Romero and L. Serrano, “ECG frequency domain features extraction: A new characteristic for arrhythmias classification,” in Proc. 23rd Annu. Int. Conf. IEEE Eng. Med. Biol. Soc., Istanbul, Turkey, 2001, pp. 2006–2008. [35] M. T. Johnson, R. J. Povinelli, A. C. Lindgren, J. Ye, X. Liu, and K. M. Indrebo, “Time-domain isolated phoneme classification using reconstructed phase spaces,” IEEE Trans. Speech Audio Process., vol. 13, no. 4, pp. 458–466, Jul. 2005. [36] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, N. Dahlgren, and N. L. Dahlgren. Darpa timit acoustic–phonetic continuous speech corpus. [CD-ROM], 1993. [37] K.-F. Lee and H.-W. Hon, “Speaker-independent phone recognition using hidden Markov models,” IEEE Trans. Acoust., Speech, Signal Process., vol. 37, pp. 1641–1648, 1989. [38] L. D. Alsteris and K. K. Paliwal, “Importance of window shape for phase-only reconstruction of speech,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Montreal, QC, Canada, 2004, pp. I573–I576. Richard J. Povinelli (S’85–M’97–SM’01) received the B.S. degree in electrical engineering and the B.A. degree in psychology from the University of Illinois, Champaign-Urbana, in 1987, the M.S. degree in computer and systems engineering from Rensselaer Polytechnic Institute, Troy, NY, in 1989, and the Ph.D. degree in electrical and computer engineering from Marquette University, Milwaukee, WI, in 1999. From 1987 to 1990, he was a software engineer with General Electric (GE) Corporate Research and Development. From 1990 to 1994, he was with GE Medical Systems, where he served as a Program Manager and then as a Global Project Leader. From 1995 to 1998, he held the positions of Lecturer and 2185 Adjunct Assistant Professor with the Department of Electrical and Computer Engineering, Marquette University, where, since 1999, he has been an Assistant Professor. His research interests include data mining of time series, chaos and dynamical systems, computational intelligence, and financial engineering. Dr. Povinelli is a member of the Association for Computing Machinery, American Society of Engineering Education, Tau Beta Pi, Phi Beta Kappa, Sigma Xi, Eta Kappa Nu, Upsilon Pi Epsilon, and the Golden Key. He was voted Young Engineer of the Year for 2003 by the Engineers and Scientists of Milwaukee, Inc. Michael T. Johnson (S’93–M’93–SM’02) received the B.S. degree in computer science engineering and the B.S. degree in engineering and electrical concentration, both from LeTourneau University, Long View, TX, in 1989 and 1990, respectively. He received the M.S.E.E. degree from the University of Texas, San Antonio, in 1994 and the Ph.D. degree from Purdue University, West Lafayette, IN, in 2000. He was a design engineer from 1990 to 1991 for Micronyx, Inc. and Microtechnology Services, Inc., Dallas, TX, and from 1991 to 1993 for Datapoint Corp., San Antonio. He was a Senior Engineer and Engineering Manager for SNC Manufacturing, Oshkosh, WI, from 1993 to 1996. Since 2000, he has been an Assistant Professor of Electrical and Computer Engineering with Marquette University, Milwaukee, WI. His primary research area is speech and signal processing, with an emphasis in speech recognition systems. Other areas of research include machine learning, statistical pattern recognition, and nonlinear signal processing. Dr. Johnson is a registered professional engineer in the state of Wisconsin. He is a member of the Association for Computing Machinery, the Acoustical Society of America, and the International Speech Communications Association. His honor society memberships include Sigma Xi, Eta Kappa Nu, and Upsilon Pi Epsilon. Andrew C. Lindgren (S’02) received the B.S. degree in physics in 2001 from DePaul University, Chicago, IL, and the M.S. degree in electrical and computer engineering in 2003 from Marquette University, Milwaukee, WI. His past research experiences include working in the area of automatic speech recognition and enhancement. He has conducted research at Argonne National Laboratory in the area of nondestructive evaluation techniques for industrial ceramics (1999–2001), and also carried out research at DePaul University (2000–2001) in the area of condensed matter physics that involved modeling magnetic reversal properties on ultrathin films. In 2003, he joined ATK Mission Research, and he has worked on space-time adaptive processing for phased-array radar systems, bistatic radar geolocation algorithms, automatic target recognition, and hyperspectral data exploitation. His current research interests include array signal processing, radar signal processing, automatic speech recognition and enhancement, and machine learning. 2186 Felice M. Roberts (S’96) received the B.S. degree in electrical engineering and computer science and the M.S. degree in electrical and computer engineering from Marquette University, Milwaukee, WI, in 1987 and 2000, respectively. She is currently working towards the Ph.D. degree in electrical and computer engineering at Marquette University. Her current research interests include signal processing and nonlinear analysis of biomedical signals, particularly ECGs. She worked for GE Medical Systems from 1986 to 1988 as an Applications Engineer in magnetic resonance imaging. She was a systems engineer with EDS, Detroit, MI, and Ruesselsheim, Germany, from 1988 to 1996, developing and maintaining CGS, a proprietary CAD/CAM/CAE system. IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 54, NO. 6, JUNE 2006 Jinjin Ye (S’02) received the B.S. degree in electronic science and engineering from Nanjing University, Nanjing, China, in 1999, and the M.S. degree in electrical and computer engineering from Marquette University, Milwaukee, WI, in 2004, respectively. He was with the Institute of Human-Computer Interaction and Media Integration, Tsinghua University, from 1999 to 2001, and with the Speech and Signal Processing Lab, Marquette University, from 2002 to 2004. Currently, he is pursuing the Ph.D. degree in electrical engineering at the University of California, Los Angeles (UCLA). His current research interests include signal processing, speech analysis, and robust speech recognition and enhancement.