Journal of Clinical Monitoring and Computing (2006) 20: 209–220 DOI: 10.1007/s10877-006-9023-2 IDENTIFYING OFFLINE MUSCLE STRENGTH PROFILES SUFFICIENT FOR SHORT-DURATION FES-LCE EXERCISE: A PAC LEARNING MODEL APPROACH Randy D. Trumbower, PT, PhD,1 Sanguthevar Rajasekaran, PhD2 and Pouran D. Faghri, MD, MS, FACSM2,3, C Springer 2006 Trumbower RD, Rajasekaran S, Faghri PD. Identifying offline muscle strength profiles sufficient for short-duration FES-LCE exercise: a PAC learning model approach. J Clin Monit Comput 2006; 20: 209–220 ABSTRACT. Functional electrical stimulation-induced leg cycle ergometry (FES-LCE) provides therapeutic exercise for persons with spinal cord injury (SCI). However, there exists no systematic approach to predict whether an individual has sufficient thigh muscle strength necessary for FES-LCE exercise. Objective. To develop and test a Probably Approximately Correct (PAC) learning model as a predictor of thigh muscle strengths sufficient for short-duration FES-LCE exercise and compare the model’s performance with other well-known statistical methods. Methods. Six healthy male individuals with SCI, having age (32.0 ± 12.5 years), height (1.8 ± 0.04 m), and weight (79.12 ± 10.76 kg), participated in static and dynamic experiments. During static experiments, absolute crank torque measurements were used to estimate thigh muscle strengths in response to maximum FES intensities of 70 mA, 105 mA, and 140 mA at fixed crank positions on an FES-LCE. During dynamic experiments, changes in power output measurements were used to classify rider performance as ‘Fatigue’ or ‘No Fatigue’ during short-duration FESLCE at maximum stimulation intensities of 70 mA, 105 mA, and 140 mA and flywheel resistance levels of 0/8th, 1/8th, and 2/8th kilopounds. A Probably Approximately Correct (PAC) learning model was developed to classify static offline muscle strength observations with online rider performances. PAC’s discriminatory power was compared with logistic regression (LR), Fisher’s linear discriminant analysis (LDA), and an artificial neural network (ANN) model. Results. PAC and ANN learning models correctly identified 100% of the training examples. PAC’s average performance on the validation set was 93.1%. The ANN and LR performed comparable with 92.8% and 93.1% accuracy, respectively. The LDA method faired well on the validation set at 89.9%. Conclusions. PAC performed well in identifying muscle strengths associated with the online performance criterion. Although PAC did not perform best during cross-validation, this model has many advantages over the other methods. PAC can adapt to changes in classification schemes and is more amenable to theoretical analyses than the other methods. PAC learning has an intuitive design and may be a practical choice for classifying muscle strength profiles with well-defined performance criteria. KEY WORDS. PAC learning, statistical models, FES-LCE, SCI. 1 Sensory Motor Performance Program, Rehabilitation Institute of Chicago and Northwestern University, 2 School of Engineering, Department of Computer Science and Engineering, 3 School of Allied Health, Department of Health Promotion, University of Connecticut, Storrs, CT USA Received 23 January 2006. Accepted for publication 9 April 2006. Address correspondence to Pouran D. Faghri, University of Connecticut, Koons Hall, U-2101, 358 Mansfield Road, U-2101, Storrs, CT 06269-2101. E-mail: pouran.faghri@uconn.edu. INTRODUCTION The goal of functional electrical stimulation-induced semireclined leg cycling (FES-LCE) is to provide therapeutic exercise for persons with spinal cord injury (SCI) [1–3]. However, a major clinical concern is the ill-defined approach used to determine whether a prospective rider has the necessary muscle strength to successfully participate in FES-LCE exercise. Most clinicians use trial-and-error to 210 Journal of Clinical Monitoring and Computing Vol 20 No 3 2006 grade an individual’s response to FES and determine when to initiate and by how much to progress a patient’s exercise regimen. There is no systematic method to evaluate an individual’s strength capabilities for FES-LCE exercise. To effectively prescribe FES-LCE, better defined methods for identifying rider strength potentials in response to FES are called for. For many clinicians and researchers, quadriceps (QUAD) strength training is considered preliminary to participation in FES-LCE. Petrofsky et al. [4] prescribed an open-chain weight-lifting program for prospective FES-LCE subjects to ensure their success during FES-LCE. The testing included FES-induced short-arc-quad sets where subjects were required to repeatedly lift a 7 kg weight for 15 min consisting of 3-second lift, 1-second hold, and 3-second release followed by 6-second rest period. Other progressiveresistance QUAD strengthening protocols are also well documented [5–8]. However, these strength training procedures do not consider the contributions of stimulated hamstrings (HAM) and gluteus maximus (GLUT) muscles nor the QUAD response to FES at different joint configurations and musculotendon lengths that are more consistent with leg cycling exercise. Even still, some studies consider only inclusion of persons with SCI with one or more months of experience in FES-LCE without report on subjects’ prior thigh strength in response to FES [1, 9–12]. Thigh muscle strength is an important determinant in FES-LCE therapeutic exercise. Inadequacies in muscle force production make FES-induced pedaling near impossible. During FES-LCE, a muscle’s force generating capability must be sufficient to accelerate the crank in the forward direction. The effectiveness of forward pedaling is ultimately dependent on individual muscle responses to FES and how these responses translate to pedal power. Direct measure of maximal isometric force generating capabilities of the individual thigh muscles would be ideal for assessing individual muscle strengths, but not appropriate in fracture-prone persons with SCI [13]. Schutte et al. [14] proposed an alternative method using a musculoskeletal model to approximate individual thigh muscle strengths relative to able-bodied individuals. This approach, however, is not practical since (1) it does not provide a method to distinguish a rider’s ability from other persons with SCI, (2) it is based on a theoretical framework without considering variability of individual muscle responses to different stimulation intensities, and (3) it defines strengths relative to able-bodied individuals that do not rely on FES. More recently, Trumbower et al. [15] performed a non-invasive method for estimating isometric muscle strength by recording pedal forces from individual leg muscles in response to different stimulation amplitudes while positioned on a FES-LCE system at different crank positions. It has previously been reported that during FES-LCE there exists a significant isometric component due to the high levels of FES during pedaling [4], and suggests that recorded isometric strengths of individual muscles may be a practical way of estimating online muscle force generating capacities during short-term FES-LCE. Although the non-invasive method appears promising, it alone cannot fully address the dynamic contributions of muscles needed to power the crank. Currently, no studies have attempted to associate riderspecific isometric muscle forces in response to FES and online cycling power. The strength characteristics of individual muscles involved in FES-LCE may be useful information in defining the anaerobic power output (PO) capabilities during short-duration cycling exercise. PO is a common measure of performance during leg cycling [1, 6, 16–18] and is used to quantify the energy transfer from rider to bike. A systematic approach to classify muscle force generating capabilities with PO levels required for shortduration FES-LCE is a necessary step when prescribing this type of exercise to prospective riders. However, the connection between a rider’s offline muscle strength and PO is not intuitive and predicting the association must depend on statistical methods. Classification models may be one way to associate muscle strength and PO and provide an indirect estimation of possible strength deficits in prospective riders. An algorithm that classifies offline muscle strength profiles with online performance criteria is one suitable approach. Logistic regression (LR) [19], artificial neural networks (ANN) [20], and linear discriminate analysis (LDA) [21, 22] models have been used for various biomedical classification schemes. Recently, Probably Approximately Correct (PAC) has gained attention as an efficient classification model for its (1) ability to categorize arbitrary example sets quickly, (2) capacity for extensive analyses, (3) strong probabilistic framework [20, 23], and (4) ability to learn intuitive rules via inductive inferencing [24]. However, it is not clear if a PAC learning model is feasible for classifying muscle strengths, because it has not been previously assessed for this type of clinical application. Therefore, the purpose of this study was to develop, test, and compare a PAC learning model as a predictor of thigh muscle strength profiles sufficient for short-duration FESLCE with more well-known approaches. It is hypothesized that the PAC model will perform well in correctly classifying prospective riders’ muscle strengths for FES-LCE exercise. PAC may provide an efficient means to classify an individual’s muscle strengths for FES-LCE and ultimately eliminate the guess-work when prescribing this exercise for potential new users. Moreover, the proposed model may have implication for other FES applications that classify muscle strength profiles with well-defined performance criteria. Trumbower et al.: Classifying Strength Profiles for FES-LCE 211 Static test METHODS AND MATERIALS Subjects Six healthy male individuals with SCI participated in this study with similar age (32.0 ± 12.5 years), height (1.80 ± 0.04 m), and weight (79.1 ± 10.8 kg) (Table 1), and were regular users of FES-LCE (having used the system for least 3 months). All subjects reviewed and signed a statement of informed consent approved by the University’s Institutional Review Board. Instrumentation A piezoelectric force sensor (PCBR Piezoelectronics, Inc., New York USA) was mounted on the right boot-pedal and it measured normal and tangential pedal forces in the sagittal plane. Normal and tangential pedal force measurements were acquired with a LabVIEWR DAQ board (National Instruments Inc, USA) at a sampling rate of 180 samples∗ s−1 . Data were passed through a 5th order, zero lag, Butterworth lowpass digital filter at 10 Hz [26]. Normal and tangential force data were defined relative to the boot-pedal [15]. Experimental procedures Each subject was fitted on the FES-LCE System (ERGYSR , Therapeutic Alliances, Inc., Fairborn OH, USA) and seat configurations were adjusted based on anthropometry. Offline static and online dynamic tests were performed on each subject with a minimum of 2 h rest between experiments. Stimulation parameters for all the tests consisted of sinusoidal, biphasic waveform, with frequency of 50 Hz, pulse duration of 500 μs, and phase duration of 1000 μs. Procedures for recording peak crank torque values for the QUAD, HAM, and GLUT muscles during offline static testing are described in the work by Trumbower et al. [27]. Absolute crank torque values corresponding to stimulation intensities of 70 mA, 105 mA, and 140 mA were calculated at pedal crank positions 0, 90, 135, and 180 degrees. These quantities were considered independent observations for each subject’s QUAD, HAM, and GLUT muscle responses to FES. A one-way analysis of variance (ANOVA) using a Bonferroni post-hoc comparison test found no difference (p > 0.05) in the mean absolute crank torque values across the pedal crank positions. Dynamic test FES-LCE flywheel resistances were adjusted via an internal magnetic brake. Maximum stimulation intensities were present at 70 mA, 105 mA, and 140 mA prior to cycling and referred to the highest level of stimulation allowed by the controller during cycling. The FES-LCE proportional feedback controller’s target cadence was set at 50 rpm for all tests. Levels of flywheel resistance and maximum stimulation intensity were randomly assigned prior to testing. Initially, subjects were provided a warm-up of active-assisted FES-LCE at 50 rpm. Following the warm-up, subjects pedaled for 2 min at a maximum stimulation intensity of 70 mA, 105 mA, and 140 mA and flywheel resistance of 0/8th, 1/8th, and 2/8th kilopounds (KP). During the 2-minute cycling period, kinematic and kinetic data were collected for 30 seconds. After recording, subjects were given a 2 min cycling cool-down followed by 5 min of rest. The dynamic test was repeated for each of the 9 combinations of flywheel resistance and maximum stimulation intensity. Table 1. Physical characteristics of SCI subjects Subjects Age (yr) Height (m) Weight (kg) Functional level∗∗ Time since injury (yr) ASIA score∗ 1 2 3 4 5 6 X̄ ± SD 38 54 21 20 30 22 31 ± 13 1.9 1.7 1.7 1.7 1.8 1.8 1.8 ± 0.1 101 77 66 79 75 76 79 ± 12 C5–C6 T5–T6 T5–T6 T8–T9 C4–C5 T5–T6 16 7 5 5 14 2 8±6 C A A A C A The mean and standard deviations ( X̄ ± SD) for age, height, weight, and time since injury are presented. ∗ American Spinal Injury Association standard neurological classification [25]. ∗∗ C – Cervical Level; T – Thoracic Level. 212 Journal of Clinical Monitoring and Computing Vol 20 No 3 2006 Table 2. Summary of the number of online observations associated with performance index of ‘Fatigue’ or ‘No Fatigue’ Classification Observations (Max = 70 mA) Observations (Max = 105 mA) Observations (Max = 140 mA) Total observations >20% >30% >40% >50% >60% 7 2 1 1 3 4 15 2 0 1 0 0 18 0 0 0 0 0 No Fatigue Fatigue Total % 40 4 1 2 3 4 54 74.1 7.4 1.8 3.7 5.6 7.4 100.0 Observations were made at maximum stimulation intensities of 70, 105, and 140 mA. An index value less than 20% corresponded to ‘No Fatigue’, while index values greater than 20% were classified as ‘Fatigue’. Performance index A performance index (PI) was used to quantify cycling performance by changes in PO during short-duration FESLCE [28]. Similar measurements of PO and anaerobic capacity have been used with able-bodied individuals during short spurts of stationary leg pedaling [29, 30]. During this study, PO changes were evaluated during short-duration cycling under fixed maximum stimulation intensities and flywheel resistances while the FES-LCE controller required riders to maintain cycling speed at 50 rpm. The instantaneous power (P) was defined as P = τ ∗ ϑ̇c r (1) where τ is the instantaneous crank torque and ϑ̇c r is the instantaneous crank velocity (Equation (1)). Crank torque was defined as the moment of the pedal forces about the crank center, where the subject applied the pedal forces to the boot-pedal. Average crank power ( P̄ ) was calculated for every crank revolution as 1 T τ ∗ ϑ̇c r dt (2) P̄ = T 0 where T is defined as one crank period (Equation (2)). The PI quantified the extent of decline in power during leg cycling as PI = P̄initial − P̄final × 100 P̄initial (3) where P̄initial was the peak of the first 3 average crank power values, which represented the initial observed crank power, and P̄final was the peak of the last 3 average crank power values, which represented the final observed crank power (Equation (3)). A positive PI corresponded to a reduction in PO. During short-duration leg cycling at maximum stimulation intensity, a reduction in PO indicates the onset of cycling fatigue where recruited muscle fibers (primarily type II [31]) are unable to sustain PO required for steadystate cycling at the preset target speed of 50 rpm. A PI threshold level was set at 20% to separate rider performances into two classes: ‘Fatigue’ or ‘No Fatigue’. This threshold represented a separation between individuals that have sufficient muscle response to FES to maintain steadystate cycling and those that did not. The selected cutoff percentage considered the speed restrictions of the current FES-LCE controller which does not permit cycling to continue if cadence falls below 35 rpm [1]. Furthermore, the strong dependence of the PI on cycling speed [28, 32] suggests that 20% is a reasonable threshold level to delineate reductions in PO that are likely to cause early termination of exercise. Table 2 summarizes the number of online observations associated with PI classes: ‘Fatigue’ or ‘No Fatigue’. Muscle strength classification Muscle strength in the general sense is defined as a muscle’s ability to generate force [33]. For this study, muscle strengths for the QUAD (SQUAD ), HAM (SHAM ), and GLUT (SGLUT ) were specifically defined by absolute crank torque produced by each muscle in response to FES during offline static testing. Muscle strengths in response to 70 mA, 105 mA, and 140 mA were classified according to their respective stimulation intensities associated with the online PI either greater than 20% (‘Fatigue’) or less than 20% (‘No Fatigue’). Figure 1 represents muscle strengths at 70 mA, 105 mA, and 140 mA. A total of 80 offline observations were made on six subjects. Of those observations, Trumbower et al.: Classifying Strength Profiles for FES-LCE 213 Fig. 1. Scatterplot of muscle strengths defined as the absolute crank torque for QUAD (S QU AD ), HAM (S H AM ), and GLUT (SGLU T ) muscles. The plot illustrates two muscle strength classifications: Fatigue’ (Stars) and ‘No Fatigue’ (Squares). 35 were classified as ‘No Fatigue’ and 45 were classified as ‘Fatigue’. based on Occam’s Razor,1 that correctly classified all positive examples and as many negative examples as possible. After processing all the examples in this fashion, the algorithm built the Boolean formula. Probably approximately correct learning model The goal of the PAC learning model is to learn a concept C. In general C can be thought of as a function of x where x is a set of variables. It may be computationally difficult to learn C exactly. Taking into account computational efficiency, PAC admits learning an approximation G to C. The error in learning, E(G), is defined as the probability that C(x) = G(x) for an arbitrary element x. The variables under concern in this study is all possible SQUAD , SHAM , and SGLUT vectors. The concept to be learnt is the decision of whether the subject is under fatigue or not. In other words, the concept C under concern has only two possible values, namely, ‘Fatigue’ and ‘No Fatigue’. Our goal is to learn an approximation G to C. In this case, E(G) is the fraction of the muscle strengths in the hypothesis space for which G yields an incorrect answer. Our PAC learner was built based on the “one-clause-at-a-time” learning algorithm [34, 35]. The PAC learner was trained and tested in MatlabR (The Mathworks, Inc., Natick, MA). The Boolean formula initiated as G = 0 and clauses were sequentially added based on a fast heuristic method [36] that chooses an optimal clause to add while removing the correctly classified negative examples (Figure 2). The optimal clause was the simplest clause, DATA ANALYSES The PAC learning model was compared to three wellknown learning models: (1) logistic regression (LR), (2) linear discriminant analysis (LDA), and (3) artificial neural networks (ANN). Model evaluation and analyses were performed on the models using MatlabR (The Mathworks, Inc., Natick, MA) and SPSS12.0R (SPSS, Inc., Chicago, IL USA). Model evaluation To evaluate how well each model performed in an unsupervised state, a 10-fold cross-validation was performed. The 10-fold cross-validation method was performed by randomly dividing the data set into two equally sized subsets 1 Occam’s Razor, in reference to machine learning, is the development of a learner that is as simple as possible to achieve good generalization performance. 214 Journal of Clinical Monitoring and Computing Vol 20 No 3 2006 Fig. 2. Pseudocode for the OCAT approach [35] using a fast heuristic method [36] to build a Boolean formula f in Conjunctive Normal Form (CNF). The maximum ratio refers to the ratio between number of positive examples accepted and the number of negative examples rejected in the built clause if ai is used in the build clause. corresponding to training and validation sets. In considering the small number of observations, the cross-validation was repeated 10 times, selecting random samples of observations (i.e., positive and negative examples) to train the models, setting aside the remaining observations for validation. Random selection of the observations was made using a set of values randomly generated with a Bernoulli distribution having a probability parameter of 0.70. This crossvalidation procedure was repeated 10 times and the average errors for the training and validation sets were computed. Discrimination criteria included model accuracy, sensitivity, and specificity and were used to assess the quality of the classification models [37]. Model accuracy was defined as the average percentage of correct classification. Sensitivity and specificity were calculated as shown in Equations 4 and 5, respectively [38]. Model sensitivity was defined by the ratio of the number of ‘No Fatigue’ values correctly classified to the total number of actual ‘No Fatigue’ values (i.e., true positive ratio). Discrimination criteria, model accuracy, sensitivity, and specificity were used to assess the quality of the classification models [37]. Model accuracy was defined as the average percentage of correct classification. Model sensitivity was defined by the ratio of the number of ‘No Fatigue’ values correctly classified to the total number of actual ‘No Fatigue’ values (i.e., true positive ratio). Model specificity was defined as the ratio of number of ‘Fatigue’ values correctly classified to the total number of actual ‘Fatigue’ values (true negative ratio). Sensitivity and specificity were calculated as shown in Equations 4 and 5, respectively [38]. SpecificityP (correct prediction | ‘No Fatigue’ = did occur) (4) Sensitivity = P (correct prediction | ‘Fatigue’ did occur) (5) Logistic regression (LR) model The LR model used a logarithmic function to constrain the probability of performance outcome to 0 (‘Fatigue’) or 1 (‘No Fatigue’). The regression coefficients were estimated using a nonlinear optimization routine (maximum likelihood method) based on the maximum natural log of the odds of the performance outcome occurring or not occurring [19]. Covariates were defined by SQUAD , SHAM , and SGLUT values. The LR analysis employed a forward stepwise inclusion method using a P-value of 0.05 at entry. For each observation, the predicted response was ‘No Fatigue’ if the case’s probability was greater than the cutoff value of 0.5. p (6) log it( p) = ln 1− p Trumbower et al.: Classifying Strength Profiles for FES-LCE log it( p) = b 0 + N 215 Artificial neural network (ANN) model b i xi (7) i =1 The probability p of the dichotomous event occurring is related to a set of predictor variables (i.e., SQUAD , SHAM , and SGLUT ) where b0 is the intercept and bi corresponds to the coefficients associated with these variables xi (Equations (6) and (7)). Linear discriminant analysis (LDA) The LDA was developed based on a linear combination of the SQUAD , SHAM , and SGLUT predictor values (Equation (8)). Di k = b o k + b 1k xi 1 + b 2k xi 2 + · · · + b q k xi q (8) where Dik is the value of the kth discriminant function (‘Fatigue’ or ‘No Fatigue’) for the ith case, q is the number of predictor variables, b j k is the value of the jth coefficient of the kth function, and xi j is the value of the ith case of the jth predictor. Function coefficients for SQUAD , SHAM , and SGLUT were calculated for the separate functions for each PI class (i.e., ‘Fatigue’ or ‘No Fatigue’) and average Wilk’s λ scores were also calculated to determine which variables better discriminated between groups. The ANN model was designed as a multilayer perceptron built using MatlabR Neural Network Toolbox (The Mathworks, Inc., Natick, MA) with interconnected neurons (Figure 3). The network topology consisted of 1 hidden layer and 1 output layer. This architecture has been shown to be effective in modeling nonlinear classifications and relationships between arbitrary input-output pairs [20]. Only 1 neuron was used in the linear output layer. The hidden layer was defined by a logistic function y j (n) where v j (n) is the sum of all the weighted synaptic inputs and bias bi (Equations (9) and (10)). 1 1 + e v j (n) N v j (n) = wi xi + b i y j (n) = (9) (10) i =1 A back-propagation algorithm [39] batch trained the ANN until the mean-square-error between the predicted and actual PI value was less than a preset threshold value of 0.01. Training of the ANN required less than 30 iterations. To reduce possible over-training, the process was repeated to determine the minimum number of hidden layer neurons needed to meet this criterion. Similar to the LR method, the predicted response was ‘No Fatigue’ if the output was greater than the cutoff value of 0.5. Fig. 3. Artificial neural network (ANN) topology used for mapping offline muscle strength profiles (QUAD, HAM, and GLUT) to performance index (PI). In this figure, the 2 layer ANN consisted of 6 neurons within a hidden ‘tansig’ layer and a neuron in the ‘linear’ output layer. Weighted synaptic inputs and biases (bi ) were included in both layers. 216 Journal of Clinical Monitoring and Computing Vol 20 No 3 2006 Probably approximately correct (PAC) model The concept to be learnt was a Boolean function f defined by the offline muscle strength profiles (SQUAD , SHAM , and SGLUT ). Each variable assumed values from the set [0.00, 200.00]. These data were transformed into their equivalent binary data. That is, each variable was discretized and converted into 15 binary variables based on a precision adjustment of 102 . Thus the transformed database was defined on 45 binary variables. Input to the PAC learner was the binary set of variables composed of positive and negative examples. Examples were defined as assignments to the offline observations. A positive example was an assignment that satisfied the formula of ‘No Fatigue’. A negative example was an assignment under which the formula evaluated as ‘Fatigue’. The formulated Boolean functions were in conjunctive normal form (CNF) composed of a conjunction of disjunctions of literals ai (i.e., variables and their negations) k f (a 1 , a 2 , . . . , a n ) = ∧ ( ∨ a i ) (11) j =1 i ∈υ j where ai is either xi (binary value) or x̄i (negation), υi is the superset of the indices of the atoms in the ai conjunction, and k is the number of clauses (Equation (11)) [36]. The Boolean function (x3 ∨ x¯1 ) ∧ (x¯2 ∨ x3 ∨ x4 ) is an example of a derived CNF formula where (x3 ∨ x¯1 ) and (x¯2 ∨ x3 ∨ x4 )are the two formula clauses. RESULTS A total of 80 offline observations were included in model training and validation. Table 3 summarizes means and standard errors for thigh muscle strengths corresponding to ‘Fatigue’ and ‘No Fatigue’ at the 3 maximum stimulation intensities. The mean SQUAD values (20.0 ± 1.4 Nm) classified as ‘No Fatigue’ were more than twice the recorded mean SHAM values (8.4 ± 0.8 Nm) and nearly 20 times SGLUT (1.0 ± 0.2 Nm). The mean SQUAD values classified as ‘Fatigue’ was markedly less (3.1 ± 0.4 Nm) as was the SHAM values (1.1 ± 0.2 Nm). The GLUT muscle registered the smallest mean strength values of the 3 muscles when classified as ‘Fatigue’ (0.1 Nm) and ‘No Fatigue’ (1.0 Nm). Table 4 summarizes the classification performances for LR, LDA, ANN, and PAC. The PAC and ANN model correctly identified 100% of the training set for both ‘No Fatigue’ and ‘Fatigue’ examples. Overall accuracy decreased on the validation sets for PAC (93.1%) and ANN (89.8%). Conversely, the LR model increased in accuracy from 91.3% on the training set to 93.1% on the validation set. The LDA recorded accuracies of 93.8% during training and 89.9% during validation. The sensitivity scores were lowest for PAC at 90.2%. However, PAC recorded the highest percentage for specificity at 95.0%. An average of approximately 4 CNF clauses and 20 ± 3 atoms were used to build the Boolean formulas. During cross-validation, the lowest accuracy of PAC was recorded with the only Boolean formula containing 5 clauses. DISCUSSION PAC learning has been shown to be effective in classification problems where the relationship pairs are not intuitive [35, 40]. This study indicates that using a PAC model may be beneficial for classifying thigh muscle strengths not sufficient for short-duration FES-LCE. The model performed well in correctly identifying the training examples (100%) Table 3. Group statistics for S QU AD , S H AM , and SGLU T values associated with the online performance index ‘Fatigue’ and ‘No Fatigue’ at 70, 105, and 140 mA Performance index Fatigue No Fatigue Total Stimulation intensity (mA) 70 105 140 Total 70 105 140 Total Mean ± Standard error SQUAD (Nm) SHAM (Nm) SGLUT (Nm) 2.5 ± 0.4 5.8 ± 1.2 3.3 ± 0.0 3.1 ± 0.4 18.8 ± 4.3 18.7 ± 2.7 21.1 ± 1.8 20.0 ± 1.4 10.6 ± 1.0 0.8 ± 0.2 2.4 ± 0.9 3.8 ± 0.0 1.1 ± 0.2 6.0 ± 1.6 8.5 ± 1.5 8.8 ± 1.1 8.4 ± 0.8 4.4 ± 0.5 0.1 ± 0.0 0.3 ± 0.1 0.1 ± 0.0 0.1 ± 0.0 0.5 ± 0.1 0.9 ± 0.2 1.2 ± 0.3 1.0 ± 0.2 0.5 ± 0.1 Trumbower et al.: Classifying Strength Profiles for FES-LCE 217 Table 4. Comparison of Probably Approximately Correct (PAC) learning model with logistic regression (LR), linear discriminant analysis (LDA), and artificial neural network (ANN) models for predicting PO performance based on offline muscle strength profiles Model method LR LDA ANN PAC (A) Predicted, training sets (%) Accuracy 91.3 93.8 100.0 100.0 (B) Predicted, validation sets (%) Sensitivity Specificity Accuracy 93.0 97.3 91.4 90.2 92.9 82.6 93.2 95.0 93.1 89.9 92.2 93.1 (A) summarizes the predicted PO performances from training set and (B) summarizes the discrimination power of the models on the validation set. and validation sets (93.1%). In particular, this study explored the utility of a PAC learning model as compared with well-known classification methods (i.e., LDA, LR, and ANN). Overall validation accuracy for the analyzed models ranged from 89.9% to 93.1%, which suggests that classifying offline thigh muscle strengths based on the performance criterion is possible. The discrimination power of PAC was defined by its sensitivity, specificity, and accuracy. These assessment tools are important in determining the likelihood of true positive or true negative classifications. A true positive classification resulted in a prospective rider labeled as sufficient to ride (‘No Fatigue’) when they are. A true negative classification resulted in a prospective rider labeled as insufficient (‘Fatigue’) and incapable of short-duration FES-LCE. From our results, the PAC was better at classifying true negative observations (95.0%) as opposed to true positive observations (90.2%). The PAC model performed best, overall, in terms of discriminating those strength profiles not sufficient for FES-LCE. This has major clinical relevance, because clinicians are required to perform safe identification of prospective riders that are too weak for FES-LCE so that the potential of injury from over stimulation to highly atrophied muscles is minimized. This study in addition to assessing the feasibility of using a PAC learning model for FES-LCE performance classification provided insight into the contribution of individual muscle strengths to pedal power under defined FES conditions. It was clear during analyses that contribution of QUAD and HAM muscles dominated all classification models. For instance, the LDA Wilk’s λ scores for the SQUAD , SHAM , and SGLUT variables were 0.42, 0.51 and 0.73, respectively. The low λ score recorded for the SQUAD was considered the best predictor variable at distinguishing between ‘Fatigue’ and ‘No Fatigue’. Moreover, the ‘Fatigue’ coefficients for SQUAD (0.37) and SHAM (0.47) were smaller than calculated coefficients for ‘No Fatigue’, indicating the SQUAD and SHAM that generated large absolute crank torque values were less likely to result in ‘Fatigue’ during short-duration FES-LCE. This was not the case for the GLUT muscle where models showed no improved model performance with the addition of the SGLUT variable. This finding is consistent with previous work that showed little contribution of the GLUT during FES-LCE [15]. Crago et al. [41] found that stimulation intensity determines the number of motor units recruited and the force generated. During this study, a low maximum stimulation intensity of 70 mA resulted in reductions in PO classified as ‘Fatigue’ in 5 of the 6 tested subjects; this stimulation intensity may not have excited a sufficient number of muscle motor units necessary to maintain PO during short-duration FES-LCE. Conversely, stimulation intensities of 105 mA and 140 mA contributed to larger torque generation and PO and as a result, a smaller number of ‘Fatigue’ classifications resulted. Individuals with SCI differ vastly in how their muscles respond to FES, computing user-specific muscle strength profiles characteristic of performance were considered in this study and should be further considered for online identification methods. Although there are a number of benefits for using any one of the compared approaches, the analyses were limited to a small population (n = 6) and drawing such conclusions is premature. Larger sample sizes would presumably provide more user confidence and thus strengthen or weaken one approach compared to another. For instance, PAC improves its predictive power through learning from larger training sets [20], which is not necessarily the case for the parametric models (i.e., LDA and LR). The LDA, LR, and ANN are cited frequently in the literature and are considered good classification methods [37]. In particular, LR andANN are the most frequently used models in medical research, followed by LDA, as defined 218 Journal of Clinical Monitoring and Computing Vol 20 No 3 2006 by the number of indexes found in PUBMED (Logistic Regression – 59,935 indexes; Neural Networks – 9912 indexes; Linear Discriminant Analysis −1273 indexes). Aside from frequent use, there are important distinctions worth noting between the compared methods. The LDA and LR are customary parametric methods that draw on many statistical assumptions. For instance, the LDA [21] assumes that the predictors are not highly correlated with each other and the mean and variance of the three predictors are also not correlated [42]. The LR requires a dichotomous performance outcome, but other methods become better suited in situations that involve more than two outcomes such as with inherent ordering (ordinal regression) without inherent ordering (multinomial logistic regression), where the performance measure and predictors are scaled (linear regression), or a dependent variable is scaled and some or all the predictors are categorical (generalized linear model univariate regression) [43]. If modifications of the studied classification schemes occur whether by adding additional muscles as predictors or providing additional performance subsets, the functionality of these parametric methods would inevitably falter. A significant advantage that both PAC and ANN have over LR and LDA is better adaptation to changes in the classification schemes. This is due to their intrinsic ability to model any function. PAC and ANN are non-parametric models not requiring strict assumptions of the data distribution like LR and LDA. This data tolerance avoids potential error via incorrect assumptions, and permits minimal userknowledge of statistics. However, during this study, ANN presented with limitations not evident from its predictive performance. ANN’s performance was easily assessed, but finding the optimal topology was not straightforward. The approach to find an appropriate design was done by trialand-error as is typical when using ANN. In many cases, this type of model selection can lead to over-fitting errors on training data thus reducing its overall generalizability [20]. The PAC learning model does not have the inherent problems of parametric models or the design limitations of ANN. It provides a number of favorable intrinsic properties not available in other methods, thus making it a worthy alternative. The OCAT algorithm formulated in this study in fact reduces the risk of overfitting by reducing the model complexity to simplest form. By definition, PAC learns a Boolean function that completely represents both positive and negative examples while minimizing the size of clauses and total number of literals used [36]. PAC learning models may be used to evaluate multiple rules on multiple outcome spaces. The capacity of PAC learning is not limited on three variables (i.e., SQUAD , SHAM , and SGLUT ) nor is it limited to the dichotomous outcome measure (i.e., ‘Fatigue’ or ‘No Fatigue’). PAC is promising for learning probability subsets that define the strength of likelihood online rider performance for n-dimensional muscle strengths without significantly altering the learning framework. Prior to this preliminary work, PAC learning had never been applied to this type of clinical application. Based on PAC’s overall performance, it deserves consideration as a decision-support system for FES-LCE. Another crucial advantage that PAC learning offers over the other methods is its amenability to theoretical analyses. For instance, high probability convergence bounds can be proven for PAC learning algorithms. On the other hand, even for simple ANNs, convergence proofs are difficult to obtain. Other classification models such as support vector machines [44], k-Nearest neighbors [45], and decision trees [46] may also be further explored. These particular models differ from PAC and the other compared methods in that they do not provide a functional form along with parameters to describe the input-output relationship and behavior [37]. The contributions of model parameters were interpretable when using LDA, LR, and PAC. In contrary, ANN lacked interpretability of its weights and biases at the level of the individual strength predictors and offer little intuitive understanding of its functionality. The PAC model provided a Boolean formula that was easy to interpret without the need for statistical understanding. Additional control parameters ε and δ are also provided for PAC learning. Here ε is known as the accuracy parameter and δ is the confidence parameter. We say that a PAC algorithm is capable of learning a concept with parameters ε and δ if the probability that the error in learning is greater than ε is at most δ [23]. These control parameters are helpful in determining the sample complexity issues that are part of any learning model system. The focus of sample complexity is on how large the sample size should be to acquire sufficient information for learner to learn a new concept (i.e., performance index). Further study is needed to explore the issues of sample complexity and parameter control of PAC learning as well as the computational complexity as it relates to this clinical application. Overall, the PAC model learns examples well, but the strength of any model is only as good as the data it learns from. Although the PAC model had strong sensitivity and specificity for this study, its performance was defined by measurements that may not have truly identified the individual muscle strengths in responses to FES. Trumbower and Faghri [15] reported overflow responses in some individuals during FES-LCE, which induced inappropriate reflexive responses in leg muscles leading to cocontraction and spastic reflexive withdrawal of thigh muscles. Spasticity was not quantified or predicted during this study, but should be further assessed to ensure the data sets well Trumbower et al.: Classifying Strength Profiles for FES-LCE represent the overall population. Also keep in mind that this study did not evaluate how close the predictions of the PAC model were to real underlying probabilities, because data were collected from individuals already capable of riding the FES-LCE system. Future calibration studies are suggested to assess whether there are differences between average model observations and average outcomes of larger sample sizes, including persons without the muscle strength to participate in FES-LCE exercise. Calibration may be used to determine whether there are statistically significant differences between the expected and observed outcomes [48] and would be vital when developing a PAC learning system that may help in the identification of a prospective rider’s potential. The reader must use caution when drawing conclusions about the predictability of PAC or the other modeling methods. The models were developed to classify FESinduced offline strength characteristics as predictors of online performance. The criterion used for classifying online performance was based on short-duration anaerobic FESLCE and did not attempt to infer a direct link between offline muscle responses to FES and a rider’s muscle activation dynamics, muscle contraction dynamics, exercise duration potential, or aerobic capacity. The classification model’s predictive power for classifying offline thigh muscles strengths was defined by a dichotomous online performance criterion and not suggestive of a superior model type over the other. Future studies are needed to fully evaluate the performance criterion in a larger population of persons with SCI with and without FES-LCE experience before one can ascertain the potential of using a PAC learning model as a decision-support system for this clinical application. In summary, FES-LCE is a therapeutic exercise aimed to improve strength and cardiovascular fitness in persons with SCI. However, many individuals may not be appropriate for FES-LCE training because no systematic method exists for identifying whether or not a prospective rider has the muscle strengths in response to FES needed for FES-LCE exercise. The results of this study indicate that the tested models were good estimators. However, PAC may be a favorable choice for systematic classification of offline muscle strengths of prospective riders of FES-LCE, because of its intuitive design. The approach may readily assist clinicians in identifying muscle response characteristics that are indicative of persons who are likely to perform FES-LCE and use those characteristics to classify riders as ‘strong’ or ‘weak’. Future classification schemes using this approach may consider probability subsets that identify a percent likelihood of sufficient strength profiles that further classify a rider’s potential. This supervised learning approach may remove the clinical uncertainties 219 in prescribing FES-LCE exercise based on muscle strength profiles in persons with SCI. References 1. Faghri PD, Glaser RM, Figoni SF. Functional electrical stimulation leg cycle ergometer exercise: training effects on cardiorespiratory responses of spinal cord injured subjects at rest and during submaximal exercise. Arch Phys Med Rehabil 1992; 73: 1085– 1093. 2. Glaser RM. Functional neuromuscular stimulation. Exercise conditioning of spinal cord injured patients. Int J Sports Med 1994; 15: 142–148. 3. Petrofsky J. Bicycle ergometer for paralyzed muscles. Journal of Clinical Engineering 1984; 9: 13–19. 4. Petrofsky J, Stacy. The effect of training on endurance and the cardiovascular responses of individuals with paraplegia during dynamic exercise induced by functional electrical stimulation. European Journal of Applied Physiology 1992; 64: 487–492. 5. Faghri P, Glaser R. Feasibility of using two FES exercise modes for spinal cord injured patients. Clinical Kinesiology 1989; 44. 6. Hooker SP, Figoni SF, Glaser RM, Rodgers MM, Ezenwa BN, Faghri PD. Physiologic responses to prolonged electrically stimulated leg-cycle exercise in the spinal cord injured. Arch Phys Med Rehabil 1990; 71: 863–869. 7. Leeds EM, Klose KJ, Ganz W, Serafini A, Green BA. Bone mineral density after bicycle ergometry training. Arch Phys Med Rehabil 1990; 71: 207–209. 8. Ragnarrson K, Pollack S, O’Daniel W, Edgar R, Petrofsky J, Nash M. Clinical evaluation of computerized functional electrical stimulation after spinal cord injury: a multicenter pilot study. Arch Phys Med Rehabililitation 1988; 69: 672–677. 9. Figoni SF. Perspectives on cardiovascular fitness and SCI. J Am Paraplegia Soc 1990; 13: 63–71. 10. Gurney AB, Robergs RA, Aisenbrey J, Cordova JC, McClanahan L. Detraining from total body exercise ergometry in individuals with spinal cord injury. Spinal Cord 1998; 36: 782–789. 11. Franco JC, Perell KL, Gregor RJ, Scremin AM. Knee kinetics during functional electrical stimulation induced cycling in subjects with spinal cord injury: a preliminary study. J Rehabil Res Dev 1999; 36: 207–216. 12. Gerrits HL, de Haan A, Sargeant AJ, Dallmeijer A, Hopman MT. Altered contractile properties of the quadriceps muscle in people with spinal cord injury following functional electrical stimulated cycle training. Spinal Cord 2000; 38: 214–223. 13. Vestergaard P, Krogh K, Rejnmark L, Mosekilde L. Fracture rates and risk factors for fractures in patients with spinal cord injury. Spinal Cord 1998; 36: 790–796. 14. Schutte L, Rodgers M, Zajac F, Glaser R. Improving the efficacy of electrical-stimulation induced leg cycle ergometry, in Mechanical Engineering, Stanford University, Palo Alto CA 1993. 15. Trumbower RD, Faghri PD. Crank torque profile of leg muscles at different stimulation intensities and pedal crank positions on a recumbent leg cycle ergometer. Presented at ACRM Conference 2005. 16. Figoni SF, Rodgers MM, Glaser RM, Hooker SP, Faghri PD, Ezenwa BN, Mathews T, Suryaprasad AG, Gupta SC. Physiologic responses of paraplegics and quadriplegics to passive 220 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. Journal of Clinical Monitoring and Computing Vol 20 No 3 2006 and active leg cycle ergometry. J Am Paraplegia Soc 1990; 13: 33–39. Figoni S, Rodgers M, Glaser R, Hooker S, Faghri P, Ezenwa B, Mathews T, Suryaprasad A, Gupta S. Physiologic responses of paraplegics and quadriplegics to passive and active leg cycle ergometry. J Am Paraplegia Soc 1990; 13: 33–39. Gfohler M, Lugner P. Cycling by means of functional electrical stimulation. IEEE Trans Rehabil Eng 2000; 8: 233–243. Myers R. Classical and modern regression with applicaitons, 2nd edition, Duxbury Press, Belmont 1990. Haykin S. Neural networks, 2nd Edition 1999. Fisher R. The use of multiple measurements in taxonomic problems. Ann Eugen 1936; 10: 422–429. Fukunaga K. Introduction to statistical pattern recognition. Academic Press, San Diego 1990. Valiant L. A theory of the learnable. Communications of the ACM 1984; 17: 1134–1142. Angluin D, Smith C. Inductive inference: theory and methods. Computing Surveys 1983; 15: 237–269. Center NSCIS. The 2004 annual statistical report for the model spinal cord injury care systems. University of Alabama at Birmingham, Birmingham 2004. Winter D. Biomechanics and Motor Control of Human Movement, 2nd edition, John Wiley & Sons, Inc, New York, 1990. Trumbower R, Faghri P. FES-induced pedal force generation of individual leg muscles during a single crank rotation in persons with SCI, Spinal Cord, Submitted. Faghri PD, Trumbower RD. Short-duration FES-induced leg cycling dynamics at different stimulation intensities and flywheel resistances, presented at IFESS 2005, Montreal, Canada 2005. Thorstensson A, Karlsson J. Fatiguability and fiber composition of human skeletal muscle. Acta Physio Scand 1976; 98: 318–322. Vandewalle H, Peres G, Monod H. Standard anaerobic exercise tests. Sports Med 1987; 4: 268–289. Burnham R, Martin T, Stein R, Bell G, MacLean I, Steadward R. Skeletal muscle fibre type transformation following spinal cord injury. Spinal Cord 1997; 35: 86–91. McCartney N, Obminski G, Heigenhauser GJ. Torque-velocity relationship in isokinetic cycling exercise. J Appl Physiol 1985; 58: 1459–1462. Siff M. Biomechanical foundations of strength and power training, in V. Zatsiorsky (ed.), Biomechanics in sport, Blackwell Scientific Ltd., London 2000, pp. 103–139. 34. Triantaphyllou E, Soyster A, Kumara S. Generating logical expressions from positive and negative examples via a branchand-bound approach. Computers Ops Res 1994; 21: 185– 197. 35. Sanchez S, Triantaphyllou E, Chen J, Liao T. An incremental learning algorithm for constructing Boolean functions from positive and negative examples. Computers & Operations Research 2002; 29: 1677–1700. 36. Deshpande A, Triantaphyllou E. A greedy randomized adaptive search procedure (GRASP) for inferring logical clauses from exmples in polynomial time and some extensions. Mathematical and Computer Modelling 1998; 27: 75–99. 37. Dreiseitl S, Ohno-Machado L. Logistic regression and artificial neural network classification models: a methodology review. J Biomed Inform 2002; 35: 352–359. 38. Subasi A, Ercelebi E. Classification of EEG signals using neural network and logistic regression. Comput. Methods Programs Biomed 2005; 78: 87–99. 39. Rumelhart D, McClelland J. Parallel distributed processing: eexplorations in the microstructure of cognition. Vol. 1, MIT Press, Cambridge 1986. 40. Kearns M, Vazirani U. An introduction to computational learning theory. MIT Press 1994. 41. Crago PE, Peckham PH, Thrope GB. Modulation of muscle force by recruitment during intramuscular stimulation. IEEE Trans Biomed. Eng 1980; 27: 679–684. 42. Johnson R, Wichern D. Applied multivariate statistical analysis, 4th edition. Prentice-Hall, Saddle River 1998. 43. Windows SF. SPSS 12.0 for Windows, Release 12.0.0 ed. SPSS, Inc Chicago 2003. 44. Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. University Press, Cambridge 2000. 45. Dasarathy B. Nearest neighbor pattern classification techniques. IEEE Computer Society Press, Silver Spring 1991. 46. Breiman L, Friedman J, Olshen R, Stone C. Classification and regression trees. Chapman & Hall, New York 1984. 47. Trumbower RD, Faghri PD. Relationship between isometric pedal force generation and stimulation intensity of individual leg muscles involved in FES-induced leg cycling. Presented at IFESS 2005, Montreal, Canada 2005. 48. Hosmer D, Lemeshow S. Applied logistic regression, 2nd ed. Wiley, New York, 2000.