Optimum Feature Selection and Extraction
for Fault Diagnosis and Prognosis
Abhinav Saxena and George Vachtsevanos
Intelligent Control Systems Laboratory, School of Electrical and Computer Engineering
Georgia Institute of Technology, Atlanta GA 30332 0250
computational complexity issues. Parallel processing of
multiple sensors and features are accomplished efficiently
and effectively through this platform. Such parallel
processing and pipelining capabilities of this new processing
tool promise to facilitate and expedite the implementation of
diagnostic, prognostic and control technologies on-board
In the following section we will describe the feature
extraction and selection process. We will also point out
various areas where improvement is desired. Then we will
introduce the cell processing environment followed by how
it can be used for feature extraction. We will conclude with
future directions in making use of this processor.
Abstract— Fault diagnosis and failure prognosis of critical
dynamic systems, such as aircraft and industrial processes,
rely on degradation or fatigue models and measurements
typically acquired on-line in real-time. Such sensor data must
be pre-processed in order to remove artifacts and improve the
signal-to-noise ratio. Furthermore, they must be processed
appropriately so that useful information in compact form can
be extracted and used to detect incipient failures and predict
the remaining useful life of failing components. We present a
methodology to select an optimum feature vector from a list
of candidate features, prioritize and rank them to meet set
performance objectives. The enabling technologies include
genetic programming tools, data fusion and model-based
approaches for feature selection and extraction. We will
suggest a multi-core processing environment for the efficient
and expedient implementation of these technologies.
Performance metrics are defined to assess the efficacy of the
methodology. Typical examples from aircraft systems are
used to demonstrate the proposed techniques.
II. Feature Selection and Extraction
I. Introduction
his paper introduces a hybrid hardware/software
realization for a fundamental problem in the area of
Prognostics and Health Management (PHM) systems: How
do we process raw sensor data on-board an aircraft in order
to detect incipient failures of critical system
components/subsystems and predict the remaining useful life
of such failing components? The extraction of useful
information from raw data (vibrations, temperature, usage
patterns, etc.) constitutes the cornerstone for accurate and
timely fault diagnosis and failure prognosis. When multiple
sensors are monitored and a large vector of features or
Condition Indicators are to be extracted, computational
resources are taxed severely for on-line, real-time
applications. The feature extraction problem is elaborated
and examples are presented. A multi-core cell processor
environment is suggested in order to address the
Feature or condition indicator selection and extraction
constitute the cornerstone for accurate and reliable fault
diagnosis. The classical image recognition and signal
processing paradigm of data information knowledge
becomes most relevant and takes central stage in the fault
diagnosis case, particularly since such operations must be
performed on-line in a real-time environment [1]. Figure 1
depicts a conceptual schematic for feature extraction and
fault mode classification.
Figure 1 A general scheme for feature extraction and fault
mode classification.
Feature extraction involves simplifying the amount of
resources required to describe a large set of data accurately.
In many cases, this is considered a data preprocessing step.
When performing analysis of complex data one of the major
problem stems from the number of variables involved.
Analysis with a large number of variables generally requires
a large amount of memory and computational power or a
classification algorithm which overfits the training sample
and generalizes poorly to new samples. Feature extraction is
a general term for methods of constructing combinations of
the variables to get around these problems while still
describing the data with sufficient accuracy. A diagnostic
feature is a system parameter (or derived system parameter)
that is sensitive to the functional degradation of one or more
components contained in the system. Diagnostic features can
be used to predict the occurrence of an undesirable system
event or failure mode.
Fault diagnosis depends mainly on extracting a set of
features from sensor data that can distinguish between fault
classes of interest, detect and isolate a particular fault at its
early initiation stages. These features should be fairly
insensitive to noise and within fault class variations. The
latter could arise because of fault location, size, etc. in the
frame of a sensor. “Good” features must have the following
• Computationally inexpensive to measure
• Mathematically definable
• Explainable in physical terms
• Characterized by large interclass mean distance
and small interclass variance
• Insensitive to extraneous variables
• Uncorrelated with other features
Much attention has focused over the past years on the
feature extraction problem whereas feature selection has
relied primarily on expertise, observations, past historical
evidence, and our understanding of fault signature
characteristics. In selecting an “optimum” feature set, we are
seeking to address such questions as: Where is the
information? How fault (failure) mechanisms relate to the
fundamental “physics” of complex dynamic systems? Fault
modes may induce changes in:
• The energy (power) of the system
• Its entropy
• Power spectrum
• Signal magnitude
• Chaotic behavior
• Other
Feature selection is application dependent. We are seeking
those features, for a particular class of fault modes, from a
large candidate set that possess properties of fault
distinguishability and detectability while achieving a reliable
fault classification in the minimum amount of time. Feature
extraction, on the other hand, is viewed as an algorithmic
process where, given sensor data, features are extracted in a
computationally efficient manner while preserving the
maximum information content. Thus, the feature extraction
process converts the fault data into an N -dimensional
feature space, such that one class of faults is clustered
together and can be distinguished from other classes.
However, in general, not all faults of a class need N
features to form a compact cluster. It is only the faults that
are in the overlapping region of two or more classes that
govern the number of features required to perform
classification. Classically, feature selection is carried out by
ranking the features on the basis of various metrics. These
metrics include [1]:
• Distinguishability: quantifies a feature’s ability
to differentiate between various classes by
finding the area of the region in which the two
classes overlap. The smaller the area, the
higher the ability of the feature to distinguish
between the classes.
• Detectability or isolatability: is the extent to
which the diagnostic scheme can detect the
presence of a particular fault; it relates to the
smallest failure signature that can be detected
and the percentage of false alarms.
• Identifiability tracks the similarity of features
as they identify a fault mode k but also
distinguishing it from other fault classes, and
• Degree of certainty combines the body of
evidence collected from all other metrics until
a desired level of false positives and false
negatives is obtained
Typical features or Condition Indicators (CIs) in the time
domain may include peak values, rms, energy, kurtosis, etc.
in the frequency domain; we focus primarily on features for
rotating equipment that exhibit a marked difference between
baseline or no-fault and faulty data [1]. For example, we
seek in this category a comparison (amplitude, energy, etc.)
of certain sidebands to dominant frequencies, when the
sensor signals are transformed via an FFT routine to the
frequency domain [2]. Other possible features are extracted
through coherence and correlation calculations. When the
information is shared between the time and frequency
domain, it might be advantageous to extract features in the
wavelet domain, offering an appropriate tradeoff between
the two domains. When multiple features are extracted for a
particular fault mode, it might be desirable to combine or
fuse uncorrelated features to enhance the fault detectability.
There are several methods for feature fusion at different
levels. The most popular methods include Bayesian
Inference, Dempster-Shafer fusion, Weighting/voting
scheme, Fuzzy Logic inference, and Neural Network fusion.
In past studies, we exploited also the functions of genetic
programming in order to derive artificial features from
baseline ones that meet prescribed performance metrics. In a
Genetic Programming based feature fusion approach we
define an appropriate fitness function and use genetic
operators to construct new feature populations from old
ones. These features are non-linear combinations of multiple
features and perform better than individual features [3].
III. Multi-Core Cell Processing Environment
The Cell processor is a microprocessor architecture jointly
developed by Sony, Toshiba, and IBM, an alliance known as
"STI". As shown in Figure 2, the Cell processor can be
broadly divided into four components [4,5]:
(1) external input and output structures,
(2) the main processor called the Power Processing
Element (PPE) (a two-way simultaneous
multithreaded Power ISA v.2.03 compliant core),
(3) eight fully-functional co-processors called the
Synergistic Processing Elements (SPE), and
(4) a specialized high-bandwidth circular data bus
connecting the PPE, input/output elements and the
SPEs, called the Element Interconnect Bus or EIB.
The advantage of the Cell processor lies in its capability
to achieve high performance in mathematically intensive
tasks such as decoding/encoding MPEG streams, generating
or transforming three dimensional data, or undertaking
Fourier analysis of data by an efficient allocation of its
processing and communication resources. The PPE, which is
capable of running a conventional operating system, has
control over the SPEs and can start, stop, interrupt and
schedule processes running on the SPEs. Despite having
Turing complete architectures, the SPEs are not fully
autonomous and require the PPE to initiate them before they
can do any useful work. Therefore, the PPE acts as a
manager and most of the "horsepower" of the system comes
from the SPEs. Both the PPE and SPE are RISC
architectures with a fixed-width 32-bit instruction format.
Although the Cell was developed for high speed gaming
applications (Sony Playstation), its capabilities have led to
explorations in blade servers, home cinema, supercomputing
and cluster computing, etc. [6] Recently Cell-based
computer systems have been developed for embedded
applications such as medical imaging, industrial inspection,
aerospace and defense, seismic processing, and
telecommunications [7]. We intend to explore Cell
capabilities for onboard real-time data processing for
enhanced diagnostics and prognostics for aircraft systems.
Cell Processing for Feature Extraction
In a recent study, we identified how cell capabilities can be
appropriately utilized for various steps of a Diagnostics and
Prognostics system [8]. From all of these steps, the feature
extraction step can itself be carried out in different ways.
More specifically, the feature extraction process can utilize
Cell capabilities in three different ways, as depicted in
Figure 3.
Figure 3 Utilizing Cell capabilities in feature extraction.
Figure 2 Cell architecture schematic.
Features or Condition Indicators may be extracted from a
variety of data and sensors. Some of these features are
computationally inexpensive, such as peak value, rms,
energy, etc., while others may involve considerable
computations. For example, features that are based on
chaotic indicators, such as Lyapunov exponents, dimension,
etc., derived from a system that exhibits chaos, typically
entail a large computational cost. Similarly, derived features
employing genetic programming tools may belong to the
same category. It is desirable, therefore, to view the
processing of multiple sensors as well as the estimation of
multiple features via a parallel processing architecture that
optimizes resource requirements and affords a real time
solution to the problem. Pipelining capabilities of the cell
processor can be exploited when the outputs of the feature
extraction module are fused or used for classification
purposes. The problem can be set up as a constrained
optimization problem aimed at minimizing time while
maximizing processor utility.
IV. Preliminary Results
Given the unique architecture of the Cell processor, it has
been a challenge to write algorithms that can utilize this
technology to its full potential. Therefore, our research team
has focused on two fronts; one, to port already existing
single processor algorithms on the Cell and two, to conduct
research for developing efficient parallel counterparts of
these CBM/PHM algorithms. In the initial phase we focused
on developing parallel algorithms for Fast Fourier
Transforms (FFTs) that form the basis of most signal
processing techniques. Our research partners have
successfully demonstrated the effectiveness of their FFT
algorithms in achieving the speedup of as much as upto four
times the other implementations and architectures (Figure 4)
Cell @ GT (8 SPEs)
We are also working on parallelizing other algorithms that
will pave the way of an overall onboard parallelized
architecture suitable for real-time aircraft applications.
V. Conclusions and Future Directions
In this paper we have described our efforts in exploring the
high performance capabilities of the Cell platform for
computationally expensive feature extraction algorithms for
on-board diagnostics and prognostics. We have identified
multiple ways in which the Cell can be employed depending
on computational demands in specific cases. We also intend
to explore possibilities for maximizing the Cell utilization
through intelligent resource allocation in a constrained
optimization problem setting. However, there are few
limitations that one needs to keep in mind before expecting
the Cell to perform. The architecture emphasizes
efficiency/watt, prioritizes bandwidth over latency, and
favors peak computational throughput over simplicity of
program code [10]. For these reasons, the Cell is widely
regarded as a challenging environment for software
development. Even though several research teams are
building software libraries for the Cell platform, the full
utilization of its capabilities is just a beginning to be
realized. Software adoption remains a key issue in whether
the Cell ultimately delivers on its performance potential.
Despite those challenges, research has indicated that Cell
excels at several types of scientific computation. We are
collaborating with multiple such research teams to explore
ways in which its capabilities can be utilized to its maximum
potential on an aircraft.
We gratefully acknowledge the support from AveTec Corp.
We also thank our collaboration partner teams from Pratt &
Whitney, Hartford CT, the University of Connecticut,
Hartford and the College of Computing at Georgia Tech,
InItel Core Duo
FFTW on Cell
Intel Pentium 4
AMD Opteron
IBM Power 5
Source: Bader et al. (2007)
Figure 4 FFT implementation performance comparisons
with other architectures and Cell implementations [9].
