A Data Mining Approach to Fault Detection for Isolated Inverter

advertisement
1
A Data Mining Approach to Fault Detection for
Isolated Inverter-based microgrids
E. Casagrande, W. L. Woon, Member, IEEE,, H. H. Zeineldin, Member, IEEE and N. H.Kan’an, Student
Member, IEEE
Abstract—This paper investigates the problem of fault protection in a microgrid containing inverter-based distributed generators (IBDGs). Due to the low magnitude of short circuit currents
generated by IBDGs, traditional protection techniques which
relay on current (fuses and overcurrent relays) may fail to protect
such networks. This paper addresses the problem of finding
suitable features derived from local electrical measurements
that can be used by statistical classifiers to better discriminate
fault events from normal network events. Given a series of
simple electrical features, a study of feature selection and data
mining techniques is conducted in the context of fault detection
in isolated microgrids with inverter-based DGs. Two statistical
classifiers are compared and implemented in this framework:
Naive Bayes and Decision Trees. The proposed approach is tested
on a facility scale microgrid consisting of three IBDGs.
Index Terms—microgrid, inverter, fault analysis, feature selection, data mining
I. I NTRODUCTION
A. Fault Protection in microgrid
T
HE protection of transmission and distribution lines
against faults is an important step in safeguarding the
electric power system. Short circuits may occur due to mechanical or natural reasons and can seriously damage electrical
equipment if they are not quickly and properly addressed. The
occurrences of these faults cannot be completely eliminated
during the planning phase of the system and devices such as
relays, circuit breakers, re-closers and fuses have traditionally
been deployed for the protection of power networks. The
literature extensively discusses how to optimally design, place
and tune these devices for a typical 20th century power grid,
i.e. a central generator and a radial distribution structure.
In the last few decades, this scenario has been revised
following the public debate over climate changes and the
consequent demand for sustainable production of energy. Innovative solutions for Distributed Energy Resource (DERs) such
as Distributed Generators (DG), storage devices and electric
vehicles have been developed to take advantage of renewable
sources of energy such as solar and wind power [1]. The
integration of these new elements into the grid has resulted in
significant improvements in the energy efficiency of traditional
transmission and distribution systems.
Microgrids and Inverter-based Distributed Generators (IBDGs) are important components of this next generation of
This work was supported by the Masdar Institute of Science and Technology. E. Casagrande, H. H. Zeineldin and W. L. Woon are with the
Masdar Institute of Science and Technology, Abu Dhabi, UAE (email:
ecasagrande@masdar.ac.ae, hzainaldin@masdar.ac.ae, wwoon@masdar.ac.ae)
power systems. A microgrid is a low voltage distribution
system that operates connected to the global medium voltage
grid or runs as a standalone system by using local DGs in a
self sufficient and coordinated way (islanding mode) [2][3].
While IBDG microgrids are believed to be an effective
approach for the future sustainable production of energy, they
have also raised some issues regarding the design, management
as well as the protection of these novel types of power
networks [4][5]. From a protection perspective, adding DGs
to the grid have been shown to have an influence on the
fault current amplitude, direction and duration [6][7]. With
IBDGs the maximum fault current generated by the inverter
is limited to twice its rated current [8] as opposed to the
case of a synchronous machine which generate up to 4 to 10
times [9]. In addition, IEEE Std 1547.2 states that fault current
from inverter-based distributed resources is very limited (about
100% to 200% of normal load current)[10]. A similar protection problem where fault current levels are comparable with
load currents is the case of High Impedance Faults (HIF)[11].
Methods developed for detecting HIF should be capable of
distinguishing between faulty and healthy feeders. Some of
the proposed approaches for HIF detection include harmonic
component methods [12], Kalman filtering [13], wavelet transform [14][15] and artificial neural networks [15]. Although
both HIF and the IBDG microgrids share the problem of low
fault currents, the response of a system to a HIF would be
different than a fault on an IBDGs microgrid.
Traditional relays which are designed to trip in the presence
of high short circuit currents may be unable to detect the lower
fault currents produced by IBDG [16][17][18]. Alternatively,
reducing the relay threshold can result in an increase in false
tripping but naturally occurring events such as load switching
may resemble network faults. This unacceptable situation is
accentuated in microgrids as these tend to be smaller in size
and hence more prone to aperiodic and unbalanced loading
conditions; a further issue is the variability of the power
generation due to the intrinsic nature of the renewables, i.e.
intermittency of the wind and the solar sources.
In summary, integrating DGs and IBDGs into microgrids
may result in malfunctions and false tripping of the protective
devices if they are designed using the traditional approach
(overcurrent).
B. Objectives and Contributions
Motivated by the above mentioned problems, this paper
proposes a data mining approach to protecting IBDG microgrids. The protection of a power system can be framed as a
2
classification task [19]: any event occurring on the network
which perturbs the system can be categorized by its type and
location. A fault is a particular kind of event for which the
new state is harmful to the electric system. In practice, a wide
range of pattern recognition techniques can be employed to
develop statistical classifiers which can be trained to detect the
presence of abnormal electric measurements. These classifiers
can be implemented inside smart relays distributed throughout
the distribution network.
The main focus of this paper is on the challenge of Feature
extraction/selection(FES) [20], which is a fundamental component of the previously outlined pattern recognition scheme. It is
an important preprocessing stage that aims to identify a limited
subset of the available features which can then be used as an
input for the statistical classifier. These features are derived
from a wide range of electrical measurements gathered by the
digital relays. A basic aim of FES is to reduce the amount of
data that the classifier would have to handle; in most cases
this will lead to valuable improvements in performance and
computational time. Finally, it is also hoped that this process
well help to identify the electrical quantities that are relevant
for the detection of faults in an isolated IBDGs microgrid, and
which is also robust to false tripping. Specifically, the main
objectives of this paper are as follows:
1) Design and simulate a facility scale microgrid using
appropriate numerical simulations. This will be used
to generate potentially relevant features in the form of
interpretable electrical measurements.
2) The previous step might produce a very large number of
features, hence motivating the use of FES algorithms.
We will test the utility of FES methods in selecting
informative subsets from these features.
3) Two classification algorithms, i.e. Naive Bayes and
Decision Trees [21], will be applied on these feature
subsets. The performance of these tests will provide
valuable insights into the saliency of the electrical quantities used and will also help to validate the overall
methodology.
II. M ETHODS AND DATA
The single line diagram described in Fig. 1 is an example of
a facility microgrid and will henceforth be referred to as M.
M has previously been used to investigate the stability and
viability of load sharing between inverters [22]. More details
about the system under study will be given in Section II-C.
The locations of the relays at both ends of a line section
in M is a necessary condition if the purpose is to disconnect
only the line involved in a fault leaving the remaining portion
of the microgrid energized. For a standalone system, shutting
down all DGs for a fault condition will de-energized the whole
microgrid. Thus, the possibility of detecting and isolating
faults is an attractive feature since unaffected portions of the
standalone system can remain energized.
Microprocessor-based relays are powerful and flexible to
carry the advanced algorithms of data analysis as developed
in this paper. These algorithms are required to process the
following two fundamental protection tasks of fault detection:
2
3
1
4
Load1
Load2
Relays
Fig. 1. Example of IBDGs microgrid used during the simulation study. The
relays are numbered as used in the paper.
Fault discrimination refers to the capability of distinguish
between fault conditions and any other normal events in
the network.
• Fault localization indicates the ability of the relay to
identify the point in the microgrid system where the
anomaly event has occurred.
The purpose of this paper is to find suitable features from
electrical measurements in order to carry on the above protective tasks using classifiers. In the case of fault localization, a
relay is not required to identify the exact position of the fault
in the grid, but only the zone in which the fault has occurred.
It is worthy to note that although fault locators and protective
devices are closely related, there are main differences which
include accuracy of fault location, speed of determining the
faulty position and used data window [23]. Methods such
as distributed voltage measurement [24] and single ended
methods [25] are used for finding the exact fault position
accurately and not only for indicating the general area or zone.
Protective relaying is the main scope of this paper.
•
A. Statistical Classifiers
The general form of a classifier which is to be implemented
in each of the relays is depicted in Fig. 2. The multidimensional time series of voltage and current, i.e T = (V, I)T , is
measured locally by the relay and sampled at frequency fS .
T is considered to span the time interval t0 ≤ t < t1 . At tE
with t0 ≤ tE < t1 an event such as load switching, capacitor
bank switching or a fault is considered to occur in M.
T is partitioned into (overlapping) sliding windows y =
(y1 , · · · , yL )T which are processed by a feature extraction
block. L denotes the length of the time series in terms of
number of samples while τL is its time span. The feature
extraction block computes for each y a K-dimensional vector
x = (x1 , . . . , xK )T of features which is used as an input
to the statistical classifier. This statistical classifier ideally
maps the correspondent vector x (and also the correspondent
sliding window) to a nested classification space C defined by
the two protection tasks. As shown in Fig. 2, this mapping
3
tE
Pre Fault
Post Fault
Fault Detection
...
t1
Sliding Windows
t2
Non Fault
Training
Window
(y1,..,yL)T
Feature
Extraction - Selection
(x1,..,xK)T
p(ω|x1 , · · · , xK ) =
Fault
gFault()
In zone
gL()
Out zone
Fault Location
Fig. 2. Sliding windows approach for the voltage and current time series
measured by each relay. Each sliding windows is classified accordingly. The
first window after the event is taken for training the classifiers.
is implemented by the two single discrimination functions
gFault (·) and gLoc (·) as opposed by a global and more
complex g(·). Each single protective task can be specifically
optimized in this manner. gF (·) maps x to either fault or nonfault classes (ωF , ωN F ). gL (·) maps the two classes in or out
zone (ωIN , ωOUT ).
The training of the statistical classifiers is based on an
ensemble of N fault and non-fault T time series collected
from M. The training dataset is denoted as D and consists of a
set of N feature vectors xD extracted from the corresponding
sliding windows yD in T . Each yD is taken as the first L
samples recorded after the occurrence of event tE , as shown
in Fig. 2. Once the classifier is trained, each of the relays can
distinguish normally events from faulty ones by recording and
analyzing successive windows of electrical measurements.
In this paper, D are collected from simulations of M using
Matlab (Simulink). This methodology differs from the standard pattern recognition practice where the learning process
is supported by data gathered from the real system. This is
necessary since a faulty event is harmful to the distribution
system and thus cannot be safely reproduced for learning
purpose. The machine learning approach to protection as
discussed in this paper can be described as learning from
simulation: the EMTD simulator provides an unlimited amount
of data, thus shifting the problem from one of data collection
to the problem of data generation.
A wide range of machine learning techniques can be used in
this context, though the relative performance of each technique
tends to be system and data dependent. Previous experience in
implementing islanding detection methods for microgrids has
motivated this paper to pursue and compare the Naive Bayes
and the j48 decision trees classifiers [26] [27]. The practical
realization of the previous methods as well as the FES analysis is done using the Waikato Environment for Knowledge
Analysis tool (WEKA) [28]. WEKA implements subroutines
of some of most common machine learning algorithms.
The Naive Bayes is a simple and fast probabilistic classifier,
taken as starting point when analysing big datasets. It assumes
statistical independence of the features, i.e. p(x1 , · · · , xK ) =
Q
K
i=1 p(xi ). K is the total number of features used. The Naive
Bayes classifier is based on maximizing the posterior:
K
1 Y
p(xi )P (ω)
Z i=1
(1)
where Z is the evidence scaling factor and p(xi ) =
nxi |ω /nω with nxi |ω the number of data of xi in ω and nω
the number of data of ω.
The j48 decision tree is an enhanced version of the C4.5
algorithm implemented in WEKA. It belongs to the family of
discriminative classifiers (Naive Bayes is generative). It has
generally better performance than Naive Bayes but is computationally more intensive. While other types of classifiers can
be employed for FES analysis, these complimentary properties
make these classifiers sufficient for this paper investigation. As
discussed in the next section, the FES analysis is applied to
find the best set of electrical features and check the efficiency
of these two different learning methods applied to the datset.
B. Feature Extraction and Selection
Given the datasets D of N sliding windows yD in T ,
seeking the set of features x to give as an input to the statistical
classifiers can in general be decomposed into two steps: feature
extraction and feature selection.
Feature extraction refers to the generation of a number of
numerical indexes which capture informative aspects of the
system being measured, referred to as features or attributes.
In this paper, traditional electrical measurements are computed
as listed in Table I.
Unfortunately, what often happens is that many of these
features turn out to be redundant, overly noisy or otherwise
inappropriate for use in classification. Feature selection is the
process whereby a subset of the K “best” features is identified;
in this context, the “goodness” of a feature may be determined
in a number of ways, which in turn led to the creation of a
variety of different feature selection approaches.
TABLE I
Electrical features used in the analysis
Features
Description
mVA,mVB,mVC
mIA,mIB,mIC
thdVA,thdVB,thdVC
thdIA,thdIB,thdIC
thetaA,thetaB,thetaC
V0,Vplus,Vneg
I0,Iplus,Ineg
RMS voltages in y
RMS currents in y
Voltages total harmonic distortions
Currents total harmonic distortions
Power factor Angles
Voltages symmetrical components
Currents symmetrical components
Since the number of possible subsets is exponential in the
number of features, a number of heuristic approaches have
been devised to facilitate the process of feature selection.
These can be divided into two main types: wrappers and filters.
1) Wrapper based methods: A wrapper is an FES technique which is based on the generalization property of a
learning machine: given a statistical classifier, the best subset
of features is the one with the lowest classification error
amongst all possible subsets. Exhaustively searching the full
space of subsets is exponential in the number of features and is
hence impractical in most realistic scenarios. Hence, heuristic
or stochastic search techniques are frequently used and can
4
provide an acceptable compromise between selection of the
resulting features and computational time. In this study, a
greedy best-first algorithm is employed to search the full F
dimensional space.
2) Filter based methods: These methods work by ranking
each of the N features individually based on their information
content with respect to the target class. A variety of statistical
metrics are used to measure this content. In this paper, we
employ a method known as Information Gain (IG), which is an
information theoretic measure based on the concept of Mutual
Information. It is defined as:
I(xj , ω) = H(c) − H(c|xj ),
(2)
where ω is the class label, i.e. (ωF , ωN F ) or (ωIN , ωOUT ),
and H(c) and H(ω|xj ) are the marginal and conditional
entropies respectively of the class label c.
A filter method gives an indication of the relevance of
each single feature and it is an important preliminary step
of analysis. However, it may fail to take into account the
effects redundancy in subsets of variables. A simple method
to consider the relevance of group of features is to form
nested subsets of features (subset-filtersubset-filter method):
{x1 },{x1 , x2 },· · ·,{x1 , · · · , xF } with I(x1 , ω) > I(x2 , ω) >
· · · > I(xF , ω). Each of these nested subsets can be evaluated
using the classification error of a learning machine.
Similarly, a method applied in this paper and known as
filter-wrapper method is designed to compromise the best
quality of both filters and wrappers. A subset of the top
M relevant variables based on the filter results is retained.
A wrapper method takes the N × M matrix of reduced
data and can search the space of 2M − 1 feature subsets to
find the best possible subset of K elements with the lowest
classification error. With M reasonably low, an exhaustive
search is performed as well as a greedy best-first search as
discussed in the results section.
The classification errors were computed using 5-fold cross
validation, i.e. using only the dataset D. The errors in WEKA
are computed as the mean of the sum squared errors in each
of the folds.
C. Test System
The microgrid M under study was previously depicted in
Fig. 1. M consists of three identical inverter-based DGs and
two loads L1 and L2 which are connected to a 380V three bus
system. Each IDGs incorporates a three-leg Voltage-Sourced
Inverter (VSI), an LC filter, and a coupling inductor as shown
in Fig. 1. The control interface implemented in each IDGs is
based on a droop control technique. It is a well-know structure
adopted for IDG microgrids since it allows power sharing
among the inverter in an autonomous fashion, i.e. without a
central controller.
Microgrids vary in structure, site and operation. For example, microgrids could be operated either in grid connected,
islanded or even have the capabilities to operate in both modes.
In addition, the dynamic behavior of the microgrid during transient events could depend on the type of DG (inverter-based
or synchronous). Lastly, the power management strategy (master/slave versus droop) can have an effect on the performance
of the microgrid. Thus, prior to discussing the case study
results, the scope of this paper needs to be clarified. The paper
focuses on a small scale isolated microgrid with inverter-based
DG equipped with the droop power management approach.
Secondly, the microgrid can be divided into smaller isolated
sections and thus it is assumed that each section has sufficient
amount of power to feed the loads. Lastly the DG interface
control is equipped with current limiters. As mentioned earlier,
for such system, short circuit currents are not high enough to
operate protective devices such fuses and relays.
D. Data Generation
Data is generated by numerically simulating M using
Matlab/Simulink. The switching time for fault or non-fault
events has been set to tE = 0.7s to guarantee that the
microgrid reaches a steady-state condition. Sliding window of
length τL = 50ms is taken from this simulation for tE > 0.7s
to build the dataset D. The sampling rate of the time series
is chosen as fS = 8kHz. A total of N = 21465 simulations
were conducted based on varying the two loads active-reactive
power consumptions of M in Fig. 1, i.e. Load1 and Load2 .
The data generation process needs to cover faithfully the
classification space in Fig. 2 taking into account several fault
and no-fault conditions for a variety of initial conditions and
locations. The internal IBDGs parameters, i.e the inverter
control settings, are fixed in all the simulations. The following
simulation setting were considered:
• 162 microgrid initial conditions are considered. Half of
these initial conditions set the total load of the system to
consume 25% of the maximum generated power while the
remaining half to 60%. This total load is thus divided in
Load1 and Load2 taking into account unbalanced loading
condition.
• 11583 load switching events are considered either at the
bus 1 or bus 2. Unbalanced load switching is considered,
as well.
• Capacitor Switching events, for each of the initial conditions, are considered taking the total load of the system
to have a final power factor of either 0.9 or 0.99.
• 9234 faults were simulated at different locations of line
1 and line 2. Different fault typologies are taken into
account with different values of the fault resistance Rf =
0.001, 0.01, 1, 5, 9Ω.
The features in Table I computed from D were normalized
in order to better encode their statistical distribution and
remove the dependence on the size of the feature [20]. Each
feature is normalized by the transformation x′ = x/ |x|
Figs. 3(a) and 3(b) show examples of simulations for
the phase A RMS voltage and current time series for load
switching, capacitor switching and a three phase-to-ground
fault respectively. These figures shows that it is difficult to
differentiate faults from other events using only the magnitude
of the resulting currents. On the other hand, these figures show
that there could be more subtle differences in the waveforms
(using other features such as the voltage as shown in Fig. 3(b))
which are identifiable using statistical classifiers. Fig. 3(c)
shows the distribution of the magnitude of the current mIA
(computed on the first time window with τL = 50ms) in all
5
Current Ia
Voltage Va
26
24
250
Fault
Load
Capacitor
18
150
Event
RMS (A)
20
Event
RMS (A)
200
22
100
50
16
0.7
0.75
0.8
0.85
0
0.6
0.9
0.65
0.7
0.75
Time (s)
(a)
(b)
0.85
0.9
0
20
Simulations
40
0.8
40
Time (s)
80
0.65
0
Simulations
14
0.6
Fault
Load
Capacitor
0
1
0.5
0
mIA
(c)
0.5
1
ThetaA
(d)
Fig. 3. (a) and (b) shows the current and voltage of phase a for different type of events. (c) shows the distribution of the feature mIa for fault/no fault
(blue/red in (c)). (d) shows the distribution of the feature ThetaA for fault in Line 1/Line 2 (blue/red in (d))
TABLE II
Filters results for τL = 50ms
Vneg
Ineg
Iplus
mIA
thdVB
mVC
mIB
thdVA
mIC
mVB
0.681
0.651
0.6035
0.601
0.58275
0.57675
0.566
0.5325
0.48625
0.473
0.7
Localization
Iplus
thetaB
Ineg
thetaA
thetaC
Vplus
mIA
mIB
mIC
thdIA
0.2838375
0.2694525
0.2611675
0.2444775
0.22349
0.199085
0.18167
0.165265
0.1396375
0.1116375
Relay 1
Relay 2
Relay 3
Relay 4
0.6
Index InfoGain
Discrimination
Fault Discrimination
0.8
0.5
0.4
0.3
0.2
0.1
Ineg
Vneg
Iplus
mIA
mIB
thdVB
mVC
mIC
thdVA
Vplus
mVB
mVA
thdVC
thetaA
V0
I0
thdIA
thetaC
thetaB
thdIB
thdIC
datasets in D. Using only this feature to discriminate fault
from non-fault events would be extremely difficult. Similarly,
Fig. 3(d) depicts the histogram of ThetaA which highlights
the difficulty of using this feature alone for fault localization.
Fig. 4.
Value of IG across relays for τL = 50ms for fault discrimination
III. R ESULTS
The result section is divided in two parts: Section III-A
highlights on the relevance of the selected features for the
protection of the microgrid against faults. Section III-B discuss
the behavior and the computational performance of the various
FES methods.
A. Robust Features for Fault Discrimination and Localization
1) Filter analysis: Figs. 4 and 5 show the InfoGain(IG)
scores for fault discrimination and localization when using
a time window of τL = 50ms. This analysis is performed
independently in each of the relays. Since each of the relays
displays different IG values, the features in these figures are
ranked by their maximum IG computed across relays. Table II
provides the overall standing of the first 10 best features taken
using the mean of IG across the relays.
By a visual inspection, these results suggest that localization
is a harder problem to solve than discrimination. Figs. 4
and 5 indicate that discrimination-specific features tend to
have higher IG values compared to localization ones. In
fact, Table II reports that the first 5 features have IG∼0.6
for discrimination while IG∼0.2 for localization. Secondly,
localization features are also more variable across relays than
discrimination ones as can it be seen in both figures.
Interestingly, both Figs. 4 and 5 show that the symmetrical
components of V and I tend to have very high values of IG,
indicating that these features contain a lot of information regarding both the presence and location of faults. In particular,
symmetrical components of I are in the top 3 positions in both
discrimination and localization values of IG as can be seen in
Table II. Symmetrical components are widely used in short
6
0.5
Fault Location
Relay 1
Relay 2
Relay 3
Relay 4
Index InfoGain
0.4
0.3
0.2
0.1
Iplus
Ineg
thetaB
thetaA
thetaC
mIA
mIB
Vplus
mIC
thdIA
mVC
Vneg
thdVB
thdIB
thdVC
thdVA
I0
mVA
mVB
thdIC
V0
0.0
Fig. 5.
Value of IG across relays for τL = 50ms for fault localization
circuit analysis and using such features in conjunction with
other features, symmetrical components become important for
the problem of fault detection. For fault localization only,
the phase angles also seem to be relevant according to the
results. Table II shows that phase angles are ranked in the
first 5 positions together with symmetrical components. Since
this feature contains information about power flow directions,
it is reasonable that it may contain information about the
location of a fault. For fault discrimination, the total harmonic
distortion of the voltage have some relevance for fault discrimination. As cited in [5], non-linearity due the saturation
effect of the power electronics during faults, may be a possible
reasons for its presence among the best features
These results are already of significant interest since they
indicate that new candidate features that could be valid substitutes for the magnitude of the I which is known to be an
issue for inverter-based microgrids. However, the magnitude
of I still has a role in fault detection since it appears in the
top 10 positions in Table II.
2) Wrapper analysis: Tables III and IV display the best
feature subsets and their respective classification accuracies
by using three different wrapper methods for both fault discrimination and localization. Accuracies are calculated using
5-fold cross validation. In particular, a wrapper method with
a greedy search technique having all features as initial set
and two wrappers with greedy and exhaustive searches with
the top 10 features in Table II as initial set are considered.
The results of these three wrappers are compared to the best
classifiers built by using all set of features as input and by the
one which uses only the top 10 features in Table II.
As for the filter analysis, since wrappers are applied independently for each of the relays a procedure to ”average” each
individual outcome is considered. Tables III and IV report the
final subsets of features that are in common in at least 2 of the
relays. These final features are listed following the ranking of
Table II to better visualize common items.
The results of Tables III and IV highlight the following:
Firstly, the wrappers accuracies are similar among the same
detection task and classifier. Secondly, there are minor differences between the various wrappers in terms of selected
features: different wrappers in each of the fault detection tasks
seem to broadly converge to similar set of features. These
results suggest that some of these features, being method independent, are more important than others for fault detection.
From the composition of the selected subsets, these tables
seems to confirm some of the preliminary remarks made in the
previous filter analysis: while the magnitude of current (and
voltage) are among the most frequently selected features, other
electrical measurements may help to improve classification
accuracy. Some of the symmetrical components are repeatedly
present in both fault discrimination (Vneg, Ineg and Iplus) and
localization (Iplus, Ineg and Vplus) in Tables III and IV. All
the three phase angles are included in the best feature subsets
for the fault localization problem emphasizing their relevance
for this particular task. The total harmonic distortion terms
are showing up for both discrimination and localization tasks.
However, they do not seem to be occurring consistently across
the tables. Thus, while present in the tables, total harmonic
distortion terms may be less relevant than symmetrical components and phase angles for fault detection.
B. Statistical Classifier Based Relays
Tables III and IV displays information about the different performance of both classifiers and wrappers. Firstly, as
discussed, for both classifiers the result of greedy searches
are comparable to the computational expensive exhaustive
search (when applied to a subset of the data). Therefore, a
reduced greedy search, i.e. the filter-wrapper method, may be
an attractive choice from a computational perspective since the
final results are broadly equivalent. Secondly, the j48 achieves
better results than the Naive Bayes in both fault discrimination
and localization problems (j48 accuracies are close to 100%).
This makes the j48 the best choice for both tasks. As depicted
in Table IV, the difference between classifiers is greatest in
the case of fault localization as the Naive Bayes accuracies
are reduced to about 70%. Since classification performance for
the j48 is also slightly higher for fault discrimination (∼99%
instead of ∼98%), the previous tables confirm the observation
previously made in Section III-A that fault localization is a
more difficult problem than discrimination.
The different behaviors of the two classifiers can be further
inspected by the help of Figs. 6-9. These figures plot the
accuracies across relays by using the subset-filter method as
explained in Section II-B: a series of nested subsets with an
increasing number of features are constructed from the ranking
of the IG values taken from Table II.
As shown from Figs. 6 and 7, the Naive Bayes produced
a result which correlates with the main motivation behind a
FES analysis. The classifier reaches the maximum accuracy
between 5 to 10 features. Increasing the number of features
beyond these values would degrade its performance. Therefore, feature selection is a necessary practice for Naive Bayes
in order to optimize its classification accuracy. This subsetfilter method, for instance, may be used as a valid approach
for the Naive Bayes to find the optimal number of features.
The behavior of the j48 is different than the Naive Bayes as
its classification accuracy increases as the number of features
fed into the decision tree increase as shown in Figs. 8 and 9.
7
TABLE III
Best subsets using wrappers for the fault discrimination problem
J48
Naive Bayes
Discrimination
Methods
Classifier with all features
Classifier with top 10 features from filter
Greedy search wrapper from all features
Greedy search filter-wrapper using only top 10 features
Exhaustive search filter-wrapper using only top 10 features
%
95.834
97.042
97.835
97.812
97.812
Features
ALL
Vneg Ineg
Vneg Ineg
Vneg Ineg
Vneg Ineg
Classifier with all features
Classifier with top 10 features from filter
Greedy search wrapper from all features
Greedy search filter-wrapper using only top 10 features
Exhaustive search filter-wrapper using only top 10 features
99.859
99.802
99.857
99.816
99.816
ALL
Vneg
Vneg
Vneg
Vneg
Iplus mIA thdVB mVC mIB thdVA mIC mVB
mIA thdVB mVC mIB mVA thetaA thetaC
mIA thdVB mVC mIB mVB
mIA thdVB mVC mIB mVB
Ineg Iplus mIA thdVB mVC mIB thdVA mIC mVB
Iplus mVC mIB mIC mVB mVA thdIB V0 thetaB thetaC
Ineg Iplus thdVB mVC mIB thdVA mIC mVB
Ineg Iplus thdVB mVC mIB thdVA mIC mVB
TABLE IV
Best subsets using wrappers for the fault localization problem
%
66.459
67.621
68.687
69.388
67.912
Features
ALL
Iplus thetaB
Iplus thetaB
Iplus thetaB
Iplus thetaB
Ineg
Ineg
Ineg
Ineg
Classifier with all features
Classifier with top 10 features from filter
Greedy search wrapper from all features
Greedy search filter-wrapper using only top 10 features
Exhaustive search filter-wrapper using only top 10 features
98.732
98.482
98.611
98.499
98.499
ALL
Iplus
Iplus
Iplus
Iplus
Ineg thetaA thetaC Vplus mIA
thetaA thetaC Vplus mIA mIB
Ineg thetaA thetaC Vplus mIA
Ineg thetaA thetaC Vplus mIA
Fault Discrimination
98
96
94
92
90
88
Relay 1
Relay 2
Relay 3
Relay 4
86
84
5
10
15
Number of Input Features
Fig. 8.
Fault Localization
Classification Accurancy
70
65
Relay 1
Relay 2
Relay 3
Relay 4
55
Fig. 7.
5
10
15
Number of Input Features
98
97
96
95
94
Relay 1
Relay 2
Relay 3
Relay 4
93
92
5
Filter-Subset results for the Naive Bayes for fault localization
Fig. 9.
10
15
20
Number of Input Features
Filter-Subset results for the j48 for fault discrimination
Fault Localization
95
90
85
80
75
70
Relay 1
Relay 2
Relay 3
Relay 4
65
60
20
mIB mIC thdIA
mVA mVB mVC thdVB V0 Vneg
mIB mIC thdIA
mIB mIC thdIA
Fault Discrimination
100
75
60
Vplus mIA mIB mIC thdIA
mIA mVB mVC
Vplus
Vplus mIA
99
91
20
Filter-Subset results for the Naive Bayes for fault discrimination
80
thetaC
thetaC
thetaC
thetaC
100
Classification Accurancy
Classification Accurancy
100
Fig. 6.
thetaB
thetaB
thetaB
thetaB
thetaA
thetaA
thetaA
thetaA
Classification Accurancy
J48
Naive Bayes
Localization
Methods
Classifier with all features
Classifier with top 10 features from filter
Greedy search wrapper from all features
Greedy search filter-wrapper using only top 10 features
Exhaustive search filter-wrapper using only top 10 features
5
10
15
Number of Input Features
Filter-Subset results for the j48 for fault localization
20
8
j48 Tree Complexity
400
20
Features on the Tree
350
15
Size of the Tree
300
250
200
TABLE V
Complexity of decision trees using wrappers
10
150
5
100
50
0
roughly positioned in Fig. 10 at the point of intersection of the
two curves. This outcome indicates that the wrapper methods
select trees with a lower complexity than the j48 with full
feature set, but achieving similar performances.
5
10
15
Number of Input Features
Size
Features
20 0
Method
Size
Num. Features
Greedy wrapper from all features
Greedy filter-wrapper from top 10
Greedy exhaustive wrapper from top 10
223
226
243
10
9
7
IV. D ISCUSSION
Fig. 10. Measure of the complexity of the j48 trees for the filter-subset
method in the case of fault discrimination
The main results from the simulation of a facility scale
microgrid with IBDGs can be summarized as follows:
This result can also be noticed in Tables III and IV where the
accuracy of classifiers build by using all features are higher
than any of the wrapper methods proposed, i.e max of 99.859%
for discrimination and max of 99.732% for localization. The
behavior of the j48 can be explained by referring to the
algorithm for decision tree induction. The building of the
tree already incorporates some built-in feature selection procedures: the j48 exploits and prunes the set of features given
as input producing an optimal subset of attributes which are at
the decision nodes of the tree. Therefore, it seems from these
figures that decision tree methods may not need a separate
feature selection approach. In order to save computational time
and to have a higher classification accuracy, all features may
be given to the decision trees which will apply its FES method.
However, as described in Fig. 10, FES may be a reasonable
step to take in order to regulate the complexity of the j48
decision trees. Fig. 10 presents two factors that determine
the complexity of this tree: the size of the trees, i.e. number
of nodes, and the number of features used at its decision
nodes. Fig. 10 plots both quantities averaged across relays,
as a function of the size of the nested subsets and relative to
the localization task only. Similar consideration applies to the
fault discrimination task, as well.
Fig. 10 shows that the size of the trees decreases and
seems to stabilize beyond 10-15 features, while the number of
features used in the tree increases almost linearly as the size of
the nested subsets increases. Therefore, using the j48 without
FES may produce a tree with a high number of decision
features (high complexity). However, by inspecting both Fig. 9
and Fig. 10 similar accuracies can be achieved by building a
tree with less number of decision features (but slightly higher
number of nodes). Since it is well-known that decision trees
are affected by the problem of data overfitting, using a tree
with lower complexity may be a better choice. In practice,
depending on the computational requirement of the problem,
FES analysis may still be a useful step to conduct for the j48
classifier in order to avoid overfitting.
Table V is computed to investigate the tree complexities in
the case of fault localization using the same wrapper methods
considered in Table IV. Table V shows that all wrappers
achieve the best performances by building trees which can be
•
•
•
Symmetrical components are among the best features for
both fault detection tasks while phase angles are relevant
for fault localization. Employing these quantities helps to
avoid some of the problems associated with using only
the magnitude of the current as a feature.
The feature selection method depends on the choice
of the classifier and on the computational requirements
set before the analysis. FES analysis is shown to be
important for both classifiers as it may help regulate their
computational complexity. The j48 outperform the Naive
Bayes in both tasks and it is thus considered as the better
choice to employ.
The FES methods described in this paper can be taken
together as a general framework for enhancing data
mining solutions in microgrids: a filter is an excellent
starting point since the feature ranking provides a first
indication of the most relevant attributes. This ranking can
be further explored by a subset-filter method to further
eliminate irrelevant features and to check the complexity
of the classifier. Lastly, a reduced greedy search may be
used as a final step when selecting the best set of features.
Given the previous results a few issues can be further investigated. Firstly, in order to obtain a more generally applicable
result, the microgrid analyzed should be expanded in size,
i.e. a larger number of lines, and should be integrated with
other types of elements such as motor switching or nonlinear
loads. Large scale microgrids can further clarify the role of
symmetrical components and phase angle as discriminatory
features as well as the behavior of the two statistical classifiers.
The j48 overfitting problem in particular needs to be further
assessed with a larger group of input features. Secondly, generating data may become easily unfeasible for such large microgrids. Machine learning techniques require ”good” examples
to train the statistical classifier based relays. The problem of
deciding what constitutes a ”good” example is crucial. In this
paper some typical cases of load switching and faults where
simulated numerically as common sense examples. Finding
an intelligent algorithm which either samples the space of
possible simulations or gives an indication of what to simulate
(”good example”) may save computational resources and be
more robust against hard classification cases.
9
V. C ONCLUSION
This paper is motivated by the problem of protection of an
isolated IBDG microgrid against faults. Traditional overcurrent
protection schemes may fail due to the limitation on the
magnitude of faults currents circulating in the microgrid. The
main contribution of this paper is to propose a data mining
framework which extracts and selects electrical features that
are more resilient to fault than traditional overcurrent methods.
These features are used as input to statistical classifiers which
can be implemented inside smart digital relays. Within this
data mining framework, this paper discussed few possible
feature selection techniques based on two statistical classifiers:
Naive Bayes and j48. The simulation of a facility scale
microgrid with IBDGs is used to test the proposed data mining
framework. The main results show the importance of using
a feature selection approach for optimizing computational
time and classification performances. Moreover the study on
this paper dataset have shown that there are alternatives to
overcurrent features for fault discrimination and localization
such as symmetrical components (for fault discrimination and
localization) and phase angles (for localization).
R EFERENCES
[1] N. Jenkins, R. Allan, P. Crossley, D. Kirschen, and G. Strbac, Embedded
Generation (Power & Energy Ser. 31). INSPEC, Inc., 2000.
[2] R. H. Lasseter, “Microgrids,” in Proc. IEEE Power Engineering Society
Winter Meeting, vol. 1, 2002, pp. 305–308.
[3] R. Lasseter, J. Eto, B. Schenkman, J. Stevens, H. Vollkommer, D. Klapp,
E. Linton, H. Hurtado, and J. Roy, “CERTS microgrid laboratory test
bed,” Power Delivery, IEEE Transactions on, vol. 26, pp. 325–332, 2011.
[4] H. H. Zeineldin, E. F. El-Saadany, and M. M. A. Salama, “Distributed
generation micro-grid operation: Control and protection,” in Proc. Power
Systems Conf.: Advanced Metering, Protection, Control, Communication, and Distributed Resources PS ’06, 2006, pp. 105–111.
[5] J. Wei, H. Zheng-you, and B. Zhi-qian, “The overview of research on
microgrid protection development,” in Intelligent System Design and
Engineering Application (ISDEA), 2010 International Conference on,
vol. 2, oct. 2010, pp. 692 –697.
[6] H. J. Laaksonen, “Protection principles for future microgrids,” vol. 25,
no. 12, pp. 2910–2918, 2010.
[7] B. Hussain, S. Sharkh, S. Hussain, and M. Abusara, “Integration of
distributed generation into the grid: protection challenges and solutions,”
in Developments in Power System Protection (DPSP 2010). Managing
the Change, 10th IET International Conference on. IET, 2010, pp. 1–5.
[8] J. Keller and B. Kroposki, “Understanding fault characteristics of
inverter-based distributed energy resources,” National Renewable Energy
Laboratory, Tech. Rep. NREL/TP-550-46698, January 2010.
[9] D. Turcotte and F. Katiraei, “Fault contribution of grid-connected
inverters,” in Proc. IEEE Electrical Power & Energy Conf. (EPEC),
2009, pp. 1–5.
[10] “IEEE application guide for IEEE std 1547, IEEE standard for interconnecting distributed resources with electric power systems,” IEEE Std
1547.2-2008, pp. 1 –207, 15 2009.
[11] M. Michalik, M. ukowicz, W. Rebizant, S.-J. Lee, and S.-H. Kang, “New
ann-based algorithms for detecting hifs in multigrounded mv networks,”
Power Delivery, IEEE Transactions on, vol. 23, no. 1, pp. 58 –66, 2008.
[12] B. Russell and R. Chinchali, “A digital signal processing algorithm for
detecting arcing faults on power distribution feeders,” Power Delivery,
IEEE Transactions on, vol. 4, no. 1, pp. 132 –140, jan 1989.
[13] A. Girgis, W. Chang, and E. Makram, “Analysis of high-impedance fault
generated signals using a kalman filtering approach,” Power Delivery,
IEEE Transactions on, vol. 5, no. 4, pp. 1714 –1724, oct 1990.
[14] S.-J. Huang and C.-T. Hsieh, “High-impedance fault detection utilizing a
morlet wavelet transform approach,” Power Delivery, IEEE Transactions
on, vol. 14, no. 4, pp. 1401 –1410, oct 1999.
[15] I. Baqui, I. Zamora, J. Mazn, and G. Buigues, “High impedance fault
detection methodology using wavelet transform and artificial neural
networks,” Electric Power Systems Research, vol. 81, no. 7, pp. 1325–
1333, 2011.
[16] T. Loix, T. Wijnhoven, and G. Deconinck, “Protection of microgrids
with a high penetration of inverter-coupled energy sources,” in Proc.
CIGRE/IEEE PES Joint Symp. Integration of Wide-Scale Renewable
Resources Into the Power Delivery System, 2009, pp. 1–6.
[17] E. Sortomme, G. J. Mapes, B. A. Foster, and S. S. Venkata, “Fault
analysis and protection of a microgrid,” in Proc. 40th North American
Power Symp. NAPS ’08, 2008, pp. 1–6.
[18] R. M. Tumilty, M. Brucoli, G. M. Burt, and T. C. Green, “Approaches
to network protection for inverter dominated electrical distribution
systems,” in Power Electronics, Machines and Drives, 2006. The 3rd
IET International Conference on, mar. 2006, pp. 622 –626.
[19] N. Perera and A. D. Rajapakse, “Recognition of fault transients using a
probabilistic neural-network classifier,” vol. 26, pp. 410–419, 2011.
[20] I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Eds., Feature Extraction, Foundations and Applications. Springer, 2006.
[21] M. A. Bramer, Principles of Data Mining, ser. Undergraduate Topics in
Computer Science. London, UK: Springer, 2007.
[22] N. Pogaku, M. Prodanovic, and T. Green, “Modeling, analysis and
testing of autonomous operation of an inverter-based microgrid,” Power
Electronics, IEEE Transactions on, vol. 22, no. 2, pp. 613 –625, 2007.
[23] M. Saha, J. Izykowski, and E. Rosolowski, Fault Location on Power
Networks, ser. Power Systems, Springer, Ed. Springer, 2009.
[24] M. Tremblay, R. Pater, and F. Zavoda, “Accurate fault-location technique
based on distributed power-quality measurements,” in Proceeding 19th
CIRED, 2007.
[25] C37.114, IEEE Guide for Determining Fault Location on AC Transmission and Distribution Lines, IEEE Power Engineering Society Std.,
2005.
[26] N. Kan’an, L. Farouk, H. Zeineldin, and W. Woon, “Effect of dg location
on multi-parameter passive islanding detection methods,” in Power and
Energy Society General Meeting, 2010 IEEE, july 2010, pp. 1 –6.
[27] W. K. A. Najy, H. H. Zeineldin, A. H. K. Alaboudy, and W. L.
Woon, “A bayesian passive islanding detection method for inverterbased distributed generation using esprit,” IEEE Transactions on Power
Delivery, vol. 26, no. 4, pp. 2687–2696, 2011.
[28] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.
Witten, “The weka data mining software: an update,” SIGKDD Explor.
Newsl., vol. 11, no. 1, pp. 10–18, 2009.
Erik Casagrande received a M.Sc. in control engineering in 2003 and a M.Sc.
in numerical modeling in 2005 from Padova University, Italy. He received the
Ph.D degree from the Neural Computing Research Group, Aston University,
Birmingham, U.K. in 2010. Currently, he is a postdoc with the Masdar Institute
of Science and Technology, Abu Dhabi, United Arab Emirates. His research
interests include machine learning and data mining techniques for biomedical
data, natural language processing and smart grids.
H. H. Zeineldin (M06) received the B.Sc. and M.Sc. degrees in electrical
engineering from Cairo University, Cairo, Egypt, in 1999 and 2002, respectively, and the Ph.D. degree in electrical and computer engineering from
the University of Waterloo, Waterloo, ON, Canada. He was with Smith and
Andersen Electrical Engineering Inc., where he was involved with projects
involving distribution system design, protection, and distributed generation.
He then was a Visiting Professor at the Massachusetts Institute of Technology
(MIT), Cambridge. Currently, he is an Associate Professor with the Masdar
Institute of Science and Technology, Abu Dhabi, United Arab Emirates. His
current interests include power system protection and distributed generation.
Wei Lee Woon (M08) received the B.Eng. degree in electronic engineering
(Hons.) from the University of Manchester Institute of Science and Technology, Manchester, U.K. (now merged with the University of Manchester),
in 1997, and the Ph.D. degree from the Neural Computing Research Group,
Aston University, Birmingham, U.K., in 2002. Upon graduation, he joined
the Malaysia University of Science and Technology (MUST) as an Assistant
Professor, where he served until 2007. He subsequently joined the Masdar
Institute of Science and Technology, Abu Dhabi, United Arab Emirates. He
has also worked as a Visiting Researcher at the Massachusetts Institute of
Technology, Cambridge, and at the RIKEN Brain Science Institute, Tokyo,
Japan. His research interests include technology mining, analysis of distributed
generation systems, and EEG signal analysis.
N. H. Kan’an (S’10) received the B.Sc degree in electrical power engineering
from Yarmouk University, Irbid, Jordan, in 2008. He received his M.Sc degree
in engineering systems and management from Masdar Institute, Abu Dhabi,
UAE, in 2011. Prior joining Masdar Institute, Mr. Kan’an worked in the power
systems substations division of Asea Brown Boveri (ABB) near-east for a year.
Currently, he is working towards his Ph.D degree in electrical engineering at
the University of Connecticut (UCONN), Storrs, USA. His current research
interests include: distributed generation control, modeling and simulation and
the real-time simulation of bulk power systems.
Download