1 A Data Mining Approach to Fault Detection for Isolated Inverter-based microgrids E. Casagrande, W. L. Woon, Member, IEEE,, H. H. Zeineldin, Member, IEEE and N. H.Kan’an, Student Member, IEEE Abstract—This paper investigates the problem of fault protection in a microgrid containing inverter-based distributed generators (IBDGs). Due to the low magnitude of short circuit currents generated by IBDGs, traditional protection techniques which relay on current (fuses and overcurrent relays) may fail to protect such networks. This paper addresses the problem of finding suitable features derived from local electrical measurements that can be used by statistical classifiers to better discriminate fault events from normal network events. Given a series of simple electrical features, a study of feature selection and data mining techniques is conducted in the context of fault detection in isolated microgrids with inverter-based DGs. Two statistical classifiers are compared and implemented in this framework: Naive Bayes and Decision Trees. The proposed approach is tested on a facility scale microgrid consisting of three IBDGs. Index Terms—microgrid, inverter, fault analysis, feature selection, data mining I. I NTRODUCTION A. Fault Protection in microgrid T HE protection of transmission and distribution lines against faults is an important step in safeguarding the electric power system. Short circuits may occur due to mechanical or natural reasons and can seriously damage electrical equipment if they are not quickly and properly addressed. The occurrences of these faults cannot be completely eliminated during the planning phase of the system and devices such as relays, circuit breakers, re-closers and fuses have traditionally been deployed for the protection of power networks. The literature extensively discusses how to optimally design, place and tune these devices for a typical 20th century power grid, i.e. a central generator and a radial distribution structure. In the last few decades, this scenario has been revised following the public debate over climate changes and the consequent demand for sustainable production of energy. Innovative solutions for Distributed Energy Resource (DERs) such as Distributed Generators (DG), storage devices and electric vehicles have been developed to take advantage of renewable sources of energy such as solar and wind power [1]. The integration of these new elements into the grid has resulted in significant improvements in the energy efficiency of traditional transmission and distribution systems. Microgrids and Inverter-based Distributed Generators (IBDGs) are important components of this next generation of This work was supported by the Masdar Institute of Science and Technology. E. Casagrande, H. H. Zeineldin and W. L. Woon are with the Masdar Institute of Science and Technology, Abu Dhabi, UAE (email: ecasagrande@masdar.ac.ae, hzainaldin@masdar.ac.ae, wwoon@masdar.ac.ae) power systems. A microgrid is a low voltage distribution system that operates connected to the global medium voltage grid or runs as a standalone system by using local DGs in a self sufficient and coordinated way (islanding mode) [2][3]. While IBDG microgrids are believed to be an effective approach for the future sustainable production of energy, they have also raised some issues regarding the design, management as well as the protection of these novel types of power networks [4][5]. From a protection perspective, adding DGs to the grid have been shown to have an influence on the fault current amplitude, direction and duration [6][7]. With IBDGs the maximum fault current generated by the inverter is limited to twice its rated current [8] as opposed to the case of a synchronous machine which generate up to 4 to 10 times [9]. In addition, IEEE Std 1547.2 states that fault current from inverter-based distributed resources is very limited (about 100% to 200% of normal load current)[10]. A similar protection problem where fault current levels are comparable with load currents is the case of High Impedance Faults (HIF)[11]. Methods developed for detecting HIF should be capable of distinguishing between faulty and healthy feeders. Some of the proposed approaches for HIF detection include harmonic component methods [12], Kalman filtering [13], wavelet transform [14][15] and artificial neural networks [15]. Although both HIF and the IBDG microgrids share the problem of low fault currents, the response of a system to a HIF would be different than a fault on an IBDGs microgrid. Traditional relays which are designed to trip in the presence of high short circuit currents may be unable to detect the lower fault currents produced by IBDG [16][17][18]. Alternatively, reducing the relay threshold can result in an increase in false tripping but naturally occurring events such as load switching may resemble network faults. This unacceptable situation is accentuated in microgrids as these tend to be smaller in size and hence more prone to aperiodic and unbalanced loading conditions; a further issue is the variability of the power generation due to the intrinsic nature of the renewables, i.e. intermittency of the wind and the solar sources. In summary, integrating DGs and IBDGs into microgrids may result in malfunctions and false tripping of the protective devices if they are designed using the traditional approach (overcurrent). B. Objectives and Contributions Motivated by the above mentioned problems, this paper proposes a data mining approach to protecting IBDG microgrids. The protection of a power system can be framed as a 2 classification task [19]: any event occurring on the network which perturbs the system can be categorized by its type and location. A fault is a particular kind of event for which the new state is harmful to the electric system. In practice, a wide range of pattern recognition techniques can be employed to develop statistical classifiers which can be trained to detect the presence of abnormal electric measurements. These classifiers can be implemented inside smart relays distributed throughout the distribution network. The main focus of this paper is on the challenge of Feature extraction/selection(FES) [20], which is a fundamental component of the previously outlined pattern recognition scheme. It is an important preprocessing stage that aims to identify a limited subset of the available features which can then be used as an input for the statistical classifier. These features are derived from a wide range of electrical measurements gathered by the digital relays. A basic aim of FES is to reduce the amount of data that the classifier would have to handle; in most cases this will lead to valuable improvements in performance and computational time. Finally, it is also hoped that this process well help to identify the electrical quantities that are relevant for the detection of faults in an isolated IBDGs microgrid, and which is also robust to false tripping. Specifically, the main objectives of this paper are as follows: 1) Design and simulate a facility scale microgrid using appropriate numerical simulations. This will be used to generate potentially relevant features in the form of interpretable electrical measurements. 2) The previous step might produce a very large number of features, hence motivating the use of FES algorithms. We will test the utility of FES methods in selecting informative subsets from these features. 3) Two classification algorithms, i.e. Naive Bayes and Decision Trees [21], will be applied on these feature subsets. The performance of these tests will provide valuable insights into the saliency of the electrical quantities used and will also help to validate the overall methodology. II. M ETHODS AND DATA The single line diagram described in Fig. 1 is an example of a facility microgrid and will henceforth be referred to as M. M has previously been used to investigate the stability and viability of load sharing between inverters [22]. More details about the system under study will be given in Section II-C. The locations of the relays at both ends of a line section in M is a necessary condition if the purpose is to disconnect only the line involved in a fault leaving the remaining portion of the microgrid energized. For a standalone system, shutting down all DGs for a fault condition will de-energized the whole microgrid. Thus, the possibility of detecting and isolating faults is an attractive feature since unaffected portions of the standalone system can remain energized. Microprocessor-based relays are powerful and flexible to carry the advanced algorithms of data analysis as developed in this paper. These algorithms are required to process the following two fundamental protection tasks of fault detection: 2 3 1 4 Load1 Load2 Relays Fig. 1. Example of IBDGs microgrid used during the simulation study. The relays are numbered as used in the paper. Fault discrimination refers to the capability of distinguish between fault conditions and any other normal events in the network. • Fault localization indicates the ability of the relay to identify the point in the microgrid system where the anomaly event has occurred. The purpose of this paper is to find suitable features from electrical measurements in order to carry on the above protective tasks using classifiers. In the case of fault localization, a relay is not required to identify the exact position of the fault in the grid, but only the zone in which the fault has occurred. It is worthy to note that although fault locators and protective devices are closely related, there are main differences which include accuracy of fault location, speed of determining the faulty position and used data window [23]. Methods such as distributed voltage measurement [24] and single ended methods [25] are used for finding the exact fault position accurately and not only for indicating the general area or zone. Protective relaying is the main scope of this paper. • A. Statistical Classifiers The general form of a classifier which is to be implemented in each of the relays is depicted in Fig. 2. The multidimensional time series of voltage and current, i.e T = (V, I)T , is measured locally by the relay and sampled at frequency fS . T is considered to span the time interval t0 ≤ t < t1 . At tE with t0 ≤ tE < t1 an event such as load switching, capacitor bank switching or a fault is considered to occur in M. T is partitioned into (overlapping) sliding windows y = (y1 , · · · , yL )T which are processed by a feature extraction block. L denotes the length of the time series in terms of number of samples while τL is its time span. The feature extraction block computes for each y a K-dimensional vector x = (x1 , . . . , xK )T of features which is used as an input to the statistical classifier. This statistical classifier ideally maps the correspondent vector x (and also the correspondent sliding window) to a nested classification space C defined by the two protection tasks. As shown in Fig. 2, this mapping 3 tE Pre Fault Post Fault Fault Detection ... t1 Sliding Windows t2 Non Fault Training Window (y1,..,yL)T Feature Extraction - Selection (x1,..,xK)T p(ω|x1 , · · · , xK ) = Fault gFault() In zone gL() Out zone Fault Location Fig. 2. Sliding windows approach for the voltage and current time series measured by each relay. Each sliding windows is classified accordingly. The first window after the event is taken for training the classifiers. is implemented by the two single discrimination functions gFault (·) and gLoc (·) as opposed by a global and more complex g(·). Each single protective task can be specifically optimized in this manner. gF (·) maps x to either fault or nonfault classes (ωF , ωN F ). gL (·) maps the two classes in or out zone (ωIN , ωOUT ). The training of the statistical classifiers is based on an ensemble of N fault and non-fault T time series collected from M. The training dataset is denoted as D and consists of a set of N feature vectors xD extracted from the corresponding sliding windows yD in T . Each yD is taken as the first L samples recorded after the occurrence of event tE , as shown in Fig. 2. Once the classifier is trained, each of the relays can distinguish normally events from faulty ones by recording and analyzing successive windows of electrical measurements. In this paper, D are collected from simulations of M using Matlab (Simulink). This methodology differs from the standard pattern recognition practice where the learning process is supported by data gathered from the real system. This is necessary since a faulty event is harmful to the distribution system and thus cannot be safely reproduced for learning purpose. The machine learning approach to protection as discussed in this paper can be described as learning from simulation: the EMTD simulator provides an unlimited amount of data, thus shifting the problem from one of data collection to the problem of data generation. A wide range of machine learning techniques can be used in this context, though the relative performance of each technique tends to be system and data dependent. Previous experience in implementing islanding detection methods for microgrids has motivated this paper to pursue and compare the Naive Bayes and the j48 decision trees classifiers [26] [27]. The practical realization of the previous methods as well as the FES analysis is done using the Waikato Environment for Knowledge Analysis tool (WEKA) [28]. WEKA implements subroutines of some of most common machine learning algorithms. The Naive Bayes is a simple and fast probabilistic classifier, taken as starting point when analysing big datasets. It assumes statistical independence of the features, i.e. p(x1 , · · · , xK ) = Q K i=1 p(xi ). K is the total number of features used. The Naive Bayes classifier is based on maximizing the posterior: K 1 Y p(xi )P (ω) Z i=1 (1) where Z is the evidence scaling factor and p(xi ) = nxi |ω /nω with nxi |ω the number of data of xi in ω and nω the number of data of ω. The j48 decision tree is an enhanced version of the C4.5 algorithm implemented in WEKA. It belongs to the family of discriminative classifiers (Naive Bayes is generative). It has generally better performance than Naive Bayes but is computationally more intensive. While other types of classifiers can be employed for FES analysis, these complimentary properties make these classifiers sufficient for this paper investigation. As discussed in the next section, the FES analysis is applied to find the best set of electrical features and check the efficiency of these two different learning methods applied to the datset. B. Feature Extraction and Selection Given the datasets D of N sliding windows yD in T , seeking the set of features x to give as an input to the statistical classifiers can in general be decomposed into two steps: feature extraction and feature selection. Feature extraction refers to the generation of a number of numerical indexes which capture informative aspects of the system being measured, referred to as features or attributes. In this paper, traditional electrical measurements are computed as listed in Table I. Unfortunately, what often happens is that many of these features turn out to be redundant, overly noisy or otherwise inappropriate for use in classification. Feature selection is the process whereby a subset of the K “best” features is identified; in this context, the “goodness” of a feature may be determined in a number of ways, which in turn led to the creation of a variety of different feature selection approaches. TABLE I Electrical features used in the analysis Features Description mVA,mVB,mVC mIA,mIB,mIC thdVA,thdVB,thdVC thdIA,thdIB,thdIC thetaA,thetaB,thetaC V0,Vplus,Vneg I0,Iplus,Ineg RMS voltages in y RMS currents in y Voltages total harmonic distortions Currents total harmonic distortions Power factor Angles Voltages symmetrical components Currents symmetrical components Since the number of possible subsets is exponential in the number of features, a number of heuristic approaches have been devised to facilitate the process of feature selection. These can be divided into two main types: wrappers and filters. 1) Wrapper based methods: A wrapper is an FES technique which is based on the generalization property of a learning machine: given a statistical classifier, the best subset of features is the one with the lowest classification error amongst all possible subsets. Exhaustively searching the full space of subsets is exponential in the number of features and is hence impractical in most realistic scenarios. Hence, heuristic or stochastic search techniques are frequently used and can 4 provide an acceptable compromise between selection of the resulting features and computational time. In this study, a greedy best-first algorithm is employed to search the full F dimensional space. 2) Filter based methods: These methods work by ranking each of the N features individually based on their information content with respect to the target class. A variety of statistical metrics are used to measure this content. In this paper, we employ a method known as Information Gain (IG), which is an information theoretic measure based on the concept of Mutual Information. It is defined as: I(xj , ω) = H(c) − H(c|xj ), (2) where ω is the class label, i.e. (ωF , ωN F ) or (ωIN , ωOUT ), and H(c) and H(ω|xj ) are the marginal and conditional entropies respectively of the class label c. A filter method gives an indication of the relevance of each single feature and it is an important preliminary step of analysis. However, it may fail to take into account the effects redundancy in subsets of variables. A simple method to consider the relevance of group of features is to form nested subsets of features (subset-filtersubset-filter method): {x1 },{x1 , x2 },· · ·,{x1 , · · · , xF } with I(x1 , ω) > I(x2 , ω) > · · · > I(xF , ω). Each of these nested subsets can be evaluated using the classification error of a learning machine. Similarly, a method applied in this paper and known as filter-wrapper method is designed to compromise the best quality of both filters and wrappers. A subset of the top M relevant variables based on the filter results is retained. A wrapper method takes the N × M matrix of reduced data and can search the space of 2M − 1 feature subsets to find the best possible subset of K elements with the lowest classification error. With M reasonably low, an exhaustive search is performed as well as a greedy best-first search as discussed in the results section. The classification errors were computed using 5-fold cross validation, i.e. using only the dataset D. The errors in WEKA are computed as the mean of the sum squared errors in each of the folds. C. Test System The microgrid M under study was previously depicted in Fig. 1. M consists of three identical inverter-based DGs and two loads L1 and L2 which are connected to a 380V three bus system. Each IDGs incorporates a three-leg Voltage-Sourced Inverter (VSI), an LC filter, and a coupling inductor as shown in Fig. 1. The control interface implemented in each IDGs is based on a droop control technique. It is a well-know structure adopted for IDG microgrids since it allows power sharing among the inverter in an autonomous fashion, i.e. without a central controller. Microgrids vary in structure, site and operation. For example, microgrids could be operated either in grid connected, islanded or even have the capabilities to operate in both modes. In addition, the dynamic behavior of the microgrid during transient events could depend on the type of DG (inverter-based or synchronous). Lastly, the power management strategy (master/slave versus droop) can have an effect on the performance of the microgrid. Thus, prior to discussing the case study results, the scope of this paper needs to be clarified. The paper focuses on a small scale isolated microgrid with inverter-based DG equipped with the droop power management approach. Secondly, the microgrid can be divided into smaller isolated sections and thus it is assumed that each section has sufficient amount of power to feed the loads. Lastly the DG interface control is equipped with current limiters. As mentioned earlier, for such system, short circuit currents are not high enough to operate protective devices such fuses and relays. D. Data Generation Data is generated by numerically simulating M using Matlab/Simulink. The switching time for fault or non-fault events has been set to tE = 0.7s to guarantee that the microgrid reaches a steady-state condition. Sliding window of length τL = 50ms is taken from this simulation for tE > 0.7s to build the dataset D. The sampling rate of the time series is chosen as fS = 8kHz. A total of N = 21465 simulations were conducted based on varying the two loads active-reactive power consumptions of M in Fig. 1, i.e. Load1 and Load2 . The data generation process needs to cover faithfully the classification space in Fig. 2 taking into account several fault and no-fault conditions for a variety of initial conditions and locations. The internal IBDGs parameters, i.e the inverter control settings, are fixed in all the simulations. The following simulation setting were considered: • 162 microgrid initial conditions are considered. Half of these initial conditions set the total load of the system to consume 25% of the maximum generated power while the remaining half to 60%. This total load is thus divided in Load1 and Load2 taking into account unbalanced loading condition. • 11583 load switching events are considered either at the bus 1 or bus 2. Unbalanced load switching is considered, as well. • Capacitor Switching events, for each of the initial conditions, are considered taking the total load of the system to have a final power factor of either 0.9 or 0.99. • 9234 faults were simulated at different locations of line 1 and line 2. Different fault typologies are taken into account with different values of the fault resistance Rf = 0.001, 0.01, 1, 5, 9Ω. The features in Table I computed from D were normalized in order to better encode their statistical distribution and remove the dependence on the size of the feature [20]. Each feature is normalized by the transformation x′ = x/ |x| Figs. 3(a) and 3(b) show examples of simulations for the phase A RMS voltage and current time series for load switching, capacitor switching and a three phase-to-ground fault respectively. These figures shows that it is difficult to differentiate faults from other events using only the magnitude of the resulting currents. On the other hand, these figures show that there could be more subtle differences in the waveforms (using other features such as the voltage as shown in Fig. 3(b)) which are identifiable using statistical classifiers. Fig. 3(c) shows the distribution of the magnitude of the current mIA (computed on the first time window with τL = 50ms) in all 5 Current Ia Voltage Va 26 24 250 Fault Load Capacitor 18 150 Event RMS (A) 20 Event RMS (A) 200 22 100 50 16 0.7 0.75 0.8 0.85 0 0.6 0.9 0.65 0.7 0.75 Time (s) (a) (b) 0.85 0.9 0 20 Simulations 40 0.8 40 Time (s) 80 0.65 0 Simulations 14 0.6 Fault Load Capacitor 0 1 0.5 0 mIA (c) 0.5 1 ThetaA (d) Fig. 3. (a) and (b) shows the current and voltage of phase a for different type of events. (c) shows the distribution of the feature mIa for fault/no fault (blue/red in (c)). (d) shows the distribution of the feature ThetaA for fault in Line 1/Line 2 (blue/red in (d)) TABLE II Filters results for τL = 50ms Vneg Ineg Iplus mIA thdVB mVC mIB thdVA mIC mVB 0.681 0.651 0.6035 0.601 0.58275 0.57675 0.566 0.5325 0.48625 0.473 0.7 Localization Iplus thetaB Ineg thetaA thetaC Vplus mIA mIB mIC thdIA 0.2838375 0.2694525 0.2611675 0.2444775 0.22349 0.199085 0.18167 0.165265 0.1396375 0.1116375 Relay 1 Relay 2 Relay 3 Relay 4 0.6 Index InfoGain Discrimination Fault Discrimination 0.8 0.5 0.4 0.3 0.2 0.1 Ineg Vneg Iplus mIA mIB thdVB mVC mIC thdVA Vplus mVB mVA thdVC thetaA V0 I0 thdIA thetaC thetaB thdIB thdIC datasets in D. Using only this feature to discriminate fault from non-fault events would be extremely difficult. Similarly, Fig. 3(d) depicts the histogram of ThetaA which highlights the difficulty of using this feature alone for fault localization. Fig. 4. Value of IG across relays for τL = 50ms for fault discrimination III. R ESULTS The result section is divided in two parts: Section III-A highlights on the relevance of the selected features for the protection of the microgrid against faults. Section III-B discuss the behavior and the computational performance of the various FES methods. A. Robust Features for Fault Discrimination and Localization 1) Filter analysis: Figs. 4 and 5 show the InfoGain(IG) scores for fault discrimination and localization when using a time window of τL = 50ms. This analysis is performed independently in each of the relays. Since each of the relays displays different IG values, the features in these figures are ranked by their maximum IG computed across relays. Table II provides the overall standing of the first 10 best features taken using the mean of IG across the relays. By a visual inspection, these results suggest that localization is a harder problem to solve than discrimination. Figs. 4 and 5 indicate that discrimination-specific features tend to have higher IG values compared to localization ones. In fact, Table II reports that the first 5 features have IG∼0.6 for discrimination while IG∼0.2 for localization. Secondly, localization features are also more variable across relays than discrimination ones as can it be seen in both figures. Interestingly, both Figs. 4 and 5 show that the symmetrical components of V and I tend to have very high values of IG, indicating that these features contain a lot of information regarding both the presence and location of faults. In particular, symmetrical components of I are in the top 3 positions in both discrimination and localization values of IG as can be seen in Table II. Symmetrical components are widely used in short 6 0.5 Fault Location Relay 1 Relay 2 Relay 3 Relay 4 Index InfoGain 0.4 0.3 0.2 0.1 Iplus Ineg thetaB thetaA thetaC mIA mIB Vplus mIC thdIA mVC Vneg thdVB thdIB thdVC thdVA I0 mVA mVB thdIC V0 0.0 Fig. 5. Value of IG across relays for τL = 50ms for fault localization circuit analysis and using such features in conjunction with other features, symmetrical components become important for the problem of fault detection. For fault localization only, the phase angles also seem to be relevant according to the results. Table II shows that phase angles are ranked in the first 5 positions together with symmetrical components. Since this feature contains information about power flow directions, it is reasonable that it may contain information about the location of a fault. For fault discrimination, the total harmonic distortion of the voltage have some relevance for fault discrimination. As cited in [5], non-linearity due the saturation effect of the power electronics during faults, may be a possible reasons for its presence among the best features These results are already of significant interest since they indicate that new candidate features that could be valid substitutes for the magnitude of the I which is known to be an issue for inverter-based microgrids. However, the magnitude of I still has a role in fault detection since it appears in the top 10 positions in Table II. 2) Wrapper analysis: Tables III and IV display the best feature subsets and their respective classification accuracies by using three different wrapper methods for both fault discrimination and localization. Accuracies are calculated using 5-fold cross validation. In particular, a wrapper method with a greedy search technique having all features as initial set and two wrappers with greedy and exhaustive searches with the top 10 features in Table II as initial set are considered. The results of these three wrappers are compared to the best classifiers built by using all set of features as input and by the one which uses only the top 10 features in Table II. As for the filter analysis, since wrappers are applied independently for each of the relays a procedure to ”average” each individual outcome is considered. Tables III and IV report the final subsets of features that are in common in at least 2 of the relays. These final features are listed following the ranking of Table II to better visualize common items. The results of Tables III and IV highlight the following: Firstly, the wrappers accuracies are similar among the same detection task and classifier. Secondly, there are minor differences between the various wrappers in terms of selected features: different wrappers in each of the fault detection tasks seem to broadly converge to similar set of features. These results suggest that some of these features, being method independent, are more important than others for fault detection. From the composition of the selected subsets, these tables seems to confirm some of the preliminary remarks made in the previous filter analysis: while the magnitude of current (and voltage) are among the most frequently selected features, other electrical measurements may help to improve classification accuracy. Some of the symmetrical components are repeatedly present in both fault discrimination (Vneg, Ineg and Iplus) and localization (Iplus, Ineg and Vplus) in Tables III and IV. All the three phase angles are included in the best feature subsets for the fault localization problem emphasizing their relevance for this particular task. The total harmonic distortion terms are showing up for both discrimination and localization tasks. However, they do not seem to be occurring consistently across the tables. Thus, while present in the tables, total harmonic distortion terms may be less relevant than symmetrical components and phase angles for fault detection. B. Statistical Classifier Based Relays Tables III and IV displays information about the different performance of both classifiers and wrappers. Firstly, as discussed, for both classifiers the result of greedy searches are comparable to the computational expensive exhaustive search (when applied to a subset of the data). Therefore, a reduced greedy search, i.e. the filter-wrapper method, may be an attractive choice from a computational perspective since the final results are broadly equivalent. Secondly, the j48 achieves better results than the Naive Bayes in both fault discrimination and localization problems (j48 accuracies are close to 100%). This makes the j48 the best choice for both tasks. As depicted in Table IV, the difference between classifiers is greatest in the case of fault localization as the Naive Bayes accuracies are reduced to about 70%. Since classification performance for the j48 is also slightly higher for fault discrimination (∼99% instead of ∼98%), the previous tables confirm the observation previously made in Section III-A that fault localization is a more difficult problem than discrimination. The different behaviors of the two classifiers can be further inspected by the help of Figs. 6-9. These figures plot the accuracies across relays by using the subset-filter method as explained in Section II-B: a series of nested subsets with an increasing number of features are constructed from the ranking of the IG values taken from Table II. As shown from Figs. 6 and 7, the Naive Bayes produced a result which correlates with the main motivation behind a FES analysis. The classifier reaches the maximum accuracy between 5 to 10 features. Increasing the number of features beyond these values would degrade its performance. Therefore, feature selection is a necessary practice for Naive Bayes in order to optimize its classification accuracy. This subsetfilter method, for instance, may be used as a valid approach for the Naive Bayes to find the optimal number of features. The behavior of the j48 is different than the Naive Bayes as its classification accuracy increases as the number of features fed into the decision tree increase as shown in Figs. 8 and 9. 7 TABLE III Best subsets using wrappers for the fault discrimination problem J48 Naive Bayes Discrimination Methods Classifier with all features Classifier with top 10 features from filter Greedy search wrapper from all features Greedy search filter-wrapper using only top 10 features Exhaustive search filter-wrapper using only top 10 features % 95.834 97.042 97.835 97.812 97.812 Features ALL Vneg Ineg Vneg Ineg Vneg Ineg Vneg Ineg Classifier with all features Classifier with top 10 features from filter Greedy search wrapper from all features Greedy search filter-wrapper using only top 10 features Exhaustive search filter-wrapper using only top 10 features 99.859 99.802 99.857 99.816 99.816 ALL Vneg Vneg Vneg Vneg Iplus mIA thdVB mVC mIB thdVA mIC mVB mIA thdVB mVC mIB mVA thetaA thetaC mIA thdVB mVC mIB mVB mIA thdVB mVC mIB mVB Ineg Iplus mIA thdVB mVC mIB thdVA mIC mVB Iplus mVC mIB mIC mVB mVA thdIB V0 thetaB thetaC Ineg Iplus thdVB mVC mIB thdVA mIC mVB Ineg Iplus thdVB mVC mIB thdVA mIC mVB TABLE IV Best subsets using wrappers for the fault localization problem % 66.459 67.621 68.687 69.388 67.912 Features ALL Iplus thetaB Iplus thetaB Iplus thetaB Iplus thetaB Ineg Ineg Ineg Ineg Classifier with all features Classifier with top 10 features from filter Greedy search wrapper from all features Greedy search filter-wrapper using only top 10 features Exhaustive search filter-wrapper using only top 10 features 98.732 98.482 98.611 98.499 98.499 ALL Iplus Iplus Iplus Iplus Ineg thetaA thetaC Vplus mIA thetaA thetaC Vplus mIA mIB Ineg thetaA thetaC Vplus mIA Ineg thetaA thetaC Vplus mIA Fault Discrimination 98 96 94 92 90 88 Relay 1 Relay 2 Relay 3 Relay 4 86 84 5 10 15 Number of Input Features Fig. 8. Fault Localization Classification Accurancy 70 65 Relay 1 Relay 2 Relay 3 Relay 4 55 Fig. 7. 5 10 15 Number of Input Features 98 97 96 95 94 Relay 1 Relay 2 Relay 3 Relay 4 93 92 5 Filter-Subset results for the Naive Bayes for fault localization Fig. 9. 10 15 20 Number of Input Features Filter-Subset results for the j48 for fault discrimination Fault Localization 95 90 85 80 75 70 Relay 1 Relay 2 Relay 3 Relay 4 65 60 20 mIB mIC thdIA mVA mVB mVC thdVB V0 Vneg mIB mIC thdIA mIB mIC thdIA Fault Discrimination 100 75 60 Vplus mIA mIB mIC thdIA mIA mVB mVC Vplus Vplus mIA 99 91 20 Filter-Subset results for the Naive Bayes for fault discrimination 80 thetaC thetaC thetaC thetaC 100 Classification Accurancy Classification Accurancy 100 Fig. 6. thetaB thetaB thetaB thetaB thetaA thetaA thetaA thetaA Classification Accurancy J48 Naive Bayes Localization Methods Classifier with all features Classifier with top 10 features from filter Greedy search wrapper from all features Greedy search filter-wrapper using only top 10 features Exhaustive search filter-wrapper using only top 10 features 5 10 15 Number of Input Features Filter-Subset results for the j48 for fault localization 20 8 j48 Tree Complexity 400 20 Features on the Tree 350 15 Size of the Tree 300 250 200 TABLE V Complexity of decision trees using wrappers 10 150 5 100 50 0 roughly positioned in Fig. 10 at the point of intersection of the two curves. This outcome indicates that the wrapper methods select trees with a lower complexity than the j48 with full feature set, but achieving similar performances. 5 10 15 Number of Input Features Size Features 20 0 Method Size Num. Features Greedy wrapper from all features Greedy filter-wrapper from top 10 Greedy exhaustive wrapper from top 10 223 226 243 10 9 7 IV. D ISCUSSION Fig. 10. Measure of the complexity of the j48 trees for the filter-subset method in the case of fault discrimination The main results from the simulation of a facility scale microgrid with IBDGs can be summarized as follows: This result can also be noticed in Tables III and IV where the accuracy of classifiers build by using all features are higher than any of the wrapper methods proposed, i.e max of 99.859% for discrimination and max of 99.732% for localization. The behavior of the j48 can be explained by referring to the algorithm for decision tree induction. The building of the tree already incorporates some built-in feature selection procedures: the j48 exploits and prunes the set of features given as input producing an optimal subset of attributes which are at the decision nodes of the tree. Therefore, it seems from these figures that decision tree methods may not need a separate feature selection approach. In order to save computational time and to have a higher classification accuracy, all features may be given to the decision trees which will apply its FES method. However, as described in Fig. 10, FES may be a reasonable step to take in order to regulate the complexity of the j48 decision trees. Fig. 10 presents two factors that determine the complexity of this tree: the size of the trees, i.e. number of nodes, and the number of features used at its decision nodes. Fig. 10 plots both quantities averaged across relays, as a function of the size of the nested subsets and relative to the localization task only. Similar consideration applies to the fault discrimination task, as well. Fig. 10 shows that the size of the trees decreases and seems to stabilize beyond 10-15 features, while the number of features used in the tree increases almost linearly as the size of the nested subsets increases. Therefore, using the j48 without FES may produce a tree with a high number of decision features (high complexity). However, by inspecting both Fig. 9 and Fig. 10 similar accuracies can be achieved by building a tree with less number of decision features (but slightly higher number of nodes). Since it is well-known that decision trees are affected by the problem of data overfitting, using a tree with lower complexity may be a better choice. In practice, depending on the computational requirement of the problem, FES analysis may still be a useful step to conduct for the j48 classifier in order to avoid overfitting. Table V is computed to investigate the tree complexities in the case of fault localization using the same wrapper methods considered in Table IV. Table V shows that all wrappers achieve the best performances by building trees which can be • • • Symmetrical components are among the best features for both fault detection tasks while phase angles are relevant for fault localization. Employing these quantities helps to avoid some of the problems associated with using only the magnitude of the current as a feature. The feature selection method depends on the choice of the classifier and on the computational requirements set before the analysis. FES analysis is shown to be important for both classifiers as it may help regulate their computational complexity. The j48 outperform the Naive Bayes in both tasks and it is thus considered as the better choice to employ. The FES methods described in this paper can be taken together as a general framework for enhancing data mining solutions in microgrids: a filter is an excellent starting point since the feature ranking provides a first indication of the most relevant attributes. This ranking can be further explored by a subset-filter method to further eliminate irrelevant features and to check the complexity of the classifier. Lastly, a reduced greedy search may be used as a final step when selecting the best set of features. Given the previous results a few issues can be further investigated. Firstly, in order to obtain a more generally applicable result, the microgrid analyzed should be expanded in size, i.e. a larger number of lines, and should be integrated with other types of elements such as motor switching or nonlinear loads. Large scale microgrids can further clarify the role of symmetrical components and phase angle as discriminatory features as well as the behavior of the two statistical classifiers. The j48 overfitting problem in particular needs to be further assessed with a larger group of input features. Secondly, generating data may become easily unfeasible for such large microgrids. Machine learning techniques require ”good” examples to train the statistical classifier based relays. The problem of deciding what constitutes a ”good” example is crucial. In this paper some typical cases of load switching and faults where simulated numerically as common sense examples. Finding an intelligent algorithm which either samples the space of possible simulations or gives an indication of what to simulate (”good example”) may save computational resources and be more robust against hard classification cases. 9 V. C ONCLUSION This paper is motivated by the problem of protection of an isolated IBDG microgrid against faults. Traditional overcurrent protection schemes may fail due to the limitation on the magnitude of faults currents circulating in the microgrid. The main contribution of this paper is to propose a data mining framework which extracts and selects electrical features that are more resilient to fault than traditional overcurrent methods. These features are used as input to statistical classifiers which can be implemented inside smart digital relays. Within this data mining framework, this paper discussed few possible feature selection techniques based on two statistical classifiers: Naive Bayes and j48. The simulation of a facility scale microgrid with IBDGs is used to test the proposed data mining framework. The main results show the importance of using a feature selection approach for optimizing computational time and classification performances. Moreover the study on this paper dataset have shown that there are alternatives to overcurrent features for fault discrimination and localization such as symmetrical components (for fault discrimination and localization) and phase angles (for localization). R EFERENCES [1] N. Jenkins, R. Allan, P. Crossley, D. Kirschen, and G. Strbac, Embedded Generation (Power & Energy Ser. 31). INSPEC, Inc., 2000. [2] R. H. Lasseter, “Microgrids,” in Proc. IEEE Power Engineering Society Winter Meeting, vol. 1, 2002, pp. 305–308. [3] R. Lasseter, J. Eto, B. Schenkman, J. Stevens, H. Vollkommer, D. Klapp, E. Linton, H. Hurtado, and J. Roy, “CERTS microgrid laboratory test bed,” Power Delivery, IEEE Transactions on, vol. 26, pp. 325–332, 2011. [4] H. H. Zeineldin, E. F. El-Saadany, and M. M. A. Salama, “Distributed generation micro-grid operation: Control and protection,” in Proc. Power Systems Conf.: Advanced Metering, Protection, Control, Communication, and Distributed Resources PS ’06, 2006, pp. 105–111. [5] J. Wei, H. Zheng-you, and B. Zhi-qian, “The overview of research on microgrid protection development,” in Intelligent System Design and Engineering Application (ISDEA), 2010 International Conference on, vol. 2, oct. 2010, pp. 692 –697. [6] H. J. Laaksonen, “Protection principles for future microgrids,” vol. 25, no. 12, pp. 2910–2918, 2010. [7] B. Hussain, S. Sharkh, S. Hussain, and M. Abusara, “Integration of distributed generation into the grid: protection challenges and solutions,” in Developments in Power System Protection (DPSP 2010). Managing the Change, 10th IET International Conference on. IET, 2010, pp. 1–5. [8] J. Keller and B. Kroposki, “Understanding fault characteristics of inverter-based distributed energy resources,” National Renewable Energy Laboratory, Tech. Rep. NREL/TP-550-46698, January 2010. [9] D. Turcotte and F. Katiraei, “Fault contribution of grid-connected inverters,” in Proc. IEEE Electrical Power & Energy Conf. (EPEC), 2009, pp. 1–5. [10] “IEEE application guide for IEEE std 1547, IEEE standard for interconnecting distributed resources with electric power systems,” IEEE Std 1547.2-2008, pp. 1 –207, 15 2009. [11] M. Michalik, M. ukowicz, W. Rebizant, S.-J. Lee, and S.-H. Kang, “New ann-based algorithms for detecting hifs in multigrounded mv networks,” Power Delivery, IEEE Transactions on, vol. 23, no. 1, pp. 58 –66, 2008. [12] B. Russell and R. Chinchali, “A digital signal processing algorithm for detecting arcing faults on power distribution feeders,” Power Delivery, IEEE Transactions on, vol. 4, no. 1, pp. 132 –140, jan 1989. [13] A. Girgis, W. Chang, and E. Makram, “Analysis of high-impedance fault generated signals using a kalman filtering approach,” Power Delivery, IEEE Transactions on, vol. 5, no. 4, pp. 1714 –1724, oct 1990. [14] S.-J. Huang and C.-T. Hsieh, “High-impedance fault detection utilizing a morlet wavelet transform approach,” Power Delivery, IEEE Transactions on, vol. 14, no. 4, pp. 1401 –1410, oct 1999. [15] I. Baqui, I. Zamora, J. Mazn, and G. Buigues, “High impedance fault detection methodology using wavelet transform and artificial neural networks,” Electric Power Systems Research, vol. 81, no. 7, pp. 1325– 1333, 2011. [16] T. Loix, T. Wijnhoven, and G. Deconinck, “Protection of microgrids with a high penetration of inverter-coupled energy sources,” in Proc. CIGRE/IEEE PES Joint Symp. Integration of Wide-Scale Renewable Resources Into the Power Delivery System, 2009, pp. 1–6. [17] E. Sortomme, G. J. Mapes, B. A. Foster, and S. S. Venkata, “Fault analysis and protection of a microgrid,” in Proc. 40th North American Power Symp. NAPS ’08, 2008, pp. 1–6. [18] R. M. Tumilty, M. Brucoli, G. M. Burt, and T. C. Green, “Approaches to network protection for inverter dominated electrical distribution systems,” in Power Electronics, Machines and Drives, 2006. The 3rd IET International Conference on, mar. 2006, pp. 622 –626. [19] N. Perera and A. D. Rajapakse, “Recognition of fault transients using a probabilistic neural-network classifier,” vol. 26, pp. 410–419, 2011. [20] I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Eds., Feature Extraction, Foundations and Applications. Springer, 2006. [21] M. A. Bramer, Principles of Data Mining, ser. Undergraduate Topics in Computer Science. London, UK: Springer, 2007. [22] N. Pogaku, M. Prodanovic, and T. Green, “Modeling, analysis and testing of autonomous operation of an inverter-based microgrid,” Power Electronics, IEEE Transactions on, vol. 22, no. 2, pp. 613 –625, 2007. [23] M. Saha, J. Izykowski, and E. Rosolowski, Fault Location on Power Networks, ser. Power Systems, Springer, Ed. Springer, 2009. [24] M. Tremblay, R. Pater, and F. Zavoda, “Accurate fault-location technique based on distributed power-quality measurements,” in Proceeding 19th CIRED, 2007. [25] C37.114, IEEE Guide for Determining Fault Location on AC Transmission and Distribution Lines, IEEE Power Engineering Society Std., 2005. [26] N. Kan’an, L. Farouk, H. Zeineldin, and W. Woon, “Effect of dg location on multi-parameter passive islanding detection methods,” in Power and Energy Society General Meeting, 2010 IEEE, july 2010, pp. 1 –6. [27] W. K. A. Najy, H. H. Zeineldin, A. H. K. Alaboudy, and W. L. Woon, “A bayesian passive islanding detection method for inverterbased distributed generation using esprit,” IEEE Transactions on Power Delivery, vol. 26, no. 4, pp. 2687–2696, 2011. [28] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, “The weka data mining software: an update,” SIGKDD Explor. Newsl., vol. 11, no. 1, pp. 10–18, 2009. Erik Casagrande received a M.Sc. in control engineering in 2003 and a M.Sc. in numerical modeling in 2005 from Padova University, Italy. He received the Ph.D degree from the Neural Computing Research Group, Aston University, Birmingham, U.K. in 2010. Currently, he is a postdoc with the Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates. His research interests include machine learning and data mining techniques for biomedical data, natural language processing and smart grids. H. H. Zeineldin (M06) received the B.Sc. and M.Sc. degrees in electrical engineering from Cairo University, Cairo, Egypt, in 1999 and 2002, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Waterloo, Waterloo, ON, Canada. He was with Smith and Andersen Electrical Engineering Inc., where he was involved with projects involving distribution system design, protection, and distributed generation. He then was a Visiting Professor at the Massachusetts Institute of Technology (MIT), Cambridge. Currently, he is an Associate Professor with the Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates. His current interests include power system protection and distributed generation. Wei Lee Woon (M08) received the B.Eng. degree in electronic engineering (Hons.) from the University of Manchester Institute of Science and Technology, Manchester, U.K. (now merged with the University of Manchester), in 1997, and the Ph.D. degree from the Neural Computing Research Group, Aston University, Birmingham, U.K., in 2002. Upon graduation, he joined the Malaysia University of Science and Technology (MUST) as an Assistant Professor, where he served until 2007. He subsequently joined the Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates. He has also worked as a Visiting Researcher at the Massachusetts Institute of Technology, Cambridge, and at the RIKEN Brain Science Institute, Tokyo, Japan. His research interests include technology mining, analysis of distributed generation systems, and EEG signal analysis. N. H. Kan’an (S’10) received the B.Sc degree in electrical power engineering from Yarmouk University, Irbid, Jordan, in 2008. He received his M.Sc degree in engineering systems and management from Masdar Institute, Abu Dhabi, UAE, in 2011. Prior joining Masdar Institute, Mr. Kan’an worked in the power systems substations division of Asea Brown Boveri (ABB) near-east for a year. Currently, he is working towards his Ph.D degree in electrical engineering at the University of Connecticut (UCONN), Storrs, USA. His current research interests include: distributed generation control, modeling and simulation and the real-time simulation of bulk power systems.