Fifteenth National Power Systems Conference (NPSC), IIT Bombay, December 2008 Understanding Power System Behavior Through Mining Archived Operational Data Sarasij Das, P S Nagendra Rao Abstract—This paper is the outcome of an attempt in mining recorded power system operational data in order to get new insight to practical power system behavior. Data mining, in general, is essentially Þnding new relations between data sets by analyzing well known or recorded data. In this effort we make use of the recorded data of the Southern regional grid of India. Some interesting relations at the total system level between frequency, total MW/MVAr generation and average system voltage have been obtained. The aim of this work is to highlight the potential of data mining for power system applications and also some of the concerns that need to be addressed to make such efforts more useful. I. I NTRODUCTION Advances in electronics, computer and information technology are fueling major changes in the area of power system instrumentation. More and more microprocessor based digital instruments are replacing analog meters. Data logging is becoming automatic and frequent. Vast quantities of data generated by extensive deployments of digital instruments are creating information pressure on Utilities. The legacy SCADA based data management systems do not support management of such huge data. The present practice is to store the acquired data in SCADA for only a few months and then delete. In few cases after removing from the SCADA system, these data are stored in compact discs. At present the usefulness of historical data is not fully explored. So, utilities do not give importance to store such data efÞciently. The traditional integrated power industry is going through a deregulation process. The market principle is bound to force competition between power utilities, which in turn demands a higher focus on proÞt. To optimize system operation and planning utilities need better decision-making processes that depend on the availability of reliable system information. It is expected that in this context historical data is going to be a vital asset. In [1] some possible applications of historical power system data is presented. Apart from a business perspective, historical data is important from another point of view also. Electric power system is a very complex system. Most of the mathematical models used for analyzing/predicting system behaviors are based on several assumptions. The availability of detailed measurements of power system parameters could provide an opportunity to Sarasij Das obtained M.Sc(Engg) in Electrical Engineering from Indian Institute of Science, Bangalore. He is currently with Power Research Development Consultant Pvt. Ltd. (e-mail: sarasijdas@gmail.com). P S Nagendra Rao is with the Electrical Engineering Department, Indian Institute of Science, Bangalore.(e-mail: nagendra@ee.iisc.ernet.in). validate many of such models. The importance of data is being recognized widely in the recent times. Power system data management has been discussed in several works[1][2][3]. Data warehousing technology is being proposed to meet the future requirement of power systems. In [4] data mining as a feature of power system data warehouse is mentioned. In [5] it has been mentioned that power systems operation can be greatly improved through data analysis and/or assimilation In our work real system data is analyzed to Þnd interrelation between several system parameters. This analysis can be viewed as a small attempt of data mining. Data mining is essentially an analysis of data sets in order to discover new relations between various quantities which is not obvious from the recorded data in its normal form. For our investigation, data of Þves system parameters- voltage, frequency, MW and VAr generation, system demand - of the southern regional grid of India, collected from Southern Regional Load Despatch Center has been used. This paper is organized as follows. Section II outlines some features of the Southern Regional Grid. In Section III a brief description of the data set used is given. Section IV presents the results of data analysis. The paper is concluded in Section V. II. S OUTHERN R EGIONAL G RID F EATURES The southern regional grid of India covers an area of 6,51,000 Sq. km encompasses four states namely Andhra Pradesh, Karnataka, Kerala, Tamilnadu and one Union Territory of Pondicherry. This region comprises of several central and state owned generating stations, independent power producers, distribution companies and state transmission utilities. In India, including SRLDC, there are Þve major regional grids. After August 2006 four of the Þve regions excluding the southern region are (synchronously) interconnected. The Southern region is connected with the other regional grids only in an asynchronous manner. Southern region is connected to Western region through HVDC back to back at RamagundamChandrapur and to Eastern region at Jepore-Gazuwaka backto-back and point to point HVDC line between KolarTalcher. The total installed capacity of southern region as in the beginning of 2007 is about 37370 MW. Some other salient features of the southern regional grid are [6] : • Covers approximately 19% of the geographical area, 22% of population and 29% of the installed capacity of the country. 248 Fifteenth National Power Systems Conference (NPSC), IIT Bombay, December 2008 • • • • • 30-70% hydro-thermal mix 3300 MW wind generating plants 8000 MW capacity independent power producers. 2000 MW capacity HVDC Talcher-Kolar double crcuit interconnection with the Eastern Region. 400/220 KV transmission system III. DATA D ESCRIPTION The control center at Bangalore of SRLDC is equipped with a computerized load despatch and communication facilities. Around 320 Remote Terminal Units(RTU) are used for real time system monitoring and grid management. Through SCADA these RTUs communicate with the control center. From the collected data only Þve system parameters- voltage, MW generation, MVAr generation, frequency, system demand - are made available to us for use in this work. These data were in the form of Microsoft Excel Þles stored in Compact Discs. Data, starting from Jan 2004 to June 2006, is collected. The data logging interval is 1 minute for all parameters. Following are some salient features of the data of the Þve parameters are chosen. channels, etc. The second class of outlier is the outcome of some extraordinary events. In the context of power systems faults, switching operations etc could be the cause for such events. This type of outliers should be retained in the data set. The third class of outlier comprises of extraordinary observations for which there is no explanation. These outliers must be retained to capture some characteristics of the system (not known/explained). The fourth and Þnal class of outlier are observations that fall within the range of each of the variables but are unique in their combination of values across the variables. In the data made available to us, it appears that no attempt had been made to identify and replace outliers. Investigated data set contains all the four types of outliers. The outliers identiÞed in the data set are: - Considerable change in value for in one or two consecutive instances - Occurrence of 0 values for a considerable time period - Occurrence of values not possible from system point of view IdentiÞable outliers are substituted with average value of previous and next non-outlier values. A. Voltage Voltage data consists of measurements at the 26 buses of the 400 KV grid. All voltages have been stored (in EXCEL Þle) as integers with a least count of 1 KV. Among the 26 bus voltages, the Kolar bus voltage remains constant at 400 KV all the time. B. Frequency Frequency data consists of frequency of the region. Precision of frequency data is 0.01216 Hz. C. MW Generation MW generation data consists of information from 60 generation buses. The precision of stored (in EXCEL Þle) data is 1 MW. The 60 outputs at generation buses represent either unit outputs or the total station output. Types of plants are hydro, thermal or nuclear. D. MVAr generation IV. A NALYSIS An overview of some features of the collected data set has been given in the last section. In this section we investigate the interrelation of some of the measured system variables. In Figures 1 and 2 system frequency vs. average system voltage is plotted for 06/06/2006 and 16/4/2006. By average system voltage we mean average voltage of all 400 KV buses. It is seen that the points are clustered around one of the diagonals. It can be seen that to some extent higher voltages correspond to higher frequency and lower voltage corresponds to lower frequency. This correlation is relatively well deÞned in Figure 2. In Figures 3 and 4 total system demand vs. system frequency is presented for 6/6/2006 and 16/4/2006. In Figures 3 and 4 by and large higher system demand corresponds to low system frequency and at lower demands system frequency is high. Figure 5 presents the total system demand vs. average system MVAr generation data corresponds to 88 generation units. Precision of stored(in EXCEL Þle) data is 1 MVAr. The 88 generation units include hydro, thermal and nuclear power units. In this case the data corresponds to individual units in all cases. 50.4 50.2 50 System frequency 49.8 E. System demand System demand data consists of system demand of four states and the total demand of the region. The precision of stored(in EXCEL Þle) data is 1 MW. One of the major challenges in handling real data is the issue of the outliers. Outliers are basically records whose features are distinct from the other records of the group. Outliers are classiÞed into four classes. The Þrst class arises from data entry error. For power systems data this type of error is generated due to malfunction of SCADA or communication OF PARAMETER INTERRELATIONS 49.6 49.4 49.2 49 48.8 48.6 405 410 415 420 System average voltage on 06/06/2006 Fig. 1. System frequency vs. Average system voltage on 06/06/2006 voltage plot for 19/1/2006. Time instants corresponding to 249 Fifteenth National Power Systems Conference (NPSC), IIT Bombay, December 2008 system voltages. In Figure 10 average generator power factor vs. total system demand is plotted for 19/1/2006. It can be seen that as system demand increases in the morning the power factor starts improving. After 6 a.m the power factor enters a random ßuctuation zone but remains high in value. At night, as system demand starts decreasing the power factor starts to decrease. In Figure 11 total system demand vs. average system voltage 50.6 50.4 50.2 System frequency 50 49.8 49.6 49.4 49.2 49 19,500 48.8 408 412 414 Average system voltage on 16/04/2006 416 418 420 19,000 18,500 System frequency vs. Average system voltage on 16/04/2006 Total system demand Fig. 2. 410 4 2.1 x 10 Total system demand 2 18,000 17,500 17,000 16,500 1.9 16,000 1.8 15500 48.8 1.7 49 49.2 49.4 49.6 49.8 System frequency on 16 04 2006 50 50.2 50.4 50.6 1.6 Fig. 4. 1.5 48.6 Fig. 3. 48.8 49 49.2 49.4 49.6 System frequency on 06 06 2006 49.8 50 50.2 Total system demand vs. System frequency on 16/04/2006 50.4 21,000 Total system demand vs. System frequency on 06/06/2006 08:00 20:00 16:00 19:00 20,000 12:00 14:00 10:00 Total system demand some data points are also indicated in the Þgure. It can be seen that from 02:00 a.m, the average system voltage gradually decreases as the system demand increases with time. After 6:20 a.m the plot shows random changes. But, after 7:00 p.m till the end of the day the average system voltage increases with decrease in total system demand. In Figure 6 the same plot of Figure 5 is shown but without the random portion of the graph. It can be seen that the plot takes two distinct paths at the start and end of the day. For the same system demand average system voltage is lower for the morning portion of the day and higher during the night. This corresponds to a ′ Hysteresis′ type of variation. In Figure 7 total system demand vs. total MVAr generation is presented for the same day. It is interesting to see that this plot also shows a ′ Hysteresis′ loop when the random portions are excluded. The Figure 7 is similar to the Figure 5 with the only difference being that in Figure 7, with time the plot moves anticlockwise where as in Figure 5 it moves clockwise. In Figure 8 the same Figure 7 is shown but with the random variation region excluded. In Figure 9 total MVAr generation vs. average system voltage is plotted. From the Figure it can be seen that almost linear relationship exists between total MVAr generation and average system voltage. As average system voltage increases MVAr injection drops and generators absorb MVAr at high average 18:00 19,000 18,000 18:20 06:00 18:10 22:00 17,000 16,000 23:59 00:00 05:00 15,000 02:00 04:00 14000 402 Fig. 5. 404 406 408 410 Average system voltage 412 414 416 418 Total system demand vs. Average system voltage on 19/01/2006 is plotted for 6/6/2006. In this case also random movement is seen during the middle of the day while the remaining part exhibits hysteresis type of behavior. We have seen this hysteresis across several days taken from several months. More careful investigations are necessary to identify speciÞc performance patterns. What is evident is that there is some interesting behavior seen in these plots. Further study could help to understand this in a better way. In Figure 12 total system MVAr generation vs. average system voltage is presented for 6/6/2006. The relationship between MVAr and average system voltage is almost linear (in an average sense). As average system voltage increases large 250 Fifteenth National Power Systems Conference (NPSC), IIT Bombay, December 2008 21,000 21,000 20,000 20,000 19,000 19,000 Total system demand Total system demand Morning data points Nigth data points 18,000 17,000 18,000 17,000 16,000 16,000 15,000 15,000 14000 404 14000 800 Night data points Morning data points 406 408 410 412 Average system voltage 414 416 418 Fig. 6. Total system demand vs. Average system voltage plot (after removing random portion) of on 19/01/2006 600 400 200 Total MVAR generation 0 200 400 Fig. 8. Total system demand vs. total MVAr generation plot (after removing random portion)on 19/01/2006 21,000 800 08:00 20:00 600 16:00 20,000 14:00 10:00 12:00 Total MVAR generation Total system demand (MW) 400 19,000 18:00 18,000 06:00 17,000 22:00 200 0 200 16,000 23:59 400 00:00 02:00 15,000 600 04:00 14000 800 Fig. 7. 600 400 200 0 Total MVAR generation 200 400 600 800 402 800 404 406 408 410 Average system voltage 412 414 416 418 Total system demand vs. total MVAr generation plot on 19/01/2006 Fig. 9. Total MVAr generation vs. average system voltage plot on 19/01/2006 MVAr is consumed by generators and as average system voltage becomes low generators inject large MVAr into the grid. Near linear relationship between the parameters is also evident for other days also. Till now we have discussed the inter-parameter relationship considering the whole system. Parameter relationships are also investigated for individual generation buses. In Figure 13 scatter plot of MVAr and voltage at different generator buses are presented for the day 6/6/2006. It can be seen that the MVAr vs. voltage scatter plot at RGM generator bus is different from other plots. For each voltage at the bus two distinct MVAr values are seen. Scatter plots (for all the 8 generating stations) indicate a near linear relationship between bus voltage and MVAr output of the units. Figure 14 shows the plot of average of MVAr generation at each voltage value vs. bus voltage value of generating stations (corresponding to Figure 13). The slope of the MVAr-voltage plot is different for different generating stations in Figure 14. The nominal voltage at VTS bus is 220 KV. As the voltage is increasing the MVAr injection at VTS bus is also increasing. On the other hand MAP and MTPS are injecting less MVAr with increasing voltage. KAI, SHVT and SIM are absorbing large MVAr at the higher voltage levels while injecting a small amount at lower voltages. RTPS is absorbing smaller MVAr at higher voltage as compared to the absorption at lower voltages. In Figure 15 plot of voltage vs. MVAr (averaged for each voltage) is shown for the day 13/3/2006. In this case RGM is absorbing large MVAr at the higher side of the voltage range while injecting small MVAr at lower side of the voltage range. The nature of variations at other generating stations except RTPS in the Þgure remain same as on 6/6/2006. In Figure 15 RTPS is injecting a small MVAr at high voltage while injecting large MVAr at low voltage. In Figure 16 Voltage vs. VAr injection (averaged for each voltage) is shown for the whole month of March 2006. In this case also a near linear relationship can be seen between VAr and voltage. For SHVT and MAP the plots are not linear at the extremities. Actually the number of sample points are very small in this range. In Figure 17 total system demand (averaged for each frequency) vs. system frequency is plotted for the month of December 2005. Except at higher 251 Fifteenth National Power Systems Conference (NPSC), IIT Bombay, December 2008 0 RGM 10:00 08:00 1 407 408 409 410 411 412 413 414 12:00 20:00 00:00 2 3 4 14000 Fig. 10. KAI 02:00 15,000 16,000 17,000 18,000 Total system demand 19,000 20,000 230 231 213 214 215 216 217 218 219 220 234 235 236 237 238 239 240 241 404 406 408 410 412 406 408 410 412 414 416 200 0 229 100 221 402 60 80 404 400 232 0 200 233 21,000 229 0 200 212 200 22:00 23:59 228 MTPS 04:00 1 200 100 227 200 14:00 MAP 18:00 RTPS VTS 0 0 500 400 40 415 16:00 SHVT Power factor angle in degree 500 1000 406 300 06:00 500 SIM 2 230 231 232 234 235 236 237 50 0 219 220 221 222 223 224 225 226 227 228 229 242 Voltage (KV) Total system demand vs. power factor plot on 19/01/2006 233 Voltage (KV) Fig. 13. Scatter plot of Voltage vs. MVAr injection at different bus on 06/06/2006 21,000 08:00 19,000 14:00 SIM 407 408 18,000 17,000 04:00 412 413 414 415 228 229 230 231 400 40 100 80 404 213 214 215 216 217 218 219 220 KAI 02:00 410 415 0 50 100 233 420 Average system voltage 234 235 236 237 238 239 240 241 406 408 410 412 406 408 410 412 414 416 100 MAP 15000 405 404 200 0 229 100 221 50 402 60 232 0 200 212 00:00 16,000 411 MTPS SHVT 23:59 06:00 410 150 100 227 18:00 409 200 VTS 16:00 22:00 0 200 600 406 12:00 10:00 200 400 RTPS 20,000 Total system demand RGM 200 20:00 230 231 232 Total system demand vs. Average system voltage on 06/06/2006 234 235 236 237 50 0 219 220 221 222 223 224 225 226 227 228 229 242 Voltage (KV) Fig. 11. 233 Voltage (KV) Fig. 14. Plot of Voltage vs. VAr injection (Averaged for each voltage) at different bus on 06/06/2006 1200 RGM 800 403 404 405 406 407 408 409 410 411 100 200 50 0 400 415 KAI 410 420 System average voltage on 06/06/2006 228 229 211 212 213 214 215 231 232 233 234 Voltage ( KV ) frequencies the graph appears to be nearly a quadratic. As system demand increases system frequency decreases and vice versa. At the higher end of frequencies the graph is irregular 231 216 217 235 236 237 403 404 405 406 407 408 409 410 228 229 230 231 232 233 234 235 500 0 227 200 218 0 100 230 Fig. 12. Total system MVAr generation vs. Average system voltage on 06/06/2006 230 0 200 210 100 600 227 MAP 200 226 MTPS 225 200 0 25 20 15 10 5 402 1000 RTPS VTS 400 SHVT System Var generation 600 800 405 0 500 402 100 0 100 200 300 403 404 405 406 407 408 409 410 411 412 413 SIM 500 1000 100 0 220 222 224 226 228 230 232 Voltage ( KV ) Fig. 15. Plot of Voltage vs. VAr injection (Averaged for each voltage) at different bus on 13/03/2006 due to insufÞcient number of sample points. In Figure 18 the same plot is shown for the month of April 2006. Here also 252 Fifteenth National Power Systems Conference (NPSC), IIT Bombay, December 2008 150 MAP KAI 100 0 100 228 230 232 234 236 238 50 0 216 218 220 222 224 226 228 230 232 234 236 240 200 SHVT 400 MTPS 100 200 0 224 226 228 230 232 234 236 238 0 200 208 210 212 214 216 218 220 SIMH 200 0 200 400 395 400 405 410 Bus voltage for the month of March 2006 415 Fig. 16. Plot of Voltage vs. VAr injection (Averaged for each voltage) for the month of March 2006 System demand averaged over each frequency for Dec 05 18,400 17,600 16,800 16,000 15,200 14,400 13,600 12,800 12000 48.5 49 49.5 50 50.5 51 System frequency System demand averaged for each frequency over month of April 06 Fig. 17. Plot of System demand averaged at each frequency vs.system frequency for the month of December 2005 22,000 of India. By performing similar analysis on other systems the similarities/differences in their behavior can be found. It must also be pointed out that the primary aim of the present work is to emphasize that new relations/insights can be obtained by analyzing operational data.The results presented are incidental. The present investigation had some constraints beyond our control. For example, all the parameters contained outliers. Erroneous outliers are to be identiÞed and eliminated for the sake of meaningful analysis at the time of archiving. it is difÞcult to do it later. We have identiÞed some of the erroneous values and substituted them with reasonable values. With more system information it would have been possible that more outliers are identiÞed. Much work is needed to Þnd suitable ways to identify and replace erroneous values at the time of recording/archiving. Application of statistical methods for outlier identiÞcation and replacement can be an interesting issue for further research. To get the real beneÞt from data analysis, utilities have to focus on eliminating the limitations of the present data storage practices. The limitations observed are : 1) The data set has many errors 2) The data is not complete 3) the data set acquisition/storage was not motivated by data mining consideration The aim of this investigation is to argue that if some of these limitations are overcome (it can be in fact done fairly easily), then mining such data could be extremely proÞtable from the point of view of efÞcient planning and operation of power systems. ACKNOWLEDGMENT The authors would like to thank Southern Regional Load Despatch Center, Bangalore for providing system data for this work. R EFERENCES 21,000 20,000 19,000 18,000 17,000 16,000 15,000 48.5 48.6 48.7 48.8 48.9 49 49.1 49.2 49.3 49.4 49.5 49.6 49.7 49.8 49.9 50 50.1 50.2 50.3 50.4 50.5 50.6 50.7 50.8 50.9 51 System frequency Fig. 18. Plot of System demand averaged at each frequency vs.system frequency for the month of April 2006 [1] M.Werner and U.Hermansson, ”Integrated Utility Data Warehousing A Prerequisite To Keep Up With Competetion On Electricity Markets”, Power Systems Management and Control.2002.Fifth International Conference on(Conf.Publ.No.488), 17-19 April 2002, pp 130-135 [2] D Shi, Y Lee, X Duan, Q.H.Wu, ”Power systems data warehouse”, IEEE Computer Applications in Power, July 2001, Vol. 14, No. 3, pp 49-55 [3] Lin Xu, ”Data modelling and processing in deregulated power systems”, Ph.D Theses, Washington State University, May 2005 [4] Xiaofeng He, Gang Wang and Jiancang Zhao, ”Research on the SCADA /EMS System Data Warehouse Technology”, IEEE/PES Transmission and Distribution Conference and Exhibition: Asia and PaciÞc, Dalian, China, 2005, pp 1-6 [5] E.H.Abed, N.S.Namachchivaya, T.J.Overbye, M.A.Pai, P.W.Sauer and A.Sussman, ”Data-Driven power systems Operations”, Lecture Notes In Computer Science, 2006, NUMB 3993, pp 448-455 [6] http://www.srldc.org/Downloads/Srldc the graph tends to show a similar nature except at the higher frequency range. V. C ONCLUSION The simple analysis attempted in the work brings out several interesting characteristics of the overall system behavior that are not readily available anywhere for the souther regional grid 253