Overview of Real-Time Database Management System Design for Power System SCADA System Jian Wu, Yong Cheng, and Noel N. Schulz Department of Electrical and Computer Engineering Mississippi State University schulz@ece.msstate.edu database. The data and its processing of a real-time SCADA database have harsh time constraints. The accuracy not only relies on the business logical result, but also is decided by whether the logical derivation completes in certain interval. Therefore, we need to combine real-time theory with traditional database technology to realize a real-time database in a SCADA system. Abstract A Supervision Control and Data Acquisition (SCADA) system is a communication and control system used for monitoring, operation and maintenance of energy infrastructure grids. Compared with traditional applications, a SCADA system has a harsh deadline for critical tasks. There is special time constraint for the real time database used in a SCADA system. The real time database in SCADA extends traditional database to include in-memory database. Such real time database management are designed to operate in the harsh environment of realtime systems, with strict requirements for resource utilization, and are ready to provide the performance and reliability required by real-life applications. In this paper, the main principle of real time database has been introduced. Its implementation in power system SCADA is discussed and a sample database is briefly introduced. Key words: Database SCADA, Real-time, Database, 2. Technical Issues in Real-time Database Implementation Like traditional database management systems, real-time database systems (RTDBMS) serve as repositories for data and provide efficient storage, load and manipulation of data. They must be equipped with basic functions of general DBMS. Moreover, to meet the temporal requirements of the managed data, timing constraints on transactions, and performance, the real-time database should support on-line query, location, deletion, and insertion. An in-memory database is often used to achieve the speed requirement. Functionalities of a real-time database can be summarized as below [3,4]: • Provides platform to share data with entire Energy Management System (EMS) • Provides open database interface, realizes the function of database such as fast load, input, deletion, inquiry, load, and access; • Provides snapshot, history data storage, copy and reutilization; • Provides validity check; • Provides security protection To realize the functionalities of a real-time database, there are several technical issues in real-time database implementation as elaborated below. In-memory 1. Introduction A Supervision Control and Data Acquisition (SCADA) system is a communication and control system used for monitoring, operation and maintenance of energy infrastructure grids. Compared with traditional applications, a SCADA system has a harsh deadline for critical tasks. It has different characteristics: the system must maintain massive sharing data and control data; also each real-time task of a SCADA system has a rigorous time requirement, while the data for analysis and processing is time-variant. Real time data has a short life cycle; for example, all the data of remote measurements and remote signals must be updated every 5 seconds and the decision or derivation based on obsolete data is invalid [1,2]. Therefore, there is special time constraint for the database used in a SCADA system. The database should be able to process persistent static data, maintain the integrity and consistency of them, should be able to deal with the dynamic data within its time constraints during processing and ensure the concurrence and efficiency of data access [3,4]. This kind database is classified as a real time 0-4244-0169-0/06/$20.00 © 2006 IEEE. 2.1 Real-time Database Model Design and Implementation A relational database is good at data representation and management. When a relational database is used in a SCADA system, which is a large-scale complicated realtime system with the functions of data management, monitoring, analysis and decision, and some shortcomings 62 are exposed. For instance since different venders of power system applications developed real-time database independently, there is no standard data representation, which causes integration problems. The common Information Model (CIM) proposed by the Electric Power Research Institute (EPRI) can be used to address this issue [5]. 3. Data Object of Dispatch System Classification Before building a data schema in real time database it’s necessary to know the data object characteristic. The data in SCADA system can be divided into three types of data objects: Real time Data Object, Static Data Object and Derived Data Object [1,2,3]. 2.2 Real-time Task Scheduling and its Concurrency Controls 3.1 Real-time Data Object In addition to maintaining database consistency as in conventional databases, real-time database systems must also handle transactions within time constraints. Scheduling real-time transactions is far more complex than traditional real-time scheduling. The system must guarantee the most time-critical task to be executed as early as possible. That is, the system should be able to schedule the transactions execution sequence according to the priority [2]. Real-time data such as remote measurement and remote signal reflects the status of a power system. The common characteristic of the data is the strict time constraint. The data is written into the real-time database periodically. A real-time data object is associated with a time stamp and life cycle. They are only valid for the responding sampling time. The task of query the sample from Remote Terminal Unit (RTU) and database written is managed by real-time task dispatch management as stated in next section. Once the value of real time data object is written into the database, it can not be changed by other task. New sample data will be stored in a real-time database as a new data object [5]. 2.2 Data Storage and Sharing Memory Management Traditional disk database operation is based on disk I/O. But the disk I/O time delay and its uncertainty sometimes is fatal to the real-time task. Therefore another essential issue of real-time databases is to eliminate time delay and uncertainty. This needs the support of "database in memory". To ensure real-time tasks access the database fast and accurately, two problems should be solved [6]. (1) Guarantees all the key tasks of load and store happen in memory only. (2) Realizes data sharing among tasks. 3.2 Static Data Object A static data object is constant. It is a special data type in real-time databases, and its value does not change with time. The time stamp associated with the data is system creating time or the renewal time. This type of data, such as parameters of transmission line and transformer impedance, is spread wide in SCADA systems. 2.3 Database Security and Recovery 3.3 Derived Data Object If the database system crashes, a traditional database system uses a log to restore the database to what it was at the time of the crash, but real-time database restoration is more complex due to the facts below [3]: (1) The process of restoration blocks the real-time task for accessing the database and causes real-time task overtime. It is possible for active real-time task to access transient data and make the wrong decision; (2) In the real-time database much data is temporary and volatile. Sometimes the influence of data inconsistency and incorrect is "temporary" and "noncontinuous". For example, incorrect remote measurement will affect the state estimation happened at same time interval only but not after this interval. Therefore, the recover time of different types of crashed data is dependent upon the application that used the data. A derived data object is calculated from a group of real time Data Objects and other data objects. Therefore, the derived data object also has a time stamp and life cycle. The time stamp of a derived data object is the derivation time. The life cycle of this type data is the time interval between this derivation time and the next derivation time. The value of a derived data object in a database may be renewed. Its value may be preserved or not depending on its functionality. This type of data includes power grid network model, magnitude and angle of bus voltage, and branch current obtained from state estimation in EMS. The data could be the solution from one module, or the raw data of some task. 4. Real-time Database System Structure 63 each other. It is necessary to coordinate activities of realtime tasks. The task dispatch management should provide the following functions to share database resources and coordinate conflict [8]: •Start and stop task at fixed time; •Allow setup conditions to trigger task; •Specify execution sequence among tasks; •Constrain the instance number of global task. In order to support the functionalities above, a real time database can be an integration of traditional relational database and in-memory database. The general architecture is shown in Figure 1[1,2]. It includes the real-time task dispatch management, in-memory database, I/O dispatch and the relational database. R e a l-tim e a p p lic a t io n 4.2 I/O dispatch R e a l-tim e ta sk d isp a tc h m a na gem en t The I/O dispatch is responsible for data synchronization of in-memory database with relational database. The inmemory database has a one-to-one relation with relational database. The data in a relational database needs to be automatically synchronized with in-memory database. It is important to decide when to write the data in an in-memory database to a relational database and when to export the data in a relational database to in-memory database. There is no general solution. One possible way is to provide methods of periodical update and forced update in dual direction between a relational database and an in-memory database. For different applications, the two methods can be used for different objects according to their characteristics. I n -m e m o r y d a ta b a se D a ta b a se m o d e l m a n a g e m e n t R e a l-tim e re s o u r c e m a n a g e m e n t D a ta o p e r a tio n N e tw o rk c o m m u n ic a tio n 5. In-Memory Database I /O D isp a tc h An in-memory database is the core part of a real-time database. It functions cover the database data model, data operation, real-time resource management, and network communication. The traditional database is a disk-based database. Its processing time is indeterministic, because it involves the disk access, data transfer between internal and external memory, buffer management, waiting list and lock management. This property makes traditional database system unable to achieve the requirements of real-time transactions that are of high efficiency and time deterministic. Therefore, the introduction of in-memory database makes it possible to have most transactions happen in the memory, which avoids disk I/O during realtime transactions, and reduces uncertainty and enhances efficiency [6,8]. R e la tio n a l D a ta b a se Figure 1. Real time database system structure The real-time task dispatch management provides coordination between activities of real-time tasks to access the real time database. The in-memory database holds the main effective range of data in the memory to avoid disk I/O during real-time business implementation and enhances implementation efficiency. It is the core part of the database. The detailed elaboration will be presented in section 5. The relational database in this architecture can be used as interface for further development, as well as storage medium for in-memory database. The I/O dispatch is responsible for data synchronization of memory database and relational database, through which the real-time database can be seamlessly integrated with traditional database. 5.1 Data Model of Database An in-memory database, as the extension of a relational database, has similarity with a relational database. A table is the key part of an in-memory database. A table describes a group data object entity of the same type. While a table is two-dimensional in relational database, a table in an inmemory database can be multi-dimensional. Although a two-dimensional table is deceptively simple to design and operate, this simplicity requires complex joins of many 4.1 Real-time Task Dispatch Management In SCADA, there are nesting, communication and cooperation relations among real-time tasks. The derived data object of a task is possibly the input data object of another task. The data object and the task are dependent on 64 tables when doing data analysis over history or over one property [5]. The introduction of a multi-dimensional table structure allows an in-memory database to process data over history or over one property and save storage space in memory. Furthermore it speeds up users’ access to data from any dimension and facilitates data analysis. to get partially correct but prompt data, rather than accurate but obsolete data. Thus, such risk is tolerable [1,2]. 5.3.2 Standard I/O To make up for the insufficiency of a fast access interface, the database can provide another access interface- standard I/O. This access interface needs to provide database security and data integrity check. It also needs to provide an effective network access mechanism and strict concurrency control. As the manipulation of any data in a database must interact with a database server, this kind of visit mechanism has reduced efficiency compared with a fast access interface. This kind of mechanism can be used in a distributed application, human-machine interface (HMI), and interface with other applications [1,2]. 5.2 Real-time Resource Management 5.2.1 Classification of Data Storage Form There are two types of data storage in a database, storage in in-memory database, and storage in the disk. In a SCADA system the amount of data is extremely huge. Not all data must be put into an in-memory database. Data storage form can be decided according to the following data characteristics [3]: • Timeliness: in a real-time database each data object has a lifecycle; the data with a short lifecycle must be preserved in an in-memory database. • Effectiveness: the data with frequent access must be stored in the memory database. • Crucial nature: In order to guarantee the system efficiency, the crucial data should be preserved in the in-memory database. 5.4 Communication The basic requests of real-time database network communication are high efficiency and reliability. Many mechanisms such as the client/server interaction, streaming transmission, and creation of a mirror database at other sites can be used to realize data distribution. These mechanisms have different advantages and disadvantages and are applicable to different application area [7,8,9]. 5.2.2 Data Security and Restoration in Database The real-time task cannot be interrupted during the restoration process, which means that the data restoration should not affect the operation of a real-time system. A traditional database is usually restored by using rollback segment. But it is difficult for this method to satisfy the real-time request. A simple reliable restoration method is to create a mirror database for essential data. A mirror database can backup the entire database at local or other points of the network. When the data is crashed at a local site, data can be restored from the backup point, which enhances the essential data security [7]. 5.4.1 Client/Server Interaction In the client/server interaction communication, a client accesses the remote database (server) through the local real-time database. For each access there is a series of interactions between the client and the remote server to establish the connection. This is a typical application of client/server used for a majority of database management systems. As the size of message is small and interaction frequency is high, it is easy to cause network overburden, low efficiency and server overload. The advantage of client/server interaction is no data synchronization necessary among databases and it’s applicable for small amount of real time data sharing. 5.3 Database Operation 5.4.2 Streaming Transmission Similar to client/server interactions, streaming transmissions "deliver" the data to the client according to request. But the basic principle of streaming transmission is that the server will deliver the data to the client periodically as requested by the client at the first connection. Thus, it can avoid the client/server frequent connection establishment process. Also, the server can reorganize messages according to client request to increase the average message size and reduce the frames passing through the network. Thus transmission efficiency is enhanced. 5.3.1 Fast Access Interface Fast data access is basic requirement of a real-time database. In order to enhance operating efficiency of system analysis and decision software, a real-time system needs to provide a fast database access interface. Through this interface, database access efficiency can be the same as memory variable operation. One possible realization is to map an entire partition of the database to the shared memory. And return the result in a C structure through an Application Program Interface (API). Such a mechanism provides effective data manipulation ability and circumambulates database security and data integrity check, which introduces a certain degree of risk. However the guideline of this interface design for a real-time database is 5.4.3 Creation of Mirror Database at Other Site A different site mirror database can help to keep data synchronization efficiently. It is applicable to special 65 applications when massive data is transferred. The main principle is to mirror some basic unit of the database, such as the whole database, a partition of the database or tables to other sites. This technique helps to quick start SCADA on one node with the mirror database specified at other node. Moreover, it can help to fast update realize the database remotely. This research work has been made possible through the support of the DoD MURI Fund # N00014-04-1-0404. 9. References [1] J.Wu, “SCADA data processing in MIS”, ZheJiang Power, December.1999. [2] J.Wu, “Interconnection between MIS and SCADA”, Transaction of Power system and its automation, October 1999. [3] J. Ordieres Mere, F. Ortega, A. Bello, C. Menendez and V. Vallina, “Operational information system in a power plant”, Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, October. 1997, Vol. 4, pp. 3285 - 3288 [4] W.L.Feijo, F.A.B. Lemos, A. V. Zampieri; A. Manzoni and A. Franceschi, ”Computational system for training and simulation of volt/VAr control actions”, Proceedings of the IEEE Power Systems Conference and Exposition, Oct. 2004 vol. 2, pp. 1007 – 1012. [5] J.Wu and N.N.Schulz, "CIM-Oriented Database Design and Data Exchanging in Power System Applications ", Proceedings of North America Power System Conference, Iowa, Oct. 2005. [6] Xiong Ming, K. Ramamritham, J. Haritsa and J. A. Stankovic, “MIRROR: a state-conscious concurrency control protocol for replicated real-time databases”, Proceedings of the Fifth IEEE Real-Time Technology and Applications Symposium, June 1999, pp. 100 – 110. [7] P. S. Yu, Y. K. Wu, K. J. Lin and S. H. Son “On real-time databases: concurrency control and scheduling”, Proceedings of the IEEE Real-Time Technology and Applications Symposium, Jan. 1994, vol. 82, no. 1 pp. 140 – 157. [8] V. F. Wolfe, L. C. DiPippo, J. J. Prichard, J. Peckham and P. J. Fortier, “The design of real-time extensions to the Open Object Oriented Database system” Proceedings of WORDS, October. 1994, pp. 86 – 93. [9] H. Hayashi, Y. Takabayashi, H. Tsuji and M. Oka, “Rapidly increasing application of Intranet technologies for SCADA (supervisory control and data acquisition system)”, Proceedings of the IEEE Transmission and Distribution Conference and Exhibition 2002, October. 2002, vol.1, pp. 22 – 25. [10] H.Zhou and H.Lu, “Distributed real-time database platform study of DMS”, Proceedings Transmission and Distribution Conference & Exhibition”, December. 2005. [11] OSI. “PI Analysis Framework User’s Guide”, available: http://www.osisoft.com. 6. Real time database implementation Since designs of EMS databases have revolved around stringent real-time performance requirements as stated above, specialized proprietary databases are used to adapt to high transaction rates, high volumes of information, and the diversity of data characteristic of the EMS environment. At the same time, to make real time database plug compatible application software, standard interface are required to provide data access [10]. One of the leading real time databases is the PI system, developed by OSIsoft. PI system is a large real-time /historical database used for data collection, storage and monitor. PI database uses swinging-door compression and particular filtration technique to process the raw data before entering the archive, and achieves efficient archive storage without losing significant data. PI database platform also provides standard data access through its Data Access Package (DAP) including API, SDK, ODBC and OLEDB [11]. 7. Summary The real time database in SCADA extends raditional database to include in-memory database. Such RTDBMS are designed to operate in the harsh environment of realtime systems, with strict requirements for resource utilization, and are ready to provide the performance and reliability required by real-life applications. In implementation of the real-time database for a dispatch automation system, we need to consider the use standard data model--CIM for the data objects. An inmemory database is the core part of a real-time database. Special mechanisms are required for real-time task scheduling and concurrence control, data storage management and crash recovery. Also speed interface needs to be provided to handle data access. Above all, till now real time database is not a commercial one, In-memory database need to design according to application, we need to consider the management system design and also database schema design. This paper discusses the technique issues to implement a real-time database and provide a general guideline for the key functions. Finally, a sample real time database, PI database platform is briefly introduced. 8. Acknowledgements 66