This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON SMART GRID 1 Queuing-Based Energy Consumption Management for Heterogeneous Residential Demands in Smart Grid Yi Liu, Chau Yuen, Senior Member, IEEE, Rong Yu, Member, IEEE, Yan Zhang, Senior Member, IEEE, and Shengli Xie, Senior Member, IEEE Abstract—In this paper, an energy consumption management is considered for households (users) in a residential smart grid network. In each house, there are two types of demands, essential and flexible demands, where the flexible demands are further categorized into delay-sensitive and delay-tolerant demands. The delay-sensitive demands have higher priority to be served than the delay-tolerant demands. Meanwhile, in order to decrease the delay of delay-tolerant demands, such demands are allowed to be upgraded to the high-priority queue (i.e., the same queue that serves the delay-sensitive demands) with a given probability. An optimization problem is then formulated to minimize the total electricity cost and the operation delay of flexible demands by obtaining the optimal energy management decisions. Based on adaptive dynamic programming, a centralized algorithm is proposed to solve the optimization problem. In addition, a distributed algorithm is designed for practical implementation and the neural network is employed to estimate the pricing or demands when such system information is not known. Simulation results show that the proposed schemes can provide effective management for household electricity usage and reduce the operation delay for the flexible demands. Index Terms—Energy consumption management, heterogeneous demands, operation delay, smart grid. I. I NTRODUCTION S MART GRID, which uses advanced metering infrastructure and central scheduler, enables two-way flows Manuscript received November 13, 2014; revised February 24, 2015 and April 21, 2015; accepted May 4, 2015. This work was supported in part by the Energy Innovation Research Program Singapore under Grant NRF2012EWT-EIRP002-045; in part by the Programs of Natural Science Foundation of China under Grant 61403086, Grant 61333013, Grant 61422201, Grant 61370159, Grant U1301255, and Grant U1201253; in part by the Guangdong Province Natural Science Foundation under Grant S2011030002886; in part by the High Education Excellent Young Teacher Program of Guangdong Province under Grant YQ2013057; in part by the Science and Technology Program of Guangzhou under Grant 2014J2200097 (Zhujiang New Star Program); in part by the Research Council of Norway under Project 240079/F20; and in part by the European Commission FP7 Project CROWN under Grant PIRSES-GA-2013-627490. Paper no. TSG-01118-2014. (Corresponding author: Shengli Xie.) Y. Liu, R. Yu, and S. Xie are with the School of Automation, Guangdong University of Technology, Guangzhou 510006, China (e-mail: yiliu115@gmail.com; yurong@ieee.org; shlxie@gdut.edu.cn). C. Yuen is with the Singapore University of Technology and Design, Singapore 487372 (e-mail: yuenchau@sutd.edu.sg). Y. Zhang is with Simula Research Laboratory, Oslo 1325, Norway (e-mail: yanzhang@simula.no). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSG.2015.2432571 of electricity and information between the users and utility companies [1]–[3]. Such two-way exchanging is able to create an automated and distributed advanced energy delivery network. The main advantage of this network is that the utility company can implement demand response management (DRM) programs to control energy consumption of the users. To implement effective DRM, the smart meters collect data on the electricity usage of the houses and communicate with the central scheduler. Considering heterogeneous demands caused by different appliances, the central scheduler manages and schedules the electricity consumption of the users while minimizing their cost as an incentive. Meanwhile, the users’ preferences of the energy usage are also considered by the central scheduler in many literatures [4]–[6]. From utility companies’ point of view, DRM can reduce peak electricity loads and increase the reliability of the power grid. By using DRM, utilities and system operators can encourage users, through incentives, to individually and voluntarily manage their loads, e.g., shifting of high-energy loads to off-peak hours. Koutsopolous and Tassiulas [7] focused on minimizing the grid operational cost for utility by designing a power demand task scheduling policy. Tasdighi et al. [8] proposed an optimal scheduling model for a microgrid based on temperature dependent thermal load modeling. From users’ point of view, the DRM can be used to find the optimal energy consumption schedule decision to minimize the users’ electricity bill, while reducing the users’ energy usage dissatisfaction. Mohsenian-Rad and Leon-Garcia [9] proposed an optimal residential load control mechanism which attempts to minimize both electricity payment and waiting time for the residential appliances. Neely et al. [10] investigated the problem of allocating energy from renewable sources to flexible users which are served within a specified delay window, and incurs a cost of drawing energy from other (possibly nonrenewable) sources if their own supply is not sufficient to meet the deadlines. According to aforementioned literatures, a typical dissatisfaction of users’ electricity usage is the operation delay, which is caused by the service priority of hierarchical loads in residential networks. Hence, on top of energy cost minimization, users with DRM also dedicate to minimize the operation delay by adjusting energy consumption according to the queue length of different classes of demands. Recently, there have been several studies detailing DRM approaches to deal with c 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 1949-3053 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 2 minimizing energy cost with constrained users’ operation delay in smart grid [4], [11]–[14]. Chen et al. [4] investigated the cost minimization problem for an end user, such as a home, community, or a business, which is equipped with renewable energy devices when electrical appliances allow different levels of delay tolerance. Zhu et al. [11] proposed a mechanism to schedule both optimal power and optimal operation time for power-shiftable appliances and time-shiftable appliances, respectively, according to the power consumption patterns of all individual appliances. Guo et al. [12] investigated the minimization of the total energy cost of multiple residential households with inelastic and elastic energy loads. For any feasible control decision, the constraint of average delay for the elastic loads should be satisfied. Gatsis and Giannakis [13] devoted to minimizing the electricity provider cost plus the total user dissatisfaction, subject to the individual constraints of operation period. Zhao et al. [14] introduced a general architecture of energy management system in a home area network and formulated a joint optimization problem to minimize the electricity expense for users as well as the operation delay. In these literatures, the operation delay is treated as a significant component in the users’ preference concerned DRM schemes. However, how to ensure the fairness of the low-serving priority loads is still an open issue. The focus of this paper is to manage the energy consumption of residential smart grid by minimizing both electricity cost and operation delay for users. Each household has essential and flexible demands. According to different delay requirements, the flexible demands are categorized into two types: 1) delay-sensitive; and 2) delay-tolerant demands. Delaysensitive demands always have priority over delay-tolerant demands, i.e., delay-tolerant demands can only be served when there is no delay-sensitive demands in the system. When the residential grid is highly loaded and a large portion of the demands is delay-sensitive demands, the prioritized scheduling may cause excessive delay for the delay-tolerant demands. Inspired by [15], an upgrade-by-probability (UBP) scheme which is used in wireless communication networks is developed to decrease such delay in smart grid. In this scheme, the delay-tolerant demands can promote or jump to the high-priority queue with probability β. Jumped delaytolerant demands are treated as if they are delay-sensitive, i.e., they are served over newly arriving delay-tolerant demands. In this paper, an energy management strategy is studied for household appliances under the queuing-based demand framework of the residential smart grid. The main contributions of this paper are as follows. 1) To ensure the fairness of the low-serving priority demands, the UBP scheme is used to decrease the delay of delay-tolerant demands if they are not served for a long time. An optimization problem is formulated to minimize both total electricity cost and operation delay for flexible demands. 2) Centralized algorithm based on adaptive dynamic programming (ADP) is proposed to solve the optimization problem. A discrete-time policy iteration algorithm of ADP is developed to obtain the optimal controller for the residential smart grid systems. IEEE TRANSACTIONS ON SMART GRID Fig. 1. System model. 3) Distributed algorithm is designed for practical implementation and neural network is employed to approximate management decision when the real-time system information are not known. In addition, numerical results show that the proposed schemes can provide effective management for household electricity usage and reduce the operation delay for the flexible demands. The rest of this paper is organized as follows. In Section II, the system models are introduced. The optimization problem to minimize both energy cost and operation delay is proposed and solved in centralized formulations in Section III. Section IV studies the distributed algorithm with neural network method to solve the proposed problem with historic information. Section V presents the numerical results of the proposed algorithms. Finally, this paper is concluded in Section VI. II. S YSTEM M ODEL A. Residential Grid Networks The energy of a residential grid network is supplied by the power grid and shared by several users (homes) through the power line, as shown in Fig. 1. Each user is equipped with a smart meter, which not only distributes the electricity from the power line to all appliances in each home, but also collects each user’s demands and preference information. Through the communication line, the smart meters are capable of reporting the collected information to the central scheduler. Based on all the collected information, the central scheduler will optimize the consumption over the day and schedule all appliances in the residential grid network. The demands in the residential grid network are categorized into two types: 1) essential; and 2) flexible demands. For the essential appliances such as TVs, electric stoves, and lamps, which have a fixed power requirement and operational period, the central scheduler will ensure a continuous supply of power throughout the optimization period. The scheduling process would only affect the flexible demands. For flexible appliances, such as optional lighting (OL), water heating, and clothes dryers (CD), the central scheduler will be able to control the switch and provide sufficient electricity corresponding to the power pattern during the scheduled periods. More specifically, the flexible demands can be buffered, i.e., the energy request can be delayed, first in a queue before being served. Considering the different delay requirements, the flexible demands are classified as delay-sensitive This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: QUEUING-BASED ENERGY CONSUMPTION MANAGEMENT FOR HETEROGENEOUS RESIDENTIAL DEMANDS IN SMART GRID and delay-tolerant demands. Without loss of generality, the delay-sensitive demands are generated by the appliances, such as CD and OL, which are sensitive to the operation delay. The delay-tolerant demands are generated by the appliances, e.g., heating, ventilation and air conditioning (HVAC) and water heating (WH), which are insensitive to the operation delay. B. Cost Function for Demands Let N denote the set of users, where the number of users is N |N |. For each user n ∈ N , let en,b (t), en,s (t), and en,r (t) denote the electricity consumed to serve essential, delay-sensitive and delay-tolerant demands in user n at time t, respectively. Let ln (t) denotes the total energy consumption at each time slot t ∈ T {1, . . . , T}, where T is the total number of all unit time slots. Then ln (t) = en,b (t) + en,s (t) + en,r (t), t ∈ T . Based on this definition, the total energy consumption across all users at each t ∈ T can be calculated as L(t) n∈N ln (t). To indicate the energy cost in the residential grid system, an energy cost function should be carefully selected. The energy cost under this model can be approximated by the quadratic function as [5] f (L(t)) = p(t)L(t)2 (1) where p(t) > 0 at each time slot t ∈ T . C. Prioritization of Demands With Upgrade-by-Probability Without loss of generality, we assume that the delaysensitive demands dn,s (t) enter the high-priority queue, while the delay-tolerant demands dn,r (t) are buffered in the lowpriority queue. At the beginning of a time slot t, when there are demands in the high-priority queue, the demands in the low-priority queue cannot be served. It is obvious that this policy may cause significant delay for the delay-tolerant demands if the arrival rate of delay-sensitive demands is high. Because the high-priority queue may always have buffered demands, hence, the UBP scheme is proposed to allow the demands in the low-priority queue to upgrade to the high-priority queue with probability β. This updating is only possible when the high-priority queue is nonempty at the beginning of a slot. The UBP scheme is presented as follows. For any user, let un,H (t) and un,L (t) denote the length of the high- and low-priority queues of user n at the beginning of t, respectively. The queuing demands can be described by the pair (un,H (t), un,L (t)). For different cases of the high- and lowpriority demand queues, we have the following expressions. 1) un,H (t) = 0: In this case, the high-priority queue is empty at the beginning of time slot t and the new arriving delay-sensitive demands are queued in the high-priority queue. The demands in the low-priority queue will be served during time slot t since the low-priority queue is not empty at the beginning of t. Then un,H (t + 1) = dn,s (t) (2) un,L (t + 1) = [un,L (t) − en,r (t)]+ + dn,r (t) where []+ denotes the maximum of the argument and zero. 3 2) un,H (t) > 0 and un,L (t) > 0: In this case, the highpriority queue is nonempty at the beginning of time slot t. Hence, the high-priority queue is served during time slot t. If the low-priority queue is nonempty as well, the entire lowpriority queue is upgraded to the high-priority queue with probability β. This upgrading process happens at the end of time slot t. The new arriving delay-tolerant demands will also upgrade to the high-priority queue. Then, we have the following expressions. 1) With probability β ⎧ ⎨ un,H (t + 1) = un,H (t) − en,s (t) + dn,s (t) + un,L (t) + dn,r (t) (3) ⎩ un,L (t + 1) = 0. 2) With probability 1 − β un,H (t + 1) = un,H (t) − en,s (t) + dn,s (t) un,L (t + 1) = un,L (t) + dn,r (t). (4) 3) un,H (t) > 0 and un,L (t) = 0: If the high-priority queue is not empty and the low-priority queue is empty at the beginning of t, the new arriving delay-tolerant demands will upgrade to the high-priority queue with probability β. Hence, we can obtain the following. 1) With probability β un,H (t + 1) = un,H (t) − en,s (t) + dn,s (t) + dn,r (t) un,L (t + 1) = 0. (5) 2) With probability 1 − β un,H (t + 1) = un,H (t) − en,s (t) + dn,s (t) un,L (t + 1) = dn,r (t). (6) III. E NERGY C ONSUMPTION O PTIMIZATION A. Problem Formulation In this paper, two objectives are jointly minimized: 1) the total energy cost for all household by consuming the electricity and 2) the queue length (i.e., the delay) of the delay-sensitive and delay-tolerant queues in each household. Hence, an optimization problem is given by N T 1 lim min 1 f (L(t)) + 2 (un,H (t) + un,L (t)) en,s (t), T→∞ T t=1 en,r (t),∀n n=1 s.t. L(t) ≤ Lmax T 1 lim un,H (t) ≤ unH,max T→∞ T lim T→∞ 1 T t=1 T un,L (t) ≤ unL,max (7) (8) (9) (10) t=1 where Lmax is the limit of the total energy consumption across all users at time slot t. unH,max and unL,max are the limits of the average high- and low-priority queues’ length at user n, respectively. 1 and 2 are the weight, which can be adjusted based on users’ preference. Constraint (8) ensures that the total energy consumed by all users is limited. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 4 IEEE TRANSACTIONS ON SMART GRID The constraints (9) and (10) are used to ensure the average delays of the flexible demands are finite. Let limT→∞ 1/T Tt=1 un,H (t) and E[un,L ] E[un,H ] limT→∞ 1/T Tt=1 un,L (t). In the next section, E[un,H ] and E[un,L ] are calculated based on UBP scheme in priority queuing model. B. Priority Virtual Queuing Analysis The dn,s (t), t ∈ T and dn,r (t), t ∈ T are assumed be independent and identically distributed over time. However, dn,s (t) and dn,r (t) can be correlated during one time slot. Let λ1 and λ2 denote the arrival rates of the delay-sensitive and delaytolerant demands, respectively. To guarantee the stability of the max ≤ emax and d max ≤ emax , where d max queues, we assume dn,s n,s n,r n,r n,s max and dn,r are the maximum value of the amount of arriving delay-sensitive and delay-tolerant demands at user n, respecmax tively. emax n,s and en,r are the maximum amount of the energy used to serve delay-sensitive and delay-tolerant demands at user n, respectively. Hence, the joint probability generation function (pgf) of dn,s (t) and dn,r (t) is defined as B(z1 , z2 ), d (t) d (t) i.e., B(z1 , z2 ) E[z1n,s z2n,r ]. Then, the marginal pgfs of the delay-sensitive and delaytolerant demands arrivals per time slot are given by B1 (z) B(z, 1) and B2 (z) B(1, z), respectively. Furthermore, the total demand arrivals during time slot t is denoted by dn (t), where dn (t) = dn,s (t) + dn,r (t). Its pgf is given by Bc (z) B(z, z). The corresponding arrival rates, i.e., the mean number of arrived demands per time slot, are indicated by λc Bc (1) = λ1 + λ2 . Let Ut (z1 , z2 ) denote joint pgf of un,H (t) and un,L (t). Then, we have u (t) u (t) (11) Ut (z1 , z2 ) E z1n,H z2n,L . Therefore, the expressions of the mean value of un,H and un,L are given as follows [15]: E[un,H ] = (1 − βλ2 )(U(0, 1) − 1) + λ1 + βE[uc ] − βU (2) (0, 1) β (12) (1 − βλ2 )(1 − U(0, 1)) − λ1 + βU (2) (0, 1) E[un,L ] = β (13) where E[uc ] is the mean total amount of demands and can be calculated via U(z, z), U (2) (0, 1) (dU(0, z)/dz)|z=1 . C. Centralized Algorithm The original problem (7)–(10) is a nonlinear optimization problem due to the random market prices p(t), random delaysensitive demands dn,s (t), and delay-tolerant demands dn,r (t). To solve the problem mathematically, we simplify the optimization problem under the assumption that the demands are ergodic across time slots, as well as the market prices. This assumption is usually valid because the power load and market behaviors statistically recur in some daily or seasonal patterns. Based on this assumption, the original optimization problem can be rewritten as N E[un,H ] + E[un,L ] (14) 1 E[ f (L(t))] + 2 min en,s (t), en,r (t),∀n n=1 s.t. L(t) ≤ Lmax E[un,H ] ≤ unH,max E[un,L ] ≤ unL,max . (15) (16) (17) The dynamic programming approach is employed to solve the problem in (14) subject to the constraints (15)–(17). For each stage k, k = 1, 2, . . ., let dn,s (k) and dn,r (k) be the delaysensitive and delay-tolerant demands of user n at stage k. The state vector is defined as x(k) = (ds (k), dr (k), a(k)), where vectors ds (k) = {d1,s (k), . . . , dN,s (k)}, dr (k) = {d1,r (k), . . . , dN,r (k)} are the delay-sensitive and delay-tolerant demands of all N users, respectively, and a(k) is the market price for buying electricity at stage k. Then, the management decision is defined as e(k) = (es (k), er (k)), where es (k) = {e1,s (k), . . . , eN,s (k)}, er (k) = {e1,r (k), . . . , eN,r (k)}. In addition, we define a feasible energy consumption management set corresponding to nth user as follows: En = e(k) | L(t) ≤ Lmax , E[un,H ] ≤ unH,max , E[un,L ] ≤ unL,max . In the proposed management method, the scheduler validly controls user’s energy consumption only if e(k) ∈ E. According to Bellman’s principle of optimality, the optimal performance index function F ∗ (x(k)) satisfies the following Hamilton–Jacobi–Bellman (HJB) equation: F ∗ (x(k)) = min U(x(k), e(k)) + F ∗ (x(k + 1)) e(k)∈En n n where U(x(k), e(k)) = a(k)L(k)2 + N n=1 (uH,t + uL,t ) is the utility function. Define the law of optimal control as e∗ (x(k)) = arg min U(x(k), e(k)) + F ∗ (x(k + 1)) . e(k)∈En Hence, the HJB equation can be written as F ∗ (x(k)) = U x(k), e∗ (x(k)) + F ∗ (x(k + 1)). To achieve the optimal management decision e∗ (x(k)), the optimal performance index function F ∗ (x(k)) should be obtained. Generally, F ∗ (x(k)) cannot be obtained unless all the e(k) are considered. If the traditional dynamic programming method is used to obtain F ∗ (x(k)) at every time step, then the curse of dimensionality should be considered. Moreover, the optimal management is discussed in infinite horizon which means the length of the control sequence is infinite. In this case, the optimal management is nearly impossible to be obtained by the HJB equation. To address these problems, a new iterative algorithm based on ADP [19] should be developed. Next, a discrete-time policy iteration algorithm based on adaptive dynamic programming is developed to obtain the optimal management for the residential smart grid systems. The goal of the developed policy iteration algorithm is to construct an iterative management decision vi (x(k)), which can make an arbitrary initial state x(0) to the equilibrium. Meanwhile, this iterative management decision also can make the iterative performance index function to reach the optimum. In the developed policy iteration algorithm, the performance index function and the management decision are This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: QUEUING-BASED ENERGY CONSUMPTION MANAGEMENT FOR HETEROGENEOUS RESIDENTIAL DEMANDS IN SMART GRID Algorithm 1: Centralized ADP Algorithm Algorithm 2: Distributed ADP Algorithm Initialization: 01: Select randomly an array of initial state x(0); 02: Select a computation precision ε; 03: Give the initial admissible management decision v0 (x(k)); 04: Give the max iteration of computation imax ; Iteration: 05: Let the iteration index i = 0; 06: Construct the iterative performance index function V0 (x(k)) according to v0 (x(k)) by (18); 07: Update the iterative management decision by (19); 08: Let i = i + 1. Construct the iterative performance index function Vi (x(k)), which satisfies the GHJB (20); 09: Update the iterative management decision vi+1 (x(k)) by (21); 10: If Vi−1 (x(k)) − Vi (x(k)) < ε, go to step 12. Else go to step 11; 11: If i < imax , then go to step 08. Else, go to step 12; 12: return vi (x(k)) and Vi (x(k)). The optimal management decision is achieved; 13: return The optimal management decision is not achieved within imax iterations. updated by iterations. Let v0 (x(k)) be an arbitrary admissible management decision. For i = 0, let V0 (x(k)) be the iterative performance index function constructed by v0 (x(k)) that satisfies the following generalized HJB (GHJB) equation: V0 (x(k)) = U(x(k), v0 (x(k)) + V0 (x(k + 1)). (18) Then, the iterative management decision is computed by v1 (x(k)) = arg min {U(x(k), e(k)) + V0 (x(k + 1))}. e(k)∈En (20) and the iterative management decision is updated by vi+1 (x(k)) = arg min U(x(k), e(k)) + Vi (x(k + 1)). e(k)∈En (21) According to (18)–(21), the iterative performance index function Vi (x(k)) is used to approximate F ∗ (x(k)) and the iterative management decision vi (x(k)) is used to approximate e∗ (x(k)). As (21) is generally not the HJB equation, it is necessary to determine whether the algorithm is convergent. In [19], the convergence proof of Vi (x(k)) and vi (x(k)) that converge to the optimal ones when i → ∞ is provided. Finally, the detailed implementation of the iteration algorithm is expressed in Algorithm 1. The computing complexity is O(imax ) where is the domain size of the variables en,s and en,r , imax is the maximum number of the iterations of the algorithm. IV. D ISTRIBUTED E NERGY C ONSUMPTION M ANAGEMENT A. Distributed Algorithm The algorithm described above should be able to run in a distributed manner in order to be implemented in practice. Hence, in this section, a distributed algorithm is designed to solve the optimization problem. For each user, the optimization problem is min en,s (t), en,r (t),∀n 1 E[ f (Ln (t))] + 2 (E[un,H ] + E[un,L ]) Initialization: β0 (k) arbitrarily to some nonnegative value; Iteration: Let the iteration index l = 0; At user n 01: Select randomly initial state xn (0) and initial admissible management decision v0 (xn (k)); 02: Give computation precision ε and the max iteration of computation lmax ; 03: If βl (k) is received, construct the iterative performance index function Vl (xn (k)), which satisfies the following GHJB equation Vl (xn (k)) = U(xn (k), vl (xn (k)) + Vl (xn (k + 1)) + βl (k)(Ln (k)); 04: Update the iterative management decision vl+1 (xn (k)) by (27) 05: If Vl−1 (xn (k)) − Vl (xn (k)) < ε, return vl (xn (k)) and Vl (xn (k)); 06: The optimal management decision is achieved; At the centralized scheduler 07: If en,l (k) for all users are received, Then; 08: The centralized scheduler calculates Ll (k), and updates the Lagrange multiplier βl (k) as follows: Ll (k) = argmin 0≤Ll (k)≤Lmax N un,H (k) + un,L (k) f Ll (k) + n=1 ⎧ ⎡ ⎤⎫+ N ⎨ ⎬ (l) ⎦ (l) ⎣ Ln (t) βl+1 (k) = βl (k) − φ L (k) − ⎩ ⎭ n=1 where φ > 0 is a constant step-size; 09: End; 10: set l ← l + 1; 11: End; 12: If l < lmax , then go to step 03. Else, go to step 13; 13: return The optimal management decision is not achieved within lmax iterations. (19) For ∀i = 1, 2, . . . , let Vi (x(k)) be the iterative performance index function constructed by vi (x(k)), which satisfies the following GHJB equation: Vi (x(k)) = U(x(k), vi (x(k))) + Vi (x(k + 1)) 5 (22) s.t. N Ln (t) ≤ L(t) ≤ Lmax (23) n=1 E[un,H ] ≤ unH,max E[un,L ] ≤ unL,max . (24) (25) Let xn (k) = (dn,s (k), dn,e (k), an (k)), n = 1, . . . , N. For iteration index ∀l = 1, 2, . . . , iterative performance index function Vl (xn (k)) for state xn is constructed by vl (xn (k)), which satisfies the following GHJB equation: Vl (xn (k)) = U(xn (k), vl (xn (k))) + Vl (xn (k + 1)) (26) where U(xn (k), vl (xn (k))) is the utility function for user n. Let en,l (k) = {eln,s (k), eln,r (k)} denote the control action of user n at iteration l, the iterative control law is updated by vl+1 (xn (k)) = arg min U(xn (k), en,l (k)) + Vl (xn (k + 1)) . en,l (k)∈En (27) Then, the distributed algorithm is shown in Algorithm 2, which iteratively solve problem (22) at users and centralized scheduler, respectively, in a distributed fashion. Let βl (k) denotes the Lagrangian multiplier at the lth iteration for stage k. At user n, if βl (k) is received, each user individually calculates its own version of local energy consumption management vector en,l (k) in line 04. The users locally solve the management problem according to the new announced βl (k). Then, each user updates the centralized scheduler with its new value of en,l (k). At the centralized scheduler, the algorithm starts with some random initial conditions, i.e., the scheduler assumes a random lagrange multiplier βl (k). This assumption is implying that, at the beginning, the scheduler has no prior This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 6 IEEE TRANSACTIONS ON SMART GRID information about the users. If the management vector en,l (k) from all users are received, the scheduler updates the βl (k) and broadcasts this updating multiplier to all users in line 08. Finally, the loop in lines 01–13 is executed until predefined stopping criterion is satisfied. As vl (xn (k + j)) is an admissible management decision, Vl (xn (k+θ )) → 0 for θ → ∞. Hence, if a large θ is choosing, then Vl (xn (k)) = θ−1 U(xn (k + j), vl (xn (k + j))). j=0 B. Neural Network-Based Distributed Management In practice, the message, βl (k) or en,l (k), exchanged between central scheduler and users might be loss due to a malfunction at transmitter or the noise at the receiver. Without knowing such messages, user n is not able to calculate the performance index function Vl (xn (k)) and admissible management decision vl (xn (k)). In this section, the user is allowed to estimate Vl (xn (k)) and vl (xn (k)) by the historical information. A three-layer neural network [19], [20] is employed by user n to approximate the Vl (xn (k)) and compute the optimal vl (xn (k)). Let J denotes the number of hidden layer neurons. Let Y denotes the weight matrix between the input and hidden layers. Let W denotes the weight matrix between the hidden and output layers. Then, the output of the three-layer neural network is (28) F̂(X, Y, W) = W T σ Y T X + b where σ (Y T X) ∈ RJ and [σ (z)]j = (ezj − e−zj /ezj + e−zj ), j = 1, . . . , J are the activation functions and b is the threshold value. Then, we will introduce the critic network and action network which are chosen as three-layer feedforward neural network. 1) Critic Network: In our algorithm, the critic network is used to approximate the performance index function Vl (x(k)). For user n, the output of the critic network can be obtained as j jT V̂l (xn (k)) = Wcl σ (Zc (k)) (29) where Zc (k) = YcT xn (k) + bc . Then, for the critic network, the error function is defined as j j j εcl (k) = V̂l (xn (k)) − Vl (xn (k)). (30) The objective function in the critic network training is 1 j 2 j ε (k) . (31) Ecl (k) = 2 cl Therefore, the gradient-based weight updating rule in the critic network is as follows: j+1 j j Wcl (k) = Wcl (k) + Wcl (k) j j ∂Ecl (k) ∂ V̂l+1 (k) j = Wcl (k) − αc j j ∂ V̂ (k) ∂Wcl (k) l+1 j j = Wcl (k) − αc εal (k) σ (Zc (k)) θ−1 j=0 j jT v̂l (xn (k)) = Wal σ (Za (k)) (33) where Za (k) = YaT xn (k) + ba . Then, the output error of the action network is defined as j j j εal (k) = v̂l (xn (k)) − vl (xn (k)). (34) The performance of error measurement can be minimized by updating the weights of the action network as follows: 1 j T j j ε (k) εal (k) . Eal (k) = (35) 2 al The calculation of the weights is similar to the one in the critic network. By using gradient descent rule, we can obtain j+1 j j Wal (k) = Wal (k) + Wal (k) j j j ∂Eal (k) ∂εal (k) ∂ v̂l+1 (k) j = Wal (k) − αa j j j ∂εl+1 (k) ∂ v̂l+1 (k) ∂Wal (k) T j j = Wal (k) − αa σ (Za (k)) εal (k) (36) where αa > 0 is the learning rate of critic network. If the training precision is achieved, then vl+1 (xn (k)) can be approximated by the action network. The weights convergence property of the neural networks is shown in the following theorem. Theorem 1: The target performance index function and the target iterative management decision can be expressed as Vl+1 (xn (k)) = Wcl∗T σ (Za (k)) vl (xn (k)) = ∗T Wal σ (Za (k)). (37) (38) Then, the critic and action networks can be trained by (32) and (36), respectively. If the learning rates αc and αa are both small enough, the weights of critic network (Wcl (k)) and the action network (Wal (k)) will asymptotically converge to the ∗ (k), respectively. optimal weights Wcl∗ (k) and Wal Proof: The proof of Theorem 1 is give in the Appendix. V. N UMERICAL R ESULTS (32) where αc > 0 is the learning rate of critic network. By precisely training, the Vl+1 (xn (k)) can be approximated by the critic network. However, it is not easy to obtain the performance index function Vl (xn (k)) when the critic network is not trained. Vl (xn (k)) can be obtained as follows: Vl (xn (k)) = 2) Action Network: In this network, the state vector xn (k) is used as input to create the iterative management decision as the output of the action network. The output is U(xn (k + j), vl (xn (k + j))) + Vl (xn (k + θ )). In this section, the RELOAD database [24], which provides hourly load profiles of different practical demands including HVAC, WH, lighting, clothes drying, freezing, etc., is used to model different demands. Two hundred users are selected and each user has three types of appliances: 1) essential appliances: normal lights, cooking machine, freezing, etc.; 2) delay-sensitive appliances: CD and OL; and 3) delaytolerant appliances: HVAC and WH. The market prices for purchasing electricity from the grid is p(t) = 0.3 cents at daytime hours, i.e., from 8:00 A . M . to 12:00 A . M . at This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: QUEUING-BASED ENERGY CONSUMPTION MANAGEMENT FOR HETEROGENEOUS RESIDENTIAL DEMANDS IN SMART GRID Fig. 2. Total operation delay in terms of different β by using centralized algorithm. 7 Fig. 3. Operation delay for delay-sensitive and delay-tolerant demands in terms of different β by using centralized algorithm. night and p(t) = 0.2 cents during the night, i.e., from 12:00 A . M . to 8:00 A . M . the next day. Neural networks are used to implement the distributed algorithm. The critic network and the action network are chosen as three-layer neural networks. For each iteration step, the critic network and the action network are trained for 80 steps using the learning rate of α = 0.02 [19]. In addition, to compare the operation delay of the proposed management scheme, a baseline scheme (no-priority scheme) where the delay-sensitive demands and the delay-tolerant demands randomly served without priority is carried out [5]. A. Operation Delay To illustrate the operation delay, log10 (actual demand queue) is used to measure the operation delay of delaysensitive demands and delay-tolerant demands. For the proposed scheme, the jumping probability β is set as 0, 0.2, and 0.6, respectively. The weights for the user’s usage preference are set as 1 = 1 and 2 = 1. The performance of the proposed control scheme will be evaluated under different combinations of the weights in the next section. In Fig. 2, we show the total operation delay in terms of different value of β by using centralized algorithm. It is observed that the proposed energy scheduling scheme can obtain lower delay than that of the no-priority scheduling scheme. This is because the proposed scheme can probabilistically assign the low priority demands to the high-priority queue for early service, which results in the reduction of total delay. Fig. 3 shows the comparison of operation delay for both delay-sensitive and delay-tolerant demands by using centralized algorithm. For delay-tolerant demands, the operation delay caused by the proposed scheme in β = 0 case is higher than that caused by no-priority scheme. This is because the delay-tolerant demands cannot be served unless the delay-sensitive queue is empty. However, the no-priority scheme serves both demands randomly and can result in lower delay of delay-tolerant demands. The operation delay caused by the proposed scheme in both β = 0.2 and β = 0.6 case are close or lower than that in the no-priority scheme. It is expected that more delay-tolerant demands can upgrade to the delay-sensitive queue with higher probability β. For delay-sensitive demands, the opposite trend can be observed. That is, as β increases, more delay-tolerant Fig. 4. Operation delay for delay-sensitive and delay-tolerant demands in terms of different β by using distributed algorithm. demands will be jumped to high-priority queue, which may lead to higher delay of delay-sensitive demands. Fig. 4 compares the operation delay for both delay-sensitive and delay-tolerant demands by using the distributed algorithm. It is observed that the operation delay obtained by the distributed algorithms is close but higher than that obtained by centralized algorithm. That is, the distributed scheme needs to estimate the loss information to obtain the optimal energy schedule which may lead to the suboptimal schedule for each user. In contrast, the centralized scheme is able to achieve optimal management of energy consumption based on the knowledge of all information. Still, the trend of the delay for both demands with different β holds. In Fig. 5, we respectively show the original profiles of the delay-sensitive and delay-tolerant demands for 200 users, the delay-sensitive and delay-tolerant queues while we implement the proposed scheduling method under different value of β. For the same reason explained in Fig. 3, as the value of β increases, the queue length of the delay-sensitive queue increases and the queue length of the delay-tolerant queue decreases. Here, we want to emphasize that the value of β can significantly influence the performance of our scheduling scheme and should be carefully selected. B. Energy Cost Fig. 6 shows the energy cost comparison of no-priority scheme, proposed centralized and distributed algorithms with This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 8 IEEE TRANSACTIONS ON SMART GRID (a) (b) (c) (d) Fig. 5. (a) Original profiles of delay-sensitive and delay-tolerant demands. (b) Queue length of delay-sensitive and delay-tolerant demands when β = 0. (c) Queue length of delay-sensitive and delay-tolerant demands when β = 0.2. (d) Queue length of delay-sensitive and delay-tolerant demands when β = 0.6. Fig. 6. Energy cost for 200 users in terms of different β. different β for 200 users. It is observed that the energy costs achieved by the proposed schemes in both β = 0 and β = 0.2 cases are less than that of the no-priority scheme. It is obvious that there should be cost savings when the proposed energy management is adopted for appliances. It is also noted that the energy cost in β = 0.6 case is higher than that in β = 0 and β = 0.2 cases and even in the no-priority case. That is, the length of the high-priority queue (delay-sensitive queue) becomes longer when low-priority demands (delay-tolerant demands) jump to the high-priority queue with large β. To serve the incremental high-priority queue, more energy should be consumed and higher energy cost will be caused. Moreover, with the similar reason explained in Section V-A, it is expected that the energy costs in the distributed algorithm is close but higher than that in the centralized algorithm. C. Impact of Control Weight In this section, the performance of the proposed energy management scheme that account for users’ preferences are evaluated when β = 0.2. The energy cost and operation delay of the proposed management scheme under centralized manner will be influenced by three weight criteria: 1) {1 = 100, 2 = 1}; 2) {1 = 50, 2 = 1}; and 3) {1 = 1, 2 = 1}. Fig. 7. Energy cost for 200 users in terms of different combinations of . Fig. 8. Delay for 200 users in terms of different combinations of . Fig. 7 shows the energy cost in terms of different weight criteria by using centralized algorithm. Note that, given 2 = 1, the energy cost decreases as 1 increases. This indicates that the dominant of energy cost in problem (7) is decided by 1 . Inspired by this observation, users can increase 1 if they focus on minimizing the energy cost. Moreover, the operation delay in terms of different weight criteria is shown in Fig. 8. It is observed that the operation delay of both delaysensitive and delay-tolerant demands increase as 1 increases. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. LIU et al.: QUEUING-BASED ENERGY CONSUMPTION MANAGEMENT FOR HETEROGENEOUS RESIDENTIAL DEMANDS IN SMART GRID This is because when energy cost is the dominant criterion in the optimization problem (7), higher delay will be caused. VI. C ONCLUSION In this paper, a demands queuing-based energy management scheme is presented for the residential smart grid. By allowing delay-tolerant demands jump to delay-sensitive demands queue, the central scheduler is able to minimize the energy cost and operation delay for users. A centralized algorithm was proposed for the central scheduler to find the optimal consumed energy for both delay-sensitive and delay-tolerant demands. Moreover, a distributed algorithm was proposed to solve the minimization problem. To deal with the communication error between users and central scheduler, the proposed distributed algorithms employed the neural network to estimate the missing messages. Numerical results showed that the proposed energy management scheme is able to balance the tradeoff between the operation delay and energy consumption in the residential smart grid networks. Meanwhile, the effectiveness of the proposed distributed algorithm with loss message has been verified in this paper. A PPENDIX Proof of Theorem 1: Let W̃cl (k) = Wcl (k) − Wcl∗ (k) and j j ∗ (k). From (32) and (36) W̃al (k) = Wal (k) − Wal j+1 j j W̃cl (k) = W̃cl (k) − αc ecl (k) σ (Zc (k)) j+1 j j W̃al (k) = W̃al (k) − αa eal (k) σ (Za (k)). j j Consider the following Lyapunov function candidate: j j jT j jT j L W̃cl , W̃al = tr W̃cl W̃cl + W̃al W̃al . (39) The difference of the Lyapunov function candidate is given by j j ( j+1)T ( j+1) ( j+1)T ( j+1) L W̃cl , W̃al = tr W̃cl + W̃al W̃cl W̃al jT j jT j − tr W̃cl W̃cl + W̃al W̃al j = αc εcl (k) 2 −2 + αc σ (Zc (k)) 2 j + αa εal (k) 2 −2 + αa σ (Za (k)) 2 . (40) According to the definition of σ (·) in (28), σ (Zc (k)) 2 and σ (Za (k)) 2 are both finite for ∀Zc (k), Za (k). Thus, if αc and αa are both small enough that satisfy αc ≤ 2/ σ (Zc (k)) 2 and αa ≤ 2/ σ (Za (k)) 2 , then j j L(W̃cl , W̃al ) < 0. The proof is completed. R EFERENCES [1] W. Tushar et al., “Three-party energy management with distributed energy resources in smart grid,” IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2487–2498, Jun. 2014. 9 [2] H. Liu, H. Ning, Y. Zhang, and M. Guizani, “Battery status-aware authentication scheme for V2G networks in smart grid,” IEEE Trans. Smart Grid, vol. 4, no. 1, pp. 99–110, Mar. 2013. [3] N. U. Hassan, Y. Khalid, C. Yuen, and W. Tushar, “Customer engagement plans for peak load reduction in residential smart grids,” IEEE Trans. Smart Grid, to be published. [4] S. Chen, N. B. Shroff, and P. Sinha, “Heterogeneous delay tolerant task scheduling and energy management in the smart grid with renewable energy,” IEEE J. Sel. Areas Commun., vol. 31, no. 7, pp. 1258–1267, Jul. 2013. [5] Y. Liu et al., “Peak-to-average ratio constrained demand-side management with consumer’s preference in residential smart grid,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 6, pp. 1084–1097, Dec. 2014. [6] Y. Liu et al., “Electricity cost minimization for a microgrid with distributed energy resource under different information availability,” IEEE Trans. Ind. Electron., vol. 62, no. 4, pp. 2571–2583, Apr. 2015. [7] I. Koutsopolous and L. Tassiulas, “Control and optimization meet the smart power grid scheduling of power demands for optimal energy management,” in Proc. ACM 2nd Int. Conf. Energy-Efficient Comput. Netw. (e-Energy), New York, NY, USA, 2011, pp. 41–50. [8] M. Tasdighi, H. Ghasemi, and A. Rahimi-Kian, “Residential microgrid scheduling based on smart meters data and temperature dependent thermal load modelling,” IEEE Trans. Smart Grid, vol. 5, no. 1, pp. 349–357, Jan. 2014. [9] A.-H. Mohsenian-Rad and A. Leon-Garcia, “Optimal residential load control with price prediction in real-time electricity pricing environments,” IEEE Trans. Smart Grid, vol. 1, no. 2, pp. 120–133, Sep. 2010. [10] M. Neely, A. Tehrani, and A. Dimakis, “Efficient algorithms for renewable energy allocation to delay tolerant consumers,” in Proc. 1st IEEE Int. Conf. Smart Grid Commun. (SmartGridComm), Washington, DC, USA, Oct. 2010, pp. 549–554. [11] Z. Zhu, J. Tang, S. Lambotharan, W. H. Chin, and Z. Fan, “An integer linear programming and game theory based optimization for demand-side management in smart grid,” in Proc. IEEE Globecom Workshops (GC Wkshps), Houston, TX, USA, 2011, pp. 1205–1210. [12] Y. Guo, M. Pan, Y. Fang, and P. P. Khargonekar, “Decentralized coordination of energy utilization for residential households in the smart grid,” IEEE Trans. Smart Grid, vol. 4, no. 3, pp. 1341–1350, Sep. 2013. [13] N. Gatsis and G. B. Giannakis, “Residential load control: Distributed scheduling and convergence with lost AMI messages,” IEEE Trans. Smart Grid, vol. 3, no. 2, pp. 770–786, Sep. 2012. [14] Z. Zhao, W. C. Lee, Y. Shin, and K. B. Song, “An optimal power scheduling method for demand response in home energy management system,” IEEE Trans. Smart Grid, vol. 4, no. 3, pp. 1391–1400, Sep. 2013. [15] T. Maertens, J. Walraevens, and H. Bruneel, “Controlling delay differentiation with priority jumps: Analytical study,” Numer. Algebra Control Optim., vol. 1, no. 4, pp. 657–673, Dec. 2011. [16] S. Barmada, A. Musolino, M. Raugi, R. Rizzo, and M. Tucci, “A wavelet based method for the analysis of impulsive noise due to switch commutations in power line communication (PLC) systems,” IEEE Trans. Smart Grid, vol. 2, no. 1, pp. 92–101, Mar. 2011. [17] R. Deng et al., “Sensing-performance tradeoff in cognitive radio enabled smart grid,” IEEE Trans. Smart Grid, vol. 4, no. 1, pp. 302–310, Mar. 2013. [18] Y. Zhang et al., “Cognitive machine-to-machine communications: Visions and potentials for the smart grid,” IEEE Netw., vol. 26, no. 3, pp. 6–13, May/Jun. 2012. [19] D. Liu and Q. Wei, “Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems,” IEEE Trans. Neural Learn. Syst., vol. 25, no. 3, pp. 621–634, Mar. 2014. [20] F.-Y. Wang, N. Jin, D. Liu, and Q. Wei, “Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound,” IEEE Trans. Neural Netw., vol. 22, no. 1, pp. 24–36, Jan. 2011. [21] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, U.K.: Cambridge Univ. Press, 2004. [22] K. Schittkowski, “NLQPL: A FORTRAN-subroutine solving constrained nonlinear programming problems,” Ann. Oper. Res., vol. 5, pp. 485–500, 1985. [23] D. P. Bertsekas, Nonlinear Programming, 2nd ed. Belmont, MA, USA: Athena Scientific, 2008. [24] N. Hassan, M. A. Pasha, C. Yuen, S. Huang, and X. Wang, “Impact of scheduling flexibility on demand profile flatness and user inconvenience in residential smart grid,” Energies, vol. 6, no. 12, pp. 6608–6635, Dec. 2013. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. 10 IEEE TRANSACTIONS ON SMART GRID Yi Liu received the Ph.D. degree in signal and information processing from the South China University of Technology, Guangzhou, China, in 2011. He joined the Singapore University of Technology and Design, Singapore, as a Postdoctorate. In 2014, he joined the Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou, where he is an Assistant Professor with the School of Automation. His current research interests include cognitive radio networks, cooperative communications, smart grid, and intelligent signal processing. Chau Yuen (SM’13) received the B.Eng. degree and the Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore, in 2000 and 2004, respectively. He was a Postdoctoral Fellow with Lucent Technologies Bell Laboratories, Murray Hill, NY, USA, in 2005, and a Visiting Assistant Professor with Hong Kong Polytechnic University, Hong Kong, in 2008. From 2006 to 2010, he was with the Institute for Infocomm Research, Singapore, as a Senior Research Engineer. He joined the Singapore University of Technology and Design, Singapore, as an Assistant Professor in 2010. His current research interests include green communications, massive multiple input multiple output, Internet-of-things, machine-to-machine, network coding, and distributed storage. He has published over 150 research papers in international journals or conferences. Dr. Yuen serves as an Associate Editor for the IEEE T RANSACTIONS ON V EHICULAR T ECHNOLOGY. Rong Yu (S’05–M’08) received the Ph.D. degree in information and communication engineering from Tsinghua University, Beijing, China, in 2007. He was with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China. In 2010, he joined the Institute of Intelligent Information Processing, Guangdong University of Technology, where he is currently a Full Professor. His current research interest include wireless communications and networking, such as cognitive radio, wireless sensor networks, and home networking. He is the co-inventor of ten patents, and has authored/co-authored over 70 international journal and conference papers. Yan Zhang (SM’10) received the Ph.D. degree in electrical and electronic engineering from Nanyang Technological University, Singapore. He is with Simula Research Laboratory, Oslo, Norway, and an Adjunct Associate Professor with the University of Oslo, Oslo. His current research interests include resource, mobility, spectrum, energy, and data management in wireless communications and networking. Dr. Zhang serves as an Organizing Committee Chair for many international conferences. He is an Associate Editor/Guest Editor for a number ofinternational journals. Shengli Xie (M’01–SM’02) received the M.S. degree in mathematics from Central China Normal University, Wuhan, China, in 1992, and the Ph.D. degree in automatic control from the South China University of Technology, Guangzhou, China, in 1997. He was the Vice Dean with the School of Electronics and Information Engineering, South China University of Technology, from 2006 to 2010. He is currently the Director with the Institute of Intelligent Information Processing, Beijing, China, and the Guangdong Key Laboratory of Information Technology, Guangzhou, for the Internet-of-things, and a Professor with the School of Automation, Guangdong University of Technology, Guangzhou. His current research interests include statistical signal processing and wireless communications, with an emphasis on blind signal processing and Internet-of-things. He has authored/co-authored four monographs and over 100 scientific papers published in journals and conference proceedings, and holds 30 patents.