PV Energy Sharing: Game-Based Pricing with AI

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 Data-driven Game-based Pricing for Sharing Rooftop Photovoltaic Generation and Energy Storage in the Residential Building Cluster under Uncertainties Xu Xu, Member, IEEE, Yan Xu, Senior Member, IEEE, Ming-Hao Wang, Member, IEEE, Jiayong Li, Member, IEEE, Zhao Xu, Senior Member, IEEE, Songjian Chai, Yufei He, Student Member, IEEE Abstract—In this paper, a novel machine learning based data-driven pricing method is proposed for sharing rooftop photovoltaic (PV) generation and energy storage (ES) in an electrically interconnected residential building cluster (RBC). In the studied problem, the energy sharing process is modeled by the leader-followers Stackelberg game where the owner of the rooftop PV system is responsible for pricing self-generated PV energy and operating ES devices. Meanwhile, local electricity consumers in the RBC choose their energy consumption with the given internal electricity prices. To track the stochastic rooftop PV panel outputs, the long short-term memory (LSTM) network based rolling-horizon prediction function is developed to dynamically predict future trends of PV generation. With system information, the predicted information is fed into a Q-learning based decision-making process to find near-optimal pricing strategies. The simulation results verify the effectiveness of the proposed approach in solving energy sharing problems with partial or uncertain information. Index Terms—Pricing method, photovoltaic generation, energy storage, residential building cluster, energy sharing, Stackelberg game, long short-term memory network, Q-learning algorithm I. INTRODUCTION I N recent years, rooftop photovoltaic (PV) systems have been widely deployed in residential buildings [1], which can provide clean energy supply during the daytime. However, for a residential building cluster (RBC) comprising of electrically This work is partially supported by the National Natural Science Foundation of China (Grant No. 71971183). The work of J. Li is supported by the National Natural Science Foundation of China (Grant No. 51907056). Y. Xu’s work is supported by Nanyang Assistant Professorship from Nanyang Technological University, Singapore. (Corresponding authors: Zhao Xu and Jiayong Li). X. Xu is with the Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Special Administrative Region, China. (email: benxx.xu@connect.polyu.hk). Y. Xu is with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore. (email: xuyan@ntu.edu.sg). M.-H. Wang is with the Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Special Administrative Region, China. (e-mail: minghao.wang@polyu.edu.hk). Z. Xu is with both Shenzhen Research Institute and Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Special Administrative Region, China. (email: eezhaoxu@polyu.edu.hk). J. Li is with the College of Electrical and Information Engineering, Hunan University, Changsha, China. (email: j-y.li@connect.polyu.hk). S. J. Chai and Y. F. He are both with the Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong Special Administrative Region, China. (e-mails: chaisongjian@gmail.com; daniel.v.he@connect.polyu.hk). interconnected buildings (see Fig. 1), the PV energy sharing management is a critical concern. The concept of energy sharing has been widely used in power systems, which has been well studied in existing papers, such as Refs [2-5]. Besides, many research efforts have been made to study the energy sharing management among end-users in the literature. Conventional energy sharing methods are based on optimization algorithms. In Ref. [6], based on Lyapunov optimization, an online energy sharing framework is presented to enhance the self-sufficiency and PV consumption for nano-grid clusters. Ref. [7] proposes a two-stage robust energy sharing approach for a prosumer microgrid with renewable energy integration, storage units and load shifting. Ref. [8] employs the heuristic algorithm to establish a day-ahead energy management method integrated with home appliance scheduling and energy sharing among smart houses. In Ref. [9], a peer-to-peer energy sharing strategy with the distributed transaction is developed for an energy building cluster including different types of energy buildings. Ref. [10] develops a game theory based energy sharing management method for the microgrid as well as a billing mechanism according to PV energy and load consumption. In Ref. [11], an online optimization based algorithm is proposed for cost-aware energy sharing among electricity consumers in a cooperative community. Ref. [12] presents a novel hybrid energy sharing management framework to facilitate heat and PV energy sharing among smart buildings. However, there are several deficiencies in these existing works: (i) Uncertain renewable generation is not well considered during the energy sharing process in these existing works; (ii) Multiple electricity consumers live in the RBC with different living behaviors, which may bring a difficulty to achieve an agreement on PV energy allocation; (iii) Conflicts of interest between the rooftop PV system owner and local electricity consumers need to be addressed properly. The existing optimization methods can be classified as model-based methods that rely on an accurate mathematical formulation to describe the energy sharing process. However, the energy sharing problem is usually involved with unknown or uncertain information in practice, so iterative solution algorithms are generally adopted. This may pose two potential challenges: (i) To ensure the convergence of some iterative algorithms, certain assumptions and simplifications are required; (ii) The iterative algorithm may be impractical to be used in the real world due to possible non-convergence issues. By comparison, as a model-free, adaptive and concise machine 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < learning technique [13], reinforcement learning exhibits excellent performance on the decision-making process. Reinforcement learning algorithms have been widely employed to model power system operation problems, such as multi-microgrid energy management [14], voltage control [15], electrical vehicle charging [16], dynamic economic dispatch [17], etc. However, applying reinforcement learning in energy sharing management is still in the early stage. In this regard, this paper proposes a fully data-driven method based on the deep neural network and the reinforcement learning algorithm for making game-theoretic dynamic pricing strategies to optimally share the rooftop PV energy with electricity consumers in a RBC. The main contributions of this paper can be summarized as follows, 1) The proposed dynamic data-driven game-based pricing decision-making process is described as the Markov Decision Process (MDP), which can be well addressed by the Q-learning algorithm. Compared with conventional optimization methods, our proposed method can be flexibly and easily applied by off-line training and on-line implementation with no requirement for initial knowledge. Besides, the computation efficiency can be substantially improved. 2) The long short-term memory (LSTM) network is duly integrated into the proposed pricing framework to capture the future trends of rooftop PV generation with time-window rolling. This predicted information is fed into the reinforcement learning based decision-making process to help the Q-learning agent to find the near-optimal pricing strategies. 3) To express the preferences of local consumers on environmental awareness, the concept of willingness-to-pay (WTP) is introduced in this paper. In this regard, the original complex game-based energy pricing optimization model is innovatively transformed into an efficient discriminatory auction, where the near-optimal pricing strategies can be quickly determined by the rooftop PV system owner by using the proposed pricing method. The rest of this paper is organized as follows. In Section II, we model the energy sharing in the RBC. Section III describes the decision-making process of pricing strategy, including the LSTM network, MDP formulation and Q-learning process. Numerical results are given in Section V. Finally, we conclude this paper in Section VI. II. PROBLEM MODELING Fig. 1 depicts the structure of the energy sharing in a RBC. As shown in this figure, the rooftop PV system is comprised of two kinds of devices, i.e., PV panels and energy storage (ES) devices. The rooftop PV system owner is the energy sharing executor who is responsible for the interoperability among various components in Fig. 1. The rooftop PV system owner is in charge of providing self-generated PV energy to all electricity consumers in the RBC and operating local energy storage devices. Moreover, this owner has the responsibility of guaranteeing the maximum utilization of local PV generation within the RBC. Besides, it is assumed that the smart meters are installed in the RBC to gather the system data and receive instructions or information from the rooftop PV system owner. 2 Fig. 1. Structure of the energy sharing in a RBC. A. Profit Model of Rooftop PV System Owner In this paper, we assume that the rooftop PV system owner is an external company, which can be defined as a financial objective function only, i.e. maximization of revenues from sharing self-generated PV energy and operating local energy storage devices. It is assumed that the ES is charged by rooftop PV generation only to properly track the dispatch of local energy. Note that the investment and operation costs of the rooftop PV system are omitted during the energy sharing process. Usually, the maximum power point tracking (MPPT) control [18] is applied to PV panel operation to maximize the PV generation since the PV power output is time-varying with solar intensity and environment temperature [19]. The actual 𝑃𝑉 values of the PV panel output are [𝑃̅ℎ𝑃𝑉 , 𝑃̅ℎ+1 , … , 𝑃̅𝐻𝑃𝑉 ], where 𝐻 ≔ {ℎ, ℎ + 1, … , 𝐻} denotes the time slot set. At each hour, the rooftop PV system owner acts as the leader which sets the uniform price for local PV generation, so the hourly profit 𝑅𝑒𝑣ℎ𝑂 of the owner can be defined as follows, RevhO =  U h ( PihPVuser + PihESuser ) +  FiT ( Ph PVgrid + Ph ES grid iN C ) (1) − hTOU [ ( PihPVuser + PihESuser ) − PhPV ]+ i In Eq. (1), the first term represents the profit of selling PV 𝑃𝑉 𝐸𝑆 energy 𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 and electricity in ES 𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 to the local electricity consumers with a uniform price 𝜆𝑈ℎ and the second 𝑃𝑉𝑔𝑟𝑖𝑑 term denotes the profit of selling PV energy 𝑃ℎ electricity 𝐸𝑆𝑔𝑟𝑖𝑑 𝑃𝑖ℎ in ES to the utility grid with feed-in traffic rate 𝜆𝐹𝑖𝑇 . 𝑁 𝐶 is the set of electricity consumers in the RBC. The third term in (1) describes the compensation cost regarding the mismatch between the energy sold to the electricity consumers and the actual PV generation. Note that this mismatch cost is caused by prediction errors since the rooftop PV system owner makes the pricing strategy based on the predicted information, which cannot be accurate. [∙]+ represents the projection 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < operator onto 𝑚𝑎𝑥⁡(𝑥, 0). the non-negative orthant, i.e., [𝑥]+ = B. Utility Cost of Electricity Consumers The electricity consumers in the RBC are followers who decide to purchase the electricity from the rooftop PV system owner or the utility grid according to the given price signals. 𝐶 The utility cost 𝑈𝑖ℎ of electricity consumer 𝑖 ∈ 𝑁 𝐶 can be given as follows, UihC = hU ( PihPVuser + PihESuser ) + hTOU PihG + wE  E PihG (2) where the first term and second term denote the electricity cost of purchasing electricity from the rooftop PV system owner 𝑃𝑉 𝐸𝑆 𝐺 𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 ,⁡𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 , and the utility grid 𝑃𝑖ℎ , respectively. The third term describes the greenhouse gas emission cost with the coefficient 𝜆𝐸 . Specifically, the weight factor 𝑤𝑖𝐸 ∈ [0,1] is introduced to reflect the environmental awareness of electricity consumer 𝑖. In practice, 𝑤𝑖𝐸 can be adjusted depending on the preferences of electricity consumers on a case by case basis. 𝐷 Note that the demand 𝑃𝑖ℎ of electricity consumer 𝑖 can be 𝑃𝑉𝑢𝑠𝑒𝑟 𝐸𝑆 𝐺 𝐷 satisfied by 𝑃𝑖ℎ , ⁡𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 and 𝑃𝑖ℎ , i.e., 𝑃𝑖ℎ = 𝑃𝑉𝑢𝑠𝑒𝑟 𝐸𝑆𝑢𝑠𝑒𝑟 𝐺 𝑊𝑇𝑃 𝑃𝑖ℎ + 𝑃𝑖ℎ + 𝑃𝑖ℎ , so the WTP 𝜆𝑖ℎ of electricity 𝐷 consumer 𝑖 for the local energy can be derived by using 𝑃𝑖ℎ − 𝑃𝑉𝑢𝑠𝑒𝑟 𝐸𝑆𝑢𝑠𝑒𝑟 𝐺 𝑃𝑖ℎ − ⁡ 𝑃𝑖ℎ to substitute for 𝑃𝑖ℎ in (2), given as follows, ihWTP = hTOU + wiE  E (Owner Building Users)   U  ES grid ESin } {h },{Ph },{Ph  G =  PV  ESuser G user { P },{ P },{ P }  ih  h ih {Rev O },{U C }  h ih   where (𝑂𝑤𝑛𝑒𝑟 ∪ 𝑁 𝐶 ) denotes the player sets, the rooftop PV system owner acts as the game leader and the building consumers take the roles of game followers in response to the 𝐸𝑆𝑔𝑟𝑖𝑑 𝐸𝑆 𝐸𝑆 𝑢𝑠𝑒𝑟 strategy of the leader; {𝝀𝑈 }, {𝑷𝑖ℎ 𝑖𝑛 }, and {𝑷𝑖ℎ ℎ }, {𝑷𝑖ℎ } are the strategy sets of the game leader; {𝑷𝑂𝑖ℎ } and {𝑷𝐺𝑖ℎ } are strategy sets of game followers; {𝑹𝒆𝒗𝑂ℎ } and {𝑼𝐶𝑖ℎ } are the the profit (1) of the leader and the utility cost (2) of the followers, respectively. Thus, the bi-level energy sharing model is formulated as,   hU ( PihPVuser + PihESuser )  iN C   FiT PVgrid  ES grid O Rev =  + ( Ph Max + Ph )  ES ES {hU , Ph grid , Ph in , hH  PVuser ESuser TOU PV +  PV ES Pih user , Pih user , PihG } P P P −  [ ( + ) − ] i ih ih h  h    (4) s.t. PhPV =  PihPVuser + PhPVgrid + PhESin (5) iN C soc PhESsoc = PhES +  ESin PhESin −  ESout (  PihESuser + Ph −1 ESgrid ) iN C (6) ES soc h = h1 P 0 =P ESinit (7) P ESuser ih +P ES grid h P ES (8) iN C 0  PhESin  P ES P (3) C. Stackelberg Game based PV Energy Sharing In this subsection, one-leader and N-follower Stackelberg game theory [20] is employed to formulate the PV energy sharing model. The basic idea of this game is that the leader chooses the first action and then the followers observe the action taken by the leader and make their own decisions accordingly. Specifically, as the leader in this game, the owner of the rooftop PV system (including rooftop PV panels and ES devices) set the internal price 𝜆𝑈ℎ for the local PV energy to sell them to the local building users. Besides, the local PV energy can also be sold to the utility grid with the FiT rate 𝜆𝐹𝑖𝑇 . The goal of leader is to maximize the daily revenue by pricing and selling local PV energy. Meanwhile, the building users act as the followers in this game, so they choose to buy the local PV energy with the internal price 𝜆𝑈ℎ or/and the electricity from the utility grid with the TOU price 𝜆𝑇𝑂𝑈 ℎ . The goal of followers is to minimize the daily electricity bills by choosing energy consumption, i.e., from the local provider or/and utility grid. In this regard, the Stackelberg game 𝐺 for this problem can be described as follows, 3 PVgrid h P ESsoc h , Pih ES grid h ,P MaESx PVuser { Pih P ESsoc user , PihG } PVuser ih P PVuser ih ESuser ih ,P P R (10) + (11)  − UiC = −  hU ( P s.t. P = P D ih (9) ESsoc hH +P ESuser ih PVuser ih +P ESuser ih +P :  G ih D ih , P  R + : ihPVuser , ihESuser , ihG G ih ) + ihWTP PihG  (12) (13) (14) where the upper-level model (4)-(11) is to maximize the profit of the PV system owner in the RBC. The objective function (4) is to maximize the daily revenue of the PV system owner. (5) denotes the dispatch of the PV energy, which can be sold to 𝑃𝑉 𝑃𝑉𝑔 local electricity consumers 𝑃𝑖ℎ 𝑐 , fed into the utility grid 𝑃ℎ , 𝐸𝑆 𝑃ℎ 𝑖𝑛 . or stored in ES (6)-(10) gives the operating limits of ES devices. (11) ensures that the upper-level variables are non-negative. The lower-level model (12)-(14) is to minimize the electricity cost of local electricity consumers. (13) balances the supply and demand of electricity consumers with dual 𝐷 variable 𝜇𝑖ℎ . (14) imposes the non-negative variables in the 𝑃𝑉 𝐸𝑆 𝐺 lower-level model with dual variables 𝜇𝑖ℎ 𝑢𝑠𝑒𝑟 , 𝜇𝑖ℎ 𝑢𝑠𝑒𝑟 , 𝜇𝑖ℎ . The difficulty of solving the proposed bi-level energy sharing problem (4)-(14) is that it is a nonlinear and nonconvex problem involved with nonlinear terms. Conventionally, Karush–Kuhn–Tucker (KKT) conditions can be employed to transform the original model should be transformed into a Mathematical Program with Equilibrium Constraints (MPEC) model (see Appendix A, Appendix B and Appendix C), which 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < can be directly solved by some commercial solvers, e.g., CPLEX [21] and Gurobi [22]. However, a large number of mixed-integer variables involve in KKT conditions, resulting in a huge computation burden. Besides, conventional optimization methods are not feasible in practice since these methods are based on the assumption of perfect prediction of PV panel output. Besides, the optimization based pricing strategy is also not reasonable enough since the rooftop PV system owner only focuses on the current profit and overlooks the future reward. In this regard, in the following section, we will propose a novel pricing method based on a dynamic uncertainty prediction model as well as a model-free reinforcement learning method, which can be easily employed to find the near-optimal pricing strategies. III. PROPOSED DATA-DRIVEN PRICING STRATEGY A. Mapping Energy Sharing Model to Discrimination Auction As studied in Ref. [10], the Stackelberg Equilibrium (SE) in a Stackelberg game is reached as long as all participants obtain the optimal solutions. Thus, our proposed bi-level energy sharing framework can reach the SE once the PV system owner (leader) finds the optimal pricing strategy for selling the self-generated PV energy and meanwhile all local consumers (followers) determine their electricity consumption, i.e., from the rooftop PV system and the utility grid. It is assumed that the load information of electricity consumers in the RBC can be utilized by the PV system owner since advanced non-intrusive load monitoring devices [23] can be installed in the residential buildings for long-term observation. To maximize the profits of the PV system owner, the self-generated PV energy will be dispatched in the descending order of the WTP values. Therefore, the optimal uniform price of PV energy equals the WTP offered by the consumers. In other words, the uniform price (WTP) that brings about the highest revenue to the PV system owner will be returned. Therefore, the original complex bi-level PV energy sharing problem is formulated as an efficient discriminatory auction for local energy (i.e., rooftop PV generation and electricity stored in ES). According to the Eq. (3), both electricity price 𝜆𝑇𝑂𝑈 and the ℎ coefficient of greenhouse gas emission cost 𝜆𝐸 are known, thus the value of WTP is mainly determined by the weight factor 𝑤𝑖𝐸 . In this regard, the original complex bi-level PV energy sharing problem is formulated as an efficient discriminatory auction for PV energy, where the weight factor 𝑤𝑖𝐸 needs to be selected for making the pricing strategy. B. LSTM Network for Dynamic PV Generation Prediction Considering the uncertain PV generation, a prediction function should be added in the rooftop PV system to facilitate the decision-making process of pricing. In this subsection, the LSTM-based sequence to sequence model is formulated to predict future rooftop PV panel output. This model includes three parts, encoder, encoder vector and decoder, aiming to map a fixed-length input with a fixed-length output where the length of the input and output may differ. The LSTM network is a variant of the standard recurrent neural network (RNN) [24]. 4 By substituting LSTM units for the basic hidden neurons in RRN, LSTM network can deal with the issues caused by gradient vanishing and explosion of long-term dependencies [25]. As shown in Fig. 3, the LSTM unit includes three kinds of gate controllers, i.e., input gate, forget gate and output gate, which are mainly used to determine what information should be remembered. These three gates can be calculated by the following equations, it =  (Wix xt + Wih ht −1 + bi ) ft =  (Wfx xt + Wfh ht −1 + bf ) ot =  (Wox xt + Woh ht −1 + bo ) (15) (16) (17) where 𝜎 represents the sigmoid function, whose output is in the range of [0,1], describing how much information should be let through. 𝑊𝑖𝑥 , 𝑊𝑖ℎ , 𝑊𝑓𝑥 , 𝑊𝑓ℎ , 𝑊𝑓𝑥 and 𝑊𝑓ℎ denote matrices of weights for the input gate, forget gate and output gate. 𝑏𝑖 , 𝑏𝑓 and 𝑏𝑜 represent the vectors of biases for these gates. It should be noted that temporal memory is implemented in the LSTM network by switching different gates to prevent the gradient vanishing. Therefore, the external inputs of the LSTM unit are the previous cell state 𝑐𝑡−1 , the previous hidden state ℎ𝑡−1 and the current input vector 𝑥𝑡 . Then, an intermediate state 𝐶𝑡 is generated, given as, Ct = tanh(Wcx xt + Wch ht −1 + bc ) (18) Accordingly, the memory cell and hidden state of this LSTM are updated as, Ct = ft  Ct + it  Ct ht = Ot  tanh(Ct ) (19) (20) where tanh is the nonlinear activation function and the operator ⨂ denotes the pointwise multiplication operation for two vectors. In this work, historical data of PV generations are collected and put into the proposed encoder-decoder sequence to sequence model, where the LSTM network is used as the training algorithm. As the output of this prediction model, 𝑦𝑡 , 𝑦𝑡+1 , … , 𝑦𝑡+12 denotes the predicted future 12-hour PV generations. This predicted information will be fed into the Q-learning process to make pricing strategies in a rolling-window manner. C. MDP Formulation As described in Section III-A, the original complex bi-level PV energy sharing problem is formulated as an efficient discriminatory auction for rooftop PV energy, where the weight factor 𝑤𝑖𝐸 needs to be selected for making pricing strategies. This pricing problem can be formulated as a finite MDP [26], where the outcomes are partly controlled by the decision-maker (rooftop PV system owner) and partly random. Under the Q-learning framework [27], the MDP is formulated as follows, 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 Fig. 2. Schematic of our proposed LSTM network and Q-learning based data-driven pricing method. 1) State Set 𝑺𝒉 : The state 𝑠ℎ ∈ 𝑺 at hour ℎ includes three kinds of information, i.e., current TOU electricity price 𝜆𝑇𝑂𝑈 ℎ , 𝐸𝑆𝑠𝑜𝑐 𝐹𝑖𝑇 feed-in tariff rate 𝜆 , current of ES 𝑃ℎ and predicted 𝑃𝑉 future trends of rooftop PV panel output [𝑃ℎ𝑃𝑉 , 𝑃ℎ+1 , … , 𝑃𝐻𝑃𝑉 ]. 2) Action Set 𝑨: As described in Section III-A, the action 𝑎ℎ ∈ 𝑨 for the current state 𝑠ℎ represents the weight factor 𝑤𝑖𝐸 . 3) Reward 𝑟ℎ : In this paper, the reward 𝑟ℎ is the cumulative profit of rooftop PV system owner by participating in the energy sharing from ℎ to 𝐻, as described by Eq. (4), 4) Action-value Function 𝑄𝜋 (𝑠, 𝑎): The cumulative reward is used as the action-value function to evaluate the quality of action-state pairs, described as follows, K  Q ( s, a) =    k  rh +1 | sh = s, ah = a   k =0  (21) where 𝑘 ≔ {0,1, … , 𝐾} denotes the time step and 𝜋 represents the policy which maps from a state to an action. Note that 𝛾 ∈ [0,1] is the discount rate indicating the relative importance of future rewards for the current reward. The primary goal of our proposed pricing problem is to maximize the action-value function by finding the optimal policy 𝜋 ∗ , i.e., a sequence of optimal actions (weight factors 𝑤𝑖𝐸 ), given as follows, Q* ( s, a ) = max Q ( s, a )  (22) The Q-learning algorithm is employed to iteratively update action-value function value via the Bellman equation [28]. Q* (sh , ah ) = r(sh , ah ) +   max Q(sh+1 , ah+1 ) (23) Besides, the Q-value can be updated by the following equation, Q(sh , ah )  (1 −  )Q(sh , ah ) +  Q* (sh , ah ) (24) where 𝜃 ∈ [0,1] denotes the learning rate indicating to what extend the new Q-value can overturn the old one. TABLE I STATE SET, ACTION SET AND REWARD FUNCTION FOR EACH HOUR State set 𝑺𝒉 Action set⁡𝑨 Reward function 𝑟ℎ 𝐸𝑆𝑠𝑜𝑐 𝑃𝑉 {𝜆𝐹𝑖𝑇 , 𝜆𝑇𝑂𝑈 , [𝑃ℎ𝑃𝑉 , 𝑃ℎ+1 , … , 𝑃𝐻𝑃𝑉 ], 𝑃ℎ ℎ } {𝑤1𝐸 , 𝑤2𝐸 , … , 𝑤𝑁𝐸𝑐 } Eq. (4) D. Q-learning Algorithm based Solution Method Algorithm 1 Proposed Dynamic Pricing Method 1. Repeat for each hour ℎ PV panel output prediction 2. Collect the data of PV panel output 3. Feed the collected data into trained LSTM network to predict future trends of PV panel out Q-learning algorithm based decision-making process 4. Input action set 𝑨𝒉 5. Initialize state set 𝑺𝒉 6. Initialize Q-value 𝑄(𝑠ℎ , 𝑎ℎ ) arbitrarily 7. Repeat for each episode 8. Repeat for each state 𝑠ℎ 9. Update state set 𝐸𝑆 𝑃𝑉 𝑺𝒉 ← {𝜆𝐹𝑖𝑇 , 𝜆𝑇𝑂𝑈 , [𝑃ℎ𝑃𝑉 , 𝑃ℎ+1 , … , 𝑃𝐻𝑃𝑉 ], 𝑃ℎ 𝑠𝑜𝑐 } ℎ 10. Choose an action 𝑎ℎ from the current action set 𝑨𝒉 11. Calculate the current reward 𝑟ℎ (𝑠ℎ , 𝑎ℎ ) 12. Update the Q value 𝑄(𝑠ℎ , 𝑎ℎ ) 13. Until 𝑠ℎ+1 = 𝑠𝐻 14. Until maximum episode 15. Output the optimal policy⁡𝜋 ∗, ∗ ∗} {𝑎ℎ∗ , 𝑎ℎ+1 , … , 𝑎𝐻 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑄 16. Execute the optimal action 𝑎ℎ∗ for the current hour ℎ 17. Until ℎ = 𝐻 Algorithm 1 describes the implementation process of the proposed Q-learning algorithm-based solution method for solving our formulated MDP based pricing problem. As shown in Algorithm 1, in each hour, the proposed LSTM network-based PV generation prediction function runs to output future PV generation. Then, these predicted values are fed into the Q-learning process for making the optimal pricing strategy. Specifically, in each episode, an action is selected for the current state in terms of the 𝜀-greedy policy (𝜀⁡ϵ⁡[0,1]) [29], where the agent in Q-learning algorithm can either execute a random action form the set of available actions with probability 𝜀 or select an action whose current Q-value is maximum, with probability 1 − 𝜀. After choosing an action, the current reward 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < can be calculated via Eq. (18) and then the Q-value can be updated via Eq. (22). At the end of each episode, the termination criterion is checked. If this termination criterion is not satisfied, the agent will move to the next episode and repeat the above process. Finally, each agent will gain optimal actions for each coming hour. Note that only the optimal action for the current hour is taken since the optimal pricing strategy will be updated for each hour. The above procedure will be repeated until the end hour, i.e., ℎ = 𝐻. Moreover, Fig. 3 is plotted to depict the flowchart of the proposed Q-learning algorithm based decision-making process. 6 Fig. 4). The daily individual load data published from the National Renewable Energy Laboratory (NREL) [30] is used in the case study. It should be noted that, for simplification, we randomly select 360 fractions in the range [0,1] to represent the weight factors for describing the environmental awareness of all electricity consumers in the RBC. The numeric value zero means weak environmental awareness while the numeric value one means strong environmental awareness. However, in real-world scenarios, these weight factors can be obtained in some ways, such as a questionnaire survey or non-instructive long-term observation of individual load consumption. For the Q-learning based decision-making process, the discount rate 𝛾 is set to 0.9 so the obtained pricing strategy is foresighted to avoid future risks. All simulations are implemented on the platform MATLAB with an Intel Core i7 of 2.4 GHz and 8GB memory. B. Performance of LSTM Network based Prediction Function TABLE II SUMMARY OF TRAINING SETTINGS OF LSTM NETWORK Network Encoder Decoder Fig. 3. The proposed Q-learning algorithm based decision-making process. IV. NUMERICAL RESULTS Others Hyperparameter Encoder length Layers Hidden states Kernel_regularizer Activation function Decoder length Layers Hidden states Activation function Kernel_regularizer MLP layers MLP activation function Epochs Batch size Loss function Optimizer Value/Function 36 1 200 0.001 Relu [32] 12 1 200 Relu 0.001 1 Tanh [33] 100 64 Mean squared error [34] Nadam [35] A. Test Case Setup Fig. 4. TOU prices in the summer of 2019. (Source: Alectra Utilities) In this paper, we consider a test case in which six apartment buildings. Each apartment building has 60 households. It is also assumed that each apartment has a 100 m 2 roof area and the installation size of the rooftop PV panel is 16.6 kWp. The capacity of ES devices within the rooftop PV system is 10KVA. TOU price data can be collected from the Alectra Utilities (see The PV dataset for network training is collected from the Global Energy Forecasting Competition 2014 [31], which can be publicly accessed online. The dataset covers 12 numerical weather prediction (NWP) variables and the hourly PV power output measured from 1st Apr 2012 to 1st Jul 2014 at three neighbored PV plants in Australia. In this case, we only use the PV power output observed in site 1 for model construction, integration of NWP information and the neighbored measurements is beyond the scope of this work, since historic samples are enough to establish the forecasting model on a rolling basis. Before learning, the measurements in the night (7:00 pm – 7:00 am) are removed, the data from 1st Apr 2012 to 1st Apr 2014 is used for model training, and the rest is for prediction. The settings of the adopted encoder-decoder LSTM network are listed in Table II. The predictive skill of the well-trained LSTM network for different look-ahead horizons is shown in Fig. 5. Based on this predicted information, a sequence of optimal actions can be selected by the Q-learning agent with the consideration of the 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < trade-off between current reward and future reward by setting the discount factor. However, only the action for the current hour is executed. Therefore, a relatively large perdition error will have a minor effect on our results. It should be noted that taking into account the NWP information and making a dynamic intraday adjustment on the day-ahead forecast can further improve the forecasting accuracy, which would benefit the reinforcement learning based decision-making process on finding the near-optimal solutions, this will be investigated in our future work. 7 C. Performance of Q-learning Decision-making Process Fig. 6. Q-learning process during 5*104 episodes (a) Fig. 7. Optimal internal uniform price for each hour (b) Fig. 8. The number of building users involved in the energy sharing process in each hour. (c) Fig. 9. Dispatch of ES in each hour. (d) Fig. 5. Prediction performance with different time steps. The Q-learning process during 50,000 episodes is shown in Fig. 6. We can observe from this figure that during the first 10,000 episodes, the Q value increase rapidly the Q-learning agent can learn trials and errors after each episode at the initial learning stage. Then, the increment of Q value becomes small and finally Q value is stable after enough training. Therefore, a near-optimal pricing strategy can be obtained after approximately 30,000 episodes. 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < The optimal action can be selected by using the Q-learning algorithm, as seen in Fig. 7, which is plotted to depict the near-optimal internal uniform prices as well as TOU prices during the daytime. As seen from this figure, during the off-peak time slots, e.g. 7:00-10:00 and 19:00-20:00, the obtained internal uniform prices are higher than TOU prices. The reason is that the rooftop PV generation is low in these time slots so the PV system owner takes a premium pricing strategy to maximize the profit. On the contrary, during the on-peak time slots, i.e. 11:00-18:00, the rooftop PV generation is high due to strong solar irradiance. Therefore, the PV system owner would like to charge lower prices so the self-generated PV electricity can be sold to more building users, leading to a near-maximum revenue for rooftop PV system owner. To clearly show the number of building users who succeed in the bidding of local PV generation in each hour, Fig. 8 is given in this subsection. It can be observed from this figure that few building users can be supplied by PV energy during the off-peak hours while more building users can utilize local PV electricity during the on-peak hours. Fig. 9 illustrates the dispatch of ES in each hour. As shown in this figure, the PV system owner tends to sell the self-generated PV energy to building users or the ES devices, rather than the utility grid, aiming to maximize its economic benefits, so the PV energy is stored in ES in the daytime and dispatched to the local consumers at night. Hence, our proposed energy sharing model as well as pricing strategy can facilitate the utilization of local PV generation, reducing negative effects caused by intermittent PV energy integration. In this subsection, a comparative case study is conducted to demonstrate the effectiveness of the pricing strategy obtained from the proposed energy sharing model. Three different pricing strategies are included in this case study, described as follows, (i) Strategy 1 (proposed internal uniform price): This pricing strategy can be obtained by solving our formulated leader-follower energy sharing model (4)-(11). (ii) Strategy 2 (TOU price): The price of rooftop PV electricity applied to local consumers equals to TOU price. In this regard, from the perspective of consumers, the choice of energy consumption (from the rooftop PV system owner or the utility grid) results in the same electricity bill. (iii) Strategy 3 (Market clearing price): The price of rooftop PV electricity applied to local consumers equals to market clearing price. Under this pricing strategy, the rooftop PV system owner can acquire the same income by selling self-generated PV energy to the local consumers and/or the utility grid. TABLE III DAILY PROFIT WITH DIFFERENT PRICING STRATEGIES Pricing strategy Strategy 1: Internal uniform price Strategy 2: TOU price Strategy 3: Market clearing price Daily profit ($) 41.99 39.71 24.47 8 Fig. 10. Comparison of hourly revenue with and without reinforcement learning based on the same prediction information. Fig. 10 is plotted to depict the hourly profit under these three pricing strategies based on the same prediction information provided by our proposed LSTM model. As seen in this figure, the profit under Strategy 1 (internal uniform price) is always higher than that with the other two strategies. Accordingly, this pricing strategy leads to the highest daily profit (see Table III). The reason is that under Strategy 2, only a few consumers with strong environment awareness (high WTP) will purchase PV energy form the rooftop PV system owner since PV energy output is time-varying so it is not as stable as the electricity from the utility grid. Besides, with Strategy 3, though local consumers are more likely to buy the rooftop PV generation due to the relatively low price, limited PV panel output cannot bring the rooftop PV system owner high profit. Therefore, our proposed pricing model can duly address the interest conflict between the rooftop PV system owner and the local consumers by subtlety using utilizing the environment awareness of consumers. D. Comparison with Conventional Optimization Method Fig. 11. Convergence performances by the MILP based optimization method and the proposed Q-learning algorithm based RL method. TABLE IV PREFERENCES OF COMPUTATION EFFICIENCY BY MILP BASED OPTIMIZATION METHOD AND Q-LEARNING ALGORITHM BASED RL METHOD Solution method Conventional optimization method Q-learning algorithm Profit ($) 43.075 41.994 Computation time (s) 3400.42 15.339 Fig. 11 compares revenues obtained by MILP based optimization method (solved by the CPLEX [21]) and our proposed Q-learning algorithm based RL method. As seen in this figure, the proposed solution method shows a poor performance at the initial training stage since it is undergoing trials and errors. However, after experiencing more episodes, 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < the agent adapts to the learning environment and adjusts its policy via exploration and exploitation mechanism. Finally, it can find a near-optimal pricing strategy. Table IV lists the performances of computation efficiency by these two solution methods. It can be observed that our proposed method can significantly reduce the computation time, which can benefit the PV energy sharing process. In this regard, considering the adaptivity of model-free RL to the external environment, it is suggested to accept our proposed well-performing pricing method for energy sharing management in the RBC. V. CONCLUSION This paper proposes a novel dynamic prediction and reinforcement learning based game-theoretic pricing model for sharing the rooftop PV energy in the RBC. Specifically, Stackelberg game theory is used to model the energy sharing between the rooftop PV system owner in RBC and local electricity consumers. With the introduction of the WTP of each consumer, the original complex uniform auction for local PV energy can be transformed into an efficient discriminatory auction, which can be formulated to the MDP. Then, we develop a Q-learning algorithm based solution method to find a near-optimal pricing strategy. Besides, the LSTM network based PV generation prediction model is built to dynamically update action-state space by providing hourly predicted information about future trends of rooftop PV panel outputs. The numerical results verify the effectiveness of our proposed method on dealing with issues of PV energy sharing management in the RBC comprising of electrically interconnected apartment buildings. For the implementation of the proposed dynamic pricing method, some major limits should be noted: 1) The WTP value of each building consumer needs to be duly considered and it is suggested to do user survey to obtain the reasonable WTP values; 2) The privacy of building consumers may be violated since they send their daily load requirement information to the rooftop PV system owner in each hour. However, this issue can be addressed by using non-intrusive load monitoring devices for the long-term observation of individual load changes. 3) Precise smart meters need to be placed in the apartment to measure the energy consumption, resulting in a high installation cost which may be accepted by induvial users. APPENDIX A KKT CONDITIONS OF NONLINEAR MODEL (4)-(14) The general formulation of proposed bi-level energy sharing model (4)-(14) can be described as follows, min f1 ( x, y, ,  ) {x, y , , } (25) s.t. h1 ( x, y,  ,  ) = 0 (26) g 2 ( x, y ,  ,  )  0 min f 2 ( x, y) (27) { y , , } s.t. h2 ( x, y ) = 0 :  g 2 ( x, y )  0 :  (28) 9 The Karush–Kuhn–Tucker (KKT) conditions of the lower-level optimization problem (28)-(30) can be integrated into the upper-level optimization problem (25)-(27), given as follows, min f1 ( x, y, ,  ) {x, y , , } (31) s.t. h1 ( x, y,  ,  ) = 0 (32) g 2 ( x, y ,  ,  )  0 (33) y f2 ( x, y) +  y h2 ( x, y) +  y g2 ( x, y) = 0 (34) h2 ( x, y ) = 0 (35) g 2 ( x, y )  0 ⊥   0 (36) Then, the Lagrangian is introduced as follows, L = −hU ( PihPVuser + PihESuser ) − hTOU PihG − wE  E PihG − ihD ( PihPVuser + PihESuser + PihG − PihD ) (37) − ihPVuser PihPVuser − ihESuser PihESuser − ihG PihG Therefore, the lower-level problem can be replaced by KKT conditions, given as follows, L = PihPVuser + PihESuser + PihG − PihD = 0 ihD (38) L = −hU − ihD − ihPVuser = 0 PVuser Pih (39) L = −hU − ihD − ihESuser = 0 PihESuser (40) L = −hTOU − wE  E − ihD − ihG = 0 G Pih (41) PihPVuser  0 ⊥ ihPVuser  0 (42) 0⊥ (43) ESuser ih P PVc itw 0 P 0⊥ 0 G ih G ih (44) APPENDIX B LINEARIZATION OF NONLINEAR MODEL (4)-(14) There are two nonlinearities in our proposed bi-level optimization model (4)-(14), 1) the nonlinear term 𝑃𝑉 𝐸𝑆 𝜆𝑈ℎ (𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 + 𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 ) in the objective function (4); 2) the complementarity constraints (42)-(44). As stated in the strong duality theorem, if a problem is convex, the objective functions of the primal and dual problems have the same value at the optimum [36]. To linearize 𝑃𝑉 𝐸𝑆 𝜆𝑈ℎ (𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 + 𝑃𝑖ℎ 𝑢𝑠𝑒𝑟 ) , the strong duality condition is introduced here. In this regard, the primary objective function (12) of the lower-level problem is equal to its dual objective 𝐷 𝐷 function 𝜇𝑖ℎ 𝑃𝑖ℎ , as follows, hU ( PihPV user + PihESuser ) + hTOU PihG + wE  E PihG = iDh PihD (45) (29) (30) 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 𝑃𝑉𝑢𝑠𝑒𝑟 Accordingly, the linear expression for 𝜆𝑈ℎ (𝑃𝑖ℎ can be written as follows, hU ( PihPV user + PihESuser ) = ihD PihD − hTOU PihG − wE  E PihG = ihD PihD − iWTP PihG h 𝐸𝑆𝑢𝑠𝑒𝑟 + 𝑃𝑖ℎ ) (46) As for the complementarity constraints (42)-(44), Ref. [37] provides the linear expressions by introducing the large 𝑃𝑉 𝐸𝑆 𝐺 positive constant 𝑀 and binary variables 𝑢𝑖ℎ 𝑢𝑠𝑒𝑟 , 𝑢𝑖ℎ 𝑢𝑠𝑒𝑟 , 𝑢𝑖ℎ , described as follows, PihPVuser , PihESuser , PihG  0 (47)  (48) PVuser ih , ,  0 ESuser ih G ih PVuser ih  (1 − u )M (49) ESuser ih  (1 − u )M (50) P P PVuser ih ESuser ih P  (1 − u )M (51)  u M (52)  u M (53) G ih PVuser ih ESuser ih G ih PVuser ih ESuser ih ihG  uihG M PVuser ih u ESuser ih ,u (54) , u {0,1} G ih (55) APPENDIX C FINAL LINEARIZED BI-LEVEL ENERGY SHARING MODEL By using the KKT conditions (see Appendix A) and linearization methods (see Appendix B), the final linearized bi-level energy sharing model is formulated as follows,   ihD PihD − ihWTP PihG  iN C   FiT PVgrid  ES grid O MaxES Rev =  + ( Ph + Ph )  ES grid U {h , Ph , Ph in , hH  PVuser ESuser TOU PV +  PVuser ESuser G Pih , Pih , Pih , + Pih ) − Ph ]  −h [ ( Pih ihD , ihPVuser , ihESuser , ihG } i   (56) s.t. (5)-(11), (38)-(41), (47)-(55) REFERENCES [1] E. O'Shaughnessy, D. Cutler, K. Ardani, and R. Margolis, "Solar plus: Optimization of distributed solar PV through battery storage and dispatchable load in residential buildings," Applied Energy, vol. 213, pp. 11-21, 2018. [2] W. Tushar, T. K. Saha, C. Yuen, D. Smith, and H. V. Poor, "Peer-to-peer trading in electricity networks: an overview," IEEE Transactions on Smart Grid, 2020. [3] W. Tushar et al., "Three-party energy management with distributed energy resources in smart grid," IEEE Transactions on Industrial Electronics, vol. 62, no. 4, pp. 2487-2498, 2014. [4] W. Tushar et al., "Energy storage sharing in smart grid: A modified auction-based approach," IEEE Transactions on Smart Grid, vol. 7, no. 3, pp. 1462-1475, 2016. [5] X. Xu, J. Li, Y. Xu, Z. Xu, and C. S. Lai, "A Two-stage Game-theoretic Method for Residential PV Panels Planning Considering Energy Sharing Mechanism," IEEE Transactions on Power Systems, 2020. [6] N. Liu et al., "Online energy sharing for nanogrid clusters: A lyapunov optimization approach," IEEE Transactions on Smart Grid, vol. 9, no. 5, pp. 4624-4636, 2017. 10 [7] S. Cui, Y.-W. Wang, J.-W. Xiao, and N. Liu, "A two-stage robust energy sharing management for prosumer microgrid," IEEE Transactions on Industrial Informatics, vol. 15, no. 5, pp. 2741-2752, 2018. [8] B. S. K. Patnam and N. M. Pindoriya, "Centralized stochastic energy management framework of an aggregator in active distribution network," IEEE Transactions on Industrial Informatics, vol. 15, no. 3, pp. 1350-1360, 2018. [9] S. Cui, Y.-W. Wang, and J.-W. J. Xiao, "Peer-to-Peer Energy Sharing among Smart Energy Buildings by Distributed Transaction," IEEE Transactions on Smart Grid, 2019. [10] N. Liu, X. Yu, C. Wang, and J. J. Wang, "Energy sharing management for microgrids with PV prosumers: A Stackelberg game approach," IEEE Transactions on Industrial Informatics, vol. 13, no. 3, pp. 1088-1098, 2017. [11] G. Ye, G. Li, D. Wu, X. Chen, and Y. Zhou, "Towards cost minimization with renewable energy sharing in cooperative residential communities," IEEE Access, vol. 5, pp. 11688-11699, 2017. [12] N. Liu, J. Wang, X. Yu, and L. Ma, "Hybrid energy sharing for smart building cluster with CHP system and PV prosumers: A coalitional game approach," IEEE Access, vol. 6, pp. 34098-34108, 2018. [13] Z. Wan, H. Li, H. He, and D. Prokhorov, "Model-Free Real-Time EV Charging Scheduling Based on Deep Reinforcement Learning," IEEE Transactions on Smart Grid, 2018. [14] Y. Du and F. Li, "Intelligent Multi-microgrid Energy Management based on Deep Neural Network and Model-free Reinforcement Learning," IEEE Transactions on Smart Grid, 2019. [15] Q. Yang, G. Wang, A. Sadeghi, G. B. Giannakis, and J. Sun, "Real-time Voltage Control Using Deep Reinforcement Learning," arXiv preprint arXiv:.09374, 2019. [16] N. Sadeghianpourhamami, J. Deleu, and C. Develder, "Definition and evaluation of model-free coordination of electrical vehicle charging with reinforcement learning," IEEE Transactions on Smart Grid, 2019. [17] P. Dai, W. Yu, G. Wen, and S. Baldi, "Distributed Reinforcement Learning Algorithm for Dynamic Economic Dispatch with Unknown Generation Cost Functions," IEEE Transactions on Industrial Informatics, 2019. [18] J. Ahmed and Z. Salam, "An improved method to predict the position of maximum power point during partial shading for PV arrays," IEEE Transactions on Industrial Informatics, vol. 11, no. 6, pp. 1378-1387, 2015. [19] A. W. Azhari, K. Sopian, A. Zaharim, and M. Al Ghoul, "A new approach for predicting solar radiation in tropical environment using satellite images-case study of Malaysia," Transactions on Environment Development, vol. 4, no. 4, pp. 373-378, 2008. [20] R. B. Myerson, Game theory. Harvard university press, 2013. [21] I. I. J. Cplex, "V12. 1: User’s Manual for CPLEX," International Business Machines Corporation, vol. 46, no. 53, p. 157, 2009. [22] G. OPTIMIZATION, "INC. Gurobi optimizer reference manual, 2015," 2014. [23] S. R. Shaw, S. B. Leeb, L. K. Norford, and R. W. Cox, "Nonintrusive load monitoring and diagnostics in power systems," IEEE Transactions on Instrumentation and Measurement, vol. 57, no. 7, pp. 1445-1454, 2008. [24] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, "Convolutional LSTM network: A machine learning approach for precipitation nowcasting," in Advances in neural information processing systems, 2015, pp. 802-810. [25] H. Sak, A. Senior, and F. Beaufays, "Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition," arXiv preprint arXiv:. 2014. [26] T. M. Hansen, E. K. Chong, S. Suryanarayanan, A. A. Maciejewski, and H. J. Siegel, "A partially observable markov decision process approach to residential home energy management," IEEE Transactions on Smart Grid, vol. 9, no. 2, pp. 1271-1281, 2016. [27] C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no. 3-4, pp. 279-292, 1992. [28] H. J. Kappen, "Optimal control theory and the linear bellman equation," 2011. [29] M. Tokic, "Adaptive ε-greedy exploration in reinforcement learning based on value differences," in Annual Conference on Artificial Intelligence, 2010, pp. 203-210: Springer. [30] N. Blair et al., "System advisor model, sam 2014.1. 14: General description," National Renewable Energy Lab.(NREL), Golden, CO (United States)2014. [31] G. I. Nagy, G. Barta, S. Kazi, G. Borbély, and G. J. I. J. o. F. Simon, "GEFCom2014: Probabilistic solar and wind power forecasting using a 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE Transactions on Industrial Informatics > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < [32] [33] [34] [35] [36] [37] generalized additive tree ensemble approach," vol. 32, no. 3, pp. 1087-1093, 2016. Y. Li and Y. Yuan, "Convergence analysis of two-layer neural networks with relu activation," in Advances in Neural Information Processing Systems, 2017, pp. 597-607. E. Fan, "Extended tanh-function method and its applications to nonlinear equations," Physics Letters A, vol. 277, no. 4-5, pp. 212-218, 2000. Z. Wang and A. C. Bovik, "Mean squared error: Love it or leave it? A new look at signal fidelity measures," IEEE signal processing magazine, vol. 26, no. 1, pp. 98-117, 2009. A. Gulli and S. Pal, Deep Learning with Keras. Packt Publishing Ltd, 2017. G. Dantzig, Linear programming and extensions. Princeton university press, 2016. J. Fortuny-Amat and B. J. McCarl, "A representation and economic interpretation of a two-level programming problem," Journal of the operational Research Society, vol. 32, no. 9, pp. 783-792, 1981. Xu Xu (S’18 M’19) received the M.E and Ph.D. degrees from The Hong Kong Polytechnic University, Hong Kong SAR in 2016 and 2019, respectively. Dr Xu is with the Department of Electrical Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong SAR, China. His current research interests include power system planning and operation, renewable power integration, energy management, and artificial intelligence application in power engineering. Yan Xu (S’10-M’13-SM’19) received the B.E. and M.E degrees from South China University of Technology, Guangzhou, China in 2008 and 2011, respectively, and the Ph.D. degree from The University of Newcastle, Australia, in 2013. He is now the Nanyang Assistant Professor at School of Electrical and Electronic Engineering, Nanyang Technological University (NTU), and a Cluster Director at Energy Research Institute @ NTU (ERI@N), Singapore. Previously, he held The University of Sydney Postdoctoral Fellowship in Australia. His research interests include power system stability and control, microgrid, and data-analytics for smart grid applications. Dr Xu is an Editor for IEEE Transactions on Smart Grid, IEEE Transactions on Power Systems, IEEE Power Engineering Letters, CSEE Journal of Power and Energy Systems, and an Associate Editor for IET Generation, Transmission & Distribution. Ming-Hao Wang (S’15-M’2018) received the B.Eng.(Hons.) degree in electrical and electronic engineering from the Huazhong University of Science and Technology, Wuhan, China, and the University of Birmingham, Birmingham, U.K. in 2012, and the M.Sc. and the Ph.D. degree, both in electrical and electronic engineering, from The University of Hong Kong, Hong Kong, in 2013 and 2017, respectively. Since 2018, he has been with the Department of Electrical Engineering, Hong Kong Polytechnic University, Hong Kong. Currently, he is a Research Assistant Professor in the Department of Electrical Engineering, the Hong Kong Polytechnic University. Hissearch interests include power systems and power electronics. 11 Zhao Xu (M’2016-SM’2012) received B.Eng, M.Eng and Ph.D degree from Zhejiang University, National University of Singapore, and The University of Queensland in 1996, 2002 and 2006, respectively. From 2006 to 2009, he was an Assistant and later Associate Professor with the Centre for Electric Technology, Technical University of Denmark, Lyngby, Denmark. Since 2010, he has been with The Hong Kong Polytechnic University, where he is currently a Professor in the Department of Electrical Engineering and Leader of Smart Grid Research Area. He is also a foreign Associate Staff of Centre for Electric Technology, Technical University of Denmark. His research interests include demand side, grid integration of wind and solar power, electricity market planning and management, and AI applications. He is an Editor of the Electric Power Components and Systems, the IEEE PES Power Engineering Letter, and the IEEE Transactions on Smart Grid. He is currently the Chairman of IEEE PES/IES/PELS/IAS Joint Chapter in Hong Kong Section. Jiayong Li (S’16–M’19) received the B.Eng. degree from Zhejiang University, Hangzhou, China, in 2014, and the Ph.D. degree from The Hong Kong Polytechnic University, Hong Kong, in 2018. He is currently an Assistant Professor with the College of Electrical and Information Engineering, Hunan University, Changsha, China. He was a Postdoctoral Research Fellow with The Hong Kong Polytechnic University and a Visiting Scholar with Argonne National Laboratory, Argonne, IL, USA. His research interests include power economics, energy management, distribution system planning and operation, renewable energy integration, and demand-side energy management. Songjian Chai received the Ph.D. degree from The Hong Kong Polytechnic University, Hong Kong SAR, in 2018. He is currently a Postdoctoral Research Fellow with The Hong Kong Polytechnic University. His research interests include variable renewable generation forecasting, electricity price forecasting, power system uncertainty analysis, and artificial intelligence application in power engineering. Yufei He (S’17) received the B.Eng. degree from Zhejiang University, China, in 2016. He is currently working toward the Ph.D. degree in the Department of Electrical Engineering, The Hong Kong Polytechnic University, Hong Kong. His research interests include power electronic control for grid-integration of renewables. 1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.

PV Energy Sharing: Game-Based Pricing with AI

Products

Support

PV Energy Sharing: Game-Based Pricing with AI

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib