Uploaded by Zhao Xu

data driven game based pricing for sharing rooftop PVs among residential buildings

advertisement
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
1
Data-driven Game-based Pricing for Sharing Rooftop
Photovoltaic Generation and Energy Storage in the
Residential Building Cluster under Uncertainties
Xu Xu, Member, IEEE, Yan Xu, Senior Member, IEEE, Ming-Hao Wang, Member, IEEE, Jiayong Li, Member,
IEEE, Zhao Xu, Senior Member, IEEE, Songjian Chai, Yufei He, Student Member, IEEE
Abstract—In this paper, a novel machine learning based
data-driven pricing method is proposed for sharing rooftop
photovoltaic (PV) generation and energy storage (ES) in an
electrically interconnected residential building cluster (RBC). In
the studied problem, the energy sharing process is modeled by the
leader-followers Stackelberg game where the owner of the rooftop
PV system is responsible for pricing self-generated PV energy and
operating ES devices. Meanwhile, local electricity consumers in
the RBC choose their energy consumption with the given internal
electricity prices. To track the stochastic rooftop PV panel
outputs, the long short-term memory (LSTM) network based
rolling-horizon prediction function is developed to dynamically
predict future trends of PV generation. With system information,
the predicted information is fed into a Q-learning based
decision-making process to find near-optimal pricing strategies.
The simulation results verify the effectiveness of the proposed
approach in solving energy sharing problems with partial or
uncertain information.
Index Terms—Pricing method, photovoltaic generation, energy
storage, residential building cluster, energy sharing, Stackelberg
game, long short-term memory network, Q-learning algorithm
I. INTRODUCTION
I
N recent years, rooftop photovoltaic (PV) systems have been
widely deployed in residential buildings [1], which can
provide clean energy supply during the daytime. However, for a
residential building cluster (RBC) comprising of electrically
This work is partially supported by the National Natural Science Foundation
of China (Grant No. 71971183). The work of J. Li is supported by the National
Natural Science Foundation of China (Grant No. 51907056). Y. Xu’s work is
supported by Nanyang Assistant Professorship from Nanyang Technological
University, Singapore. (Corresponding authors: Zhao Xu and Jiayong Li).
X. Xu is with the Department of Electrical Engineering, The Hong Kong
Polytechnic University, Hung Hom, Hong Kong Special Administrative
Region, China. (email: benxx.xu@connect.polyu.hk).
Y. Xu is with the School of Electrical and Electronic Engineering, Nanyang
Technological University, Singapore. (email: xuyan@ntu.edu.sg).
M.-H. Wang is with the Department of Electrical Engineering, The Hong
Kong Polytechnic University, Hung Hom, Hong Kong Special Administrative
Region, China. (e-mail: minghao.wang@polyu.edu.hk).
Z. Xu is with both Shenzhen Research Institute and Department of Electrical
Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Special Administrative Region, China. (email: eezhaoxu@polyu.edu.hk).
J. Li is with the College of Electrical and Information Engineering, Hunan
University, Changsha, China. (email: j-y.li@connect.polyu.hk).
S. J. Chai and Y. F. He are both with the Department of Electrical
Engineering, The Hong Kong Polytechnic University, Hung Hom, Hong Kong
Special Administrative Region, China. (e-mails: chaisongjian@gmail.com;
daniel.v.he@connect.polyu.hk).
interconnected buildings (see Fig. 1), the PV energy sharing
management is a critical concern.
The concept of energy sharing has been widely used in
power systems, which has been well studied in existing papers,
such as Refs [2-5]. Besides, many research efforts have been
made to study the energy sharing management among
end-users in the literature. Conventional energy sharing
methods are based on optimization algorithms. In Ref. [6],
based on Lyapunov optimization, an online energy sharing
framework is presented to enhance the self-sufficiency and PV
consumption for nano-grid clusters. Ref. [7] proposes a
two-stage robust energy sharing approach for a prosumer
microgrid with renewable energy integration, storage units and
load shifting. Ref. [8] employs the heuristic algorithm to
establish a day-ahead energy management method integrated
with home appliance scheduling and energy sharing among
smart houses. In Ref. [9], a peer-to-peer energy sharing strategy
with the distributed transaction is developed for an energy
building cluster including different types of energy buildings.
Ref. [10] develops a game theory based energy sharing
management method for the microgrid as well as a billing
mechanism according to PV energy and load consumption. In
Ref. [11], an online optimization based algorithm is proposed
for cost-aware energy sharing among electricity consumers in a
cooperative community. Ref. [12] presents a novel hybrid
energy sharing management framework to facilitate heat and
PV energy sharing among smart buildings. However, there are
several deficiencies in these existing works: (i) Uncertain
renewable generation is not well considered during the energy
sharing process in these existing works; (ii) Multiple electricity
consumers live in the RBC with different living behaviors,
which may bring a difficulty to achieve an agreement on PV
energy allocation; (iii) Conflicts of interest between the rooftop
PV system owner and local electricity consumers need to be
addressed properly.
The existing optimization methods can be classified as
model-based methods that rely on an accurate mathematical
formulation to describe the energy sharing process. However,
the energy sharing problem is usually involved with unknown
or uncertain information in practice, so iterative solution
algorithms are generally adopted. This may pose two potential
challenges: (i) To ensure the convergence of some iterative
algorithms, certain assumptions and simplifications are
required; (ii) The iterative algorithm may be impractical to be
used in the real world due to possible non-convergence issues.
By comparison, as a model-free, adaptive and concise machine
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
learning technique [13], reinforcement learning exhibits
excellent performance on the decision-making process.
Reinforcement learning algorithms have been widely employed
to model power system operation problems, such as
multi-microgrid energy management [14], voltage control [15],
electrical vehicle charging [16], dynamic economic dispatch
[17], etc. However, applying reinforcement learning in energy
sharing management is still in the early stage.
In this regard, this paper proposes a fully data-driven method
based on the deep neural network and the reinforcement
learning algorithm for making game-theoretic dynamic pricing
strategies to optimally share the rooftop PV energy with
electricity consumers in a RBC. The main contributions of this
paper can be summarized as follows,
1) The proposed dynamic data-driven game-based pricing
decision-making process is described as the Markov Decision
Process (MDP), which can be well addressed by the Q-learning
algorithm. Compared with conventional optimization methods,
our proposed method can be flexibly and easily applied by
off-line training and on-line implementation with no
requirement for initial knowledge. Besides, the computation
efficiency can be substantially improved.
2) The long short-term memory (LSTM) network is duly
integrated into the proposed pricing framework to capture the
future trends of rooftop PV generation with time-window
rolling. This predicted information is fed into the reinforcement
learning based decision-making process to help the Q-learning
agent to find the near-optimal pricing strategies.
3) To express the preferences of local consumers on
environmental awareness, the concept of willingness-to-pay
(WTP) is introduced in this paper. In this regard, the original
complex game-based energy pricing optimization model is
innovatively transformed into an efficient discriminatory
auction, where the near-optimal pricing strategies can be
quickly determined by the rooftop PV system owner by using
the proposed pricing method.
The rest of this paper is organized as follows. In Section II,
we model the energy sharing in the RBC. Section III describes
the decision-making process of pricing strategy, including the
LSTM network, MDP formulation and Q-learning process.
Numerical results are given in Section V. Finally, we conclude
this paper in Section VI.
II. PROBLEM MODELING
Fig. 1 depicts the structure of the energy sharing in a RBC.
As shown in this figure, the rooftop PV system is comprised of
two kinds of devices, i.e., PV panels and energy storage (ES)
devices. The rooftop PV system owner is the energy sharing
executor who is responsible for the interoperability among
various components in Fig. 1. The rooftop PV system owner is
in charge of providing self-generated PV energy to all
electricity consumers in the RBC and operating local energy
storage devices. Moreover, this owner has the responsibility of
guaranteeing the maximum utilization of local PV generation
within the RBC. Besides, it is assumed that the smart meters are
installed in the RBC to gather the system data and receive
instructions or information from the rooftop PV system owner.
2
Fig. 1. Structure of the energy sharing in a RBC.
A. Profit Model of Rooftop PV System Owner
In this paper, we assume that the rooftop PV system owner is
an external company, which can be defined as a financial
objective function only, i.e. maximization of revenues from
sharing self-generated PV energy and operating local energy
storage devices. It is assumed that the ES is charged by rooftop
PV generation only to properly track the dispatch of local
energy. Note that the investment and operation costs of the
rooftop PV system are omitted during the energy sharing
process. Usually, the maximum power point tracking (MPPT)
control [18] is applied to PV panel operation to maximize the
PV generation since the PV power output is time-varying with
solar intensity and environment temperature [19]. The actual
๐‘ƒ๐‘‰
values of the PV panel output are [๐‘ƒฬ…โ„Ž๐‘ƒ๐‘‰ , ๐‘ƒฬ…โ„Ž+1
, … , ๐‘ƒฬ…๐ป๐‘ƒ๐‘‰ ], where
๐ป โ‰” {โ„Ž, โ„Ž + 1, … , ๐ป} denotes the time slot set.
At each hour, the rooftop PV system owner acts as the leader
which sets the uniform price for local PV generation, so the
hourly profit ๐‘…๐‘’๐‘ฃโ„Ž๐‘‚ of the owner can be defined as follows,
RevhO =
๏ƒฅ๏ฌ
U
h
( PihPVuser + PihESuser ) + ๏ฌ FiT ( Ph
PVgrid
+ Ph
ES grid
i๏ƒŽN C
)
(1)
− ๏ฌhTOU [๏ƒฅ ( PihPVuser + PihESuser ) − PhPV ]+
i
In Eq. (1), the first term represents the profit of selling PV
๐‘ƒ๐‘‰
๐ธ๐‘†
energy ๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ and electricity in ES ๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ to the local
electricity consumers with a uniform price ๐œ†๐‘ˆโ„Ž and the second
๐‘ƒ๐‘‰๐‘”๐‘Ÿ๐‘–๐‘‘
term denotes the profit of selling PV energy ๐‘ƒโ„Ž
electricity
๐ธ๐‘†๐‘”๐‘Ÿ๐‘–๐‘‘
๐‘ƒ๐‘–โ„Ž
in ES
to the utility grid with feed-in traffic rate ๐œ†๐น๐‘–๐‘‡ .
๐‘ ๐ถ is the set of electricity consumers in the RBC. The third
term in (1) describes the compensation cost regarding the
mismatch between the energy sold to the electricity consumers
and the actual PV generation. Note that this mismatch cost is
caused by prediction errors since the rooftop PV system owner
makes the pricing strategy based on the predicted information,
which cannot be accurate. [โˆ™]+ represents the projection
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
operator onto
๐‘š๐‘Ž๐‘ฅโก(๐‘ฅ, 0).
the
non-negative
orthant,
i.e., [๐‘ฅ]+ =
B. Utility Cost of Electricity Consumers
The electricity consumers in the RBC are followers who
decide to purchase the electricity from the rooftop PV system
owner or the utility grid according to the given price signals.
๐ถ
The utility cost ๐‘ˆ๐‘–โ„Ž
of electricity consumer ๐‘– ∈ ๐‘ ๐ถ can be
given as follows,
UihC = ๏ฌhU ( PihPVuser + PihESuser ) + ๏ฌhTOU PihG + wE ๏ฌ E PihG
(2)
where the first term and second term denote the electricity cost
of purchasing electricity from the rooftop PV system owner
๐‘ƒ๐‘‰
๐ธ๐‘†
๐บ
๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ ,โก๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ , and the utility grid ๐‘ƒ๐‘–โ„Ž
, respectively. The third
term describes the greenhouse gas emission cost with the
coefficient ๐œ†๐ธ . Specifically, the weight factor ๐‘ค๐‘–๐ธ ∈ [0,1] is
introduced to reflect the environmental awareness of electricity
consumer ๐‘–. In practice, ๐‘ค๐‘–๐ธ can be adjusted depending on the
preferences of electricity consumers on a case by case basis.
๐ท
Note that the demand ๐‘ƒ๐‘–โ„Ž
of electricity consumer ๐‘– can be
๐‘ƒ๐‘‰๐‘ข๐‘ ๐‘’๐‘Ÿ
๐ธ๐‘†
๐บ
๐ท
satisfied by ๐‘ƒ๐‘–โ„Ž
, โก๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ and ๐‘ƒ๐‘–โ„Ž
, i.e., ๐‘ƒ๐‘–โ„Ž
=
๐‘ƒ๐‘‰๐‘ข๐‘ ๐‘’๐‘Ÿ
๐ธ๐‘†๐‘ข๐‘ ๐‘’๐‘Ÿ
๐บ
๐‘Š๐‘‡๐‘ƒ
๐‘ƒ๐‘–โ„Ž
+ ๐‘ƒ๐‘–โ„Ž
+ ๐‘ƒ๐‘–โ„Ž , so the WTP ๐œ†๐‘–โ„Ž of electricity
๐ท
consumer ๐‘– for the local energy can be derived by using ๐‘ƒ๐‘–โ„Ž
−
๐‘ƒ๐‘‰๐‘ข๐‘ ๐‘’๐‘Ÿ
๐ธ๐‘†๐‘ข๐‘ ๐‘’๐‘Ÿ
๐บ
๐‘ƒ๐‘–โ„Ž
− โก ๐‘ƒ๐‘–โ„Ž
to substitute for ๐‘ƒ๐‘–โ„Ž in (2), given as follows,
๏ฌihWTP = ๏ฌhTOU + wiE ๏ฌ E
๏ƒฌ(Owner Building Users) ๏ƒผ
๏ƒฏ U
๏ƒฏ
ES grid
ESin
}
๏ƒฏ{๏ฌh },{Ph },{Ph
๏ƒฏ
G = ๏ƒญ PV
๏ƒฝ
ESuser
G
user
{
P
},{
P
},{
P
}
๏ƒฏ ih
๏ƒฏ
h
ih
๏ƒฏ{Rev O },{U C }
๏ƒฏ
h
ih
๏ƒฎ
๏ƒพ
where (๐‘‚๐‘ค๐‘›๐‘’๐‘Ÿ ∪ ๐‘ ๐ถ ) denotes the player sets, the rooftop PV
system owner acts as the game leader and the building
consumers take the roles of game followers in response to the
๐ธ๐‘†๐‘”๐‘Ÿ๐‘–๐‘‘
๐ธ๐‘†
๐ธ๐‘†
๐‘ข๐‘ ๐‘’๐‘Ÿ
strategy of the leader; {๐€๐‘ˆ
}, {๐‘ท๐‘–โ„Ž ๐‘–๐‘› }, and {๐‘ท๐‘–โ„Ž
โ„Ž }, {๐‘ท๐‘–โ„Ž
}
are the strategy sets of the game leader; {๐‘ท๐‘‚๐‘–โ„Ž } and {๐‘ท๐บ๐‘–โ„Ž } are
strategy sets of game followers; {๐‘น๐’†๐’—๐‘‚โ„Ž } and {๐‘ผ๐ถ๐‘–โ„Ž } are
the
the
profit (1) of the leader and the utility cost (2) of the followers,
respectively. Thus, the bi-level energy sharing model is
formulated as,
๏ƒฌ ๏ƒฅ ๏ฌhU ( PihPVuser + PihESuser )
๏ƒผ
๏ƒฏi๏ƒŽN C
๏ƒฏ
๏ƒฏ FiT PVgrid
๏ƒฏ
ES grid
O
Rev = ๏ƒฅ ๏ƒญ+๏ฌ ( Ph
Max
+ Ph
)
๏ƒฝ
ES
ES
{๏ฌhU , Ph grid , Ph in ,
h๏ƒŽH ๏ƒฏ
PVuser
ESuser
TOU
PV + ๏ƒฏ
PV
ES
Pih user , Pih user , PihG }
P
P
P
−
๏ฌ
[
(
+
)
−
]
๏ƒฅi ih
ih
h
๏ƒฏ h
๏ƒฏ
๏ƒฎ
๏ƒพ
(4)
s.t. PhPV = ๏ƒฅ PihPVuser + PhPVgrid + PhESin
(5)
i๏ƒŽN C
soc
PhESsoc = PhES
+ ๏จ ESin PhESin − ๏จ ESout ( ๏ƒฅ PihESuser + Ph
−1
ESgrid
)
i๏ƒŽN C
(6)
ES soc
h = h1
P
0๏‚ฃ
=P
ESinit
(7)
๏ƒฅP
ESuser
ih
+P
ES grid
h
๏‚ฃP
ES
(8)
i๏ƒŽN C
0 ๏‚ฃ PhESin ๏‚ฃ P ES
P
(3)
C. Stackelberg Game based PV Energy Sharing
In this subsection, one-leader and N-follower Stackelberg
game theory [20] is employed to formulate the PV energy
sharing model. The basic idea of this game is that the leader
chooses the first action and then the followers observe the
action taken by the leader and make their own decisions
accordingly. Specifically, as the leader in this game, the owner
of the rooftop PV system (including rooftop PV panels and ES
devices) set the internal price ๐œ†๐‘ˆโ„Ž for the local PV energy to sell
them to the local building users. Besides, the local PV energy
can also be sold to the utility grid with the FiT rate ๐œ†๐น๐‘–๐‘‡ . The
goal of leader is to maximize the daily revenue by pricing and
selling local PV energy. Meanwhile, the building users act as
the followers in this game, so they choose to buy the local PV
energy with the internal price ๐œ†๐‘ˆโ„Ž or/and the electricity from the
utility grid with the TOU price ๐œ†๐‘‡๐‘‚๐‘ˆ
โ„Ž . The goal of followers is to
minimize the daily electricity bills by choosing energy
consumption, i.e., from the local provider or/and utility grid. In
this regard, the Stackelberg game ๐บ for this problem can be
described as follows,
3
PVgrid
h
P
ESsoc
h
, Pih
ES grid
h
,P
MaESx
PVuser
{ Pih
๏‚ฃP
ESsoc
user
, PihG }
PVuser
ih
P
PVuser
ih
ESuser
ih
,P
๏‚ฃP
๏ƒŽR
(10)
+
(11)
๏ป
− UiC = − ๏ƒฅ ๏ฌhU ( P
s.t. P = P
D
ih
(9)
ESsoc
h๏ƒŽH
+P
ESuser
ih
PVuser
ih
+P
ESuser
ih
+P : ๏ญ
G
ih
D
ih
, P ๏ƒŽ R + : ๏ญihPVuser , ๏ญihESuser , ๏ญihG
G
ih
) + ๏ฌihWTP PihG
๏ฝ
(12)
(13)
(14)
where the upper-level model (4)-(11) is to maximize the profit
of the PV system owner in the RBC. The objective function (4)
is to maximize the daily revenue of the PV system owner. (5)
denotes the dispatch of the PV energy, which can be sold to
๐‘ƒ๐‘‰
๐‘ƒ๐‘‰๐‘”
local electricity consumers ๐‘ƒ๐‘–โ„Ž ๐‘ , fed into the utility grid ๐‘ƒโ„Ž
,
๐ธ๐‘†
๐‘ƒโ„Ž ๐‘–๐‘› .
or stored in ES
(6)-(10) gives the operating limits of ES
devices. (11) ensures that the upper-level variables are
non-negative. The lower-level model (12)-(14) is to minimize
the electricity cost of local electricity consumers. (13) balances
the supply and demand of electricity consumers with dual
๐ท
variable ๐œ‡๐‘–โ„Ž
. (14) imposes the non-negative variables in the
๐‘ƒ๐‘‰
๐ธ๐‘†
๐บ
lower-level model with dual variables ๐œ‡๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ , ๐œ‡๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ , ๐œ‡๐‘–โ„Ž
.
The difficulty of solving the proposed bi-level energy
sharing problem (4)-(14) is that it is a nonlinear and nonconvex
problem involved with nonlinear terms. Conventionally,
Karush–Kuhn–Tucker (KKT) conditions can be employed to
transform the original model should be transformed into a
Mathematical Program with Equilibrium Constraints (MPEC)
model (see Appendix A, Appendix B and Appendix C), which
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
can be directly solved by some commercial solvers, e.g.,
CPLEX [21] and Gurobi [22]. However, a large number of
mixed-integer variables involve in KKT conditions, resulting in
a huge computation burden. Besides, conventional optimization
methods are not feasible in practice since these methods are
based on the assumption of perfect prediction of PV panel
output. Besides, the optimization based pricing strategy is also
not reasonable enough since the rooftop PV system owner only
focuses on the current profit and overlooks the future reward. In
this regard, in the following section, we will propose a novel
pricing method based on a dynamic uncertainty prediction
model as well as a model-free reinforcement learning method,
which can be easily employed to find the near-optimal pricing
strategies.
III. PROPOSED DATA-DRIVEN PRICING STRATEGY
A. Mapping Energy Sharing Model to Discrimination Auction
As studied in Ref. [10], the Stackelberg Equilibrium (SE) in
a Stackelberg game is reached as long as all participants obtain
the optimal solutions. Thus, our proposed bi-level energy
sharing framework can reach the SE once the PV system owner
(leader) finds the optimal pricing strategy for selling the
self-generated PV energy and meanwhile all local consumers
(followers) determine their electricity consumption, i.e., from
the rooftop PV system and the utility grid. It is assumed that the
load information of electricity consumers in the RBC can be
utilized by the PV system owner since advanced non-intrusive
load monitoring devices [23] can be installed in the residential
buildings for long-term observation. To maximize the profits of
the PV system owner, the self-generated PV energy will be
dispatched in the descending order of the WTP values.
Therefore, the optimal uniform price of PV energy equals the
WTP offered by the consumers. In other words, the uniform
price (WTP) that brings about the highest revenue to the PV
system owner will be returned. Therefore, the original complex
bi-level PV energy sharing problem is formulated as an
efficient discriminatory auction for local energy (i.e., rooftop
PV generation and electricity stored in ES).
According to the Eq. (3), both electricity price ๐œ†๐‘‡๐‘‚๐‘ˆ
and the
โ„Ž
coefficient of greenhouse gas emission cost ๐œ†๐ธ are known, thus
the value of WTP is mainly determined by the weight factor
๐‘ค๐‘–๐ธ . In this regard, the original complex bi-level PV energy
sharing problem is formulated as an efficient discriminatory
auction for PV energy, where the weight factor ๐‘ค๐‘–๐ธ needs to be
selected for making the pricing strategy.
B. LSTM Network for Dynamic PV Generation Prediction
Considering the uncertain PV generation, a prediction
function should be added in the rooftop PV system to facilitate
the decision-making process of pricing. In this subsection, the
LSTM-based sequence to sequence model is formulated to
predict future rooftop PV panel output. This model includes
three parts, encoder, encoder vector and decoder, aiming to
map a fixed-length input with a fixed-length output where the
length of the input and output may differ. The LSTM network is
a variant of the standard recurrent neural network (RNN) [24].
4
By substituting LSTM units for the basic hidden neurons in
RRN, LSTM network can deal with the issues caused by
gradient vanishing and explosion of long-term dependencies
[25]. As shown in Fig. 3, the LSTM unit includes three kinds of
gate controllers, i.e., input gate, forget gate and output gate,
which are mainly used to determine what information should be
remembered. These three gates can be calculated by the
following equations,
it = ๏ณ (Wix xt + Wih ht −1 + bi )
ft = ๏ณ (Wfx xt + Wfh ht −1 + bf )
ot = ๏ณ (Wox xt + Woh ht −1 + bo )
(15)
(16)
(17)
where ๐œŽ represents the sigmoid function, whose output is in the
range of [0,1], describing how much information should be let
through. ๐‘Š๐‘–๐‘ฅ , ๐‘Š๐‘–โ„Ž , ๐‘Š๐‘“๐‘ฅ , ๐‘Š๐‘“โ„Ž , ๐‘Š๐‘“๐‘ฅ and ๐‘Š๐‘“โ„Ž denote matrices of
weights for the input gate, forget gate and output gate. ๐‘๐‘– , ๐‘๐‘“
and ๐‘๐‘œ represent the vectors of biases for these gates. It should
be noted that temporal memory is implemented in the LSTM
network by switching different gates to prevent the gradient
vanishing. Therefore, the external inputs of the LSTM unit are
the previous cell state ๐‘๐‘ก−1 , the previous hidden state โ„Ž๐‘ก−1 and
the current input vector ๐‘ฅ๐‘ก .
Then, an intermediate state ๐ถ๐‘ก is generated, given as,
Ct = tanh(Wcx xt + Wch ht −1 + bc )
(18)
Accordingly, the memory cell and hidden state of this LSTM
are updated as,
Ct = ft ๏ƒ„ Ct + it ๏ƒ„ Ct
ht = Ot ๏ƒ„ tanh(Ct )
(19)
(20)
where tanh is the nonlinear activation function and the operator
โจ‚ denotes the pointwise multiplication operation for two
vectors.
In this work, historical data of PV generations are collected
and put into the proposed encoder-decoder sequence to
sequence model, where the LSTM network is used as the
training algorithm. As the output of this prediction model,
๐‘ฆ๐‘ก , ๐‘ฆ๐‘ก+1 , … , ๐‘ฆ๐‘ก+12 denotes the predicted future 12-hour PV
generations. This predicted information will be fed into the
Q-learning process to make pricing strategies in a
rolling-window manner.
C. MDP Formulation
As described in Section III-A, the original complex bi-level
PV energy sharing problem is formulated as an efficient
discriminatory auction for rooftop PV energy, where the weight
factor ๐‘ค๐‘–๐ธ needs to be selected for making pricing strategies.
This pricing problem can be formulated as a finite MDP [26],
where the outcomes are partly controlled by the decision-maker
(rooftop PV system owner) and partly random. Under the
Q-learning framework [27], the MDP is formulated as follows,
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
5
Fig. 2. Schematic of our proposed LSTM network and Q-learning based data-driven pricing method.
1) State Set ๐‘บ๐’‰ : The state ๐‘ โ„Ž ∈ ๐‘บ at hour โ„Ž includes three
kinds of information, i.e., current TOU electricity price ๐œ†๐‘‡๐‘‚๐‘ˆ
โ„Ž ,
๐ธ๐‘†๐‘ ๐‘œ๐‘
๐น๐‘–๐‘‡
feed-in tariff rate ๐œ† , current of ES ๐‘ƒโ„Ž
and predicted
๐‘ƒ๐‘‰
future trends of rooftop PV panel output [๐‘ƒโ„Ž๐‘ƒ๐‘‰ , ๐‘ƒโ„Ž+1
, … , ๐‘ƒ๐ป๐‘ƒ๐‘‰ ].
2) Action Set ๐‘จ: As described in Section III-A, the action
๐‘Žโ„Ž ∈ ๐‘จ for the current state ๐‘ โ„Ž represents the weight factor ๐‘ค๐‘–๐ธ .
3) Reward ๐‘Ÿโ„Ž : In this paper, the reward ๐‘Ÿโ„Ž is the cumulative
profit of rooftop PV system owner by participating in the
energy sharing from โ„Ž to ๐ป, as described by Eq. (4),
4) Action-value Function ๐‘„๐œ‹ (๐‘ , ๐‘Ž): The cumulative reward
is used as the action-value function to evaluate the quality of
action-state pairs, described as follows,
๏ƒฉK
๏ƒน
Q๏ฐ ( s, a) = ๏…๏ฐ ๏ƒช๏ƒฅ ๏ง k ๏ƒ— rh +1 | sh = s, ah = a ๏ƒบ
๏ƒซ k =0
๏ƒป
(21)
where ๐‘˜ โ‰” {0,1, … , ๐พ} denotes the time step and ๐œ‹ represents
the policy which maps from a state to an action. Note that ๐›พ ∈
[0,1] is the discount rate indicating the relative importance of
future rewards for the current reward.
The primary goal of our proposed pricing problem is to
maximize the action-value function by finding the optimal
policy ๐œ‹ ∗ , i.e., a sequence of optimal actions (weight factors
๐‘ค๐‘–๐ธ ), given as follows,
Q* ( s, a ) = max Q๏ฐ ( s, a )
๏ฐ
(22)
The Q-learning algorithm is employed to iteratively update
action-value function value via the Bellman equation [28].
Q๏ฐ* (sh , ah ) = r(sh , ah ) + ๏ง ๏€ช max Q(sh+1 , ah+1 )
(23)
Besides, the Q-value can be updated by the following
equation,
Q(sh , ah ) ๏‚ฌ (1 − ๏ฑ )Q(sh , ah ) + ๏ฑ Q๏ฐ* (sh , ah )
(24)
where ๐œƒ ∈ [0,1] denotes the learning rate indicating to what
extend the new Q-value can overturn the old one.
TABLE I
STATE SET, ACTION SET AND REWARD FUNCTION FOR EACH HOUR
State set ๐‘บ๐’‰
Action setโก๐‘จ
Reward function ๐‘Ÿโ„Ž
๐ธ๐‘†๐‘ ๐‘œ๐‘
๐‘ƒ๐‘‰
{๐œ†๐น๐‘–๐‘‡ , ๐œ†๐‘‡๐‘‚๐‘ˆ
, [๐‘ƒโ„Ž๐‘ƒ๐‘‰ , ๐‘ƒโ„Ž+1
, … , ๐‘ƒ๐ป๐‘ƒ๐‘‰ ], ๐‘ƒโ„Ž
โ„Ž
}
{๐‘ค1๐ธ , ๐‘ค2๐ธ , … , ๐‘ค๐‘๐ธ๐‘ }
Eq. (4)
D. Q-learning Algorithm based Solution Method
Algorithm 1 Proposed Dynamic Pricing Method
1. Repeat for each hour โ„Ž
PV panel output prediction
2.
Collect the data of PV panel output
3. Feed the collected data into trained LSTM network to predict
future trends of PV panel out
Q-learning algorithm based decision-making process
4.
Input action set ๐‘จ๐’‰
5.
Initialize state set ๐‘บ๐’‰
6.
Initialize Q-value ๐‘„(๐‘ โ„Ž , ๐‘Žโ„Ž ) arbitrarily
7.
Repeat for each episode
8.
Repeat for each state ๐‘ โ„Ž
9.
Update state set
๐ธ๐‘†
๐‘ƒ๐‘‰
๐‘บ๐’‰ ← {๐œ†๐น๐‘–๐‘‡ , ๐œ†๐‘‡๐‘‚๐‘ˆ
, [๐‘ƒโ„Ž๐‘ƒ๐‘‰ , ๐‘ƒโ„Ž+1
, … , ๐‘ƒ๐ป๐‘ƒ๐‘‰ ], ๐‘ƒโ„Ž ๐‘ ๐‘œ๐‘ }
โ„Ž
10.
Choose an action ๐‘Žโ„Ž from the current action set ๐‘จ๐’‰
11.
Calculate the current reward ๐‘Ÿโ„Ž (๐‘ โ„Ž , ๐‘Žโ„Ž )
12.
Update the Q value ๐‘„(๐‘ โ„Ž , ๐‘Žโ„Ž )
13.
Until ๐‘ โ„Ž+1 = ๐‘ ๐ป
14. Until maximum episode
15. Output the optimal policyโก๐œ‹ ∗,
∗
∗}
{๐‘Žโ„Ž∗ , ๐‘Žโ„Ž+1
, … , ๐‘Ž๐ป
= ๐‘Ž๐‘Ÿ๐‘”๐‘š๐‘Ž๐‘ฅ๐‘„
16. Execute the optimal action ๐‘Žโ„Ž∗ for the current hour โ„Ž
17. Until โ„Ž = ๐ป
Algorithm 1 describes the implementation process of the
proposed Q-learning algorithm-based solution method for
solving our formulated MDP based pricing problem. As shown
in Algorithm 1, in each hour, the proposed LSTM
network-based PV generation prediction function runs to
output future PV generation. Then, these predicted values are
fed into the Q-learning process for making the optimal pricing
strategy. Specifically, in each episode, an action is selected for
the current state in terms of the ๐œ€-greedy policy (๐œ€โกฯตโก[0,1]) [29],
where the agent in Q-learning algorithm can either execute a
random action form the set of available actions with probability
๐œ€ or select an action whose current Q-value is maximum, with
probability 1 − ๐œ€. After choosing an action, the current reward
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
can be calculated via Eq. (18) and then the Q-value can be
updated via Eq. (22). At the end of each episode, the
termination criterion is checked. If this termination criterion is
not satisfied, the agent will move to the next episode and repeat
the above process. Finally, each agent will gain optimal actions
for each coming hour. Note that only the optimal action for the
current hour is taken since the optimal pricing strategy will be
updated for each hour. The above procedure will be repeated
until the end hour, i.e., โ„Ž = ๐ป. Moreover, Fig. 3 is plotted to
depict the flowchart of the proposed Q-learning algorithm
based decision-making process.
6
Fig. 4). The daily individual load data published from the
National Renewable Energy Laboratory (NREL) [30] is used in
the case study. It should be noted that, for simplification, we
randomly select 360 fractions in the range [0,1] to represent the
weight factors for describing the environmental awareness of
all electricity consumers in the RBC. The numeric value zero
means weak environmental awareness while the numeric value
one means strong environmental awareness. However, in
real-world scenarios, these weight factors can be obtained in
some ways, such as a questionnaire survey or non-instructive
long-term observation of individual load consumption. For the
Q-learning based decision-making process, the discount rate ๐›พ
is set to 0.9 so the obtained pricing strategy is foresighted to
avoid future risks. All simulations are implemented on the
platform MATLAB with an Intel Core i7 of 2.4 GHz and 8GB
memory.
B. Performance of LSTM Network based Prediction Function
TABLE II
SUMMARY OF TRAINING SETTINGS OF LSTM NETWORK
Network
Encoder
Decoder
Fig. 3. The proposed Q-learning algorithm based decision-making process.
IV. NUMERICAL RESULTS
Others
Hyperparameter
Encoder length
Layers
Hidden states
Kernel_regularizer
Activation function
Decoder length
Layers
Hidden states
Activation function
Kernel_regularizer
MLP layers
MLP activation function
Epochs
Batch size
Loss function
Optimizer
Value/Function
36
1
200
0.001
Relu [32]
12
1
200
Relu
0.001
1
Tanh [33]
100
64
Mean squared error [34]
Nadam [35]
A. Test Case Setup
Fig. 4. TOU prices in the summer of 2019. (Source: Alectra Utilities)
In this paper, we consider a test case in which six apartment
buildings. Each apartment building has 60 households. It is also
assumed that each apartment has a 100 m 2 roof area and the
installation size of the rooftop PV panel is 16.6 kWp. The
capacity of ES devices within the rooftop PV system is 10KVA.
TOU price data can be collected from the Alectra Utilities (see
The PV dataset for network training is collected from the
Global Energy Forecasting Competition 2014 [31], which can
be publicly accessed online. The dataset covers 12 numerical
weather prediction (NWP) variables and the hourly PV power
output measured from 1st Apr 2012 to 1st Jul 2014 at three
neighbored PV plants in Australia. In this case, we only use the
PV power output observed in site 1 for model construction,
integration of NWP information and the neighbored
measurements is beyond the scope of this work, since historic
samples are enough to establish the forecasting model on a
rolling basis. Before learning, the measurements in the night
(7:00 pm – 7:00 am) are removed, the data from 1st Apr 2012 to
1st Apr 2014 is used for model training, and the rest is for
prediction. The settings of the adopted encoder-decoder LSTM
network are listed in Table II.
The predictive skill of the well-trained LSTM network for
different look-ahead horizons is shown in Fig. 5. Based on this
predicted information, a sequence of optimal actions can be
selected by the Q-learning agent with the consideration of the
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
trade-off between current reward and future reward by setting
the discount factor. However, only the action for the current
hour is executed. Therefore, a relatively large perdition error
will have a minor effect on our results. It should be noted that
taking into account the NWP information and making a
dynamic intraday adjustment on the day-ahead forecast can
further improve the forecasting accuracy, which would benefit
the reinforcement learning based decision-making process on
finding the near-optimal solutions, this will be investigated in
our future work.
7
C. Performance of Q-learning Decision-making Process
Fig. 6. Q-learning process during 5*104 episodes
(a)
Fig. 7. Optimal internal uniform price for each hour
(b)
Fig. 8. The number of building users involved in the energy sharing process in
each hour.
(c)
Fig. 9. Dispatch of ES in each hour.
(d)
Fig. 5. Prediction performance with different time steps.
The Q-learning process during 50,000 episodes is shown in
Fig. 6. We can observe from this figure that during the first
10,000 episodes, the Q value increase rapidly the Q-learning
agent can learn trials and errors after each episode at the initial
learning stage. Then, the increment of Q value becomes small
and finally Q value is stable after enough training. Therefore, a
near-optimal pricing strategy can be obtained after
approximately 30,000 episodes.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
The optimal action can be selected by using the Q-learning
algorithm, as seen in Fig. 7, which is plotted to depict the
near-optimal internal uniform prices as well as TOU prices
during the daytime. As seen from this figure, during the
off-peak time slots, e.g. 7:00-10:00 and 19:00-20:00, the
obtained internal uniform prices are higher than TOU prices.
The reason is that the rooftop PV generation is low in these time
slots so the PV system owner takes a premium pricing strategy
to maximize the profit. On the contrary, during the on-peak
time slots, i.e. 11:00-18:00, the rooftop PV generation is high
due to strong solar irradiance. Therefore, the PV system owner
would like to charge lower prices so the self-generated PV
electricity can be sold to more building users, leading to a
near-maximum revenue for rooftop PV system owner.
To clearly show the number of building users who succeed in
the bidding of local PV generation in each hour, Fig. 8 is given
in this subsection. It can be observed from this figure that few
building users can be supplied by PV energy during the
off-peak hours while more building users can utilize local PV
electricity during the on-peak hours.
Fig. 9 illustrates the dispatch of ES in each hour. As shown in
this figure, the PV system owner tends to sell the self-generated
PV energy to building users or the ES devices, rather than the
utility grid, aiming to maximize its economic benefits, so the
PV energy is stored in ES in the daytime and dispatched to the
local consumers at night. Hence, our proposed energy sharing
model as well as pricing strategy can facilitate the utilization of
local PV generation, reducing negative effects caused by
intermittent PV energy integration.
In this subsection, a comparative case study is conducted to
demonstrate the effectiveness of the pricing strategy obtained
from the proposed energy sharing model. Three different
pricing strategies are included in this case study, described as
follows,
(i) Strategy 1 (proposed internal uniform price): This pricing
strategy can be obtained by solving our formulated
leader-follower energy sharing model (4)-(11).
(ii) Strategy 2 (TOU price): The price of rooftop PV
electricity applied to local consumers equals to TOU price. In
this regard, from the perspective of consumers, the choice of
energy consumption (from the rooftop PV system owner or the
utility grid) results in the same electricity bill.
(iii) Strategy 3 (Market clearing price): The price of rooftop
PV electricity applied to local consumers equals to market
clearing price. Under this pricing strategy, the rooftop PV
system owner can acquire the same income by selling
self-generated PV energy to the local consumers and/or the
utility grid.
TABLE III
DAILY PROFIT WITH DIFFERENT PRICING STRATEGIES
Pricing strategy
Strategy 1: Internal uniform price
Strategy 2: TOU price
Strategy 3: Market clearing price
Daily profit ($)
41.99
39.71
24.47
8
Fig. 10. Comparison of hourly revenue with and without reinforcement learning
based on the same prediction information.
Fig. 10 is plotted to depict the hourly profit under these three
pricing strategies based on the same prediction information
provided by our proposed LSTM model. As seen in this figure,
the profit under Strategy 1 (internal uniform price) is always
higher than that with the other two strategies. Accordingly, this
pricing strategy leads to the highest daily profit (see Table III).
The reason is that under Strategy 2, only a few consumers with
strong environment awareness (high WTP) will purchase PV
energy form the rooftop PV system owner since PV energy
output is time-varying so it is not as stable as the electricity
from the utility grid. Besides, with Strategy 3, though local
consumers are more likely to buy the rooftop PV generation
due to the relatively low price, limited PV panel output cannot
bring the rooftop PV system owner high profit. Therefore, our
proposed pricing model can duly address the interest conflict
between the rooftop PV system owner and the local consumers
by subtlety using utilizing the environment awareness of
consumers.
D. Comparison with Conventional Optimization Method
Fig. 11. Convergence performances by the MILP based optimization method
and the proposed Q-learning algorithm based RL method.
TABLE IV
PREFERENCES OF COMPUTATION EFFICIENCY BY MILP BASED OPTIMIZATION
METHOD AND Q-LEARNING ALGORITHM BASED RL METHOD
Solution method
Conventional optimization method
Q-learning algorithm
Profit ($)
43.075
41.994
Computation time (s)
3400.42
15.339
Fig. 11 compares revenues obtained by MILP based
optimization method (solved by the CPLEX [21]) and our
proposed Q-learning algorithm based RL method. As seen in
this figure, the proposed solution method shows a poor
performance at the initial training stage since it is undergoing
trials and errors. However, after experiencing more episodes,
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
the agent adapts to the learning environment and adjusts its
policy via exploration and exploitation mechanism. Finally, it
can find a near-optimal pricing strategy. Table IV lists the
performances of computation efficiency by these two solution
methods. It can be observed that our proposed method can
significantly reduce the computation time, which can benefit
the PV energy sharing process. In this regard, considering the
adaptivity of model-free RL to the external environment, it is
suggested to accept our proposed well-performing pricing
method for energy sharing management in the RBC.
V. CONCLUSION
This paper proposes a novel dynamic prediction and
reinforcement learning based game-theoretic pricing model for
sharing the rooftop PV energy in the RBC. Specifically,
Stackelberg game theory is used to model the energy sharing
between the rooftop PV system owner in RBC and local
electricity consumers. With the introduction of the WTP of
each consumer, the original complex uniform auction for local
PV energy can be transformed into an efficient discriminatory
auction, which can be formulated to the MDP. Then, we
develop a Q-learning algorithm based solution method to find a
near-optimal pricing strategy. Besides, the LSTM network
based PV generation prediction model is built to dynamically
update action-state space by providing hourly predicted
information about future trends of rooftop PV panel outputs.
The numerical results verify the effectiveness of our proposed
method on dealing with issues of PV energy sharing
management in the RBC comprising of electrically
interconnected apartment buildings.
For the implementation of the proposed dynamic pricing
method, some major limits should be noted: 1) The WTP value
of each building consumer needs to be duly considered and it is
suggested to do user survey to obtain the reasonable WTP
values; 2) The privacy of building consumers may be violated
since they send their daily load requirement information to the
rooftop PV system owner in each hour. However, this issue can
be addressed by using non-intrusive load monitoring devices
for the long-term observation of individual load changes. 3)
Precise smart meters need to be placed in the apartment to
measure the energy consumption, resulting in a high
installation cost which may be accepted by induvial users.
APPENDIX A
KKT CONDITIONS OF NONLINEAR MODEL (4)-(14)
The general formulation of proposed bi-level energy sharing
model (4)-(14) can be described as follows,
min f1 ( x, y, ๏ฌ, ๏ญ )
{x, y ,๏ฌ , ๏ญ}
(25)
s.t. h1 ( x, y, ๏ฌ , ๏ญ ) = 0
(26)
g 2 ( x, y , ๏ฌ , ๏ญ ) ๏‚ณ 0
min f 2 ( x, y)
(27)
{ y ,๏ฌ , ๏ญ}
s.t. h2 ( x, y ) = 0 : ๏ฌ
g 2 ( x, y ) ๏‚ณ 0 : ๏ญ
(28)
9
The Karush–Kuhn–Tucker (KKT) conditions of the
lower-level optimization problem (28)-(30) can be integrated
into the upper-level optimization problem (25)-(27), given as
follows,
min f1 ( x, y, ๏ฌ, ๏ญ )
{x, y ,๏ฌ , ๏ญ}
(31)
s.t. h1 ( x, y, ๏ฌ , ๏ญ ) = 0
(32)
g 2 ( x, y , ๏ฌ , ๏ญ ) ๏‚ณ 0
(33)
๏ƒ‘y f2 ( x, y) + ๏ฌ๏ƒ‘ y h2 ( x, y) + ๏ญ๏ƒ‘ y g2 ( x, y) = 0
(34)
h2 ( x, y ) = 0
(35)
g 2 ( x, y ) ๏‚ณ 0 ⊥ ๏ญ ๏‚ณ 0
(36)
Then, the Lagrangian is introduced as follows,
L = −๏ฌhU ( PihPVuser + PihESuser ) − ๏ฌhTOU PihG − wE ๏ฌ E PihG
− ๏ญihD ( PihPVuser + PihESuser + PihG − PihD )
(37)
− ๏ญihPVuser PihPVuser − ๏ญihESuser PihESuser − ๏ญihG PihG
Therefore, the lower-level problem can be replaced by KKT
conditions, given as follows,
๏‚ถL
= PihPVuser + PihESuser + PihG − PihD = 0
๏‚ถ๏ญihD
(38)
๏‚ถL
= −๏ฌhU − ๏ญihD − ๏ญihPVuser = 0
PVuser
๏‚ถPih
(39)
๏‚ถL
= −๏ฌhU − ๏ญihD − ๏ญihESuser = 0
๏‚ถPihESuser
(40)
๏‚ถL
= −๏ฌhTOU − wE ๏ฌ E − ๏ญihD − ๏ญihG = 0
G
๏‚ถPih
(41)
PihPVuser ๏‚ณ 0 ⊥ ๏ญihPVuser ๏‚ณ 0
(42)
๏‚ณ0⊥๏ญ
(43)
ESuser
ih
P
PVc
itw
๏‚ณ0
P ๏‚ณ0⊥๏ญ ๏‚ณ0
G
ih
G
ih
(44)
APPENDIX B
LINEARIZATION OF NONLINEAR MODEL (4)-(14)
There are two nonlinearities in our proposed bi-level
optimization model (4)-(14), 1) the nonlinear term
๐‘ƒ๐‘‰
๐ธ๐‘†
๐œ†๐‘ˆโ„Ž (๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ + ๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ ) in the objective function (4); 2) the
complementarity constraints (42)-(44).
As stated in the strong duality theorem, if a problem is
convex, the objective functions of the primal and dual problems
have the same value at the optimum [36]. To linearize
๐‘ƒ๐‘‰
๐ธ๐‘†
๐œ†๐‘ˆโ„Ž (๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ + ๐‘ƒ๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ ) , the strong duality condition is
introduced here. In this regard, the primary objective function
(12) of the lower-level problem is equal to its dual objective
๐ท ๐ท
function ๐œ‡๐‘–โ„Ž
๐‘ƒ๐‘–โ„Ž , as follows,
๏ฌhU ( PihPV
user
+ PihESuser ) + ๏ฌhTOU PihG + wE ๏ฌ E PihG = ๏ญiDh PihD
(45)
(29)
(30)
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
๐‘ƒ๐‘‰๐‘ข๐‘ ๐‘’๐‘Ÿ
Accordingly, the linear expression for ๐œ†๐‘ˆโ„Ž (๐‘ƒ๐‘–โ„Ž
can be written as follows,
๏ฌhU ( PihPV
user
+ PihESuser ) = ๏ญihD PihD − ๏ฌhTOU PihG − wE ๏ฌ E PihG
= ๏ญihD PihD − ๏ฌiWTP
PihG
h
๐ธ๐‘†๐‘ข๐‘ ๐‘’๐‘Ÿ
+ ๐‘ƒ๐‘–โ„Ž
)
(46)
As for the complementarity constraints (42)-(44), Ref. [37]
provides the linear expressions by introducing the large
๐‘ƒ๐‘‰
๐ธ๐‘†
๐บ
positive constant ๐‘€ and binary variables ๐‘ข๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ , ๐‘ข๐‘–โ„Ž ๐‘ข๐‘ ๐‘’๐‘Ÿ , ๐‘ข๐‘–โ„Ž
,
described as follows,
PihPVuser , PihESuser , PihG ๏‚ณ 0
(47)
๏ญ
(48)
PVuser
ih
,๏ญ
,๏ญ ๏‚ณ 0
ESuser
ih
G
ih
PVuser
ih
๏‚ฃ (1 − u
)M
(49)
ESuser
ih
๏‚ฃ (1 − u
)M
(50)
P
P
PVuser
ih
ESuser
ih
P ๏‚ฃ (1 − u )M
(51)
๏ญ
๏‚ฃu
M
(52)
๏ญ
๏‚ฃu
M
(53)
G
ih
PVuser
ih
ESuser
ih
G
ih
PVuser
ih
ESuser
ih
๏ญihG ๏‚ฃ uihG M
PVuser
ih
u
ESuser
ih
,u
(54)
, u ๏ƒŽ{0,1}
G
ih
(55)
APPENDIX C
FINAL LINEARIZED BI-LEVEL ENERGY SHARING MODEL
By using the KKT conditions (see Appendix A) and
linearization methods (see Appendix B), the final linearized
bi-level energy sharing model is formulated as follows,
๏ƒฌ ๏ƒฅ ๏ญihD PihD − ๏ฌihWTP PihG
๏ƒผ
๏ƒฏi๏ƒŽN C
๏ƒฏ
๏ƒฏ FiT PVgrid
๏ƒฏ
ES grid
O
MaxES
Rev = ๏ƒฅ ๏ƒญ+๏ฌ ( Ph
+ Ph
)
๏ƒฝ
ES grid
U
{๏ฌh , Ph
, Ph in ,
h๏ƒŽH ๏ƒฏ
PVuser
ESuser
TOU
PV + ๏ƒฏ
PVuser
ESuser
G
Pih
, Pih
, Pih ,
+ Pih ) − Ph ] ๏ƒฏ
๏ƒฏ−๏ฌh [๏ƒฅ ( Pih
๏ญihD , ๏ญihPVuser , ๏ญihESuser , ๏ญihG }
i
๏ƒฎ
๏ƒพ
(56)
s.t. (5)-(11), (38)-(41), (47)-(55)
REFERENCES
[1] E. O'Shaughnessy, D. Cutler, K. Ardani, and R. Margolis, "Solar plus:
Optimization of distributed solar PV through battery storage and
dispatchable load in residential buildings," Applied Energy, vol. 213, pp.
11-21, 2018.
[2] W. Tushar, T. K. Saha, C. Yuen, D. Smith, and H. V. Poor, "Peer-to-peer
trading in electricity networks: an overview," IEEE Transactions on
Smart Grid, 2020.
[3] W. Tushar et al., "Three-party energy management with distributed
energy resources in smart grid," IEEE Transactions on Industrial
Electronics, vol. 62, no. 4, pp. 2487-2498, 2014.
[4] W. Tushar et al., "Energy storage sharing in smart grid: A modified
auction-based approach," IEEE Transactions on Smart Grid, vol. 7, no. 3,
pp. 1462-1475, 2016.
[5] X. Xu, J. Li, Y. Xu, Z. Xu, and C. S. Lai, "A Two-stage Game-theoretic
Method for Residential PV Panels Planning Considering Energy Sharing
Mechanism," IEEE Transactions on Power Systems, 2020.
[6] N. Liu et al., "Online energy sharing for nanogrid clusters: A lyapunov
optimization approach," IEEE Transactions on Smart Grid, vol. 9, no. 5,
pp. 4624-4636, 2017.
10
[7] S. Cui, Y.-W. Wang, J.-W. Xiao, and N. Liu, "A two-stage robust energy
sharing management for prosumer microgrid," IEEE Transactions on
Industrial Informatics, vol. 15, no. 5, pp. 2741-2752, 2018.
[8] B. S. K. Patnam and N. M. Pindoriya, "Centralized stochastic energy
management framework of an aggregator in active distribution network,"
IEEE Transactions on Industrial Informatics, vol. 15, no. 3, pp.
1350-1360, 2018.
[9] S. Cui, Y.-W. Wang, and J.-W. J. Xiao, "Peer-to-Peer Energy Sharing
among Smart Energy Buildings by Distributed Transaction," IEEE
Transactions on Smart Grid, 2019.
[10] N. Liu, X. Yu, C. Wang, and J. J. Wang, "Energy sharing management for
microgrids with PV prosumers: A Stackelberg game approach," IEEE
Transactions on Industrial Informatics, vol. 13, no. 3, pp. 1088-1098,
2017.
[11] G. Ye, G. Li, D. Wu, X. Chen, and Y. Zhou, "Towards cost minimization
with renewable energy sharing in cooperative residential communities,"
IEEE Access, vol. 5, pp. 11688-11699, 2017.
[12] N. Liu, J. Wang, X. Yu, and L. Ma, "Hybrid energy sharing for smart
building cluster with CHP system and PV prosumers: A coalitional game
approach," IEEE Access, vol. 6, pp. 34098-34108, 2018.
[13] Z. Wan, H. Li, H. He, and D. Prokhorov, "Model-Free Real-Time EV
Charging Scheduling Based on Deep Reinforcement Learning," IEEE
Transactions on Smart Grid, 2018.
[14] Y. Du and F. Li, "Intelligent Multi-microgrid Energy Management based
on Deep Neural Network and Model-free Reinforcement Learning," IEEE
Transactions on Smart Grid, 2019.
[15] Q. Yang, G. Wang, A. Sadeghi, G. B. Giannakis, and J. Sun, "Real-time
Voltage Control Using Deep Reinforcement Learning," arXiv preprint
arXiv:.09374, 2019.
[16] N. Sadeghianpourhamami, J. Deleu, and C. Develder, "Definition and
evaluation of model-free coordination of electrical vehicle charging with
reinforcement learning," IEEE Transactions on Smart Grid, 2019.
[17] P. Dai, W. Yu, G. Wen, and S. Baldi, "Distributed Reinforcement
Learning Algorithm for Dynamic Economic Dispatch with Unknown
Generation Cost Functions," IEEE Transactions on Industrial
Informatics, 2019.
[18] J. Ahmed and Z. Salam, "An improved method to predict the position of
maximum power point during partial shading for PV arrays," IEEE
Transactions on Industrial Informatics, vol. 11, no. 6, pp. 1378-1387,
2015.
[19] A. W. Azhari, K. Sopian, A. Zaharim, and M. Al Ghoul, "A new approach
for predicting solar radiation in tropical environment using satellite
images-case study of Malaysia," Transactions on Environment
Development, vol. 4, no. 4, pp. 373-378, 2008.
[20] R. B. Myerson, Game theory. Harvard university press, 2013.
[21] I. I. J. Cplex, "V12. 1: User’s Manual for CPLEX," International Business
Machines Corporation, vol. 46, no. 53, p. 157, 2009.
[22] G. OPTIMIZATION, "INC. Gurobi optimizer reference manual, 2015,"
2014.
[23] S. R. Shaw, S. B. Leeb, L. K. Norford, and R. W. Cox, "Nonintrusive load
monitoring and diagnostics in power systems," IEEE Transactions on
Instrumentation and Measurement, vol. 57, no. 7, pp. 1445-1454, 2008.
[24] S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c.
Woo, "Convolutional LSTM network: A machine learning approach for
precipitation nowcasting," in Advances in neural information processing
systems, 2015, pp. 802-810.
[25] H. Sak, A. Senior, and F. Beaufays, "Long short-term memory based
recurrent neural network architectures for large vocabulary speech
recognition," arXiv preprint arXiv:. 2014.
[26] T. M. Hansen, E. K. Chong, S. Suryanarayanan, A. A. Maciejewski, and
H. J. Siegel, "A partially observable markov decision process approach to
residential home energy management," IEEE Transactions on Smart
Grid, vol. 9, no. 2, pp. 1271-1281, 2016.
[27] C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no.
3-4, pp. 279-292, 1992.
[28] H. J. Kappen, "Optimal control theory and the linear bellman equation,"
2011.
[29] M. Tokic, "Adaptive ε-greedy exploration in reinforcement learning based
on value differences," in Annual Conference on Artificial Intelligence,
2010, pp. 203-210: Springer.
[30] N. Blair et al., "System advisor model, sam 2014.1. 14: General
description," National Renewable Energy Lab.(NREL), Golden, CO
(United States)2014.
[31] G. I. Nagy, G. Barta, S. Kazi, G. Borbély, and G. J. I. J. o. F. Simon,
"GEFCom2014: Probabilistic solar and wind power forecasting using a
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3016336, IEEE
Transactions on Industrial Informatics
> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <
[32]
[33]
[34]
[35]
[36]
[37]
generalized additive tree ensemble approach," vol. 32, no. 3, pp.
1087-1093, 2016.
Y. Li and Y. Yuan, "Convergence analysis of two-layer neural networks
with relu activation," in Advances in Neural Information Processing
Systems, 2017, pp. 597-607.
E. Fan, "Extended tanh-function method and its applications to nonlinear
equations," Physics Letters A, vol. 277, no. 4-5, pp. 212-218, 2000.
Z. Wang and A. C. Bovik, "Mean squared error: Love it or leave it? A new
look at signal fidelity measures," IEEE signal processing magazine, vol.
26, no. 1, pp. 98-117, 2009.
A. Gulli and S. Pal, Deep Learning with Keras. Packt Publishing Ltd,
2017.
G. Dantzig, Linear programming and extensions. Princeton university
press, 2016.
J. Fortuny-Amat and B. J. McCarl, "A representation and economic
interpretation of a two-level programming problem," Journal of the
operational Research Society, vol. 32, no. 9, pp. 783-792, 1981.
Xu Xu (S’18 M’19) received the M.E and Ph.D. degrees
from The Hong Kong Polytechnic University, Hong
Kong SAR in 2016 and 2019, respectively. Dr Xu is
with the Department of Electrical Engineering, The
Hong Kong Polytechnic University, Hung Hom, Hong
Kong SAR, China. His current research interests include
power system planning and operation, renewable power
integration, energy management, and artificial
intelligence application in power engineering.
Yan Xu (S’10-M’13-SM’19) received the B.E. and
M.E degrees from South China University of
Technology, Guangzhou, China in 2008 and 2011,
respectively, and the Ph.D. degree from The
University of Newcastle, Australia, in 2013. He is now
the Nanyang Assistant Professor at School of
Electrical and Electronic Engineering, Nanyang
Technological University (NTU), and a Cluster
Director at Energy Research Institute @ NTU
(ERI@N), Singapore. Previously, he held The
University of Sydney Postdoctoral Fellowship in Australia. His research
interests include power system stability and control, microgrid, and
data-analytics for smart grid applications. Dr Xu is an Editor for IEEE
Transactions on Smart Grid, IEEE Transactions on Power Systems, IEEE
Power Engineering Letters, CSEE Journal of Power and Energy Systems, and
an Associate Editor for IET Generation, Transmission & Distribution.
Ming-Hao Wang (S’15-M’2018) received the
B.Eng.(Hons.) degree in electrical and electronic engineering from the Huazhong University of Science and
Technology, Wuhan, China, and the University of
Birmingham, Birmingham, U.K. in 2012, and the M.Sc.
and the Ph.D. degree, both in electrical and electronic
engineering, from The University of Hong Kong, Hong
Kong, in 2013 and 2017, respectively. Since 2018, he
has been with the Department of Electrical Engineering, Hong Kong
Polytechnic University, Hong Kong. Currently, he is a Research Assistant
Professor in the Department of Electrical Engineering, the Hong Kong
Polytechnic University. Hissearch interests include power systems and power
electronics.
11
Zhao Xu (M’2016-SM’2012) received B.Eng, M.Eng
and Ph.D degree from Zhejiang University, National
University of Singapore, and The University of
Queensland in 1996, 2002 and 2006, respectively.
From 2006 to 2009, he was an Assistant and later
Associate Professor with the Centre for Electric
Technology, Technical University of Denmark,
Lyngby, Denmark. Since 2010, he has been with The
Hong Kong Polytechnic University, where he is
currently a Professor in the Department of Electrical
Engineering and Leader of Smart Grid Research Area. He is also a foreign
Associate Staff of Centre for Electric Technology, Technical University of
Denmark. His research interests include demand side, grid integration of wind
and solar power, electricity market planning and management, and AI
applications. He is an Editor of the Electric Power Components and Systems,
the IEEE PES Power Engineering Letter, and the IEEE Transactions on Smart
Grid. He is currently the Chairman of IEEE PES/IES/PELS/IAS Joint Chapter
in Hong Kong Section.
Jiayong Li (S’16–M’19) received the B.Eng. degree
from Zhejiang University, Hangzhou, China, in 2014,
and the Ph.D. degree from The Hong Kong Polytechnic
University, Hong Kong, in 2018. He is currently an
Assistant Professor with the College of Electrical and
Information Engineering, Hunan University, Changsha,
China. He was a Postdoctoral Research Fellow with The
Hong Kong Polytechnic University and a Visiting
Scholar with Argonne National Laboratory, Argonne,
IL, USA. His research interests include power
economics, energy management, distribution system planning and operation,
renewable energy integration, and demand-side energy management.
Songjian Chai received the Ph.D. degree from The Hong
Kong Polytechnic University, Hong Kong SAR, in 2018.
He is currently a Postdoctoral Research Fellow with The
Hong Kong Polytechnic University. His research
interests include variable renewable generation
forecasting, electricity price forecasting, power system
uncertainty analysis, and artificial intelligence
application in power engineering.
Yufei He (S’17) received the B.Eng. degree from
Zhejiang University, China, in 2016. He is currently
working toward the Ph.D. degree in the Department of
Electrical Engineering, The Hong Kong Polytechnic
University, Hong Kong. His research interests include
power electronic control for grid-integration of
renewables.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Hong Kong Polytechnic University. Downloaded on December 28,2020 at 05:24:48 UTC from IEEE Xplore. Restrictions apply.
Download