Uploaded by White Snow

E BOOK Data Science for Genomics 1st Edition by Amit Kumar Tyagi PhD, Ajith Abraham PhD

advertisement
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
xi
xiii
xv
1. Genomics and neural networks in
electrical load forecasting with
computational intelligence
1. Introduction
2. Methodology
2.1 RNN
2.2 Long short-term memory
3. Experiment evaluation
3.1 Testing methods effectiveness for
PGVCL data
3.2 Testing methods effectiveness for
NYISO data
4. Conclusion
References
2. Application of ensemble
learninge based classifiers for
genetic expression data
classification
1. Introduction
2. Ensemble learningebased classifiers for
genetic data classification
2.1 Bagging
2.2 Boosting
2.3 Stacking
3. Stacked ensemble classifier for leukemia
classification
3.1 Proposed classification model
3.2 Deep-stacked ensemble classifier
3.3 SVM meta classifier
3.4 Gradient boosting meta classifier
4. Results and discussion
5. Conclusion
References
1
2
2
4
6
6
8
9
9
11
12
13
13
13
14
14
14
15
16
17
21
21
3. Machine learning in genomics:
identification and modeling of
anticancer peptides
1. Introduction
2. Materials and methods
2.1 Google Colaboratory
2.2 Data sets
2.3 Pfeature package
2.4 Feature extraction functions
2.5 Machine learning implementation
2.6 Conclusion
References
4. Genetic factor analysis for an early
diagnosis of autism through
machine learning
1. Introduction
2. Review of literature
3. Methodology
3.1 Using KNIME software
3.2 Data set analysis through ML
algorithms
3.3 Naive Bayes learner
3.4 Fuzzy rule learner
3.5 Decision tree learner
3.6 RProp MLP learner
3.7 Random forest learner
3.8 SVM learner
3.9 K-nearest neighbors learner
3.10 Gradient boosted trees learner
3.11 K-means clustering
4. Results
4.1 Graphs obtained
4.2 Inference
5. Conclusion
Appendix
References
25
26
26
26
26
28
29
66
67
69
70
71
71
72
72
73
73
74
74
75
75
76
76
77
77
82
82
83
83
v
Get all Chapters For Ebook Instant Download by email at
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
vi
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
5. Artificial intelligence and data
science in pharmacogenomicsbased drug discovery: future of
medicines
1. Introduction
2. Artificial intelligence
3. Artificial intelligence in drug research
4. Drug discovery
4.1 Drug screening
4.2 Drug designing
4.3 Drug repurposing
4.4 ADME prediction
4.5 Dosage form and delivery system
4.6 PK/PD correlation
5. Pharmacogenomics
6. Pharmacogenomics and AI
7. Integration of pharmacogenomics and AI
8. Pharmacogenomic-based clinical
evaluation and AI
9. Discussion
10. Conclusion
Abbreviations
References
85
86
88
88
88
89
89
89
89
89
90
92
92
95
95
95
96
96
6. Recent challenges, opportunities,
and issues in various data analytics
1. Introduction
2. Big data
3. Data analytics
4. Challenges in data analytics
5. Various sectors in data analytics
6. Conclusion
References
99
99
100
101
102
105
105
7. In silico application of data science,
genomics, and bioinformatics in
screening drug candidates against
COVID-19
1. Introduction
1.1 A brief overview of SARS-CoV-2
1.2 Compounds reported with antiviral
activities
1.3 Herb extracts with antiviral property in
India
107
108
109
109
2. Materials and method
2.1 Target protein preparation
2.2 Ligand preparation
2.3 Binding site/catalytic site prediction
2.4 Structure minimization
2.5 Grid generation
2.6 Molecular docking of proteineligand
using Autodock software
2.7 Hydrogen bond interaction using
LigPlot software
2.8 Screening of compounds for drug
likeness
2.9 Screening of compounds for
toxicity
3. Results and discussion
4. Conclusion
Declaration
Nomenclature
Acknowledgments
References
109
110
110
110
110
110
111
111
111
111
111
125
125
125
126
126
8. Toward automated machine
learning for genomics: evaluation
and comparison of state-of-the-art
AutoML approaches
1. Into the world of genomics
2. Need and purpose of analytics in
genomics
3. Literature review
4. Research design
4.1 Research design methodology
4.2 AutoML tools used: PyCaret and
AutoViML
5. AutoML
5.1 Why AutoML and why it should be
democratized
5.2 Architectural design of AutoML
5.3 Democratization of AutoML and
beyond
6. Research outcome
6.1 Exploratory data analysis
6.2 Analysis using PyCaret
6.3 Analysis using AutoViML
6.4 Model comparison: PyCaret and
AutoViML
7. Business implications
8. Conclusion
References
Further reading
Get all Chapters For Ebook Instant Download by email at
129
129
129
131
131
133
133
133
134
134
135
135
137
140
143
148
151
151
152
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
9. Effective dimensionality reduction
model with machine learning
classification for microarray gene
expression data
1. Introduction
2. Related work
3. Materials and methods
3.1 Feature selection
3.2 Principal component
analysis
3.3 Logistic regression
3.4 Extremely randomized trees
classifier
3.5 Ridge classifier
3.6 Adaboost
3.7 Linear discriminant analysis
3.8 Random forest
3.9 Gradient boosting machine
3.10 K-nearest neighbors
3.11 Data set used for analysis
4. Results and discussion
4.1 Experimental analysis on 10-fold
cross-validation
4.2 Experimental analysis on
eightfold cross-validation
4.3 Comparison of our findings with
some earlier studies
5. Conclusion and future work
References
153
154
155
155
155
157
157
157
157
157
157
157
158
158
158
158
159
160
160
161
10. Analysis the structural, electronic
and effect of light on PIN
photodiode achievement through
SILVACO software: a case study
1. Introduction
165
1.1 Photodiode
165
1.2 Effect of light on the IeV characteristics
of photodiodes
165
1.3 IeV characteristics of a photodiode
167
1.4 Types of photodiodes
168
1.5 Modes of operation of a photodiode
168
1.6 Effect of temperature on IeV char of
photodiodes
168
1.7 Signal-to-noise ratio in a photodiode
169
1.8 Responsivity of a photodiode
169
1.9 Responsivity versus wavelength
169
2. PIN photodiode
170
2.1 Operation of PIN photodiode
170
2.2 Key PIN diode characteristics
170
Contents
vii
2.3 PIN diodes uses and advantages
2.4 PIN photodiode applications
3. Results and simulations
3.1 Effect of light on a PIN photodiode
3.2 Procedure to design and observe the
effect of light
3.3 VeI characteristic of a PIN photodiode
4. Conclusion
Appendix (Silvaco Code)
Effect of light on the characteristics of pin
diode code
Effect of light on the characteristics of SDD
diode code
References
171
171
171
171
171
174
176
176
176
177
177
11. One step to enhancement the performance of XGBoost through GSK
for prediction ethanol, ethylene,
ammonia, acetaldehyde, acetone,
and toluene
1. Introduction
2. Related work
3. Main tools
3.1 Internet of Things (IoTs)
3.2 Optimization techniques
3.3 Prediction techniques
4. Result of implementation
4.1 Description of dataset
4.2 Result of preprocessing
4.3 Checking missing values
5. Conclusions
References
12. A predictive model for classifying
colorectal cancer using principal
component analysis
1. Introduction
2. Related works
3. Methodology
3.1 Experimental dataset
3.2 Dimensionality reduction tool
3.3 Classification
3.4 Research tool
3.5 Performance evaluation metrics
4. Results and discussions
5. Conclusion
References
179
180
181
181
181
184
194
194
194
195
201
202
205
206
207
208
208
209
210
210
210
215
215
Get all Chapters For Ebook Instant Download by email at
viii
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
15. Genomic privacy: performance
analysis, open issues, and future
research directions
13. Genomic data science systems
of Prediction and prevention of
pneumonia from chest X-ray images
using a two-channel
dual-stream convolutional neural
network
1. Introduction
2. Review of literature
2.1 Introduction
2.2 Convolutional neural networks (CNNs)
3. Materials and methods
3.1 Dataset
3.2 The proposed architecture:
two-channel dual-stream CNN
(TCDSCNN) model
3.3 Performance matrix for classification
4. Result and discussion
4.1 Visualizing the intermediate layer
output of CNN
4.2 Model feature map
4.3 Model accuracy
5. Conclusion and future work
References
217
218
218
219
220
220
220
223
224
224
224
224
224
227
14. Predictive analytics of genetic
variation in the COVID-19 genome
sequence: a data science
perspective
1. Introduction
1.1 Objectives
2. Related work
3. The COVID-19 genomic sequence
3.1 The relevance of genome sequences
to disease analyses
3.2 Utilization of COVID-19 genome
sequencing for processing
4. Methodology
4.1 Implementation analysis
Lung epithelial similarity
5. Discussion
6. Conclusion
7. Future outlook
References
Further reading
229
231
231
232
233
233
235
240
241
243
243
245
245
247
1. Introduction
1.1 Genome data
1.2 Genomic data versus other types of
data
2. Related work
3. Motivation
4. Importance of genomic data/privacy in
real life
5. Techniques for protecting genetic
privacy
5.1 Controlled access
5.2 Differential privacy preservation
5.3 Cryptographic solutions
5.4 Other approaches
5.5 Some useful suggestions for protecting
genomic data
6. Genomic privacy: use case
7. Challenges in protecting genomic data
8. Opportunities in genomic data privacy
9. Arguments about genetic privacy with
several other privacy areas
10. Conclusion with future scope
Appendix A
Authors’ contributions
Acknowledgments
References
249
249
250
251
252
252
254
254
254
254
255
255
255
256
258
259
260
260
262
262
262
16. Automated and intelligent systems
for next-generation-based smart
applications
1. Introduction
265
2. Background work
265
3. Intelligent systems for smart
applications
266
4. Automated systems for smart
applications
266
5. Automated and intelligent systems for
smart applications
266
6. Machine learning and AI technologies for
smart applications
267
7. Analytics for advancements
267
8. Cloud strategies: hybrid, containerization,
serverless, microservices
267
Get all Chapters For Ebook Instant Download by email at
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Contents
9. Edge intelligence
10. Data governance and quality for smart
applications
11. Digital Ops including DataOps, AIOps,
and CloudSecOps
12. AI in healthcaredfrom data to
intelligence
13. Big data analytics in IoT-based smart
applications
14. Big data applications in a smart city
15. Big data intelligence for cyber-physical
systems
16. Big data science solutions for real-life
applications
17. Big data analytics for cybersecurity and
privacy
18. Data analytics for privacy-by-design in
smart health
19. Case studies and innovative applications
19.1 Innovative bioceramics
268
268
269
270
271
271
272
272
272
273
273
273
20. Conclusion and future scope
Acknowledgments
References
Further reading
ix
274
274
274
276
17. Machine learning applications for
COVID-19: a state-of-the-art review
1. Introduction
2. Forecasting
3. Medical diagnostics
4. Drug development
5. Contact tracing
6. Conclusion
References
Index
277
278
280
283
284
286
287
291
Get all Chapters For Ebook Instant Download by email at
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Chapter 1
Genomics and neural networks in
electrical load forecasting with
computational intelligence
1. Introduction
Load forecasting is defined as a procedure used for predicting the future electricity demand using historical data to be able
to manage electric generation and electric demand of electric utilities. In the present scenario the load forecasting is an
essential task in a smart grid. The smart grid is an electrical grid that uses computers, digital technologies, or other
advanced technologies for real-time monitoring, maintaining generation and demand, and to act on particular information
(information such as behavior of electric utilities or consumers) for improving efficiency, reliability, sustainability, and
economics [1]. To fulfill the applications of a smart grid the load forecasting plays an important role. A smart grid has
various modes of forecasting in electric grids, which are load forecasting, price forecasting, solar-based electricity generation forecasting, and wind-based electricity generation forecasting. The load forecasting is classified into four categories
[2e4]: (i) very short-term load forecasting, (ii) short-term load forecasting, (iii) mid-term load forecasting, and (iv) longterm load forecasting. The strong focus done in this paper is on short-term load forecasting. As the demand of electricity is
increasing the very short-term load forecasting and short-term load forecasting are helpful to provide additional security,
reliability, and protection to smart grids. Also, it is useful for energy efficiency, electricity price, market design, demand
side management, matching generation and demand, and unit commitment [5]. The machine learning will accurately
predict the electrical load to fulfill the needs of smart grids.
The well-defined long short-term memory (LSTM) and recurrent neural network (RNN) are used in many papers for
load forecasting, and these methods are hybridized to improve the predictions. The review on well-defined RNN and
LSTM methods used for load forecasting is as follows. In paper [6], the author has applied LSTM RNN for nonresidential
energy consumption forecasting. The real-time energy consumption data is from South China, which contains multiple
sequences of 48 nonresidential consumers’ energy consumption data. The unit of measured data is in kilowatts, and data is
collected from Advanced metering infrastructure (AMI) with sampling interval of 15 min. To calculate the prediction
accuracy, the Mean Absolute Error (MAE), Mean Absolute Percent Error (MAPE), and Root Mean Squared Error (RMSE)
method is used. In paper [7], the author has applied a RNN-LSTM neural network for long-term load forecasting. The real
time ISO New England load data is used for 5-year load prediction. The MAPE method is used to calculate the accuracy of
forecasted results. Year-wise and season-wise MAPE is calculated from which the majority MAPE is below 5% and not
exceeding 8%.
In paper [8], the author mentions multiple sequence LSTM is become an attractive approach for load prediction because
of increasing volume variety of smart meters, automation systems, and other sources in smart grids. For energy load forecasting the multisequence LSTM, LSTM-Genetic Algorithm (GA), LSTM-Particle swarm optimization (PSO), random
forecast, Support vector machines (SVM), Artificial Neural Network (ANN), and extra tree regressor methods are used, and a
comparison is made between them using RMSE and MAE. The load data was obtained from Réseau de Transport d’Électricité (RTE) Corporation, French electricity transmission network. In paper [9], the author has used LSTM for power
demand forecasting, and LSTM prediction is compared with Gradient Boosted Trees (GBT) and Support Vector Regression
1
Get all Chapters For Ebook Instant Download by email at
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
2
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Data Science for Genomics
(SVR). The LSTM gives better prediction than GBT and SVR by decreasing MSE by 21.80% and 28.57%, respectively.
Timeseries features, weather features, and calendar features are considered for forecasting. University of Massachusetts has
provided the power data for forecasting. The evaluation of model accuracy is calculated using MSE and MAPE.
In paper [10], the electricity consumption prediction is carried out for residential and commercial buildings using a deep
recurrent neural network (RNN) model. The Austin, Texas, residential buildings electricity consumption data is used for
mid-term to long-term forecasting and for commercial buildings, and the Salt Lake City, Utah, electricity consumption data
is used for prediction. For commercial buildings the RNN performs better than a multilayered perceptron model. In paper
[11], the author has used LSTM method for power load forecasting. The eunite real power load data has been used for
forecasting. The next hour and next half day prediction has been made using a single-point forecasting model of LSTM and
multiple-point forecasting model of LSTM. The model accuracy has calculated using MAPE. The single-point forecasting
model of LSTM performs better than multiple-point forecasting model of LSTM.
In paper [12], the author has applied RNN for next 24-h load prediction. The RNN prediction result is compared with
Back-Propagation neural network. In paper [13], the author has used deep RNN, DRNN-Gated Recurrent Unit (GRU),
DRNN-LSTM, multilayer perceptron (MLP), Autoregressive Integrated Moving Average (ARIMA), SVM, and MLR
methods for load demand forecasting. For prediction the author has used residential load data from Austin, Texas, USA.
Methods evaluation was calculated based on MAE, RMSE, and MAPE. In paper [14], the author has used RNN, LSTM,
and GBT for wind power forecasting. Using the wind velocity data from Kolkata, India, the wind power output forecasting
was carried out. The methods accuracy was calculated using MAE, MAPE, MSE, and RMSE.
In paper [15], the author used LSTM for short-term load forecasting. Here, 24-h, 48-h, 7-day, and 30-day ahead
predictions were made and compared with actual load. The LSTM accuracy was tested using RMSE and MAPE. In paper
[16], the author made long-term energy consumption prediction using LSTM. The real-time industrial data was used for
forecasting. The LSTM result was compared with ARMA, ARFIMA, and BPNN prediction result; out of this the LSTM
performed better. MAE, MAPE, MSE, and RMSE was used to evaluate methods accuracy.
The contribution of this paper is to accurately forecast the load using well-defined machine learning methods. In this
paper, two different zones of a real-time load dataset are used for prediction. The first load dataset is of Paschim Gujarat Vij
Company Ltd. (PGVCL), India, and the second load dataset is of NYISO, USA. For both datasets the well-defined machine
learning methods called RNN and LSTM are applied for load prediction. The accuracy of forecasted load is calculated
using root mean squared error and mean absolute percentage error. Further, the machine learning methods results are
compared with time series models prediction results that tried to achieve better prediction than time series models. In most
cases the machine learning works excellently. The time series models result is taken from Ref. [17] or this paper is
extended work of Ref. [17].
The rest of the paper is prepared as follows. Section 2 includes explanations of the well-defined applied machine
learning method, i.e., RNN and LSTM. Section 3 shows the output prediction result of applied machine learning methods
for both load datasets. Section 4 will conclude the paper in short.
2. Methodology
2.1 RNN
The concept of RNN is introduced to process the sequence data and to recognize the pattern in sequence. The reason to
develop the RNN is the feed forward network fails to predict the next value in sequence or the feed forward network
predicts the next value poorly. The feed forward network is mostly not used for sequence prediction because the new
output has no relation with previous output. Now let us see how the RNN can solve the feed forward network problem for
prediction. Fig. 1.1 illustrates the generalized way to represent the RNN, in which there is a loop where the information is
flowing from the previous timestamp to the next timestamp. For a better understanding, Fig. 1.2 shows the unrolling of a
generalized form of RNN, i.e., Fig. 1.1 [18].
From Fig. 1.2, we have input at “t-1,” which will feed it to the network; then we will get the output at “t-1.” Then at the
next time stamp, i.e., at “t” we have input at time “t” that will be given to a network along with the information from the
previous timestamp, i.e., “t-1,” and that will help us to get the output at “t.” Similarly, for output “tþ1,” we have two
inputs: one is a new input at “tþ1” that we feed to the network, and the other is the information coming from the previous
time stamp, i.e., at “t” to get the output at time “tþ1.” Likewise, it can go on [19]. Fig. 1.3 indicates the mathematical
structure of RNN. From Fig. 1.3, two generalized equations can be written as follows:
ht ¼ gh ðWi xt þ WR ht1 þ bh Þ
y t ¼ gy W y ht þ by
(1.1)
(1.2)
Get all Chapters For Ebook Instant Download by email at
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Genomics and neural networks in electrical load forecasting with computational intelligence Chapter | 1
Output
A
Input
FIGURE 1.1 Representation of RNN.
Output
at ‘t’
Output
at ‘t–1’
A
Info from
input ‘t–1’
Output
at ‘t+1’
Info from
input ‘t’
A
A
Input
at ‘t’
Input
at ‘t–1’
Input
at ‘t+1’
FIGURE 1.2 Unrolling of RNN.
y0
y1
Wy
y2
Wy
h0
Wt
WR
Wy
h1
Wt
x0
WR
h2
WR
Wt
x1
x2
FIGURE 1.3 Mathematical representation of RNN.
Get all Chapters For Ebook Instant Download by email at
3
4
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Data Science for Genomics
Where, wi is the input weight matrix, wy is output weight matrix, WR is hidden layer weight matrix, gh and gy are activation
functions, and bh and by are the biases. Eqs. (1.1) and (1.2) are useful to calculate the h0, h1, h2, . and y0, y1, y2, . values,
as shown in Fig. 1.3, respectively.
For calculating ℎ0 and y0, let us consider time “t” equals zero (i.e., t ¼ 0), and at t ¼ 0 the input is x0. Now by
substituting t ¼ 0 and input x0 in Eqs. (1.1) and (1.2), we get
h0 ¼ gh ðWi x0 þ WR h1 þ bh Þ
(1.3)
But in Eq. (1.3) the term WR * ℎ1 cannot be applied because time can never be negative, so Eq. (1.3) can be rewritten
as
h0 ¼ gh ðWi x0 þ bh Þ
y 0 ¼ gy W y h 0 þ by
(1.4)
(1.5)
From Eqs. (1.4) and (1.5), we can calculate ℎ0 and y0. Now, let us consider t ¼ 1 and the input x1 at t ¼ 1 for calculating
ℎ1 and y1, so by putting values of t ¼ 1 and input in Eqs. (1.1) and (1.2), we get
h1 ¼ gh ðWi x1 þ WR h0 þ bh Þ
y 1 ¼ gy W y h1 þ by
(1.6)
(1.7)
From Eqs. (1.6) and (1.7), we can find ℎ1 and y1. Similarly, for input x2 at t ¼ 2, we can calculate the value of ℎ2 and y2.
By substituting values into Eqs. (1.1) and (1.2), we get
h2 ¼ gh ðWi x2 þ WR h1 þ bh Þ
y 2 ¼ gy W y h2 þ by
(1.8)
(1.9)
From Eqs. (1.8) and (1.9), we can calculate ℎ2 and y2. Likewise, it can go on up to “n” period of time. So, this is how
RNN works mathematically. This method is explained by referring to various sources [11,18,19].
2.2 Long short-term memory
The LSTM neural network is a time RNN, and it is a special case of RNN, which was proposed by Hochreiter and
Schmidhuber [20,21]. The LSTM can solve the various problems faced by RNN. As sequence length is increases the
problems faced by RNN are vanishing gradient, limited storage, limited memory, and short-term memory. In LSTM
structure, there are cell state and three different gates, which will effectively solve the RNN problem. The cell state will
carry the relevant information throughout the processing of a network, and cell state acts as “memory” of the network.
Because of cell state, the earlier time stamp values can be used in later time stamps, so the LSTM can reduce the effect of
short-term memory. The various gates in LSTM are responsible to add or remove the information in cell state, and during
training the network the gates can learn what information is necessary to keep or to forget. The gates can regulate the flow
of information in the network. Fig. 1.4 illustrates the single LSTM cell or internal layout of LSTM. The LSTM has a
similar chain-type layer to RNN, where the only difference is the internal structure and way of calculating a hidden state
Cell state
ct–1
ct
tanh
ft
it
σ
σ
ht–1
xt
Forget
gate
čt
tanh
Input
gate
ot
σ
ht
Output
gate
FIGURE 1.4 LSTM cell.
Get all Chapters For Ebook Instant Download by email at
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Genomics and neural networks in electrical load forecasting with computational intelligence Chapter | 1
5
(ℎt). The hidden state is passed from one cell to other in a chain. In internal RNN cells, there is only tanh activation, but
from Fig. 1.4 the LSTM has a complex internal cell. From Fig. 1.4 the s is the sigmoid activation.
For understanding the mathematics behind LSTM and how a hidden state is calculated in it, the forget gate, input gate,
cell state, and output gate are split into different parts, shown in Fig. 1.5AeD respectively. Before going to the mathematics equation, let us see the function of tanh and sigmoid activation layers. The values that are flowing through the
LSTM network are regulated with the help of tanh activation. The tanh activation will squish (lessen) values between 1
and 1. A sigmoid activation has similar function as tanh activation; the difference is the sigmoid activation will lessen
values between 0 and 1. The value or values in the vector that come out from the sigmoid activation indicate values that are
closer to 0 are completely forgotten, and values that are closer to 1 are to be kept in the network or in cell state.
The forget gate is considered the first step in LSTM. This gate will make a decision (decide) regarding which information should be kept or removed from the cell state or network. From Fig. 1.5A, the mathematical representation of the
forget gate is expressed as
ft ¼ s Wf ½ht1 ; xt þ bf
(1.10)
In Eq. (1.10) the s is sigmoid activation, wf is weight, ℎt1 is output from the previous time stamp, xt is new input, and
bf is bias. In Fig. 1.5A and Eq. (1.10) to calculate ft, the previous output or previous hidden state ℎt1 and new input xt are
combined and multiplied with weight; after added to bias, the result is passed through the sigmoid activation. Now the
sigmoid activation will squish values between 0 and 1, and values that are nearer to 0 will be discarded and values that are
nearer to 1 will kept.
The next step is input gate, which will update the values of cell state. To update the cell state, the previous output (ℎt1)
and present input are passed through sigmoid activation. The sigmoid activation will convert the values between 0 and 1;
from this we can know which values should be updated or not. The output that comes from sigmoid activation is it. Further,
the previous output and present input are passed through tanh activation. The tanh activation will squish values between
t. From Fig. 1.5B the mathematical
1 and 11 to regulate the network [22]. The output that comes from tanh activation is C
representation of input gate is expressed as
it ¼ sðWi ½ht1 ; xt þ bi Þ
(1.11)
t ¼ tanhðWc ½ht1 ; xt þ bc Þ
C
(1.12)
The next step is to update the old cell state, i.e., ct1, into the new cell state, i.e., ct; for this, first, the old cell state is
multiplied by ft, where the vector ft has values between 0 and 11, so the old cell state values that are multiplied by 0 will
t) are multiplied; here the
become 0 or dropped. Now the sigmoid activation output (it) and tanh activation output (C
sigmoid activation will decide what to keep or to remove, i.e., it has vector values between 0 and 1. Then there is pointwise
addition to get a new cell state, shown in Fig. 1.5C. The mathematical equation is written as
t
ct ¼ ct1 ft þ it C
ft
ht–1
it
σ
ht–1
xt
čt
σ
ct
tanh
tanh
[ht–1, xt]
ot
xt
(a)
(b)
ct–1
(1.13)
ct
ht–1
σ
ht
xt
ft
(d)
it
čt
(c)
FIGURE 1.5 Various gates and cell states are split from LSTM cell to understand the mathematics behind it: (A) forget gate, (B) input gate, (C) cell
state, and (D) output gate.
Get all Chapters For Ebook Instant Download by email at
6
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Data Science for Genomics
The last step is output gate in which the hidden state (ℎt) is calculated, and this calculated hidden state is passed forward
to the next time stamp (next cell). Hidden state is used for prediction, and it has the information of previous input. To find
the hidden state, first the previous hidden state (ℎt1) and present input are passed through sigmoid activation to get the ot.
Now the new cell state (ct) is passed through tanh activation. Further, the tanh activation output and sigmoid activation
output, i.e., ot, are multiplied to get the new hidden state ht as shown in Fig. 1.5D. The mathematical equation is written as
Ot ¼ sðWo ½ht1 ; xt þ bo Þ
(1.14)
ht ¼ ot tanhðct Þ
(1.15)
Further, the hidden state ℎt and new cell state ct are carried over to the next time stamp. This method is explained by
referring to various sources [13,23,24].
3. Experiment evaluation
3.1 Testing methods effectiveness for PGVCL data
For the PGVCL load dataset the short-term load forecasting was carried out; i.e., day-ahead and week-ahead predictions
were made using RNN and LSTM. The actual observed data provided by PGVCL is from April 1, 2015 to March 31, 2019
(approximately 4 years), and the time horizon is hourly; i.e., each point was observed at each hour in a day. Fig. 1.6 shows
the real-time observed load by PGVCL [25].
For day-ahead, the method effectiveness was checked for March 31, 2019 (24 h). Here the load data from April 1,
2015 to March 30, 2019, historical data, is given in the training data set and March 31, 2019 data is given to testing data
set. Using the training set the prediction for day March 31, 2019 is done. Likewise, for week-ahead the method
effectiveness is checked for days in March 25, 2019 to March 31, 2019 (each hour in 1 week). Here the load data from
April 1, 2015 to March 24, 2019, historical data, is given in the training set, and March 25, 2019 to March 31, 2019 data
is given to the testing set. Using the training set the prediction for days March 25, 2019 to March 31, 2019 is made.
Fig. 1.7 illustrates the comparison between actual load data of PGVCL and predicted load by RNN and LSTM for day
ahead.
Also, this predicted load by RNN and LSTM is further compared with time series models prediction, as shown in
Table 1.1. The time series models prediction results is taken from Ref. [17]. In this paper, we tried to achieve a better
prediction with RNN and LSTM and experiment with how well the machine learning methods can work on PGVCL load
data. The AR (25) model gives a better prediction than the machine learning method (i.e., RNN and LSTM) for day ahead,
per Table 1.1. From Table 1.1, the AR (25) model gives a better prediction result with approximately 99% accuracy (1.92%
MAPE) and with 95.78 MW measured error, while the RNN gives a prediction result with approximately 97% accuracy
(2.77% MAPE) and with 148.83 MW measured error, and the LSTM gives a prediction result with approximately 97%
accuracy (2.85% MAPE) and with 153.38 MW measured error.
Fig. 1.8 illustrates the comparison between actual load data of PGVCL and predicted load by RNN and LSTM for week
ahead, respectively. Also, this predicted load by RNN and LSTM is further compared with the time series models
FIGURE 1.6 Observed PGVCL load
data set from April 1, 2015 to March 31,
2019.
PGVCL Load Data
7000
6000
5000
4000
3000
2000
Get all Chapters For Ebook Instant Download by email at
01-02-19
01-12-18
01-10-18
01-08-18
01-06-18
01-02-18
01-04-18
01-12-17
01-10-17
01-06-17
01-08-17
01-04-17
01-12-16
01-02-17
01-10-16
01-06-16
01-08-16
01-04-16
01-02-16
01-12-15
01-10-15
01-08-15
01-06-15
0
01-04-15
1000
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Genomics and neural networks in electrical load forecasting with computational intelligence Chapter | 1
7
FIGURE 1.7 Comparison of RNN and
LSTM prediction result for 1 day with
actual PGVCL load.
TABLE 1.1 Testing of day-ahead prediction.
Models/methods
RMSE (MW)
MAPE (%)
RNN
148.83
2.77
LSTM
153.38
2.85
AR (25)
95.784
1.92
ARMA (4,5)
201.86
3.70
ARIMA (4,1,5)
191.67
3.72
SARIMA (2,0,1) (1,0,1,24)
105.83
1.99
FIGURE 1.8 Comparison of RNN and
LSTM prediction result for 1 week with
actual PGVCL load.
prediction, as shown in Table 1.2. Also, for week ahead, we tried to achieve the better prediction with RNN and LSTM
than time series models, and here too, we experiment with how well the machine learning methods can work on PGVCL
load data for weekly prediction. The RNN gives a better prediction than time series models for week ahead, per Table 1.2.
From Table 1.2 the RNN gives a prediction result with approximately 97% accuracy (2.74% MAPE) and with 147.94 MW
measured error, and the LSTM worked well for week-ahead prediction giving a result with approximately 97% accuracy
(2.77% MAPE) and with 148.35 MW measured error. Both RNN and LSTM show better prediction than time series
models for week-ahead prediction.
Get all Chapters For Ebook Instant Download by email at
8
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Data Science for Genomics
TABLE 1.2 Testing of week-ahead prediction.
Models/methods
RMSE (MW)
MAPE (%)
RNN
147.91
2.74
LSTM
148.35
2.77
AR (95)
191.89
3.577
ARMA (12, 7)
218.40
4.100
ARIMA (12, 1, 10)
180.57
3.325
SARIMA (2,0,1) (1,0,1,24)
280.81
5.53
3.2 Testing methods effectiveness for NYISO data
In this, Hudson Valley, NY, real-time observed load data is taken from publicly available NYISO [26]. The observed load
data is from October 1, 2018 to October 21, 2019, and the time resolution is hourly. The observed real-time load data unit
is in MW. Fig. 1.9 shows the measured load by NYISO. Using NYISO load data the week-ahead prediction is made by
RNN and LSTM. The observed data from October 1, 2018 to October 14, 2019 is given to the training set, and load data
from October 15, 2019 to October 21, 2019 is given to the testing set. Further, the prediction that was made by RNN and
LSTM is compared with time series model prediction and with NYISO prediction result [17,27]. Similarly, here too we
tried to achieve better prediction with RNN and LSTM and experiment with how well the machine learning method can
work on NYISO load data. Fig. 1.10 shows the comparison of RNN, NYISO, and LSTM prediction results with actual
NYISO load. From Table 1.3 the RNN gives good prediction results in terms of RMSE with 41.19 MW measured error,
while the NYISO gives a prediction of 44.74 MW measured error. But in terms of MAPE the NYISO gives a good
FIGURE 1.9 Observed NYISO load data set from
October 1, 2018 to October 21, 2019.
FIGURE 1.10 Comparison of RNN,
NYISO, and LSTM prediction results with
actual NYISO load.
Get all Chapters For Ebook Instant Download by email at
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Genomics and neural networks in electrical load forecasting with computational intelligence Chapter | 1
9
TABLE 1.3 Testing of week-ahead prediction.
Models/methods
RMSE (MW)
MAPE (%)
RNN
41.19
3.67
LSTM
41.56
3.80
NYISO
44.741
3.4
AR (20)
71.158
6.37
SARIMA (3, 0, 2) (2, 0, 2, 24)
56.51
4.9
prediction result with 3.4 MAPE, i.e., approximately 97% accurate, while the RNN and LSTM give 3.67 and 3.80 MAPE,
respectively. Both RNN and LSTM have the same accuracy as NYISO per MAPE, but the RNN and LSTM work better
than NYISO prediction per RMSE and as shown in the prediction graph in Fig. 1.10.
4. Conclusion
In this paper we have used two machine learning methods call RNN and LSTM for electrical load forecasting. Both
methods are well explained in Section 2 by studying various sources. The forecasting made by RNN and LTTM is further
compared with time series models predictions. Overall, the machine learning methods perform better for large sequence
predictions. For day-ahead PGVCL load data the time series model performs better than RNN and LSTM, while for weekahead the machine learning shows better prediction than the time series model. For week-ahead NYISO load data the
NYISO prediction gives better prediction than machine learning in terms of MAPE, but at the same time, the machine
learning gives better prediction than NYISO prediction in terms of RMSE.
References
[1] C. Kuster, Y. Rezgui, M. Mourshed, Electrical load forecasting models: a critical systematic review, Sustainable Cities and Society 35 (August
2017) 257e270.
[2] T. Hong, S. Fan, Probabilistic electric load forecasting: a tutorial review, International Journal of Forecasting 32 (3) (JulyeSeptember 2016)
914e938.
[3] R. Patel, M.R. Patel, R.V. Patel, A review: introduction and understanding of load forecasting, Journal of Applied Science and Computations (JASC)
IV (IV) (June 2019) 1449e1457.
[4] M.R. Patel, R. Patel, D. Dabhi, J. Patel, Long term electrical load forecasting considering temperature effect using multi-layer perceptron neural
network and k-nearest neighbor algorithms, International Journal of Research in Electronics and Computer Engineering (IJRECE) 7 (2) (AprileJune
2019) 823e827.
[5] K. Yan, W. Li, Z. Ji, M. Qi, Y. Du, A hybrid LSTM neural network for energy consumption forecasting of individual households, IEEE Access 7
(2019) 157633e157642.
[6] R. Jiao, T. Zhang, Y. Jiang, H. He, Short-term non-residential load forecasting based on multiple sequences LSTM recurrent neural network, IEEE
Access 6 (2018) 59438e59448.
[7] R.K. Agrawal, F. Muchahary, M.M. Tripathi, Long term load forecasting with hourly predictions based on long-short-term-memory networks, in:
2018 IEEE Texas Power and Energy Conference (TPEC), 2018, pp. 1e6. College Station, TX.
[8] S. Bouktif, A. Fiaz, A. Ouni, M.A. Serhani, Multi-sequence LSTM-RNN deep learning and metaheuristics for electric load forecasting, Energies 13
(2020) 391.
[9] C. Yao, C. Xu, D. Mashima, V.L.L. Thing, Y. Wu, PowerLSTM: power demand forecasting using long short-term memory neural network. International Conference on Advanced Data Mining and Applications (ADMA), 2017, pp. 727e740.
[10] A. Rahmana, V. Srikumar, A.D. Smith, Predicting electricity consumption for commercial and residential buildings using deep recurrent neural
networks, Applied Energy 212 (February 2018) 372e385.
[11] D. Tang, C. Li, X. Ji, Z. Chen, F. Di, Power load forecasting using a refined LSTM, in: 11th International Conference on Machine Learning and
Computing (ICMLC ’19), 2019, pp. 104e108. NY-USA.
[12] V. Mansouri, M.E. Akbari, Efficient short-term electricity load forecasting using recurrent neural networks, Journal of Artificial Intelligence in
Electrical Engineering 3 (9) (June 2014) 46e54.
[13] L. Wen, K. Zhou, S. Yang, Load demand forecasting of residential buildings using a deep learning model, Electric Power Systems Research 179
(February 2020) 106073.
Get all Chapters For Ebook Instant Download by email at
10
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
Data Science for Genomics
[14] T. Srivastava, Vedanshu, M.M. Tripathi, Predictive analysis of RNN, GBM and LSTM network for short-term wind power forecasting, Journal of
Statistics and Management Systems 23 (1) (February 2020) 33e47.
[15] S. Muzaffar, A. Afshari, Short-term load forecasts using LSTM networks, Energy Procedia 158 (February 2019) 2922e2927.
[16] J.Q. Wang, Y. Du, J. Wang, LSTM based long-term energy consumption prediction with periodicity, Energy 16 (February 2020) 117197.
[17] M.R. Patel, R.B. Patel, D.N.A. Patel, Electrical energy demand forecasting using time series approach, International Journal of Advanced Science
and Technology 29 (3s) (March 2020) 594e604.
[18] S. Bouktif, A. Fiaz, A. Ouni, M.A. Serhani, Optimal deep learning LSTM model for electric load forecasting using feature selection and genetic
algorithm: comparison with machine learning approaches, Energies 11 (7) (June 2018) 1636.
[19] T. Prasannavenkatesan, Forecasting hyponatremia in hospitalized patients using multilayer perceptron and multivariate linear regression techniques,
Concurrency and Computation: Practice and Experience 33 (16) (2021) e6248.
[20] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (8) (November 1997) 1735e1780.
[21] M. Chai, F. Xia, S. Hao, D. Peng, C. Cui, W. Liu, PV power prediction based on LSTM with adaptive hyperparameter adjustment, IEEE Access 7
(2019) 115473e115486.
[22] P. Theerthagiri, I. Jeena Jacob, A. Usha Ruby, V. Yendapalli, Prediction of COVID-19 possibilities using K-nearest neighbour classification algorithm, International Journal of Current Research and Review 13 (06) (2021) 156.
[23] S. Motepe, A.N. Hasan, R. Stopforth, Improving load forecasting process for a power distribution network using hybrid AI and deep learning
algorithms, IEEE Access 7 (2019) 82584e82598.
[24] Y. Ma, Q. Zhang, J. Ding, Q. Wang, J. Ma, Short term load forecasting based on iForest-LSTM, in: 2019 14th IEEE Conference on Industrial
Electronics and Applications (ICIEA), Xi’an, China, 2019, pp. 2278e2282.
[25] http://www.pgvcl.com.
[26] T. Prasannavenkatesan, Probable forecasting of epidemic COVID-19 in using COCUDE model, EAI Endorsed Transactions on Pervasive Health
and Technology 7 (26) (2021) e3.
[27] http://www.energyonline.com/Data/GenericData.aspx?DataId¼14&NYISO ISO_Load_Forecast.
Get all Chapters For Ebook Instant Download by email at
We Don’t reply in this website, you need to contact by email for all chapters
Instant download. Just send email and get all chapters download.
Get all Chapters For Ebook Instant Download by email at
etutorsource@gmail.com
You can also order by WhatsApp
https://api.whatsapp.com/send/?phone=%2B447507735190&text&type=ph
one_number&app_absent=0
Send email or WhatsApp with complete Book title, Edition Number and
Author Name.
Download