Uploaded by Jacob Ethan

Various Machine learning methods in predicting rainfall (1)

Various Machine
Learning Methods in
Predicting Rainfall
An Academic presentation by
Dr. Nancy Agnes, Head, Technical Operations, Tutors India
Group: www.tutorsindia.com
Email: info@tutorsindia.com
Today's Discussion
Boosted Decision Tree Regression (BDTR)
Decision Forest Regression (DFR)
Neural Network Regression (NNR)
Bayesian Linear Regression (BLR)
Performance evaluation metric for machine learning methods
Method 1: Forecasting rainfall using Autocorrelation Function (ACF)
Method 2: Forecasting rainfall using projected error
The term machine learning (ML) stands for "making
it easier for machines," i.e., reviewing data without
having to programme them explicitly.
The major aspect of the machine learning process is
performance evaluation.
Four commonly used machine learning algorithms
are Supervised, semi-supervised, unsupervised and
reinforcement learning methods.
The variation between supervised and unsupervised learning is
that supervised learning already has the expert knowledge to
developed the input/output.
On the other hand, unsupervised learning takes only the input and
uses it for data distribution or learn the hidden structure to
produce the output as a cluster or feature.
The purpose of machine learning is to allow computers to
forecast, cluster, extract association rules, or make judgments
based on a dataset.
The major aim of this blog is to study and compare various ML
models, which are used for the prediction of rainfall, namely,
DFR- Decision Forest Regression, BDTR- Boosted Decision
Tree Regression, NNR- Neural Network Regression and BLRBayesian Linear Regression.
Secondly, assist in discovering the most accurate and reliable
model by showing the evaluations conducted on various
scenarios and time horizons.
The major objective is to predict the effectiveness of these
algorithms in learning the sole input of rainfall patterns.
A BDTR is a classic method to create an ensemble of
regression trees where each tree is dependent on the
prior tree.
In ensemble learning methods, the second tree
rectifies the errors of the primary tree, the errors of
the primary and second trees are corrected by the
third tree, and so on.
Predictions are made using the entire set of trees
used to create the forecast.
The BDTR is particularly effective at dealing with tabular data.
The advantages of BDTR are it robust to missing data and
normally allocate feature significance scores.
BDTR usually outperforms DFR since it appears to be the
method of choice in Kaggle competition, with somewhat better
performance than DFR.
Unlike DFR, BDTR is more prone to overfitting because the
main purpose is to reduce bias and not variance.
BDTR takes longer to build Because there are more
hyperparameters to optimize, and trees are generated
A DFR is a collective of randomly trained decision trees.
It works by constructing many decision trees at training time
and produces a mean forecast (regression) or an individual tree
model of classes (classification) as the end of the product.
Each tree is assembled with a random subset of features and
an irregular subset of data that allows the trees to deviate by
appearing in different datasets.
It has two parameters: the number of trees and the number of
selected features at each node.
DFR is good in generates uneven data sets with
missing variables since it is generally robust to
It also has lower classification errors and better
scores than decision trees, but it does not easily
interpret the results.
Another disadvantage is the important feature may not
be vigorous to variety within the preparing dataset.
It has two parameters: the number of trees to be
selected at each node and the number of features to be
certain at each node.
Because it is generally resistant to overfitting, DFR generates
unequal data sets with missing variables.
It also has lower classification error and f-scores than decision
trees, but the findings are difficult to interpret.
Another significant disadvantage is that the feature may not
deal with the variability in the supplied dataset.
The distribution of DFR is depicted in Figure 2 below[1].
NNR is made up of a series of linear operations interspersed
with non-linear activation functions.
The network's settings are as follows: the foremost layer is the
input layer, the former layer is the output layer, and some
hidden layers are made up of the equal number of nodes to the
number of classes.
The structure of a neural network (NN) model gives a brief
description of the number of hidden layers, how the layers are
connected, the number of nodes in each hidden layer, which
activation function is used, and weights on the graph edges.
Although NN is widely known for using deep learning and
modelling complicated problems such as image recognition, it
can be easily adapted to regression problems.
If individuals employ adaptive weights and approximate the
non-linear functions of their inputs, any statistical model can
be classified as a NN.
As a result, NNR is well suited to problems where a typical
regression model cannot provide a solution. Figure 3 below
shows the architecture modeling.
NNR is a sequence of linear operations scattered with
various non-linear activation functions.
The network has these defaults; the major layer is the
input layer, the last layer is the output layer, and the
hidden layer consisting of several nodes equal to the
number of classes.
A neural network (NN) is defined by its structure,
including the number of hidden layers, each hidden
layer's number of nodes, how the layers are linked,
which activation function is used, and the weights.
NN's are widely known for use in deep learning and modelling complex problems
such as image recognition.
They are easily adapted to regression problems.
Thus, NNR is suited to situations where a more traditional regression model cannot
fit a solution.
Unlike linear regression, the Bayesian
technique employs Bayesian inference.
To obtain parameter estimates, prior parameter
information must be paired with a probability
The forecast distribution utilizes probabilities
by current belief about w given data to assess
the likelihood of a value y given x for a specific
w. (y, X).
Finally, add up all of the potential w [43] values.
BLR uses a natural mechanism to allow insufficient or
incorrectly dispersed data to survive.
The main benefit is that, unlike traditional regression,
Bayesian processing allows you to recover the complete
spectrum of inferential solutions rather than just a single
estimate and a confidence interval.
To summarize, the choice of the proposed methods to
implement the rainfall forecasting model is difficult for
mimicking the rainfall process utilizing traditional model
The rainfall behaviour is affected by stochastic and natural resources such as the rise
of temperatures when the air becomes warmer, more moisture evaporates from land
and water to the atmosphere, and weather change causes shifts in air and ocean
currents weather patterns.
Hire Tutors India experts to develop your algorithm and coding implementation for
your Computer Science dissertation Services.
The success of scoring (datasets) that has been by a
trained model to replicating the true values of the output
parameters listed as follows was measured using model
performance evaluation.
MAE stands for Mean absolute error, reflecting the
degree of absolute error between the actual and
forecasted data.
RMSE stands for Root Mean Square Error is
compared between the forecasted and the
actual data.
RAE stands for Relative absolute error, is the
relative absolute difference between the
forecasted and the actual data.
RSE stands for Relative squared error [1], which similarly
normalizes the entire squared error of the forecasted values.
Coefficient of determination, R 2 [1] shows the forecasting
methods' performance where zero refers to the random model
while one means there is a perfect fit.
In summary, predicting performance is better when
R 2 is close to 1, but it differs for RMSE and MAE
since the model's performance is better when the
value is close to 0.
It uses 4 various regression models: BDTR- Boosted Decision Tree Regression (),
DFR- Decision Forest Regression , BLR- Bayesian Linear Regression, NNR- Neural
Network Regression, and table 1 illustrate the best model to predict rainfall results
based on ACF.
Given that rainfall data is split into daily, weekly, 10-days, and monthly, each
regression gives a different scenario.
Since BDTR has the highest coefficient determination of R2, it is considered the best
regression developed of ACF.
Based on these outcomes, it can be concluded that BDTR can accurately predict
rainfall over various time horizons and that the model's accuracy improved when
more inputs were included.
Table 2 summarizes the top three best models for different scenarios using different
normalization and data partitioning metrics. In this context, different normalizations
such as ZScore, LogNormal, and MinMax and data partitioning (80% and 90%) are
investigated to obtain the optimal model with high accuracy.
Table 2: summarizes the top three best models for different scenarios using different
normalization and data partitioning metrics. [1]
Except for the 10-day prediction, overall model performance indicates that normalizing
using LogNormal produces good results for each category.
The comparison between the BDTR and DFR models is the
most acceptable result than NNR and BLR. The results show
the best model for daily error prediction with R equal to
0.737978 is BDTR, and for weekly rainfall error prediction
where R equal to 0.7921.
While for monthly rainfall error prediction, DFR outperformed
other models where R was equal to 0.7623.
However, for 10-days rainfall error prediction, the NNR
model with ZScore normalization outperformed other models
in predicting the value of 10-days with an acceptable level of
accuracy where R is equal to 0.61728.
It can be concluded that an acceptable level of accuracy could be
achieved by reducing the error in the dataset of the projected
rainfall with the expected observable rainfall by using BDTR
integrated with LogNormal and partitioning the data to 90% for
training 10% for testing.
Finally, Fig. 3 demonstrates the actual error and the predicted error.
The figure shows how well the proposed model can resemble the
observed and projected rainfall error during the testing phases.
The red line demonstrates the predicted value from the proposed ML algorithm, while
the blue line demonstrates the observed value of actual error.
It can be seen that the projected model has an acceptable level of accuracy for all four
different time horizons; however, the highest level of accuracy was obtained for the
weekly error.
Method 1: The results presented that for M1, the result improves
with cross-validation with BDTR and tuning its parameter.
The more input included in the model, the more accurate the model
can perform. BDTR is the best ACF regression because it has the
highest coefficient of determination, R2 (daily: 0.5525075,
0.8468193, 0.9739693; weekly: 0.8400668, 0.8825647, 0.989461;
10 days: 0.8038288, 0.8949389, 0.9607741, 0.9894429; and
monthly: 0.9174191, 0.6941756, 0.9939951, 0.9998085) meaning
the better rainfall prediction for the future.
Method 1 is the best prediction for the rainfall, mimicking the actual
values with the highest coefficient closer to 1.
The dependencies on ACF show that rainfall has almost a similar
pattern every year from November to January, and this shows a
correlation between the predicting input and output.
The current study's findings showed that standalone machinelearning algorithms can predict rainfall with an acceptable level of
accuracy; however, more accurate rainfall prediction might be
achieved by proposing hybrid machine learning algorithms and
with the inclusion of different climate change scenarios.
Tutorsindia assists numerous Uk Reputed universities
students and offers outstanding Machine Learning
dissertation and Assignment help.
Also offers full dissertation writing service across all
the subjects.
No doubt, we have subject-Matter Expertise to help
you in writing the complete thesis. Get Your PhD
Research from your Academic Tutor with Unlimited
Contact Us