Uploaded by mylittleeinstein47

Design Alternative

advertisement
Design Alternative 1: Reinforcement Learning with Dynamic Long Short-term Memory
Concept and principle
Reinforcement Learning is a type of machine learning technique which allows an agent through multiple trail
and error, learns interactively in a specified environment using feedback from its own actions and
experiences. Though both supervised and reinforcement learning use mapping between input and output,
unlike supervised learning where the feedback provided to the agent is correct set of actions for performing
a task, reinforcement learning uses rewards and punishments as signals for positive and negative behavior.
As compared to unsupervised learning, reinforcement learning is different in terms of goals. While the goal in
unsupervised learning is to find similarities and differences between data points, in the case of reinforcement learning
the goal is to find a suitable action model that would maximize the total cumulative reward of the agent. The figure below
illustrates the action-reward feedback loop of a generic RL model.
A recurrent neural network is a deep learning algorithm designed to deal with a variety of complex computer
tasks such as object classification and speech detection. RNNs are designed to handle a sequence of events
that occur in succession, with the understanding of each event based on information from previous events.
However, while they sound promising, RNNs are rarely used for real-world scenarios because of the
vanishing gradient problem. In practice, the architecture of RNNs restricts its long-term memory capabilities,
which are limited to only remembering a few sequences at a time. The vanishing gradient problem restricts
the memory capabilities of traditional RNNs—adding too many time-steps increases the chance of facing a
gradient problem and losing information when you use backpropagation. LSTMs are designed to overcome
the vanishing gradient problem and allow them to retain information for longer periods compared to traditional
RNNs. LSTMs can maintain a constant error, which allows them to continue learning over numerous timesteps and backpropagate through time and layers. LSTMs use gated cells to store information outside the
regular flow of the RNN. With these cells, the network can manipulate the information in many ways, including
storing information in the cells and reading from them. The cells are individually capable of making decisions
regarding the information and can execute these decisions by opening or closing the gates.
Combining this two will enable to create an algorithm that adapts to its environment effectively as it learns
through reinforcement and stores the memory in LSTM which would be ideal on dynamic pricing as the
volatile price movement in fresh produce market in retail is caused by multiple factors and through this, it
could make a good price suggestion for automation.
Design Methodology
For data processing feature scaling on the data is needed. To do this, Standardisation or Normalization is
used.
where µx represents the mean of vector X and the σx represents the standard variation of vector X.
Furthermore, Xmin and Xmin refer to the minimum and maximum values of vector X, respectively. In case
of Normalization, as the denominator will always be larger than the nominator, the output value will
always be a number between 0 and 1. Given that our methodology is based on LSTMs that include Sigmoid
function, we will use normalization rather than standardization as a feature scaling method. This
transformation can be done using the MinMaxScaler from the Scikit-Learn Python library.
RNN process
What RNN does is that it translates the provided inputs to a machine-readable vectors. For example, if we
aim to use RNN that will classify a review consisting of multiple words as negative or positive then the RNN
will translate each word to a vector. Then the system processes each of this sequence of vectors one by one,
moving from very first vector to the next one in a sequential order. While processing, the system passes the
information through the hidden state (memory) to the next step of the sequence.
RNN Information Flow
LSTMs have the same information flow as usual RNNs with the sequential information transition where data
propagates forward through sequential steps. The difference between the usual RNN and LSTM is the set of
operations that are performed on the passed information and the input value in that specific step.These set
of various operations gives the LSTM the opportunity to keep or forget parts of the information that flows into
that particular step. The information in LSTMs flows through its gates which are LSTMs’ main concepts; forget
gate, input gate, cell state, and output gate.
LSTM Information Flow
Download