Uploaded by SHIVAM GOYAL (007)

pdf 25 03

advertisement
FPGA Based Implementation of Neural Network
2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI) | 978-1-6654-9529-5/22/$31.00 ©2022 IEEE | DOI: 10.1109/ACCAI53970.2022.9752656
Sainath Shravan Lingala, Swanand Bedekar, Piyush Tyagi, Purba Saha and Priti Shahane
Symbiosis Institute of Technology, Pune, India
E-mail : shravanls1015@gmail.com. swanandbedekar26@gmail.com, piyushtyagi99@gmail.com
sweetpop.purba@gmail.com, pritis@sitpune.edu.in
Abstract- The objective of this paper is to implement a set of
Neural Networks (NN) for the detection and recognition of
handwritten digit characters. This paper is based on the
Modified National Institutes of Standards and Technology
(MNIST) dataset which has been used for the testing and
training of the NN model. For this application, both software
and hardware platforms have been used to obtain efficient
outcomes and identify a comparative analysis between the
software and hardware performance on the basis of various
parameters. These parameters include accuracy, resource
utilisation and operating frequency. This implementation of
NN has been performed over the software platform using
python programming libraries like Tensorflow and Zynet.
But studies referring to software-based implementations
conclude various limitations in terms of the execution of
Convolutional Neural Networks (CNN) as well as NN in
computation-intensive, memory intensive, and resourceintensive characteristics of largescale, possessing various
challenges. Hence similar techniques have been used to
implement the hardware-based results over the Field
Programmable Gate Array (FPGA) and to utilize its
proficient properties such as parallelism and pipelining for
efficient execution. This hardware implementation is
achieved through Vivado High Level Synthesis (HLS)
software using Verilog programming.
Processing Unit (GPU), FPGA, Application-Specific
Integrated Circuit (ASIC) assume a significant part
and are reasonable platforms to show NN
calculations [6]. Lately, FPGAs have been
considered as an alluring stage for NN execution as
they are appropriate hardware accelerators
absolutely inferable from their adaptability and
productivity. Present day FPGAs have different
hardware models like dedicated processor, DSP,
adders, multiplexers, and memory blocks. These
embedded resources along with customized logic
blocks, makes FPGA a perfect candidate for NN
model [8]. The section 2 discusses the previous
advances achieved by several authors in disciplines
such as AI and FPGA. The section 3, on the other
hand, discusses techniques, which are further
divided
into
software
and
hardware
implementations. Finally, section 4 discusses the
findings and results of the model achieved over
hardware followed by the conclusion in section 5.
Index Terms—FPGA, Neural Network, MNIST, ReLU,
Sigmoid, Tensorflow, Vivado HLS
II. PREVIOUS DEVELOPMENT
I.INTRODUCTION
With the development of technology, NN and its
significance have increased over the years.
Utilizations of NN like face recognition, text and
digit recognition, and image classifications have
been generally acknowledged everywhere [1]. A NN
is a bunch of calculations that endeavours to
recognise fundamental connections in a cluster of
information utilizing a technique that emulates how
the human brain functions. These days NN has
turned into the best in class for AI calculations
because of their high precision. In any case,
execution of NN calculations on equipment stages
becomes challenging because of high calculation
intricacy, memory transfer speed and power
utilization. With bigger models of NN, the
prerequisite for the nature of the sort of processors
additionally increments [2]–[5]. This is the place
where hardware accelerators like Graphics
From the previous work it is observed that with the
development in technology, the use of machine
learning and deep learning algorithms as well as
their applications are becoming more common. The
execution of the ML algorithms calculations over
FPGA gives a more straightforward solution to
configure hardware specific to the algorithm
empowering the feature of parallel execution
utilizing the MNIST information base for training
and testing purposes set for the NN on FPGA with
the expectation to accomplish a plan which offers
superior execution, increased accuracy and no
expense of CPU when contrasted with conventional
software techniques [6], [7], [9].
III.METHODOLOGY
The process followed for the implementation of NN
includes designing a neuron, considering parameters
for optimizations, generating weights and biases
1
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply.
using “Tensorflow”, designing the layers, deciding
activation functions like ReLU and sigmoid and
verification.
A. Software Implementation Methodology
Software implementation methodology of NN
Development includes generation of weights, biases,
and test data along with the results which is
described below.
Neural Network Development: This includes
defining a very simple NN where the images are fed
in the input layer and whose output will have 10
neurons since the images are being classified into 10
different Classes. The function used here is sigmoid
function, which is a subset of the logistic function,
and it is generally denoted as sigmoid(x). Fig. 1.
Shows the Simple Neutral Network.
Fig. 1. Simple Neutral Network.
The fundamental reason for employing the sigmoid
function is that it occurs between (0 to 1). As a
result, it is particularly useful for models in which
the probability must be predicted as an output. Since
the probability of anything exists only between 0
and 1, the sigmoid is the best choice. Image is
nothing but ultimately a 2 dimensional matrix where
each pixel is represented between 0 to 255 , 255 is
white and 0 is black , so one can have those pixels as
a 2 dimensional array. Then the 2 dimensional array
is converted into a single dimensional array by
flattening it.The 28x28 grid is flattened and 784
neurons are generated. Now the NN is created by
using “kera.sequential” , keras has an API
“keras.layers.dense”, where “Dense” signifies that
all the neurons in one layer are connected to every
other neuron in the second layer.
Optimizer used in this case is the „adam‟ optimizer,
to train the model efficiently. Loss function used in
this case is “sparsecategorical-crossentropy”and
metrics used is “accuracy”. Later hidden 4 layers are
added for better accuracy containing 30,30,10,10
neurons respectively.
Generating Weights, Biases and Test Data from
Tensorflow: The mnist dataset was installed as part
of Tensorflow itself. In Tensorflow weights are
stored according to input, so if there are 784 inputs
to the first layer containing 30 neurons, then
Tensorflow stores weights as 784 lists of 30 weights
each. So, the first weight represents the weight to the
first neuron from the first input. For the purpose of
hardware implementation and the script to work, all
the weights of a particular neuron must be stored as
a single list which is why transpose is adapted.
Biases are always for a particular neuron, since there
are no such challenges in the case of biases so the
transpose of the bias list is not taken into account.
The sigmoid function used in software
implementation Training in this case is much faster
than hardware implementation. After completion of
this step a text file gets generated consisting of
weights and biases. This file is used later for
hardware implementation to generate the NN by
generating mif type files using Zynet library. Hence,
the accuracy achieved in software implementation is
around 96.09 percent.
Comparison Of Results: The method used in the
paper for the purpose of feature extraction is Image
Pixels. with the help of ANN as a classifier, an
accuracy of 96 percent was achieved. The paper [13]
used the Regular Histogram of Oriented Gradients
method for the same purpose of handwritten
character recognition by using a Support vector
machine as their classifier. The authors were
successful in achieving an accuracy of 95.64
percentage.
[10]
used
the
convolutional
cooccurrence Histogram of Oriented Gradients
method for the same purpose of handwritten
character recognition by using a SVM as their
classifier. It was possible for them to obtain an
accuracy of 81.64 percentage. [11] used the
Convolutional cooccurrence Histogram of Oriented
Gradients method for text recognition by using a
SVM as their classifier. An accuracy of 83.60
percent was achieved by the writers [12]. It worked
upon function object detection with Histogram of
Oriented Gradients as feature extraction and ANN as
classifier. They were able to achieve an accuracy of
97.33 percent. “ [13]” used Chain Code Histogram
along with Support Vector Machine as their
classifier to achieve an accuracy of 98.48
percentage, [14] used a hybrid classifier in the form
of CNN and SVM along with CNN as their feature
2
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply.
extraction method. They were able to achieve an
accuracy of 94.40 percent.
Feature Extraction
Method
Classifier
Accuracy
R-HOG
SVM
95.64
HOG
SVM
81.00
HOG
SVM
83.60
HOG
ANN
97.33
CCH
SVM
98.48
CNN
CNN+SVM
94.
Image Pixels
ANN
96.00
From this it can be concluded that the ANNs
performance is more effectively superior when
compared to SVM-based classification [15].
B. Hardware Implementation Methodology
FPGAs consist of integrated circuits including array
of
reprogrammable
logic
blocks.
These
developments are driven by their flexibility,
hardware-timed speed and reliability, and
parallelism. FPGAs are devices that exhibit
properties such as parallelism and pipe-lining in
nature, unlike processors so different processing
tasks do not have to work for same resources.
Independent processing tasks are assigned to a
different section of the chip and operate
independently of the other logic blocks. In the result,
adding extra processing has no impact on the
performance of one element of the application. 1)
Designing the Model: The implementation of the
NN over Vivado begins by understanding the
necessary parameters that are required to be taken
under consideration for the implementation of the
application of handwritten digit recognition over a
hardware platform. This paper emphasizes on the
implementation of NN for the recognition of the
handwritten digit Recognition over FPGA using
Vivado HLS Software. For this process, initially a
neuron file is designed by programming the required
parameters of the neurons such as depth of the
neuron, layer number, address width and data width
along with the weights, bias and weight memory file
and activation function files about which is
described in the preceding section. Then the layers
are designed in a feed forwards manner comprising 5
layers with two layers as the input and output layer
and three layers as the hidden layers. These layers
consist of 784, 30, 30, 10, 10 neurons in each
respective layer containing files of Neurons and
activation Functions.
Activation Functions (ReLU and Sigmoid): In a NN,
the way in which weighted sum of the input is
converted into an output from a node or nodes in a
layer is provided by an activation function. The
activation function computes a weighted total and
then adds bias to it to establish whether a neuron
should be activated or not. The ReLU activation
function is a symmetric function following linearity
that will output the input directly if it is positive,
else, it will output zero. Similarly, the
implementation of the sigmoid activation function
can be done in the programme by changing the
activation type to sigmoid. The sigmoid function is
an activation function in machine learning that is
used to introduce nonlinearity to a model. To put it
another way, it chooses which values to pass as
output and which not to.
Constraints File: Constraint file is required to
explain the software that will determine which
physical pins on the FPGA will be used or connected
to, as well as the HDL code that will define the
FPGAs behaviour. The value 4.000 taken in this
paper represents the clock time period and, in this
case, represents the input clock time period of 4
nanoseconds.
IV.RESULTS
Resource Utilization: Every FPGA includes a set
amount of resources that are joined by
programmable
interconnects
to
build
a
reconfigurable digital circuit and I/O blocks that
allow the circuit to interface with the outside world.
These often form the most essential FPGA
specifications to consider when assessing and
evaluating FPGA for a certain application. Figures
mentioned below represent the resources utilized
during the execution of the NN over Vivado HLS.
Fig. 2 represents resource utilization of the NN using
ReLU as the activation function.
Similarly, Fig. 3 represents the resource utilization
using Sigmoid for the execution of the NN. From the
Fig. 2 and 3 , depicting the resource utilisation using
ReLU and Sigmoid for the processing of the NN. It
can be compared that the implementation of the NN
using sigmoid as the activation function is more
efficient as compared to ReLU as the activation.
This is since the resources utilized by Sigmoid
3
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply.
activation function is less as compared to the ReLU
activation function for the same implementation.
Fig. 2. Utilisation of Resources using ReLU.
Fig. 3. Utilisation of Resources using Sigmoid.
Accuracy: It is interpreted as the percentage of
unerring predictions for the given test data.
Depending on the data loaded or used for training, it
assists in making preference regarding the model
best at recognising correlations and patterns between
variables in a dataset.
function provides a better functioning model with an
accuracy of 96 percentage as compared to the model
using ReLU as the activation function which
provided an accuracy of only 33 percentage.
Calculation of Maximum Frequency: The
processor‟s operational clock cycles per second is
referred to as frequency of the device. Maximum
frequency is used to identify the amount of time
required to execute a set of instructions of the
program by the device. The maximum frequency is
calculated by taking the slack time into
consideration. Slack means what was requested and
what has been achieved. For calculating the
maximum frequency, 4ns is taken as the clock
period which was declared in the constraint file as
shown in section III.B.3. Then, 0.035ns is taken as
the worst negative slack which was obtained in the
vivado using ReLU. Summation of these values
(NN)s are taken which provides the overall time
period of the device i.e. 4+0.035 = 4.035ns. Finally,
for calculating the frequency, the reciprocal of this
obtained value is calculated i.e. 1/4.035. Hence the
maximum frequency calculated using ReLU as the
activation Function for the NN obtained is Fmax =
247 MHz. The timing and frequency values for the
implementation of NN using Sigmoid as activation
function is also calculated in a similar manner where
4ns is taken as the clock period. Then, 0.011ns is
taken as worst negative slack which was obtained in
the Vivado using Sigmoid. Summation of these
values are taken which provides the overall time
period of the device i.e. 4+0.011 = 4.011ns. Finally,
for calculating the frequency, the reciprocal of this
obtained value is calculated i.e. 1/4.035. Hence the
maximum frequency calculated using Sigmoid as the
activation function for the NN obtained is Fmax =
249 MHz.
V. CONCLUSION
Fig. 4. Accuracy using Sigmoid.
Fig. 4 represents the accuracy achieved while using
Sigmoid as the activation function for the
implementation of the NN. By comparing the two
models for the accuracy achieved, it can be deduced
that the NN model using Sigmoid as the activation
In the era of increasing technology and compactness
of transistors over the devices as explained by
Moore‟s law, FPGA has substantiated to be an
efficient platform. In comparison to the software
model, hardware implementation using FPGA
provides properties of reconfigurability of logic
gates along with the proficient features of pipelining
and parallelism. In this paper, the application of
recognizing the handwritten digits using NN were
performed on both the platforms of software as well
as hardware. In the software implementation of the
4
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply.
NN, an accuracy of 96 percent was achieved using
Zynet and Tensorflow by configuring the network in
python programming language. For the hardware
implementation over FPGA where Vivado HLS was
used in reconfiguring the hardware device, a similar
accuracy of 96 percent was achieved along with a
maximum operating frequency of 249 MHz. In the
hardware execution of the NN, this implementation
was also successful in comparing the results of
various other parameters such as resource utilisation,
maximum operating frequency, accuracy and of the
hardware device based on the two kinds of activation
functions i.e., ReLU activation function and Sigmoid
activation function which enables to gain a better
understanding towards the working of NN over
FPGA and also forms a basic framework for future
work. Hence in conclusion, FPGA proves to be an
effective as well as efficient solution towards
solving the Artificial Intelligence (AI), Machine
Learning (ML) and Deep Learning (DL) challenges
where various parameters such as accuracy, resource
utilization and can be optimized with ease.
REFERENCES
[1]
[2]
[3]
NO, THIET KE KIEN TRUC MANG, RONNTA NHAN, and
DANG CHU SO. ”Design of Artificial Neural Network
Architecture for Handwritten Digit Recognition on FPGA.”
(2016).
Park, Jinhwan, and Wonyong Sung. ”FPGA based
implementation of deep neural networks using on-chip memory
only.” 2016 IEEE International conference on acoustics, speech
and signal processing (ICASSP). IEEE, (2016).
Hagan, Martin T., Howard B. Demuth, and Mark Beale. Neural
network design. PWS Publishing Co., (1997).
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
Netzer, Yuval, et al. ”Reading digits in natural images with
unsupervised feature learning.” (2011).
Moradi, Marzieh, Mohammad Ali Pourmina, and
FarbodRazzazi. ”FPGA-Based farsi handwritten digit
recognition system.” International Journal of Simulation
Systems, Science and Technology 11.2 (2010).
Huynh, Thang Viet. ”Design space exploration for a singleFPGA handwritten digit recognition system.” 2014 IEEE Fifth
InternationalConference on Communications and Electronics
(ICCE). IEEE, (2014).
Savich, Antony W., Medhat Moussa, and ShawkiAreibi. ”The
impact of arithmetic representation on implementing MLP-BP
on FPGAs: A study.” IEEE transactions on neural networks
18.1 (2007): 240-252.
Nichols, Kristian R., Medhat A. Moussa, and Shawki M. Areibi.
”Feasibility of floating-point arithmetic in FPGA based artificial
neural networks.” In CAINE. (2002).
Si, Jiong, and Sarah L. Harris. ”Handwritten digit recognition
system on an FPGA.” 2018 IEEE 8th Annual Computing and
Communication Workshop and Conference (CCWC). IEEE,
(2018).
Su, Bolan, et al. ”Character recognition in natural scenes using
convolutional co-occurrence hog.” 2014 22nd International
Conference on Pattern Recognition. IEEE, (2014).
Tian, Shangxuan, et al. ”Scene text recognition using cooccurrence of histogram of oriented gradients.” 2013 12th
International Conference on Document Analysis and
Recognition. IEEE, (2013).
Varagul, Jittima, and Toshio Ito. ”Simulation of detecting
function object for AGV using computer vision with neural
network.” Procedia Computer Science 96 (2016): 159-168.
Kamble, Parshuram M., and Ravinda S. Hegadi. ”Handwritten
Marathi character recognition using R-HOG Feature.” Procedia
Computer Science 45 (2015): 266-274.
Niu, Xiao-Xiao, and Ching Y. Suen. ”A novel hybrid CNN–
SVM classifier for recognizing handwritten digits.” Pattern
Recognition 45.4 (2012): 1318-1325.
Islam, KhTohidul, et al. ”Handwritten digits recognition with
artificial neural network.” 2017 International Conference on
Engineering Technology and Technopreneurship (ICE2T).
IEEE, (2017).
5
Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply.
Download