FPGA Based Implementation of Neural Network 2022 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI) | 978-1-6654-9529-5/22/$31.00 ©2022 IEEE | DOI: 10.1109/ACCAI53970.2022.9752656 Sainath Shravan Lingala, Swanand Bedekar, Piyush Tyagi, Purba Saha and Priti Shahane Symbiosis Institute of Technology, Pune, India E-mail : shravanls1015@gmail.com. swanandbedekar26@gmail.com, piyushtyagi99@gmail.com sweetpop.purba@gmail.com, pritis@sitpune.edu.in Abstract- The objective of this paper is to implement a set of Neural Networks (NN) for the detection and recognition of handwritten digit characters. This paper is based on the Modified National Institutes of Standards and Technology (MNIST) dataset which has been used for the testing and training of the NN model. For this application, both software and hardware platforms have been used to obtain efficient outcomes and identify a comparative analysis between the software and hardware performance on the basis of various parameters. These parameters include accuracy, resource utilisation and operating frequency. This implementation of NN has been performed over the software platform using python programming libraries like Tensorflow and Zynet. But studies referring to software-based implementations conclude various limitations in terms of the execution of Convolutional Neural Networks (CNN) as well as NN in computation-intensive, memory intensive, and resourceintensive characteristics of largescale, possessing various challenges. Hence similar techniques have been used to implement the hardware-based results over the Field Programmable Gate Array (FPGA) and to utilize its proficient properties such as parallelism and pipelining for efficient execution. This hardware implementation is achieved through Vivado High Level Synthesis (HLS) software using Verilog programming. Processing Unit (GPU), FPGA, Application-Specific Integrated Circuit (ASIC) assume a significant part and are reasonable platforms to show NN calculations [6]. Lately, FPGAs have been considered as an alluring stage for NN execution as they are appropriate hardware accelerators absolutely inferable from their adaptability and productivity. Present day FPGAs have different hardware models like dedicated processor, DSP, adders, multiplexers, and memory blocks. These embedded resources along with customized logic blocks, makes FPGA a perfect candidate for NN model [8]. The section 2 discusses the previous advances achieved by several authors in disciplines such as AI and FPGA. The section 3, on the other hand, discusses techniques, which are further divided into software and hardware implementations. Finally, section 4 discusses the findings and results of the model achieved over hardware followed by the conclusion in section 5. Index Terms—FPGA, Neural Network, MNIST, ReLU, Sigmoid, Tensorflow, Vivado HLS II. PREVIOUS DEVELOPMENT I.INTRODUCTION With the development of technology, NN and its significance have increased over the years. Utilizations of NN like face recognition, text and digit recognition, and image classifications have been generally acknowledged everywhere [1]. A NN is a bunch of calculations that endeavours to recognise fundamental connections in a cluster of information utilizing a technique that emulates how the human brain functions. These days NN has turned into the best in class for AI calculations because of their high precision. In any case, execution of NN calculations on equipment stages becomes challenging because of high calculation intricacy, memory transfer speed and power utilization. With bigger models of NN, the prerequisite for the nature of the sort of processors additionally increments [2]–[5]. This is the place where hardware accelerators like Graphics From the previous work it is observed that with the development in technology, the use of machine learning and deep learning algorithms as well as their applications are becoming more common. The execution of the ML algorithms calculations over FPGA gives a more straightforward solution to configure hardware specific to the algorithm empowering the feature of parallel execution utilizing the MNIST information base for training and testing purposes set for the NN on FPGA with the expectation to accomplish a plan which offers superior execution, increased accuracy and no expense of CPU when contrasted with conventional software techniques [6], [7], [9]. III.METHODOLOGY The process followed for the implementation of NN includes designing a neuron, considering parameters for optimizations, generating weights and biases 1 Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply. using “Tensorflow”, designing the layers, deciding activation functions like ReLU and sigmoid and verification. A. Software Implementation Methodology Software implementation methodology of NN Development includes generation of weights, biases, and test data along with the results which is described below. Neural Network Development: This includes defining a very simple NN where the images are fed in the input layer and whose output will have 10 neurons since the images are being classified into 10 different Classes. The function used here is sigmoid function, which is a subset of the logistic function, and it is generally denoted as sigmoid(x). Fig. 1. Shows the Simple Neutral Network. Fig. 1. Simple Neutral Network. The fundamental reason for employing the sigmoid function is that it occurs between (0 to 1). As a result, it is particularly useful for models in which the probability must be predicted as an output. Since the probability of anything exists only between 0 and 1, the sigmoid is the best choice. Image is nothing but ultimately a 2 dimensional matrix where each pixel is represented between 0 to 255 , 255 is white and 0 is black , so one can have those pixels as a 2 dimensional array. Then the 2 dimensional array is converted into a single dimensional array by flattening it.The 28x28 grid is flattened and 784 neurons are generated. Now the NN is created by using “kera.sequential” , keras has an API “keras.layers.dense”, where “Dense” signifies that all the neurons in one layer are connected to every other neuron in the second layer. Optimizer used in this case is the „adam‟ optimizer, to train the model efficiently. Loss function used in this case is “sparsecategorical-crossentropy”and metrics used is “accuracy”. Later hidden 4 layers are added for better accuracy containing 30,30,10,10 neurons respectively. Generating Weights, Biases and Test Data from Tensorflow: The mnist dataset was installed as part of Tensorflow itself. In Tensorflow weights are stored according to input, so if there are 784 inputs to the first layer containing 30 neurons, then Tensorflow stores weights as 784 lists of 30 weights each. So, the first weight represents the weight to the first neuron from the first input. For the purpose of hardware implementation and the script to work, all the weights of a particular neuron must be stored as a single list which is why transpose is adapted. Biases are always for a particular neuron, since there are no such challenges in the case of biases so the transpose of the bias list is not taken into account. The sigmoid function used in software implementation Training in this case is much faster than hardware implementation. After completion of this step a text file gets generated consisting of weights and biases. This file is used later for hardware implementation to generate the NN by generating mif type files using Zynet library. Hence, the accuracy achieved in software implementation is around 96.09 percent. Comparison Of Results: The method used in the paper for the purpose of feature extraction is Image Pixels. with the help of ANN as a classifier, an accuracy of 96 percent was achieved. The paper [13] used the Regular Histogram of Oriented Gradients method for the same purpose of handwritten character recognition by using a Support vector machine as their classifier. The authors were successful in achieving an accuracy of 95.64 percentage. [10] used the convolutional cooccurrence Histogram of Oriented Gradients method for the same purpose of handwritten character recognition by using a SVM as their classifier. It was possible for them to obtain an accuracy of 81.64 percentage. [11] used the Convolutional cooccurrence Histogram of Oriented Gradients method for text recognition by using a SVM as their classifier. An accuracy of 83.60 percent was achieved by the writers [12]. It worked upon function object detection with Histogram of Oriented Gradients as feature extraction and ANN as classifier. They were able to achieve an accuracy of 97.33 percent. “ [13]” used Chain Code Histogram along with Support Vector Machine as their classifier to achieve an accuracy of 98.48 percentage, [14] used a hybrid classifier in the form of CNN and SVM along with CNN as their feature 2 Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply. extraction method. They were able to achieve an accuracy of 94.40 percent. Feature Extraction Method Classifier Accuracy R-HOG SVM 95.64 HOG SVM 81.00 HOG SVM 83.60 HOG ANN 97.33 CCH SVM 98.48 CNN CNN+SVM 94. Image Pixels ANN 96.00 From this it can be concluded that the ANNs performance is more effectively superior when compared to SVM-based classification [15]. B. Hardware Implementation Methodology FPGAs consist of integrated circuits including array of reprogrammable logic blocks. These developments are driven by their flexibility, hardware-timed speed and reliability, and parallelism. FPGAs are devices that exhibit properties such as parallelism and pipe-lining in nature, unlike processors so different processing tasks do not have to work for same resources. Independent processing tasks are assigned to a different section of the chip and operate independently of the other logic blocks. In the result, adding extra processing has no impact on the performance of one element of the application. 1) Designing the Model: The implementation of the NN over Vivado begins by understanding the necessary parameters that are required to be taken under consideration for the implementation of the application of handwritten digit recognition over a hardware platform. This paper emphasizes on the implementation of NN for the recognition of the handwritten digit Recognition over FPGA using Vivado HLS Software. For this process, initially a neuron file is designed by programming the required parameters of the neurons such as depth of the neuron, layer number, address width and data width along with the weights, bias and weight memory file and activation function files about which is described in the preceding section. Then the layers are designed in a feed forwards manner comprising 5 layers with two layers as the input and output layer and three layers as the hidden layers. These layers consist of 784, 30, 30, 10, 10 neurons in each respective layer containing files of Neurons and activation Functions. Activation Functions (ReLU and Sigmoid): In a NN, the way in which weighted sum of the input is converted into an output from a node or nodes in a layer is provided by an activation function. The activation function computes a weighted total and then adds bias to it to establish whether a neuron should be activated or not. The ReLU activation function is a symmetric function following linearity that will output the input directly if it is positive, else, it will output zero. Similarly, the implementation of the sigmoid activation function can be done in the programme by changing the activation type to sigmoid. The sigmoid function is an activation function in machine learning that is used to introduce nonlinearity to a model. To put it another way, it chooses which values to pass as output and which not to. Constraints File: Constraint file is required to explain the software that will determine which physical pins on the FPGA will be used or connected to, as well as the HDL code that will define the FPGAs behaviour. The value 4.000 taken in this paper represents the clock time period and, in this case, represents the input clock time period of 4 nanoseconds. IV.RESULTS Resource Utilization: Every FPGA includes a set amount of resources that are joined by programmable interconnects to build a reconfigurable digital circuit and I/O blocks that allow the circuit to interface with the outside world. These often form the most essential FPGA specifications to consider when assessing and evaluating FPGA for a certain application. Figures mentioned below represent the resources utilized during the execution of the NN over Vivado HLS. Fig. 2 represents resource utilization of the NN using ReLU as the activation function. Similarly, Fig. 3 represents the resource utilization using Sigmoid for the execution of the NN. From the Fig. 2 and 3 , depicting the resource utilisation using ReLU and Sigmoid for the processing of the NN. It can be compared that the implementation of the NN using sigmoid as the activation function is more efficient as compared to ReLU as the activation. This is since the resources utilized by Sigmoid 3 Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply. activation function is less as compared to the ReLU activation function for the same implementation. Fig. 2. Utilisation of Resources using ReLU. Fig. 3. Utilisation of Resources using Sigmoid. Accuracy: It is interpreted as the percentage of unerring predictions for the given test data. Depending on the data loaded or used for training, it assists in making preference regarding the model best at recognising correlations and patterns between variables in a dataset. function provides a better functioning model with an accuracy of 96 percentage as compared to the model using ReLU as the activation function which provided an accuracy of only 33 percentage. Calculation of Maximum Frequency: The processor‟s operational clock cycles per second is referred to as frequency of the device. Maximum frequency is used to identify the amount of time required to execute a set of instructions of the program by the device. The maximum frequency is calculated by taking the slack time into consideration. Slack means what was requested and what has been achieved. For calculating the maximum frequency, 4ns is taken as the clock period which was declared in the constraint file as shown in section III.B.3. Then, 0.035ns is taken as the worst negative slack which was obtained in the vivado using ReLU. Summation of these values (NN)s are taken which provides the overall time period of the device i.e. 4+0.035 = 4.035ns. Finally, for calculating the frequency, the reciprocal of this obtained value is calculated i.e. 1/4.035. Hence the maximum frequency calculated using ReLU as the activation Function for the NN obtained is Fmax = 247 MHz. The timing and frequency values for the implementation of NN using Sigmoid as activation function is also calculated in a similar manner where 4ns is taken as the clock period. Then, 0.011ns is taken as worst negative slack which was obtained in the Vivado using Sigmoid. Summation of these values are taken which provides the overall time period of the device i.e. 4+0.011 = 4.011ns. Finally, for calculating the frequency, the reciprocal of this obtained value is calculated i.e. 1/4.035. Hence the maximum frequency calculated using Sigmoid as the activation function for the NN obtained is Fmax = 249 MHz. V. CONCLUSION Fig. 4. Accuracy using Sigmoid. Fig. 4 represents the accuracy achieved while using Sigmoid as the activation function for the implementation of the NN. By comparing the two models for the accuracy achieved, it can be deduced that the NN model using Sigmoid as the activation In the era of increasing technology and compactness of transistors over the devices as explained by Moore‟s law, FPGA has substantiated to be an efficient platform. In comparison to the software model, hardware implementation using FPGA provides properties of reconfigurability of logic gates along with the proficient features of pipelining and parallelism. In this paper, the application of recognizing the handwritten digits using NN were performed on both the platforms of software as well as hardware. In the software implementation of the 4 Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply. NN, an accuracy of 96 percent was achieved using Zynet and Tensorflow by configuring the network in python programming language. For the hardware implementation over FPGA where Vivado HLS was used in reconfiguring the hardware device, a similar accuracy of 96 percent was achieved along with a maximum operating frequency of 249 MHz. In the hardware execution of the NN, this implementation was also successful in comparing the results of various other parameters such as resource utilisation, maximum operating frequency, accuracy and of the hardware device based on the two kinds of activation functions i.e., ReLU activation function and Sigmoid activation function which enables to gain a better understanding towards the working of NN over FPGA and also forms a basic framework for future work. Hence in conclusion, FPGA proves to be an effective as well as efficient solution towards solving the Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) challenges where various parameters such as accuracy, resource utilization and can be optimized with ease. REFERENCES [1] [2] [3] NO, THIET KE KIEN TRUC MANG, RONNTA NHAN, and DANG CHU SO. ”Design of Artificial Neural Network Architecture for Handwritten Digit Recognition on FPGA.” (2016). Park, Jinhwan, and Wonyong Sung. ”FPGA based implementation of deep neural networks using on-chip memory only.” 2016 IEEE International conference on acoustics, speech and signal processing (ICASSP). IEEE, (2016). Hagan, Martin T., Howard B. Demuth, and Mark Beale. Neural network design. PWS Publishing Co., (1997). [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] Netzer, Yuval, et al. ”Reading digits in natural images with unsupervised feature learning.” (2011). Moradi, Marzieh, Mohammad Ali Pourmina, and FarbodRazzazi. ”FPGA-Based farsi handwritten digit recognition system.” International Journal of Simulation Systems, Science and Technology 11.2 (2010). Huynh, Thang Viet. ”Design space exploration for a singleFPGA handwritten digit recognition system.” 2014 IEEE Fifth InternationalConference on Communications and Electronics (ICCE). IEEE, (2014). Savich, Antony W., Medhat Moussa, and ShawkiAreibi. ”The impact of arithmetic representation on implementing MLP-BP on FPGAs: A study.” IEEE transactions on neural networks 18.1 (2007): 240-252. Nichols, Kristian R., Medhat A. Moussa, and Shawki M. Areibi. ”Feasibility of floating-point arithmetic in FPGA based artificial neural networks.” In CAINE. (2002). Si, Jiong, and Sarah L. Harris. ”Handwritten digit recognition system on an FPGA.” 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). IEEE, (2018). Su, Bolan, et al. ”Character recognition in natural scenes using convolutional co-occurrence hog.” 2014 22nd International Conference on Pattern Recognition. IEEE, (2014). Tian, Shangxuan, et al. ”Scene text recognition using cooccurrence of histogram of oriented gradients.” 2013 12th International Conference on Document Analysis and Recognition. IEEE, (2013). Varagul, Jittima, and Toshio Ito. ”Simulation of detecting function object for AGV using computer vision with neural network.” Procedia Computer Science 96 (2016): 159-168. Kamble, Parshuram M., and Ravinda S. Hegadi. ”Handwritten Marathi character recognition using R-HOG Feature.” Procedia Computer Science 45 (2015): 266-274. Niu, Xiao-Xiao, and Ching Y. Suen. ”A novel hybrid CNN– SVM classifier for recognizing handwritten digits.” Pattern Recognition 45.4 (2012): 1318-1325. Islam, KhTohidul, et al. ”Handwritten digits recognition with artificial neural network.” 2017 International Conference on Engineering Technology and Technopreneurship (ICE2T). IEEE, (2017). 5 Authorized licensed use limited to: NATIONAL INSTITUTE OF TECHNOLOGY SURATHKAL. Downloaded on March 25,2023 at 11:30:59 UTC from IEEE Xplore. Restrictions apply.