Uploaded by einsteiniom newton

An FPGA-Based HardwareSoftware Design Using Binari (1)

advertisement
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2021.DOI
An FPGA-based Hardware/Software
Design using Binarized Neural Networks
for Agricultural Applications: A Case
Study
CHUN-HSIAN HUANG1 , (Member, IEEE),
1
National Taitung University, Taiwan
Corresponding author: Chun-Hsian Huang (e-mail: huangch@nttu.edu.tw).
This work was supported in part by the Research Project MOST 107-2221-E-143-002-MY3, Ministry of Science and Technology, Taiwan.
ABSTRACT
This work presents an FPGA-based hardware/software design to help the agricultural robot intelligently
decide if biological agents need to be applied to the target crops. For target crop recognition, in our global
positioning, the selective search integrates with a thresholding scheme to reduce the number of region of
interest (ROI) in a captured image. In our local recognition, a binarized neural network (BNN) architecture
is presented to help recognize the target crop. Furthermore, an estimation method of pest and disease severity
is also presented. Experiments show integrating our presented BNN architecture needs a few extra resources
(less than 17% of available FPGA resources in terms of Xilinx Zynq UltraScall+T M MPSoC ZU3EG A484),
compared to an existing BNN one. However, the top-1 accuracy rate and the top-5 one can be increased
by 32.25% to 32.84% and by 14.99% to 15.17%, respectively. Furthermore, when the presented BNN
architecture was also implemented on the ARM Cortex-A53 CPU and the NVIDIA GeForce RTX 2080
GPU, our BNN hardware module on the FPGA can accelerate the frames per second (FPS) by a factor of
3,690.18 and a factor of 1.07, respectively.
INDEX TERMS FPGA, binarized neural networks, object detection, agriculture
I. INTRODUCTION
N the early 1980s, the agricultural development gradually
integrates with computer science to support automatic
management [1]. Currently, emerging techniques such as
Internet-of-Things (IoT) further make the agricultural management more efficient. For example, we can apply different
sensors to the farms for monitoring the growth environments
of crops and their health statuses. By using the collected sensor data, farmers and companies can extract the valuable information to improve the productivity of crops. Furthermore,
with the popularity of robotics, the efficiency of agricultural
management can be thus enhanced significantly.
I
For agricultural applications, protecting the target crops
from pests and diseases is a crucial task. In this work, dragon
fruits are adopted as our target crops, and we try to enable an
agricultural robot to intelligently decide if biological agents
need to be applied to the target crops. As a result, the main
motivation of this work is to implement an accurate and real-
time robot vision system that can help the agricultural robot
detect the target crop and analyze its health status.
The target crop recognition is a typical application of
object detection. Currently, the convolutional neural network
(CNN) [2] is the most representative model of deep learning.
A CNN consists of an input layer, multiple hidden layers
and an output layer. The hidden layers include a series of
convolutional layers that convolve with a multiplication or
dot product. As a result, CNNs are very computing-intensive
so that they are usually implemented on powerful platforms
such as cloud servers. This also means that the captured
images must be transferred to these powerful platforms such
as cloud servers through a communication network for object
detection. However, this usually leads to high latencies of
data transfers in the Internet, and thus the real-time detection
cannot be achieved.
In recent years, fortunately, a new alternative called edge
computing [3] is proposed to bring computation and data
1
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
closer to the source when required. For an intelligent robot,
the concept of edge computing enables the real-time requirement to be achieved. However, implementing the computingintensive CNNs always accompanies a large amount of power
consumptions [4] and memory access [5]. This is also a big
challenge especially for energy-constrained edge devices.
To realize the edge artificial intelligence (AI), quantized
neural networks (QNNs) [6] are thus proposed to reduce
the size of network. Therefore, lower precision parameters
reduce the required storage space and the bit-width of processing elements. A binarized neural network (BNN) is a
typical design that refines weights and activations to 1-bit
value [7]. Such refined computations can increase system
throughput and performance, which also enables low-power
deep learning applications [8]. However, implementing neural networks on the computing architectures such as CPUs,
GPUs and application-specific integrated circuits (ASICs)
usually incurs vast amount of power consumptions [4]. Recently, accelerating the implementation of CNNs using FPGAs have become a new alternative, due to its ability to
maximize parallelism and the energy efficiency [9]. Furthermore, by taking advantage of reconfiguration, different CNN
parameters or topologies can be easily reconfigured in the
FPGA for evaluation.
Based on the above discussion, to achieve the goal of
this work, that is, making the agricultural robot able to
intelligently decide if biological agents need to be applied
to the target crops, we thus propose an FPGA-based hardware/software design using BNNs. In this proposed design, a
BNN architecture is presented to recognize the target crops.
By considering the environment of the real scene and the
features of target crops, a thresholding scheme is used to
reduce the number of region of interest (ROI) in a captured
image, while an estimation method of the pest and disease
severity is also presented. Therefore, through the assistance
of the proposed design, the agricultural robot can detect the
target crops and decide if biological agents need to be applied
to the target crops to protect them from pests and diseases.
The rest of this paper is organized as follows. Section II
describes the preliminary work, while Section III introduces
our estimation method of pest and disease severity. The
FPGA-based hardware/software design is described in Section IV, and system evaluation is given in Section V. Finally,
Section VI concludes this work.
II. PRELIMINARY WORK
To enable the agricultural robot to intelligently decide if
biological agents need to be applied to the target crops,
the target crops need to be detected accurately and the pest
and disease severity can be then estimated. In the following
sections, we will introduce the state-of-the-art work about
crop detection and estimation of pest and disease severity,
and the existing BNN architectures and applications.
A. CROP DETECTION AND ESTIMATION OF PEST AND
DISEASE SEVERITY
To increase agricultural productivity, crop pest and disease
recognition is a crucial task. Traditionally, this task mainly
relies upon agricultural experts to diagnose the health status
of crops. Recently, with the rapid progress of technology, the
use of image processing techniques for crop pest and disease
recognition has also become a hot research issue.
To estimate the pest and disease severity, calculating the
size of a deformed or discoloured area relative to the whole
crop is a typical method. Zaw et al. [10] adopted k-means
clustering to select the defected area for the segmentation
phase. Then, support vector machine (SVM) was used to
estimate the leaf disease. Dhingraet et al. [11] presented a
segmentation technique based on neutrosophic logic. Based
on segmented regions, feature subsets were used to detect
whether the plant leaf is diseased or not. Bierman et al. [12]
adopted both SVM and artificial neural networks (ANN)
classifiers training based on 18 colors and texture features,
and fused the results of the two classifier. Their experiments
showed a recognition accuracy of 100% for both downy and
powdery mildew using the ensemble classifier. The above
methods [10]–[12] could provide high-accurate crop pest
and disease recognition; however, these recognition methods
were not applied to a real growth environment.
To detect the target object especially for more complex
real-time image recognition tasks, CNN-based inference designs recently become the mainstream solutions [13]. Zhang
et al. [14] presented a multi-task cascaded convolutional
network based on intelligent fruit detection, by using which
an automated robot can work in real time with high accuracy.
An image fusion procedure was also presented to improve
the performance of the detector. Experiments were performed
on a personal computer equipped with NVIDIA (R) GeForce
GTX 1060 graphics card, and results showed the proposed
detector was performed immaculately both in terms of accuracy and time-cost. Yu et al. [15] also presented a fruit pose
estimator called rotated YOLO (R-YOLO), which improved
the localization precision of the picking points. Their design
was implemented on the embedded control platform such as
NVIDIA Jetson TX2 for inference. Experiments showed the
method provided better performance in terms of real-time
detection and localization accuracy of the picking points.
Based on the above research work [14], [15], to detect the
target crops in a real environment, a powerful embedded
computing architecture and a refined CNN model are thus
necessary.
B. BNN DESIGNS
Object detection with a neural network is a computingintensive task. This is also a challenge for the hardware
architecture used. The most existing CNNs usually incur a
large amount of power consumptions. Thus, QNNs such as
BNNs [7] were proposed and applied to embedded and edge
computing environments. For a BNN design, its weights and
the activations are constrained to either +1 or −1, and the
2
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
activation function as depicted in Equation 1 is applied to
all BNN layers. Because of the reduction in the memory
and computational demands, algorithm efficiency can be thus
improved.
+1 , x ≥ 0
Sign(x) =
(1)
−1 , x < 0
Nurvitadhi et al. [16] compared the applicability of BNNs
on different hardware computing architectures, namely CPU,
GPU, ASIC, and FPGA. They implemented the BNN accelerator on Aria 10 FPGA as well as 14-nm ASIC, and compared them against optimized software on Xeon server CPU,
Nvidia Titan X server GPU, and Nvidia TX1 mobile GPU.
Experiments showed that FPGA provided superior efficiency
over CPU and GPU. Furthermore, FPGA can provide orders
of magnitudes in efficiency improvements over software,
compared to a fixed ASIC solution. Zhao et al. [17] presented
a BNN accelerator synthesized from C++ to FPGA-targeted
design. Experiments demonstrated the energy and resource
efficiency of the FPGA-based CNN accelerator.
By using FINN [18], Fraser et al. [8] implemented a large
BNN on an ADM-PCIE-8K5 FPGA platform. The experiments showed it could classify CIFAR-10 images at 88.7%
accuracy. Moss et al. [19] also presented a BNN accelerator
implemented on the Intel Xeon+FPGA platform that specialized FPGA architecture for the most computing-intensive
parts whilst other parts were handled by the Xeon CPU.
Experiments showed the design could provide comparable
performance and better energy efficiency than a Titan X GPU
card.
To further apply BNNs to real applications, Jokic et
al. [20] proposed an FPAG-based 20 kfp streaming camera
system called BinaryEye. The system could classify regions
of interest within a frame in real-time streaming mode. For
maritime/sea border security and surveillance applications,
Hashimoto et al. [21] also adopted BNNs for ship classification from Synthetic Aperture Radar (SAR) images. The
experiments showed the proposed design could classify ship
or not with equivalent accuracy as GPU by using FPGA.
Based on the above research work [8], [16], [17], [19]–[21],
we can observe that integrating resource-efficient BNNs and
energy-efficient FPGAs into a robot vision system would be
an ideal solution. Therefore, in this work, we will adopt the
BNN and implement it in the FPGA device to recognize our
target crops. Furthermore, due to binarization, the accuracies
of the existing BNNs were also reduced compared to the
CNNs with full precision [22]. To enhance the recognition
accuracy of target crops using the BNN in a real scene, a
thresholding scheme is also presented in this work. Details
will be introduced in Section III.
III. ESTIMATION FLOW OF PEST AND DISEASE
SEVERITY
To intelligently decide if biological agents need to be applied
to the target crops, the estimation of pest and disease severity
is the core part of the robot vision system. The proposed
3;)8!'#)9$A*!
!"!#$%&!'(!)*#+
,-$*)#$%./'.0'
123
#$
4*!)$!*'$+)/'
$+*!(+."56
!"
7)*8!$'#*.9'
*!#.8/%$%./
,-$*)#$%./'.0'
#*.9'%;)8!
,($%;)$%./'.0'
9!($')/5'5%(!)(!'
(!&!*%$<
4*!)$!*'$+)/'
=!&!"'>6
#$
!"
?99"<'@%.".8%#)"'
)8!/$(
#$
1!#.8/%:!56
!"
Detection
Estimation
FIGURE 1. Estimation flow of pest and disease severity
flow contains two phases, namely target crop detection and
estimation of pest and disease severity, as shown in Figure 1.
A. TARGET CROP DETECTION
To accurately detect our target crop, the overall process
is further divided into the global positioning and the local
recognition.
1) Global Positioning
The size of the captured image is much greater than that
of images used for target crop recognition. This means the
captured image needs to be first split into many segmenting
images before it is transferred to the BNN. Instead of using
sliding windows, the presented global positioning consists of
the selective search [23] and a thresholding scheme.
The selective search is based on a hierarchical grouping
algorithm [23], where each region in an image is based
on five similarity measures, namely color similarity, texture
similarity, size similarity, shape similarity/compatibility, and
a final meta-similarity measure.
•
•
Color similarity: All the values of color channel for
each region are calculated, and then the histogram of
each color channel is represented by 25 bins. Next, 75
bins (25 for each red, green and blue channel) are combined into a vector. Color similarity is to measure the
histogram intersection distance between two regions.
Texture similarity: Eight Gaussian derivatives of an image are created, which are then used to extract histogram
with 10 bins for each color channel. Next, a 10 × 8
× 3 dimensional vector for each region are generated.
3
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
Algorithm 1 Global positioning - thresholding
1: for i ← 0, m do
2:
#greeni ← Calc(Ii )
#greeni
3:
ratioi ←
#totali
4:
if ratioi > thres then
5:
Add(Ii , ROIlocal )
6:
end if
7: end for
To compute texture similarity between two regions,
histogram intersection is also used.
• Size similarity: Smaller regions would be merged earlier
rather than later. This ensures that region proposals at all
scales are formed in all parts of the image, while it can
prevent a large number of clusters from swallowing up
all smaller regions.
• Shape similarity/compatibility: Two regions are compatible when they fit well into each other. When two
regions do not even touch each other, they would not be
merged.
• Final meta-similarity: A final meta-similarity between
two regions acts as a linear combination of aforementioned four similarities.
Through the hierarchical grouping algorithm, the ROIs are
thus extracted. When the color features of target crops are
taken into consideration, it can be known that not all the
ROIs need to be transferred to the BNN for the target crop
recognition. A thresholding scheme, as given in Algorithm 1,
is presented in this work to reduce the number of ROIs. As
depicted in Equation 2, the original ROIs of a captured image
(ROIori ) are represented as m + 1 segmenting images.
ROIori = {I0 , I1 , ..., Im };
(2)
For each segmenting image Ii in ROIori , its number of
green pixels (#greeni ) is calculated first. Here, based on the
real crops, when the color of a pixel is within a specific color
range, this pixel will be defined as a green pixel. When the
ratio (ratioi ) of the number of green pixels to that of total
pixels (#totali ) in a segmenting image Ii is greater than
a predefined threshold (thres), this segmenting image Ii is
thus included in ROIlocal for the local recognition phase.
is used in the last four ones. Through the convolutional operations and the maximum polling ones, the sizes of the feature
maps are individually 32×32, 16×16, 14×14, 7×7, 5×5,
3×3 and 1×1. The depths of all the layers are individually
32, 64, 128, 128, 256, 256, 256, 512, 512 and 10.
When the global positioning phase finishes, each segmenting image in ROIlocal is resized as a 64 × 64 8-bit
RGB image and then transferred to the BNN for target crop
recognition. If the target crop is recognized in a segmenting
image, this segmenting image is thus used in the estimation
pest and disease severity.
B. ESTIMATION OF PEST AND DISEASE SEVERITY
This phase is to estimate the pest and disease severity of
the target crop, so that the biological agent can be decided
whether it should be applied to the target crop. By referring
to the experience of agricultural experts, our target crops, that
is, dragon fruits, can be divided into five levels of pest and
disease severity, as shown in Figure 3. Here, the increase in
the value of level means the increase in the pest and disease
severity.
As described in Section II-A, the estimation of five levels
of pest and disease severity is mainly based on the size of
deformed or discoloured area relative to the whole crop.
The method used in this work is as given in Algorithm 2.
Here, Imgextract represents a segmenting image in which the
target crop can be recognized. To distinguish the area of pests
and diseases from the target crop, based on the characteristics
of color, two masks, namely maskcrop and maskseverity , are
used to remove the background area (Line 2). By performing
the edge detection (Line 3), the target crop and its area of
pests and diseases are then contoured. Next, the pixel counts
of the target crop (pixelcrop ) and those of area of pests and
diseases (pixelseverity ) are individually calculated (Line 4).
The ratio of pixelseverity to pixelcrop is then used to estimate
which level of pest and disease severity (Level0 ∼ Level4)
is.
When the level of pest and disease severity is greater than
Level2, the biological agents are applied to the target crop.
Therefore, through the target crop detection as described in
Section III-A and the estimation of pest and disease severity
as described in Section III-B, the agricultural robot can
intelligently decide if biological agents need to be applied
to the target crops.
2) Local Recognition
IV. FPGA-BASED HARDWARE/SOFTWARE DESIGN
To increase the recognition accuracy of target crops, in the
local recognition phase, we also present a BNN architecture,
as shown in Figure 2. Its input is a 64 × 64 8-bit RGB image,
while it outputs a 16-bit result. Here, ten categories can be
recognized. This BNN architecture contains seven convolutional operations, three maximum polling ones, and three
fully-connected ones. The kernel size of the convolutional
operation and that of the maximum pooling operation are
3×3 and 2×2, respectively. The same padding is used in the
first three convolutional operations, while the valid padding
The full system architecture design is shown in Figure 4. A
Linux operating system (OS) runs on the microprocessor.
A USB controller is to connect to a camera for capturing
real-time images. Furthermore, a UART controller is used
to connect to the motor control board for controlling the
movement of the agricultural robot.
The design flow for implementing the FPGA-based system
architecture is illustrated in Figure 5. By using the deep
learning framework, the target crop images (training data)
are trained along with the presented BNN model. When the
4
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
Conv. +
MaxPool
Conv. +
MaxPool
Conv. +
MaxPool
Conv.
Conv.
Conv.
Conv.
FC
64 3
32
16 3
3
3
3
3
64
32
3
14
64
5
3
3
3
16
32
7 3
14 3
256
FC
FC
3
3
1
3
5
128
1
3
3
7
128
3
3
256
256
512
512
10
FIGURE 2. Proposed BNN architecture
(a) Level 0
(b) Level 1
(c) Level 2
(d) Level 3
(e) Level 4
Algorithm 2 Estimation of pest and disease severity
1: function P IXEL C OUNT (Img, mask)
2:
Imgmasking ← M asking(Img, mask)
3:
Imgedge ← EdgeDetect(Imgmasking )
4:
count ← ContourArea(Imgedge )
5:
return count
6: end function
7: pixelcrop ← P ixelCount(Imgextract , maskcrop )
8: pixelseverity ← P ixelCount(Imgextract , maskseverity )
pixelseverity
9: ratio ←
pixelcrop
10: if ratio ≥ 80% then
11:
result ← Level4
12: else if ratio ≥ 60% then
13:
result ← Level3
14: else if ratio ≥ 40% then
15:
result ← Level2
16: else if ratio ≥ 20% then
17:
result ← Level1
18: else
19:
result ← Level0
20: end if
FIGURE 3. Five levels of of pest and disease severity
FPGA
training phase finishes, the BNN parameters and topology
are extracted to generate the corresponding BNN hardware
module. Instead of implementing the corresponding (hardware description language) HDL codes directly, the highlevel synthesizer (HLS) is used in this work to translate the
high-level languages such as Python and C++ into the HDL
codes such as Verilog codes. This makes the BNN hardware
module easily replaceable without rewriting HDL codes and
can quickly evolve the refinement of the BNN architecture.
Finally, by using the FPGA design tool, the BNN hardware
module can be generated and then configured in the programmable logic, while the application programs, including
selective search, thresholding, detection and estimation, and
the OS are executed on the microprocessor.
0%(',11(%
23)45
;<<
7 ;"
'()*%(++,%
#$.,%$
!"#$%&'("
!"#$%&
'()*%(++,%
!"'$%&
-,.(%/"
'()*%(++,%
789:"
'()*%(++,%
-(*(%"
'()*%(+"6($%&
-,.(%/
FIGURE 4. FPGA-based system architecture
5
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
5-'%*%*.+9'('
Deep
Learning
!1$1&(%21+!1'-&34+53-103)$6%*.4+ Framework
71&).*%(%)*4+80(%/'(%)*
!
"##$%&'(%)*+
,-).-'/0
!"#$!
!"#"$%"&'())'
*!+,-%"+%.!"
3*44,#
Python & C++
Python
Cross
Compiler
/*-*+&)(!$"
High-Level
Synthesizer
5&*,*6&)!,(!6#'+1
!!"
#$%&'$'()*)+,(
Verilog
FPGA
Design Tool
!",(--+.#)
/0-1(%
2 !",(--"!
!"#!$%%$&'()
*"#+,
!!"#$%&'$%("
)*&+,(
%!&'()*'+$*,(-*.
*'+&'0*01(+$!)21
FIGURE 6. Agricultural robot
FIGURE 5. Design flow for implementing the FPGA-based system
architecture
V. SYSTEM EVALUATION
To evaluate the the proposed method, the AVNET Ultra96V2 platform containing a Xilinx Zynq UltraScall+T M MPSoC ZU3EG A484 device was adopted to implement the
FPGA-based hardware/software design. Figure 6 shows the
agricultural robot, and the AVNET Ultra96-V2 platform was
equipped in the main control box. The agricultural robot
was customized for this work, instead of using an existing
product. The robot contained a robotic arm equipped with a
camera to capture the images of target crops and a nozzle
to apply the biological agents to the crops. Furthermore, a
motor control board was also in the main control box and
responsible for controlling the movement of the robotic body
and the robotic arm. Note that this work focuses on the robot
vision system, and Figure 6 mainly shows where the robot
vision system would perform on. This work does not address
on the mechanical and electrical engineering of the robotic
body and arm.
A deep learning framework called Theano [24] was used
for the training of the presented BNN. It was executed on
a server equipped with an Intel Core i7-8700 CPU, 64 GB
RAM, and a graphic card that contained a NVIDIA GeForce
RTX 2080 GPU. Fig. 7 shows the real scene of our greenhouse, where one hundred target crops, i.e., dragon fruits
were planted. As shown in Fig. 8, the images of different
levels of pest and disease severity were captured by the
cameras for training. Furthermore, the dragon fruits in the
platform of data collection were changed every day to ensure
the data diversity of captured images.
In the proposed system, a thresholding scheme as introduced in Algorithm 1 was used, while it was based on the
ratio of the number of green pixels to that of total pixels in
a target crop. In the real scene as shown in Fig. 7, besides
the target crops, in fact, most of the remaining objects would
be filtered out through the thresholding scheme. Therefore,
!"#$%&"'()*$%+,%("*#$(%&*+'-
FIGURE 7. Greenhouse
the accuracy of the target crop recognition in our real scene
is close to 100%. To completely and objectively evaluate the
presented BNN, the ImageNet dataset [25] was thus used in
this experiment. Here, besides the target crops, the remaining
images were obtained from the ImageNet dataset [25]. Our
training data contained 5,000 images of the target crops
and 45,000 images containing nine different categories of
objects. Here, per category contained 5,000 images, and
the nine categories consisted of acorn, banana, bell pepper,
cauliflower, spider, ladybug, lemon, mushroom, and orange.
Furthermore, the number of training epochs was set as 500,
while the batch normalization [26] was applied to the BNN
training for acceleration.
We adopted the FINN framework [18] to generate the
corresponding BNN hardware module. Here, the generated
BNN module was based on a Xilinx Vivado HLS project.
We extracted the compiled HDL results and integrated them
into the system design. In the BNN hardware module, all
the information of parameters and topology obtained through
the training phase was stored in a parameter memory bank
and then loaded into the corresponding layers to customize
the BNN architecture. According to our implementation, this
proposed design can operate up to 100.04 MHz.
6
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
TABLE 2. Comparison on power consumptions (W)
!"#$!%
BNN [8], [20], [21]
Ours
!"!#$%
!"!#$&
!"!#$'
!"!#$(
!"!#$)
FIGURE 8. Image capture of target crops
TABLE 1. Comparison on BNN configurations
Input Image
#Conv.
#MaxPool
#FC
Params(Mbits)
BNN [8], [20], [21]
32 × 32
6
2
3
1.55
Ours
64 × 64
7
3
3
2.12
A real case of the target crop detection and estimation of
pest and disease severity is showed in Figure 9. Through the
global positioning, some ROIs were labelled in the image.
Next, through the local recognition, the segmenting image
containing the target crop was then extracted for the calculation of the ratio of pixelseverity to pixelcrop . Finally, based
on Algorithm 2, the estimation result was thus obtained.
To further analyze the proposed design, an existing BNN
architecture [8] used in Jokic et al. [20] and Hashimoto
et al. [21] was also implemented for comparison. In the
following sections, we will discuss the BNN configurations,
the recognition accuracy and the system performance.
A. BNN CONFIGURATIONS
The comparison between the configurations of the existing
BNN architecture [8], [20], [21] and ours is given in Table 1. Compared to the existing BNN architecture [8], [20],
[21], besides the number of fully connected layers (#F C),
our presented BNN architecture contains more convolutional layers (#Conv.) and more maximum polling layers
(#M axP ool). Furthermore, the size of our input image is a
64 × 64 RGB image. As a result, compared to the existing
BNN architecture [8], [20], [21], ours increases 36.77% of
parameters (P arams).
The comparison on resource usage in terms of the available
FPGA resources by integrating the existing BNN architecture [8], [20], [21] and ours into the system is shown in
Figure 10. The corresponding power consumptions are given
in Table 2. Owing to the deeper and larger architecture,
compared to the existing BNN architecture [8], [20], [21], we
can expect that the integration of the presented BNN architecture into the system needs more resources usages and power
consumptions. According to the experiments, integrating our
BNN architecture increases 7.66% of available LUTs, 4.73%
of available Flip-Flops, and 16.66% of available BRAMs,
Static
0.324
0.326
Dynamic
2.664
2.856
Total
2.989
3.182
while it results in a power increase of 6.46%. Compared to
the increases in LUT and Flip-Flop, that in BRAM is more
obvious. This is because the presented BNN architecture
contains more parameters, as shown in Table 1, so that more
BRAMs were required to implement the parameter memory
bank of the BNN hardware module.
B. RECOGNITION ACCURACY
In this experiment, three sets of test images, including 400
images, 600 images and 800 images, were used to test the
accuracies of the system designs by integrating the existing
BNN architecture [8], [20], [21] and ours. As described in
Section V, besides the target crops, the remaining categories
of objects were also used for testing. In the above three test
sets, every category contained the same number of images.
Two evaluation metrics, namely top-1 accuracy and top-5
one, were used in the experiment, and they were defined as
given in Definition 1 and Definition 2, respectively.
Definition 1: Top-1 accuracy is the conventional accuracy,
that is, the classification result having highest probability
must be exactly the expected result.
Definition 2: Top-5 accuracy indicates the expected result
must be exactly one of the classification results having the
5 highest probabilities.
For the existing BNN architecture [8], [20], [21] and ours,
the experimental results in terms of top-1 and top-5 accuracy
rates are shown in Figure 11. According to the experiments,
compared to the existing BNN architecture [8], [20], [21], the
top-1 accuracy rate using ours can be increased by 32.25% to
32.84%, while the top-5 one can be increased by 14.99% to
15.17%.
Compared to the existing BNN architecture [8], [20], [21],
as shown in Table 1, besides our presented BNN architecture
contains more layers, the size of input image is also larger.
As a result, in the training phase, more key features of the
target crops could be extracted so as to increase the recognition accuracy. Furthermore, compared to the existing BNN
architecture [8], [20], [21], integrating our BNN hardware
module into the system requires only a few extra resources
(less than 17% of available FPGA resources), as described in
Section V-A, but the recognition accuracy can be increased
significantly.
C. SYSTEM PERFORMANCE
Based on the flow of target crop detection and estimation
of pest and disease severity as shown in Figure 1, given
the time of Tselect microseconds for selective search and
thresholding, the time of Trecognize microseconds for target
crop recognition using the BNN, and the time of Testimate
7
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
Local Recognition
Global Positioning
Estimation
Calculation of Ratio
FIGURE 9. A real case of the target crop detection and the estimation of pest and disease severity
,<=4>?@<A>=BC<A1DAE<?F=A4GAEH<A
BIB10BJ0<A/8;-A?<=4>?@<=AKLM
5NNAO(PQAO" PQAO"!P
TABLE 3. System performance
:>?=
(
'
CPU
GPU
FPGA
FPGA
&
%
$
Time (µs)
564,248,000
163,337
88,011
152,670
FPS
1.42
4,897.85
9,089.77
5,240.06
#
"
!
)*+
)*+,-. /0123/042
5,-.
678
9:
5*/;
FIGURE 10. Comparison on resource usage
>33:?(@A:?" @A:?"!@
56678069:8042:;<=
Computing Architecture
ARM Cortex-A53
NVIDIA GeForce RTX 2080
BNN [8], [20], [21]
Ours
B78C
)
(
'
&
%
$
#
"
!
*+,-!
*+,-%
./012324$
*+,-!
*+,-%
*+,-!
./012324&
*+,-%
./012324(
FIGURE 11. Top-1 and top-5 accuracy rate
microseconds for estimation of pest and disease severity, the
total processing time Ttotal is as depicted in Equation 3.
Ttotal = Tselect + Trecognize + Testimate
(3)
As shown in Figure 5, the most computing-intensive part,
that is, the BNN computation, is implemented as a hardware
module for acceleration. The remaining parts, including the
selective search and thresholding (Tselect ) and the estimation of pest and disease severity (Testimate ), are executed
on the microprocessor. In this experiment, we thus focus
on comparing the amounts of recognition time (Trecognize )
on different computing architectures, including the ARM
Cortex-A53 CPU, the NVIDIA GeForce RTX 2080 GPU,
the existing BNN architecture [8], [20], [21] on FPGA and
ours on FPGA. The set of 800 images, as introduced in
Section V-B, was used for evaluation. The experimental
results, including the required processing time and the frames
per second (F P S), are given in Table 3. Here, besides the
existing BNN architecture [8], [20], [21], the presented BNN
architecture, as shown in Figure 2, was also implemented on
the ARM Cortex-A53 CPU and the NVIDIA GeForce RTX
2080 GPU for comparison. Note that NIVIDIA TensorRT
was not used in the GPU implementation.
According to the experiments, the FPS using the existing
BNN architecture [8], [20], [21] is the best among all the four
computing architectures. However, as introduced in Table 1,
compared to our presented BNN architecture, the existing
BNN architecture [8], [20], [21] receives smaller input images and contains fewer layers. As a result, we can expect its
processing time must be less than those using the other three
computing architectures when the numbers of images are the
same. Its FPS, of course, can be also greater than those using
the other three computing architectures. However, as shown
in Figure 11, the top-1 accuracy rates using the existing BNN
architecture [8], [20], [21] are all less than 10%. Such a bad
8
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
accuracy cannot be accepted in this work. When the same
BNN architecture with higher accuracy, as shown in Figure 2, were implemented on the ARM Cortex-A53 CPU and
the NVIDIA GeForce RTX 2080 GPU, our BNN hardware
module on the FPGA can accelerate the FPS by a factor of
3,690.18 and a factor of 1.07, respectively. Note that, in fact,
the NVIDIA GeForce RTX 2080 GPU is a very powerful
GPU device mainly used in a server and not in an embedded
computing system. Therefore, this experiment demonstrates,
when the recognition accuracy is taken into consideration,
the proposed design can provide higher system performance
to satisfy real-time requirements.
VI. CONCLUSIONS
To enable the agricultural robot to intelligently decide if
biological agents need to be applied to the target crops, this
work proposes an FPGA-based hardware/software design
using BNNs. In the target crop detection, a thresholding
scheme is used to reduce the number of ROIs in a captured
image, while a BNN architecture is presented to help recognize the target crop. Furthermore, an estimation method
of pests and disease severity is also presented. Experiments
show, although integrating the presented BNN architecture
needs a few extra FPGA resources, not only the recognition
accuracy can be increased significantly but the high FPS
can be also achieved. However, in the real environment, the
target crop detection is easily affected by light, capture-angle
and background objects. In the future, besides RGB images,
infrared and depth images will be used in our design for the
enhancement of recognition accuracy. Furthermore, different
QNN architectures will be also discussed and tested.
REFERENCES
[1] S. C. Borgelt, J. D. Harrison, K. A. Sudduth, and S. J. Birrell, “Evaluation
of GPS for applications in precision agriculture,” Applied Engineering in
Agriculture, vol. 12, no. 6, pp. 633–638, 1996.
[2] Z. Zhao, P. Zheng, S. Xu, and X. Wu, “Object detection with deep learning:
A review,” IEEE Transactions on Neural Networks and Learning Systems,
vol. 30, no. 11, pp. 3212–3232, 2019.
[3] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu, “Edge computing: Vision and
challenges,” IEEE Internet of Things Journal, vol. 3, no. 5, pp. 637–646,
2016.
[4] S. Liang, S. Yin, L. Liu, W. Luk, and S. Wei, “FP-BNN: Binarized neural
network on FPGA,” Neurocomputing, vol. 275, pp. 1072 – 1086, 2018.
[5] J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang,
N. Xu, S. Song, Y. Wang, and H. Yang, “Going deeper with embedded
fpga platform for convolutional neural network,” in Proceedings of the
ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2016, p. 26–35.
[6] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio,
“Quantized neural networks: Training neural networks with low precision
weights and activations,” The Journal of Machine Learning Research,
vol. 18, no. 1, p. 6869–6898, 2017.
[7] M. Courbariaux, Y. Bengio, and J.-P. David, “Binaryconnect: Training
deep neural networks with binary weights during propagations,” in Proceedings of the 28th International Conference on Neural Information
Processing Systems, vol. 2, Dec. 2015, p. 3123–3131.
[8] N. J. Fraser, Y. Umuroglu, G. Gambardella, M. Blott, P. Leong, M. Jahre,
and K. Vissers, “Scaling binarized neural networks on reconfigurable
logic,” in Proceedings of the 8th Workshop and the 6th Workshop on
Parallel Programming and Run-Time Management Techniques for ManyCore Architectures and Design Tools and Architectures for Multicore
Embedded Computing Platforms, 2017, p. 25–30.
[9] A. Shawahna, S. M. Sait, and A. El-Maleh, “FPGA-based accelerators of
deep learning networks for learning and classification: A review,” IEEE
Access, vol. 7, pp. 7823–7859, 2019.
[10] K. K. Zaw, Z. M. M. Myo, and D. T. H. Thoung, “Support vector machine
based classification of leaf diseases,” International Journal Science and
Engineering Applications, vol. 7, pp. 143–147, 2018.
[11] G. Dhingra, V. Kumar, and H. D. Joshi, “A novel computer vision based
neutrosophic approach for leaf disease identification and classification,”
Measurement, vol. 135, pp. 782–794, 2019.
[12] A. Bierman, T. LaPlumm, L. Cadle-Davidson, D. Gadoury, D. Martinez,
S. Sapkota, and M. Rea, “A high-throughput phenotyping system using
machine vision to quantify severity of grapevine powdery mildew,” Plant
Phenomics, vol. 2019, 2019, article ID 9209727.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with deep convolutional neural networks,” in Proceedings of the 25th
International Conference on Neural Information Processing Systems Volume 1, 2012, p. 1097–1105.
[14] L. Zhang, G. Gui, A. M. Khattak, M. Wang, W. Gao, and J. Jia, “Multitask cascaded convolutional networks based intelligent fruit detection for
designing automated robot,” IEEE Access, vol. 7, pp. 56 028–56 038, 2019.
[15] Y. Yu, K. Zhang, H. Liu, L. Yang, and D. Zhang, “Real-time visual
localization of the picking points for a ridge-planting strawberry harvesting
robot,” IEEE Access, vol. 8, pp. 116 556–116 568, 2020.
[16] E. Nurvitadhi, D. Sheffield, Jaewoong Sim, A. Mishra, G. Venkatesh, and
D. Marr, “Accelerating binarized neural networks: Comparison of FPGA,
CPU, GPU, and ASIC,” in Proceedings of International Conference on
Field-Programmable Technology, 2016, pp. 77–84.
[17] R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. Srivastava, R. Gupta,
and Z. Zhang, “Accelerating binarized convolutional neural networks
with software-programmable FPGAs,” in Proceedings of the ACM/SIGDA
International Symposium on Field-Programmable Gate Arrays, 2017, p.
15–24.
[18] Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre,
and K. Vissers, “FINN: A framework for fast, scalable binarized neural
network inference,” in Proceedings of the 2017 ACM/SIGDA International
Symposium on Field-Programmable Gate Arrays. ACM, 2017, pp. 65–
74.
[19] D. J. M. Moss, E. Nurvitadhi, J. Sim, A. Mishra, D. Marr, S. Subhaschandra, and P. H. W. Leong, “High performance binary neural networks on
the Xeon+FPGAT M platform,” in Proceedings of the 27th International
Conference on Field Programmable Logic and Applications (FPL), 2017,
pp. 1–4.
[20] P. Jokic, S. Emery, and L. Benini, “BinaryEye: A 20 kfps streaming camera
system on FPGA with real-time on-device image recognition using binary
neural networks,” in Proceedings of IEEE 13th International Symposium
on Industrial Embedded Systems (SIES), 2018, pp. 1–17.
[21] S. Hashimoto, Y. Sugimoto, K. Hamamoto, and N. Ishihama, “Ship classification from SAR images based on deep learning,” in Intelligent Systems
and Applications. Springer International Publishing, 2019, pp. 18–34.
[22] T. Simons and D.-J. Lee, “A review of binarized neural networks,” Electronics, vol. 8, no. 6, 2019.
[23] P. Felzenszwalb and D. Huttenlocher, “Efficient graph-based image segmentation,” International Journal of Computer Vision, vol. 59, pp. 167–
181, 2004.
[24] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins,
J. Turian, D. Warde-farley, and Y. Bengio, “Theano: A CPU and GPU
math compiler in python,” in Proceedings of the 9th Python in Science
Conference, 2010, pp. 3–10.
[25] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang,
A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal
of Computer Vision (IJCV), vol. 115, no. 3, pp. 211–252, 2015.
[26] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network
training by reducing internal covariate shift,” CoRR, vol. abs/1502.03167,
2015, http://arxiv.org/abs/1502.03167.
9
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3058110, IEEE Access
CHUN-HSIAN HUANG (M’17) received his
Ph.D. degree in Computer Science and Information Engineering from National Chung Cheng
University, Taiwan, in January 2011. In July 2011,
he was a postdoctoral scholar at Intel-NTU Connected Context Computing Center, National Taiwan University. From August 2011 to February 2012, he was an assistant researcher at the
Chung-Shan Institute of Science and Technology, Taiwan. Since February 2012, he joined the
faculty of Department of Computer Science and Information Engineering, National Taitung University. He is currently an Associate Professor in the Department of Computer Science and Information Engineering of National Taitung University. Dr. Huang’s research interests include embedded systems, reconfigurable computing, cyber physical systems and robotic applications. Details can be found on the website.
https://sites.google.com/site/chunhsianhuang/english-version
10
VOLUME x, 2020
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
Download