Uploaded by goal goal

Federated Learning Aided Deep Convolutional Neural Network Solution for Smart Traffic Management

advertisement
NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium | 978-1-6654-7716-1/23/$31.00 ©2023 IEEE | DOI: 10.1109/NOMS56928.2023.10154300
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
Federated Learning Aided Deep Convolutional
Neural Network Solution for Smart Traffic
Management
Guanxiong Liu1 Student Member, IEEE, Nicholas Furth1 , Hang Shi1 , Abdallah Khreishah1 Senior Member, IEEE,
Jo Young Lee1 , Nirwan Ansari1 Fellow, IEEE, Chengjun Liu1 , and Yaser Jararweh2
1 New Jersey Institute of Technology, Neward, NJ, US
2 Jordan University of Science and Technology, Irbid, Jordan
{gl236, nf77, hs328, abdallah, jo.y.lee, nirwan.ansari, cliu}@njit.edu, yijararweh@just.edu.jo
Abstract—Machine learning models, especially neural network
(NN) classifiers, have shown tremendous potential of being used
in complex tasks such as image classification, object detection
and video analytics. However, to be adopted in the real-world
applications, there are still problems to be answered. One of these
problems is that training machine learning models, especially
NN models, requires a certain level of computation and data
processing. Other problems are the limited bandwidth of the
network and the possibility of exposing the privacy of the users
to attacks if the training data (specially video) is going to be
transferred through the network. To mitigate these problems,
researchers recently proposed the concept of federated learning.
In this paper, we build a video analytic application for
traffic management and train it using federated learning. More
specifically, each traffic surveillance camera combined with its
co-located small PC are seen as the worker node in federated
learning. In this way, the NN model in each node can be
trained on data collected from all nodes without transmitting
and sharing with a central server, which resolves all of the
above mentioned problems. The performance of the trained NN
model is evaluated via experiments under different open sourced
datasets to demonstrate that the proposed work has the potential
to enhance the detection accuracy (mAP) over 40%.
Index Terms—Machine Learning, Neural Network, Traffic
Video Analytic
the network, which could saturate the network and make the
transmitted data vulnerable to privacy attacks.
To solve these problems federated learning has been proposed [2]. Different from the centralized training architecture,
federated learning tries to train the NN classifier with a small
copy of data which reduces the computation consumption.
Since this will lead to a biased and sub-optimal model, federated learning further aggregates the trained model in a central
node to mitigate the bias. Besides saving the computational
power, federated learning also resolves the data privacy issue
by only transmitting the model parameters which also significantly reduces the traffic load on the connections between
nodes when videos are used as the training data. Federated
learning is also suitable to be combined with the edge computing architecture which is widely adopted in today’s Internetof-things (IoT) [3], [4]. In such a combination, the cloudlet in
the edge computing paradigm can take the role of the worker
node in federated learning. In this paper, we propose a proofof-concept architecture that combines federated learning and
edge computing for traffic video analytic applications.
The contribution of our work could be listed as follows.
•
I. I NTRODUCTION
Due to their surprisingly good representation power of
complex distributions, neural network (NN) models are shown
to be the most successful solutions for many complex tasks.
For example, recent NN classifiers have outperformed other
methods in image classification, object detection and face
recognition tasks [1]. However, in order to be applied in the
real-world applications, there are still problems that need to
be resolved. For example, the training of the NN classifier,
especially the classifier that is designed with large number
of layers for better performance, requires a certain level of
computational power and data with enough diversity. However,
in many real-world applications, such as in traffic management
systems, the computational power is not centralized as well as
the data. In such applications, to train a NN classifier, we
need a huge amount of training data to be transmitted through
•
•
We build a proof-of-concept architecture which combines
federated learning with edge computing.
Based on the proposed architecture, we implement the
NN aided traffic video analytic application that can identify the car, bus and pedestrian in the video.
Through extensive evaluation, we show that the federated
model achieves much better overall performance. The
average improvement of the object detection accuracy
(mAP) comparing the federated model and the single NN
model is larger than 10% in all cases. Moreover, when the
node has low quality data (e.g. insufficient training data
and low data diversity), the improvement can be increased
to over 40%.
The rest of the work is organized as follows. Section
II summarizes important background material. Sections III
details the design of our proposed architecture, federated
training and NN model implementation. Section IV presents
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
the evaluation setting and results. Section V concludes the
paper.
II. BACKGROUND
In this section, we review some fundamental topics and
provide references for further understanding of the concepts
presented throughout this work.
A. Edge Computing
While cloud computing has been the most popular choice
for services that require vast amount of data processing power,
the limited network bandwidth is still the bottleneck of such
cloud-centralized design. To achieve better performance, both
data processing power and network bandwidth have to be
taken into consideration. As a result, the paradigm of edge
computing has been recently introduced which can push some
of the computing resources away from the centralized nodes
to the edge of the network.
A considerable amount of policies and algorithms have
been proposed for the edge network architecture. For example,
Chiang and Zhang [5] summarize the opportunities and challenges of edge computing in the networking context of IoT
and indicate that the fog concept can fill the technology gaps
in IoT. Moreover, Zhao et al. [6] propose a cluster content
caching structure for cloud radio access networks (C-RANs)
to tackle the problems of high power consumption and poor
QoS for real-time services caused by significant data exchange
in both backhaul and fronthaul links.
Besides the research of edge computing architecture, there
are also research works that focus on edge computing aided
applications. As an example, Kiani et al. [4] study the problem
of combining edge computing with traffic surveillance system.
In our work, we further enhance the edge-computing based
traffic surveillance system by utilizing federated learning,
which allows us to perform more advanced tasks.
B. Deep Neural Network
Due to the surprisingly good representation power of complex distributions, deep neural networks (DNN), in recent
years, have been widely used in many applications. One of the
popular use case scenarios is the computer vision related tasks.
For example, the RCNN and YOLO models are considered as
efficient DNNs that focus on object detection [7], [8].
Despite the tremendous success of DNNs, there are still
problems and challenges that need to be resolved under many
application scenarios, such as traffic surveillance systems. On
one hand, recent research shows that DNNs are vulnerable to
attacks such as adversarial examples [9]–[11]. On the other
hand, the training and inference of DNNs have strict requirements on computational power and data diversity. Therefore,
utilizing DNNs into the applications requires further investigation to solve these problems. Based on our review, current
research works on the federated learning shed lights to resolve
some of these challenges [2].
C. Federated Learning
Federated learning is proposed to allow training of Neural
Networks in a distributed manner. With federated learning, we
can train the machine learning model (e.g. DNNs) on multiple
local datasets (limited data diversity) contained in local nodes
(limited computational power) without exchanging individual
data samples. The participants of federated learning include a
central node and several worker nodes. The worker nodes own
their training data and apply updates to the DNNs. The central
node collects these updates from worker nodes and aggregates
the final update [12]. This process could be summarized in the
following steps (Figure 1):
• Step-1: The central node initializes the DNN model.
The architecture is defined and all weight parameters are
properly initialized.
• Step-2: The copies of initialized DNN model are sent to
each worker nodes.
• Step-3: The worker nodes train the DNN model with
their own data for a few epochs. The updates of weight
parameters (compared with the DNN model received from
central node) are calculated.
• Step-4: The worker nodes send the updates back to
central node. The central node aggregates these updates
based on predefined method. Then, the DNN model in
the central node is updated.
In federated learning, the updated DNN model in the central
node could be resent to the worker nodes to repeat the process
in step-2 to step-4 many times.
The requirement of computational power is mitigated since
the worker nodes only need to handle their own training
data which can be light weighted. To meet the data diversity
and prevent overfitting, the central node in federated learning
aggregates the updates before applying to the DNN model. The
central node could be the cloud server in the edge computing
diagram which has enough computational power. In addition to
the computational efficiency, the federated learning also allows
the worker nodes to collaborate in the training process without
sharing data. In this way, the worker nodes that participated
in the federated learning can protect their data privacy [13].
In addition to the process shown in Figure 1, we enhance our
algorithm by adding two key improvements, first, we remove
any outlier gradients before aggregation, outlier are detected
using the l2 distance, also known as the euclidean distance.
Second, after the aggregation has completed we compare the
loss of each local model on its own data on both the aggregated
and local weights and select the better of the two prior to
beginning a new training round.
D. Video Analysis on Edge
The works presented in [14]–[17] provide solutions and
benchmarks for video analytic and computer vision application
on edge devices. The works presented in [14] and [17] provide
frameworks for choosing the best pre-existing solution given
hardware and latency constraints, however their contribution
does not include providing any new solutions for the problems.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
Central Node
Central Node
Central Node
Central Node
Worker Nodes
Worker Nodes
Worker Nodes
Worker Nodes
Fig. 1: Federated Learning
[15] provides a framework to evaluate video analytic solutions.
In addition to measuring the traditional metrics such as F1score, this work also considered if a certain program was
dependent on specific features within a video making it less
generalizable. The work presented in [16] provides a solution
for prediction traffic speed and congestion while minimizing
computational costs on edge devices. Additionally, other existing works such as [18] and [19] only consider classification for
either cars or pedestrians, not both simultaneously. Moreover,
the solutions presented in [18], [19] do not generalize well to
real-time object detection and they only consider a centralized
training setting which limits their real-world potential. The
work presented in [20] provides an overview of existing
techniques for traffic detection, although fails to provide any
new solutions.
The works presented in [21]–[23] each consider computer
vision and video analysis in federated learning (FL) settings.
While the works presented in [22], [23] do not deal with
real-world application like traffic analysis, it does present a
general solution of how to implement large-scale models in
FL. The work presented in [21], unlike [22] and [23] focuses
on the traffic analysis as the practical scenario to demonstrates
its solution. Specifically, [21] proposes a multi-layer design
of distributed system that combines the edge computing and
federated learning for traffic surveillance. However, the focus
of this work is the system design that can accelerate the prediction instead of the edge computing and federated learning
based video analytic algorithm. In other words, [21] does
not implement the proposed system and fail to provide any
empirical evaluation results. The approach which we present
has several key differences from these aforementioned works.
First of all, we propose the joint design of federated learning
and state-of-the-art video analytic method (i.e., YOLOv3).
Secondly, in order to achieve cooperation between federated
learning and YOLOv3 model, we enhance the training method
which makes the trained model significantly outperforms the
YOLOv3 model that is sololy trained. Last but not the least,
we extensively evaluate our proposed approach under the realworld application, traffic analysis, with real-world datasets
which demonstrates its great applicability.
III. S YSTEM D ESIGN
As mentioned before, we build a traffic surveillance application with both edge computing and federated learning
Fig. 2: Edge Computing based Framework
with DNN. Our system could be broken down into three
components: (1) the edge computing based framework, (2) the
implementation of federated learning, and (3) the DNN model
details.
A. Edge Computing based Framework
In this work, we propose an edge computing based framework that consists of multiple cloudlets (with cameras) and
a central cloud as presented in Figure 2. Here, the cloudlet
is formed by a mini computer with a co-located camera at
the edge of the network. All cloudlets are connected to the
cloud through backhaul network. Within each cloudlet, a NN
model is trained to solve the traffic surveillance related task
(e.g. counting pedestrian and vehicle). These cloudlets are
deployed in different locations which means that the recorded
videos could be under different conditions (e.g. lighting,
environmental, angle, traffic, etc). Moreover, these recorded
videos are labelled manually or by supporting devices (e.g.
loop vehicle detector). We assume the video data at each
cloudlet is private such that the cloudlets cannot share data
among each other. All the data transmissions happen between
cloudlets and the cloud. In the traditional edge computing
framework, the raw video data is transferred from cloudlet to
the cloud for further processing, which always causes network
congestion and privacy concerns. In our federated learning
model, we only transfer the weights trained by the NN to
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
Fig. 3: The network structure of YOLOv3.
the cloud, which can significantly reduce the data transmission
pressure on the network and hide the private video data details.
∆θ = fagg (∆θ0 , ..., ∆θk )
0
0
θ ←− θ + ∆θ
(1)
(2)
B. Implementation of Federated Learning
On top of the edge computing framework, we train the NN
model by cooperation among the distributed cloudlets and the
central cloud through a federated learning based method.
Firstly, we focus on one of the cloudlets. We assume
that Xi = {xi0 , ..., xin } is the training examples collected by
the cloudlet’s camera locally. Based on these examples, the
weight parameters, θi , of NN model in the cloudlet could
be updated to minimize the corresponding objective function,
Li (Xi ,Yi , θ 0 ). Here, Yi is the label for Xi .
When the updated NN model is ready, this cloudlet could
calculate its overall change with the respect of weight parameters initialized by the cloud (θ 0 ) as ∆θi = θi − θ 0 . Since the
environment conditions for any specific cloudlet is relatively
stable, such change in weight parameters (∆θi ) could lead to
an overfitting model.
To mitigate this issue, the overall change to the weight
parameters in different cloudlets are transmitted to the cloud.
It is worth to mention that transmitting the overall change
requires much less network bandwidth than transmitting the
raw video (S(∆θi ) << S(Xi ) where S(·) denote the data size
calculation).
Once these changes arrive at the cloud, a pre-defined
aggregation function, fagg , is fed to calculate the final update,
∆θ , to the weight parameter.
In this work, the aggregation function calculates the average
of all inputs. The above process is repeated multiple times as
summarized in Algorithm 1.
C. Object Detector Based on YOLOv3
You Only Look Once (YOLO) [24] is a widely known
object detection method using the deep neural network. It can
achieve state-of-the-art object detection accuracy in real-time.
The YOLO method partitions the input frame into multiple
grids and predicts bounding boxes and confidence scores. By
setting up a threshold for the confidence score, we can detect
the objects with the highest likelihood in the frame. YOLOv3
is an incremental update of YOLO [25], that achieves a higher
detection accuracy. It predicts the object class in three different
scales and uses independent logistic classifiers and the binary
cross-entropy loss instead of the softmax layers in prediction.
Therefore, we select to use the structure of YOLOv3 as
the network model in our federated learning framework. The
structure of the neural network we used is shown in Figure. 3.
The first 52 layers of the Darknet-53 are used for feature
extraction. By connecting different scales of concatenation
layers, the network can detect both the small objects far away
from the camera and the large ones close to the camera.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
Algorithm 1 Federated Learning
INPUT: The cloud weight parameters θ 0
OUTPUT: Final weight parameters θ ∗
1:
2:
3:
4:
5:
for Each federated learning epoch do
Update each cloudlet with cloud weight parameters θ 0
for Each cloudlet in parallel do
Update the NN model weight parameters based on
training examples collected and labelled locally.
θi = arg minLi (Xi ,Yi , θ 0 )
θ
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
Calculate the overall change.
∆θi = θi − θ 0
Transmit the ∆θi to the cloud.
end for
Aggregate the changes.
∆θ = fagg (∆θ0 , ..., ∆θk )
Update the cloud weight parameter.
θ 0 ←− θ 0 + ∆θ
end for
Obtain the final cloud weight parameters.
θ∗ = θ0
return θ ∗
IV. E XPERIMENTS
To illustrate the effectiveness of our proposed federated
learning method, we select to use object detection in traffic
videos as the evaluation task. Voc data set [26] and the Coco
data set [27] are two widely used data sets for object detection
evaluation. We randomly sample 5 sub-datasets from these
two datasets individually. We assume that each one of these
sub-datasets is the local data stored in an edge device, which
cannot be shared or transmitted. Therefore, in total, we have
10 different clients in this federated learning system. For
the single NN model, we utilize the entire Voc and Coco
dataset which makes it more challenging for the federated
learning model. To simplify the DNN model, we select to
use the vehicle class and the human class, which are the
most commonly seen objects in traffic videos, as target object
classes to train our DNN models. We have trained three DNN
models in our experiment, two local models, and one federated
model based on the network structure introduced in Sec. III-C.
The local models are trained with the data stored at one of
the edges, specifically, one is trained using Voc data set, and
another is trained using Coco data set. The federated model is
trained with our proposed federated learning method, which
shares the trained weights of two edges.
We use a GPU server with 8 NVIDIA V100 GPUs to train
and test our federated learning method. For the local models,
we first train with frozen layers and 10−3 learning rate for
50 epochs. Then, we fine-tune with 10−4 learning rate for
another 50 epochs. For the federated model, we follow the
same training process. The difference is that we transfer the
weights update from two local models to the cloud each epoch
for aggregation, and then local models re-initialize the model
based on weights sent back from the cloud.
To test the performance of the models, we use a third data
set: the urban tracker [28] as testing data set. The urban tracker
data set includes four video scenarios, with over 7000 annotated video frames. It includes three urban traffic videos, which
contain both vehicles and humans and one indoor video with
only humans. The three traffic videos are taken from different
viewing angles, which are effective to test the generalization
ability of models. We use average precision (AP), precision,
recall, F-score to measure the detection accuracy in each object
class, and use the mean average precision (mAP) to measure
the detection accuracy on average. The precision, recall, and
F-score are calculated based on a confidence score of 0.3.
By comparing the two local models, we can see that the
Coco model always achieves a higher detection accuracy
(mAP) than the Voc model. It reflects the unbalanced training
data quality at different edge sides. By applying the federated model, the training progress will integrate the features
extracted from two edge sides without exchanging the raw
data. This training progress on one hand reduces the data
transmission pressure of the network. On the other hand, it
resolves the data privacy issue. The rationale of the federated
model is that we enhance the object classification accuracy by
enlarging the training sample set. We will further analyze the
effectiveness of the federated model.
Table. I shows the detection accuracy of the urban tracker
datasets. The first row in Table. I shows the detection accuracy
of the ’Stmarc’ video. This video has a high viewing angle as
shown in the first row of Fig. 4. Considering the two single
NN models, for both the vehicle and human classes, we can
see that the Voc model achieves higher precision but lower
recall rate compared with the Coco model, because of the
miss detection happening in the Voc model. In comparison,
the Coco model reaches a higher AP and F-score, which
means that the Coco model has better comprehensive detection
accuracy. For the vehicle class, we can see that both the AP
and F-score of the federated model outperform that of the
Coco model and the Voc model. This happens because the
federated model can achieve higher value in both the Precision
and Recall. For the human class, we can see that the federated
model also achieves a higher AP and F-score compared with
the Coco model and the Voc model. It is clear that the
federated model is able to achieve balanced performance
between Precision and Recall compared with the Voc model.
The mAP represents the average detection accuracy for these
two classes, we can see that the federated model achieves
the highest mAP in these three models. The mAP of the
federated model increases more than 10% comparing to the
Coco model, and around 30% comparing to the Voc model.
This improvement proves that the federated learning structure
can really improve the detection accuracy in practice.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
32.97%
16.01%
45.69%
34.24%
32.17%
72.49%
60.58%
50.57%
67.43%
Coco
Voc
Fed
Coco
Voc
Fed
Coco
Voc
Fed
Coco
Voc
Fed
Stmarc
Rouen
Sherbrooke
Atrium (human only)
AP
Model
Video
NA
86.03%
87.73%
60.05%
49.32%
50.43%
88.08%
35.22%
56.09%
54.02%
Precision
69.96%
59.53%
81.40%
49.59%
31.61%
72.48%
55.27%
17.09%
61.62%
Recall
Vehicle
77%
70.93%
69.11%
49%
38.86%
79.52%
43%
26.20%
57.57%
F-score
81.81%
81.78%
90.58%
21.15%
17.93%
74.15%
55.42%
18.06%
62.10%
29.46%
9.09%
37.64%
AP
Object Class
98.48%
99.76%
98.43%
77.56%
65.75%
54.08%
85.41%
90.21%
95.08%
55.54%
98.10%
60.37%
Precision
89.08%
86.62%
93.65%
21.37%
28.90%
97.18%
63.45%
19.16%
65.88%
45.83%
7.01%
58.36%
Recall
Human
93.54%
92.73%
95.98%
33.51%
40.15%
69.49%
72.81%
31.61%
77.84%
50.22%
13.08%
59.35%
F-score
TABLE I: The comparative detection results of the Coco model, Voc model, and the federated model.
40.87%
34.25%
70.79%
44.83%
25.12%
67.29%
31.22%
12.55%
41.66%
mAP
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
The second row in Table. I shows the detection accuracy
of the ’Rouen’ video. This video has a lower camera hanging
height than the ’Stmarc’ video as shown in the second row
of Fig. 4. The vehicles and humans have a large size, and
includes more detail features in comparison. The precision
and recall of the Coco model and Voc model have similar
performance compared with the ’Stmarc’ video. The federated
model, in comparison, still outperforms two single NN models
in terms of AP, Precision, Recall, and F-score on both Vehicle
and Human classes. For the overall measurement on mAP, we
can see that the federated model achieves 67.29% while the
values from single models are 44.82% (the Coco model) and
25.12% (the Voc model).
The third row in Table. I shows the detection accuracy of the
’Sherbrooke’ video. This video has a lower camera viewing
angle than the ’Stmarc’ and ’Rouen’ videos as shown in the
third row of Fig. 4. The different viewing angles result in
different features of the object. At this viewing angle, the
detection of human is becoming hard for single NN models
since we can see the best AP and F-score is much lower the
previous two rows. However, the performance of the federated
model is not affected based on the results and we believe that
the federated model is indeed benefit from the diversity when
model is jointly trained with multiple clients’ updates.
We can see from these three testing videos that the Voc
model always achieves worse detection accuracy compared
with the Coco model, due to the poor training data quality
(less data). Therefore, the worker node with the Voc dataset
can always benefit from the federated model significantly. In
addition, the work node with the Coco dataset can also get
large improvement by using the federated model. For example,
on ’Sherbrooke’ video, the improvement on mAP over the
Coco model is around 30%. By analyzing the detection
results, we can conclude that the federated model can enhance
the detection accuracy for most worker nodes. For poorly
performed clients, our empirical results show that using the
federated model can improve its mAP over 42%. Even for the
best performed clients, the federated model can still enhance
its mAP by 10% which is significant in object detection.
The last row in Table. I shows the detection accuracy of
the ’Atrium’ video. This video is an indoor video that only
contains humans as shown in the last row of Fig. 4. The
accuracy of all three models is at the same level, which
is pretty high for object detection. This result shows that
taking federated learning approach will not break the original
performance when the single NN model is good enough.
V. C ONCLUSION
In this work, we build an architecture that combines edge
computing and the federated learning to implement a traffic
video analytic application with the NN based YOLOv3 model.
To evaluate the performance of this application, we compare
its performance on object detection (vehicle and human) on
several different video scenarios. The results show that the
federated model can significantly improve the object detection
accuracy in almost all cases. The improvements on mAP over
the best performed client is as large as 10% which is huge
in object detection. For poorly performed clients, federated
learning enables the knowledge sharing which effectively
mitigate insufficient training data or low data diversity. Our
empirical results show that the enhancement on mAP could
become over 40%.
R EFERENCES
[1] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep learning.
MIT press Cambridge, 2016, vol. 1.
[2] J. Konečnỳ, B. McMahan, and D. Ramage, “Federated optimization: Distributed optimization beyond the datacenter,” arXiv preprint
arXiv:1511.03575, 2015.
[3] F. Bonomi, “Connected vehicles, the internet of things, and fog computing,” in The Eighth ACM International Workshop on Vehicular InterNetworking (VANET), Las Vegas, USA, 2011, pp. 13–15.
[4] A. Kiani, G. Liu, H. Shi, A. Khreishah, N. Ansari, J. Y. Lee, and C. Liu,
“A two-tier edge computing based model for advanced traffic detection,”
in 2018 Fifth International Conference on Internet of Things: Systems,
Management and Security. IEEE, 2018, pp. 208–215.
[5] M. Chiang and T. Zhang, “Fog and IoT: An overview of research
opportunities,” IEEE Internet of Things Journal, vol. 3, no. 6, pp. 854–
864, 2016.
[6] Z. Zhao, M. Peng, Z. Ding, W. Wang, and H. V. Poor, “Cluster content
caching: An energy-efficient approach to improve quality of service
in cloud radio access networks,” IEEE Journal on Selected Areas in
Communications, vol. 34, no. 5, pp. 1207–1221, 2016.
[7] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2014, pp. 580–587.
[8] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
once: Unified, real-time object detection,” in Proceedings of the IEEE
conference on computer vision and pattern recognition, 2016, pp. 779–
788.
[9] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing
adversarial examples,” International Conference on Learning Representations, 2015.
[10] G. Liu, I. Khalil, and A. Khreishah, “Zk-gandef: A gan based zero
knowledge adversarial training defense for neural networks,” in 2019
49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). IEEE, 2019, pp. 64–75.
[11] ——, “Gandef: A gan based adversarial training defense for neural
network classifier,” in IFIP International Conference on ICT Systems
Security and Privacy Protection. Springer, 2019, pp. 19–32.
[12] S. Banabilah, M. Aloqaily, E. Alsayed, N. Malik, and Y. Jararweh,
“Federated learning review: Fundamentals, enabling technologies, and
future applications,” Information processing & management, vol. 59,
no. 6, p. 103061, 2022.
[13] J. Posner, L. Tseng, M. Aloqaily, and Y. Jararweh, “Federated learning
in vehicular networks: Opportunities and solutions,” IEEE Network,
vol. 35, no. 2, pp. 152–159, 2021.
[14] U. I. Minhas, L. Mukhanov, G. Karakonstantis, H. Vandierendonck, and
R. Woods, “Leveraging transprecision computing for machine vision
applications at the edge,” in 2021 IEEE Workshop on Signal Processing
Systems (SiPS). IEEE, 2021, pp. 205–210.
[15] Z. Xiao, Z. Xia, H. Zheng, B. Y. Zhao, and J. Jiang, “Towards performance clarity of edge video analytics,” arXiv preprint arXiv:2105.08694,
2021.
[16] G. Liu, H. Shi, A. Kiani, A. Khreishah, J. Lee, N. Ansari, C. Liu, and
M. M. Yousef, “Smart traffic monitoring system using computer vision
and edge computing,” IEEE Transactions on Intelligent Transportation
Systems, 2021.
[17] X. Ran, H. Chen, X. Zhu, Z. Liu, and J. Chen, “Deepdecision: A mobile
deep learning framework for edge video analytics,” in IEEE INFOCOM
2018-IEEE Conference on Computer Communications. IEEE, 2018,
pp. 1421–1429.
[18] P. Sermanet, K. Kavukcuoglu, S. Chintala, and Y. LeCun, “Pedestrian
detection with unsupervised multi-stage feature learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
2013, pp. 3626–3633.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
6th International Workshop on Intelligent Transportation and Autonomous Vehicles Technologies (ITAVT 2023) - Workshop of NOMS 2023
Fig. 4: The comparative detection results of the Coco model, Voc model, and the federated model using the urban tracker
data set. The first column shows the detection results achieved from the Coco model. The second column shows the detection
results of the Voc model. The third column shows the detection results of the federated model. Each row represents a frame
from a video (Stmarc, Rouen, Sherbrooke, Atrium) respectively.
[19] M. Stojmenovic, “Real time machine learning based car detection in
images with fast training,” Machine Vision and Applications, vol. 17,
no. 3, pp. 163–172, 2006.
[20] N. Buch, S. A. Velastin, and J. Orwell, “A review of computer vision
techniques for the analysis of urban traffic,” IEEE Transactions on
intelligent transportation systems, vol. 12, no. 3, pp. 920–939, 2011.
[21] A. B. Sada, M. A. Bouras, J. Ma, H. Runhe, and H. Ning, “A distributed
video analytics architecture based on edge-computing and federated
learning,” in 2019 IEEE Intl Conf on Dependable, Autonomic and Secure
Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf
on Cloud and Big Data Computing, Intl Conf on Cyber Science and
Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). IEEE,
2019, pp. 215–220.
[22] C. He, A. D. Shah, Z. Tang, D. F. N. Sivashunmugam, K. Bhogaraju,
M. Shimpi, L. Shen, X. Chu, M. Soltanolkotabi, and S. Avestimehr,
“Fedcv: A federated learning framework for diverse computer vision
tasks,” arXiv preprint arXiv:2111.11066, 2021.
[23] Y. Liu, A. Huang, Y. Luo, H. Huang, Y. Liu, Y. Chen, L. Feng, T. Chen,
H. Yu, and Q. Yang, “Fedvision: An online visual object detection
platform powered by federated learning,” in Proceedings of the AAAI
Conference on Artificial Intelligence, vol. 34, no. 08, 2020, pp. 13 172–
13 179.
[24] J. Redmon, S. K. Divvala, R. B. Girshick, and A. Farhadi, “You only
look once: Unified, real-time object detection,” in 2016 IEEE Conference
on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas,
NV, USA, June 27-30, 2016. IEEE Computer Society, 2016, pp. 779–
788.
[25] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,”
arXiv, 2018.
[26] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams,
J. Winn, and A. Zisserman, “The pascal visual object classes challenge:
A retrospective,” International Journal of Computer Vision, vol. 111,
no. 1, pp. 98–136, Jan. 2015.
[27] T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan,
P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in
context,” in Computer Vision - ECCV 2014 - 13th European Conference,
Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V, ser.
Lecture Notes in Computer Science, D. J. Fleet, T. Pajdla, B. Schiele,
and T. Tuytelaars, Eds., vol. 8693. Springer, 2014, pp. 740–755.
[28] J. Jodoin, G. Bilodeau, and N. Saunier, “Urban tracker: Multiple
object tracking in urban mixed traffic,” in IEEE Winter Conference on
Applications of Computer Vision, 2014, pp. 885–892.
Authorized licensed use limited to: BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE. Downloaded on June 28,2023 at 09:37:03 UTC from IEEE Xplore. Restrictions apply.
Download