Uploaded by cunhamarisa73

ieee smc 202307

advertisement
IEEE Systems, Man, and Cybernetics Magazine
EDITOR-IN-CHIEF
Tingwen Huang
Texas A&M University at Qatar, Doha, Qatar
tingwen.huang@qatar.tamu.edu
ASSOCIATE EDITORS
Mali Abdollahian, Australia
Mohammad Abdullah-Al-Wadud, Saudi Arabia
Choon Ki Ahn, Korea
Bernadetta Kwintiana Ane, India
Krishna Busawon, UK
György EIgner, Hungary
Liping Fang, Canada
Hossam Gaber, Canada
Aurona Gerber, South Africa
Jason Gu, Canada
Abdollah Homaifar, USA
Okyay Kaynak, Turkey
Kevin Kelly, Ireland
Kazuo Kiguchi, Japan
Abbas Khosravi, Australia
Vladik Kreinovich, USA
Wei Lei, China
Kovács Levente, Hungary
Huaqing Li, China
Jing Li, China
Dongning Liu, China
Agostino Marcello Mangini, Italy
Darius Nahavandi, Australia
Chris Nemeth, USA
Vinod Prasad, Singapore
Hong Qiao, China
Ferat Sahin, USA
Mehrdad Saif, Canada
Claudio Savaglio, Italy
Bahram Shafai, USA
Yin Sheng, China
Jinshan Tang, USA
Liqiong Tang, New Zealand
Ying Tan, Australia
Jiacun Wang, USA
Yingxu Wang, Canada
Margot Weijnen, Netherlands
Peter Whitehead, USA
Zhao Xingming, China
Laurence T. Yang, Canada
SOCIETY BOARD OF
GOVERNORS
Executive Committee
Sam Kwong, President
Imre Rudas, Jr. Past President
Edward Tunstel, Sr. Past President
Enrique Herrera Viedma, Vice President,
Cybernetics
Saeid Nahavandi, Vice President,
Human–Machine Systems
Thomas I. Strasser, Vice President,
Systems Science and Engineering
Yo-Ping Huang, Vice President,
Conferences and Meetings
Karen Panetta, Vice President,
Membership and Student Activities
Okyay Kaynak, Vice President,
Organization and Planning
Shun-Feng Su, Vice President, Publications
Ying (Gina) Tang, Vice President, Finance
Vladik Kreinovich, Treasurer
Tom Gedeon, Secretary
Valeria Garai, Asst. Secretary
Editors
Peng Shi, EIC, IEEE Transactions
on Cybernetics
Robert Kozma, EIC, IEEE Transactions
on Systems, Man, and Cybernetics: Systems
Ljiljana Trajkovic, EIC, IEEE Transactions
on Human–Machine Systems
Bin Hu, EIC, IEEE Transactions
on Computational Social Systems
Tiago H. Falk, EIC, SMC E-Newsletter
Industrial Liaison Committee
Christopher Nemeth, Chair
Sunil Bharitkar
Michael Henshaw
Yo-Ping Huang
Azad Madni
Rodney Roberts
Organization and Planning Committee
Vladimir Marik, Chair
Enrique Herrera Viedma
Mengchu Zhou
Dimitar Filev
Robert Woon
Ferat Sahin
Edward Tunstel
Larry Hall
Jay Wang
Michael Smith
C.L. Philip Chen
Karen Panetta
Publications Ethics Committee
Shun-Feng Su, Chair
Imre Rudas
Edward Tunstel
Vladik Kreinovich
Peng Shi
Fei-Yue Wang
Robert Kozma
Ljiljana Trajkovic
Haibin Zhu
History Committee
Michael Smith
Membership and Student Activities Committee
Karen Panetta, Chair
György Eigner, Coordinator
Christopher Nemeth
Lance Fung
Robert Kozma
Roxanna Pakkar
Saeid Nahavandi
Okyay Kaynak
Tadahiko Murata
Ferial El-Hawary
Paolo Fiorini
Shun-Feng Su
Virgil Adumitroaie
Peng Shi
Ashitey Trebi-Ollennu
Hideyuki Takagi
Standards Committee
Loi Lei Lai, Chair (China)
Chun Sing Lai, Vice Chair (UK)
Wei-jen Lee (USA)
Thomas Strasser (Austria)
Dongxiao Wang (Australia)
Chaochai Zhang (China)
Haibin Zhu (Canada)
Nominations Committee
Imre Rudas, Chair
C.L. Philip Chen
Vladimir Marik
Ljiljana Trajkovic
Awards Committee
Dimitar Filev, Chair
Edward Tunstel
Laurence Hall
Ljiljana Trajkovic
Peng Shi
Michael H. Smith
Vladik Kreinovich
Fellows Evaluation Committee
Edward Tunstel, Chair
Mengchu Zhou, Vice Chair
Liping Fang
Maria Pia Fanti
Vladimir Marik
Germano Lambert-Torres
Karen Panetta
Ching-Chih Tsai
Electronic Communications
Subcommittee
Saeid Nahavandi, Chair
Syed Salaken, Web Editor
Darius Nahavandi, Social Media
Mariagrazia Dotoli
Patrick Chan
Haibin Zhu
Ying (Gina) Tang
Ferat Sahin
György Eigner
Chapter Coordinators Subcommittee
Lance Fung, Chair
Enrique Herrera-Viedma
Imre Rudas
Adrian Stoica
Maria Pia Fanti
Karen Panetta
Hideyuki Takagi
Ching-Chih Tsai
Student Activities Subcommittee
Roxanna Pakkar, Chair
Bryan Lara Tovar
Piril Nergis
JuanJuan Li
X. Wang
Young Professionals Subcommittee
György Eigner, Chair
Ronald Bock
Sonia Sharma
Xuan Chen
Raul Roman
Fernando Schramm
IEEE PERIODICALS
MAGAZINES DEPARTMENT
445 Hoes Lane, Piscataway, NJ 08854 USA
Peter Stavenick
Journals Production Manager
Katie Sullivan
Senior Manager, Journals Production
Janet Dudar
Senior Art Director
Gail A. Schnitzer
Associate Art Director
Theresa L. Smith
Production Coordinator
Mark David
Director, Business Development—
Media & Advertising
Felicia Spagnoli
Advertising Production Manager
Peter M. Tuohy
Production Director
Kevin Lisankie
Editorial Services Director
Dawn M. Melley
Staff Director, Publishing Operations
IEEE SYSTEMS, MAN, AND CYBERNETICS MAGAZINE (ISSN 2333-942X) is published quarterly by the Institute of Electrical and Electronics Engineers, Inc. Headquarters: 3 Park Avenue, 17th Floor, New York, NY 10016-5997 USA, Telephone: +1 212 419 7900. Responsibility for the
content rests upon the authors and not upon the IEEE, the Society or its members. IEEE Service Center (for orders, subscriptions, address
changes): 445 Hoes Lane, Piscataway, NJ 08855-1331 USA. Telephone: +1 732 981 0060. Subscription rates: Annual subscription rates included
in IEEE Systems, Man, and Cybernetics Society member dues. Subscription rates available on request. Copyright and reprint permission:
Abstracting is permitted with credit to the source. Libraries are permitted to photocopy beyond the limits of U.S. Copyright law for the private
use of patrons 1) those post-1977 articles that carry a code at the bottom of the first page, provided the per-copy fee indicated in the code is
paid through the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923 USA; 2) pre-1978 articles without a fee. For other copying, reprint, or republication permission, write Copyrights and Permissions Department, IEEE Service Center, 445 Hoes Lane, Piscataway,
NJ 08854. Copyright © 2023 by the Institute of Electrical and Electronics Engineers Inc. All rights reserved.
Digital Object Identifier 10.1109/MSMC.2023.3280352
IEEE prohibits discrimination,
harassment, and bullying.
For more information, visit http://www.
ieee.org/web/aboutus/whatis/policies/
p9-26.html.
Smart Solutions
for Technology
www.ieeesmc.org
Volume 9, Number 3 • July 2023
Features
2 UAVs-Enabled Maritime Communications
UAVs-Enabled Maritime Communications:
Opportunities and Challenges
By Muhammad Waseem Akhtar and Nasir Saeed
2
9 An ASD Classification Based
on a Pseudo 4D ResNet
Utilizing Spatial and Temporal Convolution
By Shuaiqi Liu, Siqi Wang, Hong Zhang, Shui-Hua Wang,
Jie Zhao, and Jingwen Yan
19 Tooth.AI
Intelligent Dental Disease Diagnosis and Treatment
Support Using Semantic Network
By Hossam A. Gabbar, Abderrazak Chahid, Md. Jamiul Alam Khan,
Oluwabukola Grace Adegboro, and Matthew Immanuel Samson
28 MDN-Enabled SO for Vehicle Proactive
Guidance in Ride-Hailing Systems
19
Minimizing Travel Distance and Wait Time
By Xiaoming Li, Jie Gao, Chun Wang, Xiao Huang, and Yimin Nie
37 Edge Processing
A LoRa-Based LCDT System for Smart Building
With Energy and Delay Constraints
By B Shilpa, Hari Prabhat Gupta, and Rajesh Kumar Jha
ABOUT THE COVER
Functional magnetic resonance imaging
display of the human brain.
©SHUTTERSTOCK/STEPAN KAPL
Departments
& Columns
44
Conference Reports
Mission Statement
The mission of the IEEE Systems, Man, and Cybernetics Society is to serve the interests of its members
and the community at large by promoting the theory, practice, and interdisciplinary aspects of systems
science and engineering, human–machine systems, and cybernetics. It is accomplished through
conferences, publications, and other activities that contribute to the professional needs of its members.
Digital Object Identifier 10.1109/MSMC.2023.3273049
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
1
UAVs-Enabled
Maritime
Communications
UAVs-Enabled Maritime Communications:
Opportunities and Challenges
by Muhammad Waseem Akhtar
and Nasir Saeed
T
he next generation of wireless communication systems will integrate terrestrial
and nonterrestrial networks, targeting
the coverage of the undercovered regions,
especially those connected to marine
activities. Unmanned aerial vehicle (UAV)-based
connectivity solutions offer significant advances to
support conventional terrestrial networks. However,
the use of UAVs for maritime communication is
still an unexplored area of research. Therefore, this
article highlights different aspects of UAV-based
maritime communication, including the basic architecture, various channel characteristics, and use
cases. The article afterward discusses several open
research problems, such as mobility management,
trajectory optimization, interference management,
and beam forming.
Introduction
Seawater covers around 70% of planet Earth, and more
than 90% of the world’s products are moved by a commercial fleet of approximately 46,000 ships [1], [2], [3].
The world is experiencing an ever-growing booming
marine economy with continuous development in conventional sectors, such as fisheries and transportation, and exploring dimensions in maritime activities,
such as tourism, exploring oil and gas resources, and
weather monitoring. Most of these applications
Digital Object Identifier 10.1109/MSMC.2022.3231415
Date of current version: 17 July 2023
2
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
2333-942X/23©2023IEEE
Therefore, developing high-speed maritime networks is of
great importance to improve the onboard user experience.
As a result, maritime communications have garnered substantial interest in the recent past, where the primary purpose is to enhance the broadband network coverage for
terrestrial users with the aid of UAVs that can serve as
aerial base stations (BSs) and relays [4].
In this context, UAVs can play a vital role in maritime
communications either as relays or flying sensors, gathering information in cheaper, safer, and faster ways. They
can successfully perform complex tasks with less human
involvement cost. UAVs in the maritime network have the
potential to manage, control, and monitor maritime activities, including the identification of defects in ships to
diagnose and resolve issues while keeping ships in the
sea, reducing maintenance costs and time. Moreover,
UAVs can also be helpful for maritime natural resource
exploration purposes, such as oil and gas exploration,
especially in harsh and challenging environmental conditions. Furthermore, UAVs equipped with high-resolution
cameras can also be used for security and surveillance
purposes. A single drone can gather more information
than cameras installed at different locations. Inspired by
these trends, we present the key aspects of UAV-aided
maritime communication networks. The goal is to identify
the prospects and challenges of deploying UAVs in the
maritime network. Our major contributions in this article
are summarized as follows:
◆◆ First, we present a design architecture of a UAV-based
maritime communication network.
◆◆ Then, we discuss the channel characteristics in maritime communication networks, such as air-to-sea and
near-sea-surface channels. Also, we present the use
cases of UAV-aided maritime communication (Table 1).
◆◆ Finally, we present the research challenges and future
directions for UAV-based maritime communication
networks.
depend on a reliable and efficient maritime communication network.
Existing maritime networks mainly comprise bandwidth that is too low, very high frequency (VHF) radios,
or satellite communication networks with too high a
cost to support the International Maritime Organization
(IMO) eNavigation concept. However, emerging maritime
networks need wideband, low-cost communication systems to achieve better security, surveillance, and coverage for efficient working conditions for the onboard crew
and passengers. Although wireless broadband access
(WBA) can fulfill the IMO eNavigation requirement, the
implementation of WBA technologies in maritime areas
is questionable.
The typical marine networks comprise a mesh network
of different entities in an integrated satellite–air–sea–
ground network. A stand-alone satellite-based solution
considerably boosts its potential to cover a large area with
high-speed data transmission. However, it suffers from
unavoidable large propagation delays and expensive implementation costs. Alternatively, HF/VHF-based systems are
simple to implement but have limited utilization, i.e., only
in vessel identification, tracking/monitoring, and alerting.
©SHUTTERSTOCK.COM/I’M FRIDAY
UAV-Aided Maritime Communication
Network Architecture
The basic network architecture of a UAV-aided maritime
communication network is shown in Figure 1. In such a
network, UAVs are simultaneously connected with the
maritime control station (MCS), satellite, and sea vessels.
The communication links between UAVs and the MCS, satellites, and ships are primary, whereas the communication link between satellites and the MCS is secondary. In
the following, we discuss the MCS, control links, and data
links in detail.
MCS
An MCS is the brain of maritime networks positioned on
the ship, on UAVs, or underwater to facilitate the operators
of UAVs. The control station may be either stationary or
movable for command and control (C&C) transmission.
The control station equipment can be as simple as a laptop
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
3
with an antenna connected to it or as complex as a rat’s
nest, with wires, antennas, computers, electronics boxes,
joysticks, and monitors.
Control Links
The link used for talking from a BS in the ship or at the
coast to users (UAV, satellite, or ship) in UAV-assisted maritime networks is called the control link. The control link is
responsible for transmitting commands and controls from
Table 1. A depiction of UAV-based
prospective integrated solutions for
challenges in maritime applications.
Use Cases
Challenges
Perspective UAVBased Integrated
Solutions
Relaying
Mobility, beam forming, and handovers
Sonar, UAVs, and
machine learning
IoT data
harvesting
Interference and
path planning
Sonar, UAVs, and
machine learning
Wireless power
transfer
3D handovers
Sonar, UAVs, and
machine learning
Computation
offloading
Complexity
UAVs and machine
learning
Localization
Channel variations,
and 3D Doppler
effect
Sonar, UAVs, satellite, and machine
learning
Delivering goods
Path/trajectory
planning
Sonar, UAVs,
satellite, and
machine learning
Cost and
complexity
Sonar, UAVs,
satellite, and
machine learning
Security, safety,
and fault
identification
IoT: Internet of Things.
Satellite
Satellite-
Link
UAV
k
k
im
ar
y
Co
nt
ro
lL
in
Sec
k
ond
ary
Con
trol
Link
Lin
-Ship
UAV-to
Ship
UAV-to-Ship
Lin
Ship
Lin
k
Pr
UAV
-to-
Link
-Ship
lliteto
Sate
Satellite-to-Ship Link
k
ip Lin
o-Sh
llite-t
Sate
to-UAV
Ship
Ship
Underwater
Vessel
Ship
Maritime Control
Station
Ship
Ship
Underwater
Vessel
Figure 1. A depiction of the basic network
architecture for UAV-aided maritime communication.
4
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
a BS to the users in the uplink. The control links from a
maritime BS to the satellite may be utilized for the orbit
selection, speed control of the satellite, and coverage control. Similarly, for the UAV and maritime vessels, control
links are used for speed control, path selection, and transmission direction control.
Data Links
Information is exchanged in maritime networks using data
links where the communication technologies are responsible for data delivery between system elements and external units. The fundamental challenges of the maritime
network are the security of C&C from a BS to the users,
cognitive control of the bandwidth, frequency, and data
flow. The following are the different types of data links
that exist in maritime networks.
UAV–Ship and Satellite–Ship Data Links
These links deliver information from the UAV/satellite to a
sea-based reception device. These links are responsible for
the data communication between UAVs and ships and satellites and ships.
UAV–Satellite, UAV–UAV, and Satellite–Satellite Data Links
UAVs can cooperate with other space/airborne platforms,
such as satellites and other UAVs. These types of data links
demand that air-to-air communication be established
between the platforms. Establishing these links is more
challenging due to the relative movement of both transmitters and receivers [5].
Channel Characteristics
It is important to comprehend and model the wireless
channels to establish the efficient maritime communication network mentioned. As far as maritime communication is concerned, three major channel types are to
be investigated. The first is an air-to-sea channel used
to communicate between UAVs and ships. The second is
a near-sea-surface channel that is used for ship-to-ship
communication. Finally, an underwater communication
channel is used to communicate between underwater
vessels. Un­­derwater communication channels can further be divided into near-sea-surface (i.e., up to 600 m
below the sea surface) and deep-sea underwater (i.e.,
more than 600 m below the sea surface) wireless channels due to differences in their c­ haracteristics, such as
the temperature, salinity, and atmospheric pressure at
different sea levels.
Maritime wireless channels differ from conventional
terrestrial channels in many aspects, such as the ducting effect and heavy scattering over the sea surface,
unpredictable sea wave proportions, water density, and
temperature variations in the sea. All of these aspects
result in significant complexity in the receiver design.
Although the satellite-to-ship channels have been
explored extensively in the past [6], the wireless channels
expected to face more sparse scattering, which may lead
in the terrestrial and nonterrestrial integrated netto simplification in the air-to-sea channel modeling.
works (TaNTIN) [7] are less explored for the near-coast
As discussed earlier, a standard two-ray or three-ray
situation. Therefore, researchers have recently invesmodel can be used in an air-to-sea channel. However, due
tigated maritime wireless channels and developed
to long-distance transmission in the maritime environseveral models.
ment, two main elements, i.e., the
The two most essential and disducting effect and Earth curvatinguishing properties of maritime
ture, must be considered. Also,
wireless channels are sparsity and
The control station
the location of the transmitter
location dependence. Sparsity is
equipment can be as
(UAV or satellite) is usually above
extensively observed in the marithe ducting layer; therefore, a part
time environment, especially for
simple as a laptop
of the radio energy could be
the unpredictable scattering and
with an antenna
absorbed in the ducting layer,
distribution of maritime receivers.
especially when the gazing angle
connected to it or
In contrast, the location depen(the angle between the sea surface
dency feature implies that there
as complex as a
and the direct path) is less than
should be a completely different
rat’s nest, with wires,
a threshold. In this case, the raychannel model for different locatrapping action of the ducting
tions of the maritime receiver. Figantennas, computers,
layer can also increase the power
ure 2 depicts the challenging
electronics boxes,
of the received signal, resulting in
maritime environment and chanreduced path loss [10].
nel variations observed at sea level
joysticks, and
due to the traveling sea waves,
monitors.
mov ing UAVs, a nd ships. Sea
Near-Sea-Surface Channel
waves traveling in random direcAs mentioned earlier, near-sea-surtions and with dynamic wave
face (such as ship-to-ship, ship-toamplitudes cause high fluctuations in the receiver’s sigland, and land-to-ship) channels are distance dependent.
nal-to-interference-plus-noise ratio (SINR) level. The
Different channel models can be used for different locamobility of sips, sea waves, and UAVs in random directions of transmitters and receivers. The standard two-ray
tions makes channel estimation challenging for the
model can be used for a modest distance between the
receiver. Similarly, the variable speed of sea waves, UAVs,
transmitter and receiver. However, the LoS and the reflectand ships leads to an unpredictable Doppler effect. Coned ray components vanish due to Earth curvature with
sequently, these traits develop new difficulties and
increased distance between the transmitter and receiver.
dimensions in the design of UAVs in a maritime communiHowever, the receiver can still receive the signal transmitcation system.
ted due to the ducting effect, provided there is proper beam
In the following, we discuss different models for the airalignment between the transmitter and receiver. Concluto-sea, near-sea-surface, and underwater wireless channels.
sively, as the distance between the transmitter and receiver
UAV
UAV
-to-
UAV
L
Lin
k
UAV
Underwater
Vessel
Ship
UAV
-to-
UAV-to-Ship Link
k
Underwater
Vessel
Lin
Ship
ip
Sea
Wave
Ship
h
-S
k
Lin
UA
V-t
o-S
h
ip L
ink
ink
o
V-t
UA
ip
Sh
Sea Wave
oV-t
UA
Air-to-Sea Channel
Air-to-ground channels are widely studied in the literature
[2]. However, the air-to-sea channel differs from the air-toground channel in many aspects due to differences, such
as ducting, the sparsity effect, and instability in the maritime environment, which lead to the remarkable differences in channel modeling. Usually, in many cases, the
two-ray model is applied. The first component of the tworay model is the line-of-sight (LoS) component, and the
second is the surface-reflected ray component. When the
transmission distance is very large, and the transmitter is
located at some notable height, the curve-Earth two-ray
model is used to account for the Earth curvature [8].
In some cases, the rays received from other weak scattered paths can also be considered, apart from two
strong paths. However, a dispersion around the maritime
receiver is observed when the transmitter is located at a
very high altitude [9]. Compared to the terrestrial (i.e.,
near the urban area) environment, a maritime receiver is
Ship
Underwater
Vessel
Ship
Figure 2. The UAV as a use case of reliable
maritime communication in dynamically changing
environmental conditions.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
5
transfer, data offloading, and localization (Table 1). In the
increases, the two- or three-ray channel model is replaced
following, we discuss each of the use cases in detail.
by duct only. The ducting effect across the sea surface
allows beyond LoS (BLoS) transmission in marine communications, which has gained much popularity in secure and
UAV-Based Relaying
long-distance maritime communication.
UAV-based communications have growing importance for
Figure 3 shows the path loss [10], [11] against the dismany applications, particularly with the arrival of high-altitance between the transmitter and receiver for different
tude, long-endurance platforms. These UAVs can enable
maritime channels with acoustic
BLoS communications in support
waves at 500-kHz frequency. The
of a range of maritime activities.
path loss varies with the level of
The UAV-based airborne relay will
Wireless charging has
water density in the wireless chanenable range extension for marinel. For instance, the path loss in
time communication ser vices.
been acknowledged
deep seawater is higher than that
Also, with the flexible mobility and
as a viable
in free-space, near-sea-surface,
high possibilities of LoS air-to-sea
and sea-surface channels. The realinks, UAV-enabled relays can distechnology to provide
sons for this are the factors of templay increasingly important advanan energy supply for
perature, shadowing, and density
tages for maritime networks, as
of the water. We also show the
shown in Figure 1.
battery-limited nodes,
trend of path loss for radio-fresuch as underwater
quency (RF) waves in Figure 4,
UAV-Aided Maritime Internet
Internet of Things
where the RFs face the highest
of Things Data Harvesting
path loss in deep-seawater chanUnderwater sensor networks have
devices and sensors.
nels compared to other maritime
attained a lot of re­­search attention
wireless channels. For the freein recent years. However, it is evispace channel, we do not consider
dent that major obstacles remain
shadowing caused by the sea waves; rather, we consider
to be solved. Several telemetry activities for maritime
the LoS communication link between the UAV and ship at
monitoring, research, and exploration can be performed
sea level. By comparing Figures 3 and 4, we can determine
based on collecting data from marine buoys rapidly and in
that acoustic waves are more suitable for maritime comreal time. Satellites, ships, and airplanes can all collect
munication in the underwater environment. At the same
marine data, but satellite transmission is often expensive
time, RF is better suited for near-surface and free-space
and bandwidth limited, while manned ships/aircraft have
links above the seawater environment.
high manpower/mission costs and risks. Therefore, using
UAVs that can resist strong winds over the sea surface as
Use Cases of UAV-Aided Maritime
agile data collectors appears to be an exciting solution.
Communication
This section covers various use cases of UAV-aided maritime communication, such as relaying, wireless power
250
200
Path Loss (dB)
160
Path Loss (dB)
140
120
100
Free Space (LoS)
Sea Surface
Near Sea Surface
Deep Sea Water
60
50
0
0
200
400
600
Distance (m)
800
–100
1,000
Figure 3. A depiction of the path loss for free-space
(LoS), sea-surface, near-sea-surface, and deepseawater channels at 500-kHz acoustic waves.
6
100
–50
80
40
150
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
0
200
400
600
Distance (m)
Free Space (LoS)
Sea Surface
800
1,000
Near Sea Surface
Deep Sea Water
Figure 4. A depiction of the path loss for free-space
(LoS), sea-surface, near-sea-surface, and deepseawater channels at 500-kHz RF waves.
UAVs can fly near the buoys and use a stable communication channel to wirelessly and quickly capture a significant amount of data because of their high mobility.
determining the location of unknown marine targets by
UAVs are challenging.
Research Challenges and Directions
Although there has been great interest in UAV-aided marMaritime Wireless Power Transfer
itime communication over the past few years, various
Wireless charging has been acknowledged as a viable techopen research issues should be targeted. In the follownology to provide an energy supply for battery-limited
ing, we explore some promising
nodes, such as underwater Interupcoming research challenges for
net of Things (IoT) devices and
UAV-aided maritime communicasensors. UAV-based wireless chargIn maritime
tion networks.
ing can bring more flexibility in
terms of mobility and accessing
communication
hard-to-reach areas [12]. Due to the
UAV 3D Maritime
networks, beamLoS linkages between the UAV and
Trajectory Design
sensors, the UAV-enabled wireless
Exploiting the high mobility of UAVs
forming and power
power transfer system may subis projected to unlock the full potencontrol issues are
stantially improve energy transfer
tial of UAV-to-sea communications.
more challenging
efficiency by deploying the UAV as
Various trajectory optimization
a mobile energy transmitter.
models exist in the literature that
due to the
optimize air-to-sea communications
frequent
switching
under different UAV configurations.
Maritime Computation
The problems of trajectory optimiOffloading
of frequencyzation are often nonconvex, and
Because of great sensitivity to time
access points
variants of the successive convex
and energy consumption, many
and collaborative
approximation (SCA) technique are
computation- and data-intensive
used to solve them suboptimally.
jobs are challenging to accomplish
operation.
Nevertheless, these SCA-based
on maritime energy-constrained
approaches depend heavily on tradevices. UAV-based mobile edge
jectory initialization and do not
computing (MEC) is a promising
explicitly account for the wind effect. Furthermore, for
solution to overcome this challenge, providing ubiquitous
fixed-wing UAVs that must sustain forward motion to stay in
Internet services for emerging maritime applications, such
the air, the computational complexity and resulting trajectoas marine environmental monitoring, ocean resource
ry complexity make it costly to collect a high volume of
exploration, disaster prevention, and navigation. As a
data. Therefore, designing an energy-efficient 3D maritime
result, UAV-based MEC has emerged as a new paradigm
UAV trajectory is very important.
that receives great attention in both academic and industrial sectors. Increasing demand for large-scale connection
and communications, ultralow information-processing
UAV-to-Sea and UAV-to-UAV Interference
latency, and high dependability in delay-sensitive marine
Management
applications pose problems for delivering reliable quality
For maritime applications, UAVs largely send data in the
of service in a resource-constrained maritime network.
downlink. Nevertheless, the capacity of maritime-connected
UAVs to establish LoS communication with several sea vessels might lead to severe mutual interference between them
Maritime Localization
and the ships. To overcome this difficulty, additional advancLocalization plays a significant role in communication in
es in the architecture of future UAV-based maritime netthe TaNTIN environment [1]. Maritime localization uses a
works, such as enhanced receivers, 3D frequency reuse, and
ship’s measuring devices to determine the location of
3D beam forming, are needed. For instance, because of their
other nautical targets. Ocean surveillance satellites can
capabilities of detecting and categorizing images, deep
take advantage of space and altitude to cover large
learning models can be implemented on each UAV to recocean areas, monitor submarine operations in real time,
ognize numerous environmental elements, such as the
and detect radar signals sent by ships. Nevertheless, the
location of UAVs and ships. Such a method will enable
position precision based on satellites may not be satiseach UAV to change its beamwidth tilt angle to minimize
factory, especially in unforeseen situations that require
the ships’ interference.
high accuracy, such as ocean rescue and noncooperative (enemy) ship location. In this case, UAVs can be
used to improve the localization accuracy of the targets
3D Mobility Management (3D Handoffs)
where the UAVs can be controlled remotely [3]. NeverUAVs can be deployed as aerial BSs or aerial users in UAVtheless, the self-positioning of UAV platforms and
assisted maritime networks. In the case of their
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
7
deployment as the aerial BSs, UAVs can be deployed far
away from maritime users, such as a ship. This might
degrade the signal strength at the receiver and cause poor
mobility performance, such as radio connection loss and
handover failure. In addition, loss of the C&C signal may
result in dangerous events, such as the collision of UAVs
with commercial ­aircraft, or may even cause UAVs to fall
into the sea.
For this case, UAVs are deployed as aerial users in maritime communication networks. However, they can still
face many mobility management issues, especially when
there is no LoS link between the maritime BS and the aerial users [13]. Although the sidelobes of BS antenna can still
serve aerial users, there may be a loss of connection and
handover failure due to lower antenna gains in the sidelobes [1]. Hence, excellent mobility management is of
essential relevance for enabling reliable connections
between UAVs and ships sailing on the sea.
device-to-device communication; artificial intelligence;
machine learning and blockchain technologies; and maritime communication. He is a Member of IEEE.
Nasir Saeed (mr.nasir.saeed@ieee.org) earned his
Ph.D. degree in electronics and communication engineering from Hanyang University, Seoul, South Korea, in
2015. He is currently an associate professor with the
Department of Electrical and Communication Engineering at United Arab Emirates University, Al Ain 15551,
United Arab Emirates. His research interests include
nonconventional communication networks, heterogenous vertical networks, multidimensional signal processing, and localization.
References
[1] J.-B. Wang et al., “Unmanned surface vessel assisted maritime wireless communication toward 6G: Opportunities and challenges,” IEEE Wireless Commun., early
access, 2022, doi: 10.1109/MWC.008.2100554.
[2] Y. Song et al., “Internet of maritime things platform for remote marine water qual-
Beam Forming for High-Mobility Ships and UAVs
In maritime communication networks, beam-forming
and power control issues are more challenging due to
the frequent switching of frequency-access points and
collaborative operation. Conjunct power control and
beam forming provide reliable coverage for UAV-assisted maritime networks, but a fixed beam-forming vector
may lead to SINR variations due to variations in angle of
departure (AoD) and angle of arrival (AoA). Empirical
measurements with Doppler effects can be of substantial value for constructing more accurate statistical airto-sea channel models, and modern technologies can
improve beam forming and mobility management for
ships and UAVs.
ity monitoring,” IEEE Internet Things J., vol. 9, no. 16, pp. 14,355–14,365, Aug. 2022,
doi: 10.1109/JIOT.2021.3079931.
[3] F. S. Alqurashi et al., “Maritime communications: A survey on enabling technologies, opportunities, and challenges,” IEEE Internet Things J., early access, 2022,
doi: 10.1109/JIOT.2022.3219674.
[4] M. W. Akhtar et al., “The shift to 6G communications: Vision and requirements,”
Human Centric Comput. Inf. Sci., vol. 10, no. 1, pp. 1–27, Dec. 2020, doi: 10.1186/s13673
-020-00258-2.
[5] N. Saeed et al., “Point-to-point communication in integrated satellite-aerial 6G
networks: State-of-the-art and future challenges,” IEEE Open J. Commun. Soc., vol. 2,
pp. 1505–1525, Jun. 2021, doi: 10.1109/OJCOMS.2021.3093110.
[6] C. Azzarello, C. Gerbino, and R. Mehta, “Enhanced sensing methods for UAVbased disaster recovery,” Comput. Sci. Eng. Senior Theses, Santa Clara Univ.,
Dept. Comput. Sci. Eng., Santa Clara, CA, USA, 2021. [Online]. Available: https://
scholarcommons.scu.edu/cseng_senior/194.
Conclusion
This article presents the possible architecture, important applications, challenges, and solutions for using
UAVs in maritime networks. This article identifies various types of wireless maritime channel characteristics.
Furthermore, several use cases of UAV-assisted maritime communications, such as monitoring and surveillance, relaying, IoT harvesting, computation offloading,
localization, and the delivery of goods, are discussed.
This article further tries to spur the interest of researchers in the future evolution of UAV-enabled maritime communication networks that will enable digital use cases
for the future marine economy.
[7] M. W. Akhtar and S. A. Hassan, “TaNTIN: Terrestrial and non-terrestrial integrated
networks-a collaborative technologies perspective for beyond 5G and 6G,” Internet
Technol. Lett., early access, 2021, doi: 10.1002/itl2.274.
[8] A. Verma et al., “VaCoChain: Blockchain-based 5G-assisted UAV vaccine distribution scheme for future pandemics,” IEEE J. Biomed. Health Inform., vol. 26, no. 5,
pp. 1997–2007, May 2022, doi: 10.1109/JBHI.2021.3103404.
[9] S. Bauk, “Performances of some autonomous assets in maritime missions,” TransNav, Int. J. Marine Navig. Safety Sea Transp., vol. 14, no. 4, pp. 875–881, Feb. 2021,
doi: 10.12716/1001.14.04.12.
[10] J. Wang et al., “Wireless channel models for maritime communications,” IEEE
Access, vol. 6, pp. 68,070–68,088, Nov. 2018, doi: 10.1109/ACCESS.2018.2879902.
[11] J. Wang and S. Wang, “Seawater short-range electromagnetic wave communication method based on OFDM subcarrier allocation,” J. Comput. Commun., vol. 7,
no. 10, pp. 63–71, Jan. 2019, doi: 10.4236/jcc.2019.710006.
About the Authors
Muhammad Waseem Akhtar (muhammadwaseem.
akhtar@miun.se) is a postdoctoral research fellow with the
Information Systems and Technology department of Mid
Sweden University, Sundsvall 851 70, Sweden. His research
interests include the Internet of Things; cooperative communication; energy- and bandwidth-efficient network
designing; massive multiple-input, multiple-output and
8
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
[12] E. Lvsouras and A. Gasteratos, “A new method to combine detection and tracking
algorithms for fast and accurate human localization in UAV-based SAR operations,” in
Proc. IEEE Int. Conf. Unmanned Aircraft Syst. (ICUAS), 2020, pp. 1688–1696, doi:
10.1109/ICUAS48674.2020.9213873.
[13] Z. Haider et al., “A novel cooperative relaying-based vertical handover technique
for unmanned aerial vehicles,” Secure Commun. Netw., vol. 2022, Sep. 2022, Art. no.
5702529, doi: 10.1155/2022/5702529.
An ASD
Classification
Based on a Pseudo
4D ResNet
Utilizing Spatial and Temporal Convolution
©SHUTTERSTOCK.COM/SAID FX
by Shuaiqi Liu , Siqi Wang , Hong Zhang,
Shui-Hua Wang , Jie Zhao, and Jingwen Yan
T
he psychiatric condition known as autism
spectrum disorder (ASD) affects children and
adults alike. As a medical imaging technology,
functional magnetic resonance imaging
(fMRI) is widely used to study the brains of
persons with ASD. This study introduces a novel technique: a pseudo 4D ResNet (P4D ResNet) to simultaneously
Digital Object Identifier 10.1109/MSMC.2022.3228381
Date of current version: 17 July 2023
2333-942X/23©2023IEEE
extract and classify the brain activity of ASD patients. A
P4D ResNet can extract both temporal and spatial information from fMRI data, which mainly consists of two different residual blocks stacked together. In a P4D ResNet,
to reduce computational and parametric quantities, each
residual block is combined with a 3D spatial filter and a 1D
temporal filter instead of a 4D spatiotemporal convolution,
which can perform parallel computation. Due to the high
dimensionality of the complete data and the limited
amount of data, in this article, each piece of fMRI data are
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
9
classifier. Iidaka [10] input the correlation matrix calculated from rs-fMRI time-series data into a probabilistic neural network (PNN) for ASD classification. The PNN
classifier consists of four fully interconnected layers: an
input, a pattern, summation, and an output. The proposed
algorithm obtained approximately 90% accuracy in 312
Introduction
subjects with ASD, and 328 subjects with typical developASD, usually known as autism, is a common neurodevelment. Bi et al. [11] proposed a random NN cluster consistopmental cognitive condition in children that is primarily
ing of multiple NNs to classify 50 ASD patients and
inherited. Neurodegenerative conditions, including autism
42 typical controls (TCs) to solve the problem of the low
spectrum diseases, have recently drawn more attention.
accuracy of a single NN to classify
Patients usually have very slow
ASD patients and TCs. They also suglanguage development and are
gested five random NN clusters,
unable to communicate properly.
Resting-state fMRI
namely, a random backpropagation
They are not interested in the
NN cluster, random probabilistic NN
activities around them and rarely
requires subjects to
cluster, random learning vector
initiate social interactions. Morebe fully relaxed to
quantization NN cluster, a random
over, they often exhibit repetitive,
competitive NN cluster, and random
stereotyped behaviors and are
acquire images, and
Elman NN cluster were constructextremely resistant to change and
the images acquired
ed. Among them, the accuracy of
transformation. The families of
have high spatial and
random Elman NN clusters was
ASD patients suffer significant psygreatly improved.
chological and financial stress for
temporal resolution.
Mostafa et al. [12] proposed a
a protracted period of time due to
brain network-based algorithm for
the lack of a specific prescription
ASD classification. This algorithm
for ASD and the difficulties in findused 264 regions-based wrapping schemes from the fMRI
ing a permanent cure. This causes losses and injury to
of the brain to construct a brain network. Then, 264 origiindividuals, families, and society at large. Traditional ASD
nal brain features were defined by the 264 feature values
diagnostic techniques are time consuming and prone to
of the Laplacian matrix of the brain network, and three
error because they are dependent on the Diagnostic and
additional features of the brain network were defined by
Statistical Manual of Mental Disorders. As a result, the
the network centrality. Finally, this algorithm obtained 64
creation of a fully automated diagnostic method for ASD
discriminative features through a feature-selection algois required.
Numerous functional neuroimaging techniques have
rithm and obtained an accuracy of 77.7% in ASD classificabeen utilized in brain study since the advancement of medtion. Liu et al. [13] proposed an ASD classification
ical imaging. One of the most widely used is fMRI [1], [2],
algorithm based on dynamic functional connectivity and
[3], [4]. High temporal and spatial resolution obtained by
multitask feature selection, which was validated by the
fMRI makes it possible to see both physiological and pathfMRI data from ABIDE I with a classification accuracy of
ological functional brain activity [5], [6]. Blood-oxygen76.8%. Zhao et al. [14] used the method of extracting cenation-level dependent, which in brain research can be
tral moments of data to extract time-invariant features in
separated into two modalities, namely, task and resting
low- or high-order dynamic functional connectivity netstates, serves as the foundation for the fundamental theory
works of fMRI data. By integrating the features extracted
of fMRI. Resting-state fMRI (rs-fMRI) requires subjects
from conventional functionally connected, low-order
to be fully relaxed to acquire images, and the images
dynamically connected, and high-order dynamically conacquired have high spatial and temporal resolution.
nected networks, an accuracy of up to 83% was obtained in
Because the acquisition method is quick and easy, it is
45 ASD patients and 47 TCs by using a linear, kernel-based
widely applied in the classification of ASDs [7], [8], [9]. The
support vector machine (SVM) classifier.
rs-fMRI data used in this study were mainly dichotomized
Deep learning algorithms have been well applied in varfor ASD and TCs.
ious fields [15], [16], [17]. Deep learning-based ASD classifiIn terms of model composition, the research on ASD
cation algorithms have also recently gained popularity due
classification can generally be categorized into two types:
to the quick advancement of computers. One of the most
traditional machine learning and deep learning. Traditionwidely used is convolutional NNs (CNNs). For example,
al machine learning methods provide effective models for
Xiao et al. [18] decomposed the dataset of each subject
ASD classification and recognition problems. Scholars
into 30 independent components. Then, an array of 84 key
from various countries have proposed different traditional
features of all the subjects was reshaped into a 3,400 ×
machine learning-based methods for ASD classification,
84-dimensional key-feature matrix and was input into a
and the main steps include manual feature extraction and
stacked autoencoder for classification. This study
sampled at equal intervals of a set length in the time dimension for data expansion. Compared with other existing
models, the experiments show that the proposed model for
ASD classification achieved better results.
10
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
Classification
FC
Dropout
Flatten
MRB
4D Max Pooling
MRB
4D Max Pooling
MRB
4D Max Pooling
P4DC
fMRI Data
4D Max Pooling
small-sample data. In the aforementioned studies, classifiobtained an average classification accuracy of 87.21% in
cation accuracy of the small-sample studies can reach
84 subjects. Jia et al. [19] extracted the functional connecnearly 90%, while that of large-sample studies reaches only
tivity correlation matrix of the brain from rs-fMRI data
about 70% accuracy. However, the significance of ASD
after preprocessing and then used a stacked autoencoder
classification studies is precisely why there is a desire to
for ASD classification. ASD identification was obtained
invest in realistic medical judgment. If the variability of
with an accuracy of 95.27% in 656 subjects. In 2019,
sites in the database is not taken into account and only a
Rathore et al. [20] obtained a classification accuracy of
small sample is used for the study, the results are not
69.2% in 1,035 subjects with a simple three-layer NN by
extensive. To classify ASDs, a P4D-ResNet-based ASD
using a functional correlation and its topological features.
classification method is created and employed in this
In the same year, Zhuang et al. [21] proposed an invertible
research. This model puts spatial
network for ASD classification and
convolution and temporal convolubiomarker selection. This inverttion together into a residual block,
ible network has two invertible
The quantity of data
thus realizing the simultaneous
blocks that map the data from the
extraction of spatiotemporal feainput domain to the feature domain.
voxels is substantial
tures of fMRI data, which can also
Then, a fully connected (FC) layer
because fMRI images
perform parallel computations.
was applied for classification, and a
are an arrangement
The results of the experiments
classification accuracy of 71% was
show how effective the proposed
achieved in 530 ASD patients and
of a series of 3D
method works.
505 subjects. In 2020, Tang et al.
images acquired in a
[22] proposed an end-to-end multiThe Proposed Algorithm
modal architecture based on deep
time series.
In this article, we propose a P4DNNs that can analyze the region-ofResNet model based on different
interest time-series activation
residual architectures. This model
maps by combining different deep
can extract both spatial and temporal features of fMRI data
learning networks. This method can analyze functional
and fully exploit the spatiotemporal information, which
images more comprehensively and achieve 74% classificaachieves satisfactory classification results. Construction of
tion accuracy among 1,035 subjects. In 2021, Shao et al. [23]
the P4D-ResNet network model is described in this section.
proposed an ASD classification algorithm by combining
The P4D-ResNet model consists of a 4D maximum pooling
deep feature selection and graph convolutional networks
layer, a P4D-convolution (P4DC) block, a mixed residual
(GCNs), which achieved better ASD classification results.
block (MRB), a Flatten layer, a dropout layer, and an FC
In the same year, Yin et al. [24] constructed brain networks
layer. The network structure of the P4D-ResNet model is
from brain fMRI images and then combined self-encoders
shown in Figure 1.
and deep NNs for ASD classification, which achieved good
The ResNet model first performs dimensionality reducclassification results.
tion by using a 4D maximum pooling layer, followed by
The quantity of data voxels is substantial because fMRI
P4D convolution, i.e., a spatial and temporal convolution to
images are an arrangement of a series of 3D images
obtain the spatial and temporal features of the fMRI data.
acquired in a time series. The huge amount of spatiotemThe P4D-ResNet model feeds the extracted features into
poral information within the fMRI 4D image data is
three connected 4D maximum pooling layers and the MRB
ignored in most current methods, which inevitably leads to
module to downscale and further extract spatiotemporal
the loss of important information. Traditional models are
features from the data. Finally, through the Flatten layer
unable to extract more effective features, and the classifiand the FC layers, the classification results are obtained
cation accuracy is relatively low. In addition to this, the
by the Sigmoid function. The proposed model in this artisample size has a significant impact on the classification
cle can be expressed as
results. There tends to be greater accuracy when using
Figure 1. The network structure of the P4D-ResNet model. max: maximum.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
11
c
||w
p=0 q=0 r=0 s=0
pqrs
ijc
k ((xi -+1)pc)(y + q)(z + r)(t + s) n (2)
0
0
0
0
where 2 is the activation function. Pi, Q i, R i, and S i
denote the size of the dimension in each of the four direcpqrs
tions. w ijc
is the weight value at position (p, q, r, s), which
connects the c th feature map of the i - 1 th layer with the
j th feature map of the i th layer.
With the expansion of convolutional layers from three to
four dimensions, the skyrocketing number of parameters
and computational effort may lead to an overfitting phenomena. To solve this problem, we decompose the 4D spatiotemporal convolution into the combination of a 3D spatial and
1D temporal convolution, that is, the original 3 # 3 # 3 # 3
convolution is split into a combination of a 3 # 3 # 3 # 1
spatial convolution and a 1 # 1 # 1 # 3 temporal convolution, which is the principle of the P4DC module.
Conv
Conv-s
Conv
Conv-t
Conv-t
Conv-s
The MRB
In this article, as shown in Figure 3, a 4D MRB is built to
conduct the simultaneous extraction of spatiotemporal
information. The residual block is composed of a P4D-serial residual block (P4D-SRB) and a P4D-parallel residual
block (P4D-PRB).
The P4D-SRB and P4D-PRB constructed in this article
are obtained by modifying the conventional 3D bottleneck
residual block. The conventional residual structure is
shown in Figure 4(a), and the residual blocks constructed
in this article are depicted in Figure 4(b) and (c).
P4D-PRB
P4D-SRB
MRB
Figure 2. The principle of 4D convolution.
Pi - 1 Q i - 1 R i - 1 S i - 1
0 0 0 0
Conv
Temporal
k ijx y z t = 2 d b ij + | | |
Conv
4D Convolution
4D CNNs are well suited for spatiotemporal feature learning of medical images. It is possible to better extract the
data’s temporal and spatial information by performing 4D
convolutional procedures over space and time. To gain
more detailed temporal information, the spatial feature
maps in the convolutional layer are connected to numerous nearby time points in the previous layer. The principle
of 4D convolution is presented in Figure 2. The same color
in the convolutional connection indicates weight sharing.
As displayed in Figure 2, the 4D convolution operation
applies the same 4D kernel to a continuous 3D image,
0 0 0 0
P4D-SRB
where x denotes the input 4D fMRI data, and y denotes
the output of the last MRB function. MRB denotes MRB
function, P4DC denotes the P4D-convolutional block, and
MP denotes the 4D maximum pooling function. The substructures in the model are described separately in the
next section.
extracting features over the entire time series by shifting
the step size. Assuming that k ijx y z t is the value at the
(x 0, y 0, z 0, t 0) position of the j th feature map of the i th
layer, that is,
P4D
4D-PR
PRB
B
y = MRB (MP (MRB (MP (MRB (MP (P4DC (MP (x)))))))))
(1)
Figure 3. The mixed residual block structure. P4D-SRB: pseudo-4D serial residual block; P4D-PRB: pseudo-4D
parallel residual block.
12
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
The kernel size of both the first and fourth convolutional layers in the P4D-SRB is set to 1 # 1 # 1 # 1, which
can match the number of channels. The number of output
channels in the P4D-SRB is four times the number of input
channels. The P4D-SRB uses spatial convolution, followed
by a temporal convolution mode for the spatiotemporal
feature extraction of data. And the P4D-PRB extracts the
spatiotemporal features of data by using spatial convolution and temporal convolution in parallel. In the P4D-SRB,
output of the spatial convolution is directly used as the
input of the temporal convolution, which indicates that
the extraction of spatial information has a direct impact
on the temporal features. In contrast, in the P4D-PRB,
spatial and temporal convolution are extracted separately
and then directly accumulated as feature outputs. The
extraction of spatial information in the same residual
block does not have a direct effect on temporal feature
extraction. It is helpful to generate MRBs by cascading
this too, which improves ASD classification results by
capturing the spatiotemporal features of fMRI data well.
b, w, h, d, and t by using the reshaping function.
◆◆ A 3D maximum pooling operation is performed on the
reshaped input and output data with a dimension size
of b, w/2, h/2, d/2, and t. “/” denotes a division operation with upward rounding.
◆◆ The current data are reshaped into a dimension size of
b, w/2, h/2, d/2, t/2, and 2 by using the reshaping
function again.
◆◆ Take the maximum value of the current data in the
channel dimension and output the data with dimension
sizes of b, w/2, h/2, d/2, t/2, and 1.
When the number of channels is eight, the data are first
sliced into eight tensors with channel number eight, and
the 4D maximum pooling operation with channel number
one is invoked separately. And when the number of channels is 16, the data are sliced into two tensors of channel
number eight. Similarly, when the number of channels is 32
or 64, it is processed the same way. So, the 4D maximum
pooling layer can be computed by parallel computation.
4D Maximum Pooling Layer
This study extends the 3D maximum pooling layer to the
4D maximum pooling layer. The number of channels used
in this article for the 4D maximum pooling layer are 1, 16,
32, and 64, respectively. The 4D maximum pooling with a
channel number of 1 proceeds as follows:
◆◆ Let the size of each dimension of the input data of the
pooling layer be b, w, h, d, t, and l. b denotes the
batch size of the data input. w, h, and d represent
the width, height, and depth, respectively, of the input
fMRI data. t represents the time dimension of the
input data, and l denotes the number of channels.
Data Enhancement And Model Training
The dataset from the global, openly accessible Autism
Brain Imaging Data Sharing Project [25] is used to generate the rs-fMRI results in this study. The samples with
poor brain coverage, excessive motion peaks, ghosting,
and other scanner aberrations are eliminated to leave a
final dataset of 871 participants, including 403 ASD
patients and 468 TCs.
In a 4D NN, more data samples are needed for training.
Therefore, the data in this article are enhanced by obtaining
multiple sampling from the original dataset in the temporal
dimension. Specifically, 871 subjects are disordered before
1×1×1
◆◆ The input data are reshaped into a dimensional size of
1×1×1×1
1×1×1×1
ReLU
ReLU
3×3×3
ReLU
ReLU
ReLU
3×3×3×1
ReLU
3×3×3×1
1×1×1×3
1×1×1×3
ReLU
+
ReLU
1×1×1
+
ReLU
(a)
ReLU
1×1×1×1
1×1×1×1
++
+
ReLU
ReLU
(b)
(c)
Figure 4. 3D residual block and P4D residual block structures. (a) An ResNet. (b) A P4D-SRB. (c) A P4D-PRB.
ReLu: rectified linear unit.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
13
the experiment, and each subject’s 4D fMRI data are sampled in the temporal dimension in turn. Sixteen time slices
are drawn at an interval of one per frame, and each subject
enhances the data by the maximum expansion. The data of
each 69 subjects and the corresponding labels are encapsulated into one generated TFRecord file. TFRecord format
file storage form can reasonably store the data. TFRecord
internal use of the “Protocol Buffer” binary data encoding
scheme occupies only a block of memory and only needs to
load one binary file at a time. It is simple and fast, especially for large training data. When the training data are large,
they can be divided into multiple TFRecord files to improve
processing efficiency. Fifteen TFRecord files are generated
for training and testing. Among them, 12 TFRecord files are
used for training and three TFRecord files are used for testing. The data augmentation scheme used in this article is
divided into a training set and a testing set on the unit of
“person.” Then the data of each subject are expanded separately. Each person’s extended data are either in the training or the testing set, which aids in preventing similar data
from impairing the model’s classification effect and
improves the generalization performance. The amount of
data used in the actual experiment after data augmentation
is listed in Table 1.
The experiments in this article are implemented on the
Tensorflow 1.0 platform with an Ubuntu 18.4 operating
system, 32 G of random-access memory, Intel(R) Xeon(R)
central processing unit E5-2667 processor, and a Nvidia
Tesla K40c GPU card. The experiments start with data
enhancement of the preprocessed fMRI data with dimensional sizes of 61, 73, 61, and 16 for all the data. Second, to
reduce the risk of model overfitting, a 4D maximum pooling layer with a step size of two and a kernel of
2 # 2 # 2 # 2 is used for dimensionality reduction. The
low-level spatiotemporal features are then extracted by a
layer of spatial convolution with a kernel size of
Table 1. The dataset after data
enhancement.
The datasets
ASD
TC
Total
The original dataset
403
468
871
The expanded dataset
2,901
3,051
5,952
Table 2. The performance of different
kinds of residual block combinations.
14
Residual
structures
Accuracy
(%)
Specificity
(%)
Sensitivity
(%)
MSRB-2
66.8
62.68
70.18
MPRB-2
68.54
60.62
75.08
MMRB
74.67
71.9
76.91
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
3 # 3 # 3 # 1 and a layer of temporal convolution with a
kernel size of 1 # 1 # 1 # 3. Then, high-level spatiotemporal features of the data are extracted by the maximum
pooling and MRB modules. In this article, three MRB modules are used. The first MRB module has eight channels.
The second MRB module has 16 channels. The third MRB
module has 32 channels. The output features of the last
MRB module are flattened by using the Flatten layer.
Finally, the flattened feature vector is fed into the FC
layer after the dropout operation and classified by using
the Sigmoid classifier. In this article, model optimization
is accomplished by using the Adam-optimization algorithm. Cross-entropy is the loss function. The experimental parameters include a four-batch data input size. The
rate of learning is 0.00001. Dropout is set to 0.5, and the
dense layer’s two-parameter regularization parameter is
set to 0.0005.
Experimental Results and Analysis
The data are split into a training set and a test set to the
ratio of 8:2 to test the model algorithm’s efficacy and save
as much training data as possible. The test set is used to
evaluate the classification performance of the model,
whereas the training set is used to train the model.
Ablation Experiments
To have a better illustration of the effectiveness of the
MRB module on ASD classification, “mixed serial residual
block (MSRB-2)” is used to replace the P4D-PRB residual
block in MRB with the P4D-SRB residual block. Then,
“mixed parallel residual block (MPRB-2)” is used to
replace the P4D-SRB residual block in MRB with the P4DPRB residual block, and the classification results are presented in Table 2. When MSRB-2 is used, the accuracy,
specificity, and sensitivity of ASD classification are 66.8,
62.68, and 70.18%, respectively. When using MPRB-2, the
accuracy, specificity, and sensitivity of ASD classification
are 68.54, 60.62, and 75.08%, respectively. In contrast, when
using the MRB module, the accuracy of ASD classification
is improved by 7.87 and 6.13%, respectively, and the sensitivity and specificity are the highest. It can be seen that a
more structured MRB can achieve better results, especially for sensitivity improvement, which validates the effectiveness of the MRB model.
For the aforementioned three different residual structures, we plot their receiver operating characteristic
(ROC) curves and calculate area-under-the-curve (AUC)
values to evaluate the three algorithms. Figure 5 illustrates
ROC analysis results of the model by using MSRB-2,
MPRB-2, and mixed many residual block (MMRB), respectively. Figure 5 shows that the model performs at its best
and the AUC value is its highest when MRB is employed.
In this article, we also conduct experiments on the
effect of the number of MRB modules. And we use 1–4 MRB
modules, respectively, to further verify reliability of the
model’s design. As shown in Table 3, ASD classification
abbreviated as HFR by merging various functional connecaccuracy is merely 64.74% when only one MRB module is
tivity matrix creation techniques, brain segmentation defiused to extract spatiotemporal features, which indinitions, and feature-extraction techniques proposed by
cates that one MRB module cannot extract effective and
Graña and Silva [27]. 3) A CNN and
representative spatiotemporal
multilayer perceptron (CNN-MLP)information. As the number of
based ASD classification system
model layers increases, ASD classiIt can be seen that a
[28]. 4) A deep multimodal model
fication accuracy increases, but
ASD classification system based
when the number of stacked
more structured MRB
on joint representation learning,
groups reaches four, ASD classifican achieve better
namely, DiagNet, was proposed by
cation accuracy decreases and the
results, especially
Eslami et al. [29]. 5) A 4D CNNmodel appears to be overfitted. In
based ASD classification algorithm
summary, three MRB modules are
for sensitivity
proposed by Guo et al. [30]. 6) An
selected for model experiments in
improvement,
ASD classification system based on
this article.
4D CNNs, namely, UM_1, was proIn this article, the selection of
which validates the
posed by Guo et al. [30]. 7) An ASD
time frames for data sampling is
effectiveness of the
classification algorithm based on
discussed. The time frames are
USM sites and 4D CNNs was
selected and set to 8, 16, and 32 for
MRB model.
offered by Guo et al. [30]. 8) A CNN
training and testing, respectively.
and gate-recursive unit-based ASD
The classification effects are listed
classification algorithm was reportin Table 4.
ed by Jiang et al. [31]. 9) A GCN was used by Parisot et al.
Table 4 shows that ASD classification accuracy is low
[32] to train ASD detection models in a semisupervised
when the time dimension is chosen to be 8. This is mostly
learning setting. The results of the comparison algorithms
due to the time being too short, which causes the model to
are taken from the test results provided by the authors in
extract fewer features and makes it difficult to properly
the corresponding references. The test dataset contains
extract the temporal signals in the fMRI data. And when 32
data from every site, providing for the calculation of the
is used for the temporal dimension, more parameters and
average accuracy. The proposed algorithm’s and comparicomputation are required for model training, which results
son algorithms’ test set results are listed in Table 5.
in the overfitting phenomena. As a result, the experiments
As shown in Table 5, the proposed algorithm can
in this article’s experiments selected the data from 16 time
achieve 74.67% accuracy in the experiments with 871 subpoints that had the best categorization effect.
jects. It improves by 7.37% compared to the RCE-SVM,
The Comparison With
Existing Algorithms
Table 3. The impact of the number of MRB
We compare the proposed method with the current ASD
modules on the classification effect.
classification algorithms to test its performance. The com-
True-Positive Rate
pared algorithms are 1) an ASD classification algorithm
based on functional connection networks and recursivecluster elimination SVMs (RCE-SVMs) was put forth by
Chaitra et al. [26]. 2) A hybrid ASD classification algorithm
The number of
MRB modules
Accuracy
(%)
Specificity
(%)
Sensitivity
(%)
1
64.74
62.34
66.72
2
69.54
66
72.47
1
3
74.67
71.9
76.91
0.8
4
70.36
67.45
72.77
0.6
able 4. The classification effect of
T
different time frames.
0.4
MSRB-2 (AUC = 0.75)
MPRB-2 (AUC = 0.74)
MMRB (AUC = 0.8)
0.2
0
0
0.2
0.4
0.6
0.8
1
False-Positive Rate
Figure 5. The ROC curves of different residual block
superposition experiments.
Time
Frames
Accuracy
(%)
Specificity
(%)
Sensitivity
(%)
8
70.86
67.88
73.22
16
74.67
71.9
76.91
32
71.27
70.38
72.01
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
15
5.23% compared to the UM_1, and 3–5% compared to the
HFR, CNN-MLP, DiagNet, 4D CNN, and USM. In addition,
the proposed algorithm obtains 71.9 and 76.91% sensitivity
and specificity, respectively. The proposed algorithm uses
large samples for experiments, so the results are more
extensive. And the subsequent single-site experiments
also verify that the algorithm in this article can obtain
better classification results on the New York University
(NYU) results.
The classification accuracy, sensitivity, and specificity
of the proposed algorithm at 17 sites are computed in this
study to further explore the classification performance of
the model at each site, as shown in Table 6. From Table 6, it
is more obvious that the variability between sites has a significant impact on the final results. Although the Carnegie
Mellon University (CMU), SBL, and UM sites had less than
70% classification accuracy, the Kennedy Krieger Institute
(KKI), Leuven, MaxMun, and Trinity sites have more than
80% classification accuracy. The varying scanning apparatuses, subject counts, and time dimensions at each site
contributed to the variation in the expansion data as well.
The noise introduced by this fluctuation makes it more difficult to extract features from the fMRI data to categorize
illness states.
In addition, the confusion matrices of 17 sites are given
in Figure 6, which clearly shows the sample probability distribution of both the ASD and TC being correctly and
incorrectly identified, respectively. From Figure 6, it can be
seen that the percentage of ASD and TC, which can be correctly classified, is high in the KKI, MaxMun, and Trinity
sites, while the accuracy of both ASD and TC recognition
varies more in the CMU, SBL, San Diego State University,
and UM sites.
Conclusion and Future Work
In this study, the P4D-ResNet deep learning model was
proposed for the simultaneous extraction of spatiotemporal
information. Instead of using 4D spatiotemporal convolution, we employed spatial and temporal convolution,
which also built a mixed residual model to extract richer
spatiotemporal feature information. This study conducted
an enhancement operation on fMRI data, taking into
account the constraints of the current data volume and
Table 5. The performance of the
P4D-ResNet model compared with
other algorithms.
Classification
algorithms
RCESVM
HFR
CNNMLP
DiagNet
4D
CNN
Accuracy (%)
67.3
71.1
70.22
70.3
70.49
Classification
algorithms
UM_1
USM
CNNG
GCN
P4D
ResNet
Accuracy (%)
69.44
69.7
72.46
69.5
74.67
16
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
the sample size necessary for deep learning models. To
evaluate the performance of the model, we conducted
ablation experiments on the proposed algorithm. Additionally, by contrasting the method in this article with current ASD classification algorithms, the proposed
algorithm’s efficacy was confirmed. In addition, we calculated ASD classification accuracy, sensitivity, and specificity indexes among 17 sites and assessed the effect of
site variability on the results. However, several issues can
be considered in the future. First, we used only functional
imaging modality, without considering the structural
imaging modalities related to brain states. In the future,
we will integrate both functional and structural modalities to train our model for ASD identification. Second, we
treated ASD diagnosis as a binary classification problem.
However, it is well known that ASD is divided into eight
categories in the latest edition of ICD-11, published by the
Table 6. The classification effect of the
algorithm in this article at 17 sites.
Serial
number
Sites
Accuracy
(%)
Specificity
(%)
Sensitivity
(%)
1
Caltech
70.45
70
70.83
2
CMU
69.7
84.62
60
3
KKI
86.36
94.74
80
4
Leuven
80.36
77.5
87.5
5
MaxMun
83.33
83.33
83.33
6
NYU
72.59
75
70.67
7
OHSU
71.42
75
66.67
8
Olin
76.67
83.33
72.22
9
Pitt
78.72
82.35
76.67
10
SBL
69.44
73.33
50
11
SDSU
80
60
87.14
12
Stanford
72.46
70
73.47
13
Trinity
85.42
87.5
84.38
14
UCLA
70
77.78
63.63
15
UM
69.53
81.82
62.78
16
USM
75.89
75.71
76.19
17
Yale
80.95
76.19
85.71
Caltech: California Institute of Technology; CMU: Carnegie Mellon University;
KKI: Kennedy Krieger Institute; Leuven: University of Leuven; MaxMun: University
of Munich; NYU: New York University Langone Medical Center; OHSU: Oregon
Health and Science University; Olin: Olin Institute of Living at Hartford
Hospital; Pitt: University of Pittsburgh School of Medicine; SBL: Social Brain Lab;
SDSU: San Diego State University; Stanford: Stanford University (Stanford);
Trinity: Trinity Centre for Health Sciences; UCLA: University of California, Los
Angeles; UM: University of Michigan; USM: University of Utah School of
Medicine; Yale: Yale Child Study Center.
ASD
ASD 84.62 15.38
30
70
TC 29.17 70.83
TC
ASD
(a)
ASD
TC
40
75
25
ASD
ASD
TC 26.53 73.47
ASD TC
(l)
ASD 77.5
TC
TC
22.5
ASD 83.33 16.67
12.5
87.5
TC 16.67 83.33
ASD
TC
(c)
ASD 82.35 17.65
TC 27.78 72.22
TC 23.33 76.67
ASD
TC
ASD
(h)
ASD 87.5
ASD
(d)
ASD 83.33 16.67
TC
30
80
20
ASD
TC
(g)
70
TC
60
(b)
TC 33.33 66.67
ASD
ASD 94.74 5.26
ASD
75
TC 29.33 70.67
TC
TC
TC
50
50
ASD
(i)
TC
(j)
TC
ASD
(e)
ASD 73.33 26.67
25
(f)
ASD
60
40
TC 42.86 57.14
ASD
TC
(k)
12.5
ASD 77.78 22.22
ASD 81.82 18.18
ASD 75.71 24.29
ASD 76.19 23.81
TC 15.62 84.38
TC 36.37 63.63
TC 37.22 62.78
TC 23.81 76.19
TC 14.29 85.71
ASD TC
(m)
ASD TC
(n)
ASD TC
(o)
ASD TC
(p)
ASD TC
(q)
Figure 6. The confusion matrices of 17 sites. (a) Caltech. (b) CMU. (c) KKI. (d) Leuven. (e) MaxMun. (f) NYU.
(g) OHSU. (h) Olin. (i) Pitt. (j) SBL. USM. (k) SDSU. (l) Stanford. (m) Trinity. (n) UCLA. (o) UM. (p) USM. (q) Yale.
World Health Organization. Therefore, we will seek to
model a multiclass classifier. In addition, the deep learning model is like a black box, and it is difficult to achieve
physiological interpretation. We will continue to explore
interpretive methods suitable for the model.
Acknowledgment
This work was supported in part by the National Natural Science Foundation of China under grant 62172139, the Natural
Science Foundation of Hebei Province under grant
F2022201055, and the Science Research Project of Hebei
Province under grant BJ2020030. The project was funded
by the China Postdoctoral under grant 2022M713361, Natural Science Interdisciplinary Research Program of Hebei
University under grant DXK202102, Research Project of
Hebei University Intelligent Financial Application Technology R & D Center under grant XGZJ2022022, Open Project
Program of the National Laboratory of Pattern Recognition
under grant 202200007, and Open Foundation of Guangdong Key Laboratory of Digital Signal and Image Processing Technology (2020GDDSIPL-04). This work was also
supported by the High-Performance Computing Center of
Hebei University. Jingwen Yan is the corresponding author.
About the Authors
Shuaiqi Liu (shdkj-1918@163.com) earned his Ph.D.
degree from the Institute of Information Science, Beijing
Jiaotong University, in 2014. He is a professor at the College
of Electronic and Information Engineering, Hebei University, Baoding 071002, China. His research interests include
image processing and signal processing.
Siqi Wang (sqwang_hbu@163.com) earned her B.S.
degree from the College of Electronic and Information
Engineering, Hebei University, Baoding, China, in 2021.
She is currently pursuing her M. S. degree at the College
of Electronic and Information Engineering, Hebei University, 071002 Baoding, China. Her research interests
include computer vision and image processing.
Hong Zhang (hzhang_hbu@163.com) earned her
B.S. degree from the College of Information Engineering, Yanshan University, Qnhuangdao, China, in 2019.
She is currently pursuing her M.S. degree at the College
of Electronic and Information Engineering, Hebei University, 071002 Baoding, China. Her research interests
include computer vision and image processing.
Shui-Hua Wang (shuihuawang@ieee.org) earned her
Ph.D. degree in electrical engineering from Nanjing University in 2017. She was a professor in the School of Computer
Science and Technology, Henan Polytechnic University,
454000 Jiaozo, China. She also served as a research
associate in Loughborough University from 2018–2019.
Her research interests includes machine learning and biomedical image processing.
Jie Zhao (jzhao_hbu@163.com) earned his Ph.D.
degree in optics from the State Key Laboratory of
Applied Optics, Changchun Institute of Fine Mechanics
and Optics, Academia Sinica, Changchun, China, in
1997. He is a professor in the Department of Electronic
Engineering, University of Shantou, 515063 Shantou,
China. His current research interests include SAR
image processing, hyper-wavelet transforms, and compressed sensing.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
17
Jingwen Yan (jwyan@stu.edu.cn) is with the School of
Engineering, Shantou University, 515063 Shantou, China.
[16] K. Fu, D. Fan, G. Ji, Q. Zhao, J. Shen, and C. Zhu, “Siamese network for RGB-D
salient object detection and beyond,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44,
no. 9, pp. 5541–5559, Sep. 2022, doi: 10.1109/TPAMI.2021.3073689.
References
[17] Q. Hu, S. Hu, and S. Liu, “BANet: A balance attention network for anchor-free ship
[1] C. M. Michel, M. M. Murray, G. Lantz, S. Gonzalez, L. Spinelli, and R. Grave de Per-
detection in SAR images,” IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–12, Jan.
alta, “EEG source imaging,” Clin. Neurophysiol., vol. 115, no. 10, pp. 2195–2222, Oct.
2022, doi: 10.1109/TGRS.2022.3146027.
2004, doi: 10.1016/j.clinph.2004.06.001.
[18] Z. Xiao, C. Wang, N. Jia, and J. Wu, “SAE-based classification of school-aged chil-
[2] S. Liu et al., “3DCANN: A spatio-temporal convolution attention neural network
dren with autism spectrum disorders using functional magnetic resonance imaging,”
for EEG emotion recognition,” IEEE J. Biomed. Health Inform., vol. 26, no. 11, pp.
Multimedia Tools Appl., vol. 77, no. 17, pp. 22,809–22,820, Sep. 2018, doi: 10.1007/
5321–5331, Nov. 2022, doi: 10.1109/JBHI.2021.3083525.
s11042-018-5625-1.
[3] E. Moradi, A. Pepe, C. Gaser, H. Huttunen, and J. Tohka, “Machine learning
[19] N. Jia, J. Tan, Z. Xiao, Z. Qi, and J. Wu, “Classification of autism spectrum disorder
framework for early MRI-based Alzheimer’s conversion prediction in MCI sub-
based on brain functional connectivity and SAE,” J. Nanchang Univ. (Natural Sci.),
jects,” NeuroImage, vol. 104, pp. 398–412, Jan. 2015, doi: 10.1016/j.neuroimage.
vol. 42, no. 4, pp. 399–403, Aug. 2018, doi: 10.13764/j.cnki.ncdl.2018.04.017.
2014.10.002.
[20] A. Rathore, S. Palande, J. S. Anderson, B. A. Zielinski, P. T. Fletcher, and B. Wang,
[4] S. Liu, C. Zhao, Y. An, P. Li, J. Zhao, and Y. Zhang, “Diffusion tensor imaging
“Autism classification using topological features and deep learning: A cautionary
denoising based on Riemannian geometric framework and sparse Bayesian learning,”
tale,” in Proc. Int. Conf. Med. Image Comput. Comput. Assisted Intervention (MIC-
J. Med. Imag. Health Inform., vol. 9, no. 9, pp. 1993–2003, Dec. 2019, doi: 10.1166/
CAI), Cham, Switzerland: Springer-Verlag, 2019, pp. 736–744, doi: 10.1007/978-3-030
jmihi.2019.2832.
-32248-9_82.
[5] S. Liu, L. Zhao, J. Zhao, B. Li, and S.-H. Wang, “Attention deficit/hyperactivity disor-
[21] J. Zhuang, N. C. Dvornek, X. Li, P. Ventola, and J. S. Duncan, “Invertible network
der Classification based on deep spatio-temporal features of functional Magnetic Reso-
for classification and biomarker selection for ASD,” in Proc. Int. Conf. Med. Image
nance Imaging,” Biomed. Signal Process. Control, vol. 71, Jan. 2022, Art. no. 103239,
Comput. Comput. Assisted Intervention (MICCAI), Cham, Switzerland: Springer-
doi: 10.1016/j.bspc.2021.103239.
Verlag, 2019, pp. 700–708, doi: 10.1007/978-3-030-32248-9_78.
[6] A. Kastrup, G. Kruger, G. H. Glover, and M. E. Moseley, “Assessment of cerebral
[22] M. Tang, P. Kumar, H. Chen, and A. Shrivastava, “Deep multimodal learning for
oxidative metabolism with breath holding and fMRI,” Magn. Reson. Med., vol. 42,
the diagnosis of autism spectrum disorder,” J. Imag., vol. 6, no. 6, p. 47, Jun. 2020, doi:
no. 3, pp. 608–611, Sep. 1999, doi: 10.1002/(SICI)1522-2594(199909)42:3<608::AID-
10.3390/jimaging6060047.
MRM26>3.0.CO;2-I.
[23] L. Shao, C. Fu, Y. You, and D. Fu, “Classification of ASD based on fMRI data with
[7] E. Kirino, S. Tanaka, Y. Nagai, A. Hattori, and S. Aoki, “S1-3 Functional connectivity
deep learning,” Cogn. Neurodynamics, vol. 15, no. 6, pp. 961–974, Dec. 2021, doi:
in autism spectrum disorder evaluated using rs-fMRI and DKI,” Clin. Neurophysiol.,
10.1007/s11571-021-09683-0.
vol. 131, no. 10, pp. e244–e245, Oct. 2020, doi: 10.1016/j.clinph.2020.04.062.
[24] W. Yin, S. Mostafa, and F. Wu, “Diagnosis of autism spectrum disorder based on
[8] J. F. Agastinose Ronicko, J. Thomas, P. Thangavel, V. Koneru, G. Langs, and
functional brain networks with deep learning,” J. Comput. Biol., vol. 28, no. 2, pp.
J. Dauwels, “Diagnostic classification of autism using resting-state fMRI data
146–165, Feb. 2021, doi: 10.1089/cmb.2020.0252.
improves with full correlation functional brain connectivity compared to partial
[25] B. Lullo. “Autism Brain Imaging Data Exchange I ABIDE I.” ABIDE. Accessed: Jun.
correlation,” J. Neurosci. Methods, vol. 345, Nov. 2020, Art. no. 108884, doi: 10.1016/
24, 2016. [Online]. Available: https://fcon_1000.projects.nitrc.org/indi/abide/abide_I.html
j.jneumeth.2020.108884.
[26] N. Chaitra, P. A. Vijaya, and G. Deshpande, “Diagnostic prediction of autism spectrum
[9] M. Wang, J. Huang, M. Liu, and D. Zhang, “Modeling dynamic characteristics of
disorder using complex network measures in a machine learning framework,” Biomed.
brain functional connectivity networks using resting-state functional MRI,” Med.
Signal Process. Control, vol. 62, Sep. 2020, Art. no. 102099, doi: 10.1016/j.bspc.2020.102099.
Image Anal., vol. 71, Jul. 2021, Art. no. 102063, doi: 10.1016/j.media.2021.102063.
[27] M. Graña and M. Silva, “Impact of machine learning pipeline choices in autism
[10] T. Iidaka, “Resting state functional magnetic resonance imaging and neural
prediction from functional connectivity data,” Int. J. Neural Syst., vol. 31, no. 4,
network classified autism and control,” Cortex, vol. 63, pp. 55–67, Feb. 2015, doi:
p. 2,150,009, Apr. 2021, doi: 10.1142/s012906572150009x.
10.1016/j.cortex.2014.08.011.
[28] Z. Sherkatghanad, M. Akhondzadeh, S. Salari, M. Zomorodi, and V. Salari, “Auto-
[11] X. Bi, Y. Liu, Q. Jiang, Q. Shu, Q. Sun, and J. Dai, “The diagnosis of autism spec-
mated detection of autism spectrum disorder using a convolutional neural network,”
trum disorder based on the random neural network cluster,” Frontiers Hum. Neurosci.,
Frontiers Neurosci., vol. 13, Jan. 2020, Art. no. 1325, doi: 10.3389/fnins.2019.01325.
vol. 12, Jun. 2018, Art. no. 257, doi: 10.3389/fnhum.2018.00257.
[29] T. Eslami, V. Mirjalili, A. Fong, A. R. Laird, and F. Saeed, “ASD-DiagNet: A hybrid
[12] S. Mostafa, L. Tang, and F. X. Wu, “Diagnosis of autism spectrum disorder based
learning approach for detection of autism spectrum disorder using fMRI data,” Fron-
on eigenvalues of brain networks,” IEEE Access, vol. 7, pp. 128,474–128,486, Sep. 2019,
tiers Neuroinformatics, vol. 13, Nov. 2019, Art. no. 70, doi: 10.3389/fninf.2019.00070.
doi: 10.1109/access.2019.2940198.
[30] L. Guo et al., “Classification of the functional magnetic resonance image of autism
[13] J. Liu, Y. Sheng, W. Lan, R. Guo, Y. Wang, and J. Wang, “Improved ASD classifica-
based on 4D convolutional neural network,” CAAI Trans. Intell. Syst., vol. 16, no. 6, pp.
tion using dynamic functional connectivity and multi-task feature selection,” Pattern
1021–1029, Nov. 2021, doi: 10.11992/tis.202009022.
Recognit. Lett., vol. 138, pp. 82–87, Oct. 2020, doi: 10.1016/j.patrec.2020.07.005.
[31] W. Jiang et al., “CNNG: A convolutional neural networks with gated recurrent
[14] F. Zhao, Z. Chen, I. Rekik, S.-W. Lee, and D. Shen, “Diagnosis of autism spectrum
units for autism spectrum disorder classification,” Frontiers Aging Neurosci., vol. 14,
disorder using central-moment features from low- and high-order dynamic resting-
Jul. 2022, Art. no. 948704, doi: 10.3389/fnagi.2022.948704.
state functional connectivity networks,” Frontiers Neurosci., vol. 14, Apr. 2020, Art. no.
[32] S. Parisot, S. I. Ktena, E. Ferrante, M. Lee, and D. Rueckert, “Spectral graph
258, doi: 10.3389/fnins.2020.00258.
convolutions for population-based disease prediction,” in Proc. Int. Conf. Med. Image
[15] Y. Wu et al., “JCS: An explainable COVID-19 diagnosis system by joint classifica-
Comput. Comput. Assisted Intervention (MICCAI), Cham, Switzerland: Springer-
tion and segmentation,” IEEE Trans. Image Process., vol. 30, pp. 3113–3126, Feb.
Verlag, 2017, pp. 177–185, doi: 10.1007/978-3-319-66179-7_21.
2021, doi: 10.1109/TIP.2021.3058783.
18
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
©SHUTTERSTOCK.COM/TEO ANGELOVSKI
Tooth.AI
Intelligent Dental Disease Diagnosis and Treatment
Support Using Semantic Network
by Hossam A. Gabbar , Abderrazak Chahid ,
Md. Jamiul Alam Khan , Oluwabukola Grace Adegboro, and
Matthew Immanuel Samson
T
he emerging fourth industrial revolution (industry 4.0) is leading the healthcare system toward
more digitalization and smart management. For
instance, recent digital healthcare solutions can
help dentists/practitioners save time by managing their schedules and managing diagnosis and treatment.
The proposed solution is a diagnostic module that can be
integrated into existing dental software. This module is
based on artificial intelligence (AI) that allows the diagnosis
of X-ray images/volumes and helps in the early detection
Digital Object Identifier 10.1109/MSMC.2023.3245814
Date of current version: 17 July 2023
2333-942X/23©2023IEEE
and diagnosis of oral health diseases. The solution presents
a smart and automated assistive platform to aid dental practitioners in identifying underlying tooth diseases and
accessing doctors in treatment suggestions.
Introduction
According to the Global Burden of Disease 2010, of dental
and oral diseases affecting people worldwide, around 35%
suffer from untreated decay (caries) of permanent teeth,
11% have severe periodontal (gum) disease, and 2% even
have tooth loss. Oral health diseases happen due to different factors, such as a lack of resources, oral hygiene habits, etc. Such diseases may cause the loss of all-natural
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
19
≥7
1
0
0
−7
61
−6
0
51
−5
0
−4
41
0
31
−3
21
12
Age All
0
80
70
60
50
40
30
20
10
−2
(%)
teeth, which can lead to changes in eating patterns, nutrient deficiency, and involuntary weight loss, as well as
speech difficulty (if left uncorrected). The state of oral
health in Canada reported that the government’s main
challenge is providing required oral health care to the
most vulnerable segments of its population (e.g., lowincome groups, indigenous peoples, people with special
needs, children, and new immigrants with refugee status)
[1]. Figure 1 shows the age distribution of the health survey
of the Canadian community.
The time loss due to dental problems and treatment
causes an economic loss estimated at over 40 million
hours lost annually: US$442 billion in 2010 worldwide (see
Table 1 for more details). It is crucial to design preventive
healthcare solutions to help improve the oral health system and reduce economic loss.
Source: Statistics Canada,
Canadian Community Health Survey (CCHS), 201236
Figure 1. Percentage of Canadians aged 12 years
and over who consulted with a dentist or orthodontist
in 2012.
In addition, tooth-related diseases might result from
some skull/mouth geometry abnormalities. In some cases,
surgical interventions are needed to correct this deformation and restore healthy teeth. The detection of such skeletal abnormalities is usually diagnosed using cephalometric
analysis, which checks the normal position of some key
locations, called landmarks. Therefore, it is crucial to
design preventive healthcare solutions to integrate skeletal
and dental diagnosis to help improve the oral health
system and reduce treatment expenses. Many studies demonstrate that preventative healthcare solutions are costeffective, with substantial economic benefits regarding
reduced treatment costs and decreased productivity losses
in the labor market.
Most of the existing dental and skeletal software provide independent diagnosis and/or treatment solutions
with data management and appointment schedulers. These
available systems in the market can be divided into two
main categories. First, the hardware-based solutions provide the medical dental and skeletal scanner for data
acquisition and capturing the medical recording used for
the medical diagnosis. The scanners use different imaging
technologies, such as X-ray computed tomography (CT),
and intraoral cameras using near-infrared imaging (NiRi).
These solutions provide doctors mainly with raw and/or
enhanced medical images used for manual diagnosis, for
example, iTero [2], Carestream Dental [3], and GO [4].
These solutions allow fast scan time with additional postprocessing phases to reduce motion blur risk and limit
exposure time with minimal radiation. Some of these solutions allow advanced postprocessing, such as automatic
cephalometric tracing, superimposition, image reporting,
and surgical simulation using a visual treatment objective.
Table 1. Potential productivity losses due to dental problems and treatment at the
individual and societal level [1].
Occupation Classification
Mean
Hours Lost
Potential Individual
Losses ($)
Potential Societal
Losses ($)
Management
2.9
108.16
104,287,872
Business, finance, and administrative
3.8
85.15
239,109,715
Natural and applied sciences and related occupations
2.9
95.17
103,278,484
Health occupations
3.6
97.44
97,790,784
Occupations in social science, education, government
service, and religion
3.7
112.51
165,333,445
Occupations in art, culture, recreation, and sport
3.9
91.67
33,212,041
Sales and service occupations
31
5812
220,857,664
Trades, transport, and equipment operators and related occupations
2.8
6431
131,064,967
Occupations unique to primary industry
33
76.39
16,439,128
Occupations unique to processing, manufacturing, and utilities
22
42.96
32,232,888
20
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
I n t h i s a r t icle, we propose
These software solutions include
a smart and automated solution
but are not limited to: Cephx [5],
This module is
that combines dental and skeletal
Planmeca [6], FACAD [7], OrisCeph
based on artificial
diagnosis based on deep learning
Rx [8], AudaxCeph [9], Carestream
techniques. In addition, it assists
Dental [3], and DolphinCeph Tracintelligence that
doctors in treatment suggestions,
ing [10]. Most of the advanced analallows the diagnosis
taking into consideration the
ysis is performed using deep
patient profile parametrized by
learning-based models (classificaof X-ray images/
their medical records and prevition, semantic segmentation, and
volumes and helps
ous diseases treatment history.
landmark detection, etc.) [11] or
in the early detection
The outline of the rest of this artiknowledge-based techniques (gencle is as follows. The “Proposed
erative programming, pattern
and diagnosis of oral
Solution” section describes the
detection, etc.) [12]. However, deep
health diseases.
proposed Tooth.AI framework
learning-based features provided
with the different diagnosis and
by this software still depend on the
treatment modules. The “Results
initial data used for training. This
and Discussion” section presents
limits its flexibility for variant
the obtained results of a case study. The “Knowledge
patient profiles. Thus, it becomes challenging to ensure the
Translation” section explores the knowledge transfer
generalizability of the trained models. Moreover, some facplan to take our solution to the next stage of public usage.
tors, such as age, gender, and existing medical conditions,
The “Novelty and Anticipated Impact” section presents a
are not considered in model training.
Input Medical Data
Patient Information
– Age, Sex, Gender
– Health Condition
– Previous Treatment
CT Image/3D Volume
Patient Information
Verified Diagnosis/Treatment
Annotated
Dataset (DSN)
Dental Diagnosis
– Previous Successful Disease
Treatments
– Patients With Similar Profile/
Disease
–
–
–
–
Tooth Structure Segmentation
Caries Detection and Characterization
Predict Future Dental Complications
Provide a Justified Diagnostic Report
Incremental Learning
Treatment Database
Treatment Suggestion
Suggest the Most Relevant Successful
Treatment Protocol
Update Database
Expert Feedback
Figure 2. The general framework of the proposed solution. DSN, dental semantic network.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
21
summary of our contribution and novelty with concluding remarks.
Proposed Solution
There has been a massive amount of X-ray data and
cumulative knowledge from dental radiologists and
experts over the last few decades. They have identified
thousands of pathological changes and traces of previous
dental treatment on X-rays worldwide. Our solution will
offer an integrated computer vision and knowledge-based
system to extract diagnostic information from input medical images/volumes collected from dental exams using CT
scanners. The proposed research solution is to design an
intelligent preventive system named Tooth.AI to detect
and diagnose skeletal and dental diseases. It aims to provide a real-time inspection of the teeth and skull geometry
and simulate the future development of the disease in the
case of no treatment and suggest suitable treatments for
the patient (Figure 2).
Dental Diagnosis
This integrated dental diagnosis component of this solution
will support the detection and diagnosis of vertical root
fractures, assessment of root morphologies, determining
the working length of the tooth, locating apical foramen,
retreatment predictions, and prediction of periapical pathologies. The medical images will be collected from openaccess resources/data sets and our collaborative dentists in
Toronto and internationally. The developed deep learning
segmentation techniques will identify the tooth’s structure
(see Figure 3). In addition, it will classify its health condition
(healthy tooth with caries).
The extracted knowledge will be accumulated with
deterministic and probabilistic parameters in the dental
semantic network (DSN) that will be dynamically updated
using expert feedback. The collaboration with experts will
allow our team to annotate the medical data and evaluate
the performance of the developed algorithm on different
patients and validate the deployment of the proposed solution in clinical case studies. The case studies will include
different sex/gender from different communities and
Enamel
Dentine
Pulp
Gum Line
Crown
Neck
Root
Alveolar Bone
Figure 3. Illustration of the tooth structure (source [13]).
22
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
regions across Canada to have a good representation of
the dental healthcare data with varying health conditions.
All of these factors will be investigated in relation to the
dental treatment process. The proposed computer visionbased approach will process 2D images/scans, which will
be mapped into 3D digital form. Tooth.AI will mainly analyze cone-beam CT images/volumes because it provides a
better understanding of the mouth/teeth morphology. In
addition, we use 2D images collected from the nonradiative intraoral camera using NiRi technology.
Skeletal Diagnosis
The second integrated component of Tooth.AI will support dentists, orthodontists, and oral surgeons in cephalometric analysis and help them to understand the dental
and skeletal relationships in the human skull. They will
be able to plan the treatment correctly and accurately
with reduced time. Tooth.AI will reduce the manual
examination of X-ray images, where it will automatically
identify landmarks with preprocessed knowledge in
DSN. It will visualize the integrated view with landmarks and possible diagnoses or issues based on stored
expertise, as shown in Figure 2. Tooth.AI will provide
details about patient diagnoses of dental and skeletal
abnormalities and propose a possible treatment plan.
Tooth.AI will offer an automated process with human-inthe-loop to work fully automatically or with human
intervention based on user preferences and configurations. The proposed techniques within Tooth.AI for
cephalometric landmark detection are based on state-ofthe-art methods categorized into two main categories, as
shown in Figure 4.
The latest techniques of cephalometric landmark detection and delta disease detection using the latest deep
learning algorithms produce results comparable to human
examiners [14]. For in­­stance, very encouraging results
were achieved in landmark detection of an error less than
2 mm of point-to-point errors with ground truth positions
[15], [16], [17], [18], [19], [20]. In addition, there exist other
types of methods used to search for landmarks, such as
shape model [21], employing resampling in conjunction
with the convolutional neural networks (CNN) algorithm
[22], CNN for regression analysis of cephalometric coordinates [23], and various others.
However, we need to go beyond landmark detection and
suggest a suitable treatment based on the previous successful treatments of similar patient profiles. We propose developing a fully integrated toolbox for automatic analysis of
X-ray images, detection of abnormalities or diseases, and
help in treatment planning. The system would have a proper
data management system to input patient data. Then the
landmarks would be identified with a trained deep learning
model. In detecting landmarks, we propose investigating
the effects of factors, such as age, gender, and noise data.
The proposed system would analyze the landmark data
(providing the needed angles and distances computations
necessary for the diagnosis). By combining the computed
results and the previous expert’s treatment, the system
would suggest the presence of abnormalities or diseases
and suggest treatment planning. The collaboration with
dentists will enable the team to annotate the diagnosis
image and link it to diseases. It will also provide detailed
inputs to label images for skeletal analysis to support the
planning of surgical modifications. The main toolbox will
be directed to clinical use using X-ray CT images and 3D
volumes. The proposed algorithms will further validate
nonradiative data using our laboratory setup based on an
intraoral camera [24]. The workflow of the proposed solution is shown in Figure 2.
annotated data set. This will boost the deep learning
model and help build a compromised knowledge base
that can be transferred to other doctors and healthcare
systems. The increment learning framework presents a
solution to this problem as follows:
◆◆ Gradually build an annotated data set from the daily
practice of doctors.
◆◆ Centralize the knowledge base from different experts
and build generalizable models.
◆◆ Enable the transfer learning by using these pretrained
models for another similar disease while preserving
patient data privacy.
DSN
Enabled Incremental Learning
During the medical treatment
journey of a specific disease, the
Deep learning-based diagnosis
Tooth.AI will provide
doctors create a treatment file
has shown remarkable abilities to
details about
describing the diagnosis proceachieve high accuracy even comdure, and record the prescribed
parable to expert practitioners.
patient diagnoses of
treatment and its efficiency evaluHowever, this cannot be guarandental and skeletal
ation during the follow-up sesteed if these models are trained
sions. In this article, the different
on a small data set or using data
abnormalities and
data collected during this treatsets that do not represent most
propose a possible
ment journey are structured into a
samples but with few variabilities.
treatment plan.
s e m a nt ic ne t work d a t a b a s e
In addition, medical data have
including patient health condition,
some additional constraints relatdisease and treatment history, etc.
ed to privacy and ethics restricTherefore, all patients’ treatment
tions. Therefore, it becomes highly
journeys are put together and grouped into different
challenging to access the needed labeled data set with
nodes: patients (denoted P), tooth diseases (denoted D-T)
enough size and variability. Furthermore, the data label(tooth), gum diseases (denoted D-G), and their correing presents a second challenge as this type of labeling
sponding treatments (T-T, T-G). These nodes are associ(dental and skeletal diagnosis) is subjective to each
ated by their relationship: i.e., the patient (P1) is affected
expert’s experience and daily practices. Therefore, it is
by the disease (D-T1), which is treated with the treatment
vital to design a system that can use the doctor’s diagno(T-T2). The patient nodes are linked with a weighted edge
sis and treatment and convert them into a standardized
Landmark Identification
Knowledge Based
Techniques
AI Based
Deep Learning
Machine Learning
–Edge and Pattern Detection
–Genetic Programing
–Models (Active Shape
and Active Appearance
Models)
–Random Forest
–Regression
–Support Vector Machines
–Decision Tree
–Linear Affine and Linear
Principal Component
–Convolution Neural
Network
–Pulse Coupled Neural
Network
–Cellular Neural Network
Figure 4. Categorization of landmark identification techniques (source [12]).
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
23
f irst to tra in a segmentation
model. The segmented teeth will
Therefore, it is vital
be then cropped and used to gento design a system
erate a second data set used for
classification. The classification
that can use the
model is deployed to distinguish
doctor’s diagnosis
three different tooth classes:
healthy, unhealthy, and treated
and treatment and
(with filling). The cascade models
convert them into
will help in diagnosing each tooth
a standardized
separately, as shown in Figure 6.
Results and Discussion
The training performance of
In this section, a case study is preannotated data set.
segmentation models gives perforsented to show an example of the
mance described by the achieved
obtained results using the prointersection over union (IOU) up
posed framework. It explores a
to 0.79. Similarly, the classification model could achieve an
scenario of deploying the proposed Tooth.AI system for
accuracy of 0.95. Figure 7 presents the training and validateeth diagnosis and skull landmark detection and shows
tion performance of both models.
how this diagnosis report can be used to update the
semantic network and suggest a suitable treatment.
Skeletal Landmark Detection
Teeth Diagnosis
The cephalograms data set [25] consists of 400 lateral
cephalogram images of 400 different subjects, whose
The used panoramic dental data set consists of 1,000
ages are between 7 and 76. Each image of the data
radiography images, where the corresponding mask
set is annotated with 19 landmarks, as presented in
localized the different teeth [24]. These data are used
defining their similarity. This similarity is computed as the covariance of patients, retrieved from
the semantic network edge, considering their health conditions
and disease/treatment history.
Figure 5 shows an illustration of
converting regular data into semantic network-based data.
1
ChlorHexidine
P2:
Jamiul
Affected By
0.6
P1:
John
Success
0.8
IOU
Gingivitis
Treated With
0.4
Failure
Training IOU
Affected By
Caries
Patient
0.6
0.2
Treated With
Disease
Validation IOU
0
Root
Canal
Treatment
100
200
300
Epochs
(a)
400
500
100
Treatment
90
Figure 5. Illustration of the treatment suggestion
80
Accuracy
process.
70
60
50
40
Training_Accuracy
30
Validation_Accuracy
0
50
100 150 200 250 300 350 400
Epochs
(b)
Figure 6. Example of teeth diagnosis: healthy teeth
(green), unhealthy (red), treated (blue).
24
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
Figure 7. Teeth segmentation and classification
performance: (a) teeth segmentation IOU; (b) teeth
classification accuracy.
Figure 8. For landmark detection, a deep
learning model based on CNN architecture is deployed. Figure 9 shows the predicted landmark points. The obtained
results will be used to compute the different clinical measurements needed to
characterize the skull shape and extract
the anomalies.
L1
L2
L3
L4
L5
L6
L7
L8
L9
L10
L11
L12
L13
L14
L15
L16
L17
L18
L19
Sella
Nasion
Orbitale
Porion
Subspinale
Supramentale
Pogonion
Menton
Gnathion
Gonion
Lower Incisal Incision
Upper Incisal Incision
Upper Lip
Lower Lip
Subnasale
Soft Tissue Pogonion
Posterior Nasal Spine
Anterior Nasal Spine
Articulate
Treatment Suggestion Using the
Semantic Network
The diagnostic reports generated by the
dental and skeletal modules will be used
to recognize the disease and suggest the
appropriate treatment. In this work, a list
of diseases and treatments is shown
based on the medical literature to build
the initial database needed for treatment
Figure 8. Cephalogram annotation example showing the 19
suggestions [26], [27], [28]. The suggestion landmarks (source [17]).
of treatment for a specific patient has
three levels. First, the system suggests
further train the deep learning models. We will communithe recent successful treatment if the patient was previcate with the Canadian Dental Association to get more
ously treated for the same disease. Second, and if the
views and expertise on our solution and potential implepatient was not affected by the disease before, the system
mentation guidelines. Our team will communicate with
will suggest the successful treatment of the most similar
the Canadian Dental Regulatory Authorities Federation to
patient from the database. If not available, third, the sysgain experience and application of automation in view of
tem will suggest the most commonly used treatment for
the regulatory framework.
the diagnosed disease.
The generated system diagnosis and the suggested
treatment are then updated to the semantic network, creating additional nodes if applicable. Figure 10 presents two
examples of adding new nodes to the semantic network.
Knowledge Translation
The collaborating partner dentists from Canada and international clinics will provide sample images (with consent) and diagnosis and treatment data, which will
support the research team to build training data and associated analysis. The interviews with expert dentists and
dental data providers will offer expertise in the validation
and analysis of images, diagnosis, and treatment details,
which will be transferred to the research team. Obtaining
medical data from 20 patients is expected each year. In
addition, we will conduct around 28 interview sessions
with dentists and experts to annotate the collected data
and get their opinion about the algorithms, approach, and
integrated solutions. Thanks to the interaction with
experts and practitioners, the proposed toolbox is
enabled with an interactive user interface. Thus, the
experts can correct the wrong predictions of the AI models to boost their performance. Moreover, we propose
handling the lack of an annotated data set by developing
an incremental model training framework that keeps
updating the annotated data from recent interactions with
the expert. All of these interactions between the toolbox
and the expert will be saved to the database and used to
Figure 9. Example of skeletal landmark locations
detection: predicted landmarks (green) ground-truth
of landmarks (red).
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
25
Disease Nodes
Patient Nodes
Treatment Nodes
Patient ID
Zoom Area
(a)
Disease Nodes
Patient Nodes
Treatment Nodes
Patient ID
Zoom Area
(b)
Figure 10. Example of the semantic networks after new patient–disease–treatment augmentation: (a) small
DSN; (b) larger DSN.
Novelty and Anticipated Impact
The proposed system includes different deep learningbased techniques for dental and skeletal diseases and
treatments, which will enhance the accuracy of dental
treatments and reduce errors, with enhanced efficiency. The
proposed novel incremental learning framework will allow
for a gradual and improved understanding of dental and
skeletal diseases and to transfer this knowledge to an AIbased model using an active interaction between the toolbox and the expert. It will preserve the doctor’s experiences
in diagnosis and treatment, and convert them into standardized annotated data sets that will be used to support young
dentists with less experience in improved dental treatments.
26
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
The proposed DSN and knowledge base would be useful for
both dentists and the public to share and transfer expertise.
The accumulation of expertise around dental diagnosis and
treatment will preserve the expertise of doctors and will
allow continuous expertise exchange and transfer between
healthcare providers. The proposed solution will also support dental surgeries, which are expensive, and reduce
errors and increase comfort and satisfaction based on
improved precision and accuracy to meet patient expectations. It will open the door for digital and smart dental
healthcare systems. The solution will enable plug-and-play
interfaces to different X-ray and camera technologies for
national and international deployments.
Acknowledgment
Research reported in this publication has been supported
by New Vision Systems Canada Inc. and Mitacs.
[13] K. Watson and C. Frank. “How to brush your teeth properly.” Healthline. Accessed:
Mar. 12, 2022. [Online]. Available: https://www.healthline.com/health/dental-andoral-health/how-to-brush-your-teeth
[14] H. W. Hwang, J. H. Moon, M. G. Kim, R. E. Donatelli, and S. J. Lee, “Evaluation of
About the Authors
Hossam A. Gabbar (hossam.gaber@ontariotechu.ca) is
with the Faculty of Energy Systems and Nuclear Science
and the Faculty of Engineering and Applied Science, Ontario Tech University, Oshawa, ON L16 0C5, Canada.
Abderrazak Chahid (abderrazak.chahid@ontariotechu.
net) is with the Faculty of Energy Systems and Nuclear Science, Ontario Tech University, Oshawa, ON L16 0C5, Canada.
Md. Jamiul-Alam Khan (mdjamiul.khan@ontariotechu.
net) is with the Faculty of Engineering and Applied Science,
Ontario Tech University, Oshawa, ON L16 0C5, Canada.
Oluwabukola Grace-Adegboro (oluwabukola.
adegboro@ontariotechu.net) is with the Faculty of
Engineering and Applied Science, Ontario Tech University,
Oshawa, ON L16 0C5, Canada.
Matthew Immanuel Samson (sunnyssj8@gmail.com)
is with New Visions Systems Canada Inc., Scarborough, ON
M1S 3L1, Canada.
automated cephalometric analysis based on the latest deep learning method,” Angle
Orthodontist, vol. 91, no. 3, pp. 329–335, May 2021, doi: 10.2319/021220-100.1.
[15] C. W. Wang et al., “Evaluation and comparison of anatomical landmark detection
methods for cephalometric X-ray images: A grand challenge,” IEEE Trans. Med. Imag.,
vol. 34, no. 9, pp. 1890–1900, Sep. 2015, doi: 10.1109/TMI.2015.2412951.
[16] H. Kim, E. Shim, J. Park, Y. Y. J. Kim, U. Lee, and Y. Y. J. Kim, “Web-based fully
automated cephalometric analysis by deep learning,” Comput. Methods Programs
Biomed., vol. 194, Oct. 2020, Art. no. 105513, doi: 10.1016/j.cmpb.2020.105513.
[17] “Fully automatic cephalometric evaluation using random forest regression-voting,”
Univ. of Manchester, Manchester, U.K., 2015. [Online]. Available: https://www.research.
manchester.ac.uk/portal/en/publications/fully-automatic-cephalometric-evaluation
-using-random-forest-regressionvoting(b42c658f-0a66-4d1e-99c7-9cb67fb282a0).html
[18] “Grand challenges in dental X-ray image analysis 2014.” Accessed: Mar.
12, 2022. [Online]. Available: https://www.be.ntust.edu.tw/p/404-1009-44930.
php?Lang=zh-tw
[19] Y. Song, X. Qiao, Y. Iwamoto, and Y. W. Chen, “Automatic cephalometric landmark
detection on X-ray images using a deep-learning method,” Appl. Sci. (Switzerland),
vol. 10, no. 7, Apr. 2020, Art. no. 2547, doi: 10.3390/app10072547.
[20] J. Kim et al., “Accuracy of automated identification of lateral cephalometric
References
landmarks using cascade convolutional neural networks on lateral cephalograms from
[1] H. Amasya, D. Yildirim, T. Aydogan, N. Kemaloglu, and K. Orhan, “Cervical ver-
nationwide multi-centres,” Orthodontics Craniofacial Res., vol. 24, no. S2, pp. 59–67,
tebral maturation assessment on lateral cephalometric radiographs using artificial
Dec. 2021, doi: 10.1111/ocr.12493.
intelligence: Comparison of machine learning classifier models,” Dentomaxillofacial
[21] J. Montúfar, M. Romero, and R. J. Scougall-Vilchis, “Automatic 3-dimensional
Radiol., vol. 49, no. 5, Mar. 2020, Art. no. 49, doi: 10.1259/dmfr.20190441.
cephalometric landmarking based on active shape models in related projections,”
[2] “iTero element 5D — iTero intraoral scanner.” iTero. Accessed: Mar. 12, 2022.
Amer. J. Orthodontics Dentofacial Orthopedics, vol. 153, no. 3, pp. 449–458, Mar. 2018,
[Online]. Available: https://global.itero.com/en/products/itero_element_5d
doi: 10.1016/j.ajodo.2017.06.028.
[3] “Cephalometric imaging systems.” Carestream Dental. Accessed: Mar. 12, 2022.
[22] S. H. Kang, K. Jeon, H. J. Kim, J. K. Seo, and S. H. Lee, “Automatic three-
[Online]. Available: https://www.carestreamdental.com/en-us/csd-products/extraoral-
dimensional cephalometric annotation system using three-dimensional con-
imaging/cephalometric-imaging/
volutional neural networks: A developmental trial,” Comput. Methods Bio-
[4] “GO extraoral imaging,” Newtom. Accessed: Mar. 12, 2022. [Online]. Available:
mechanics Biomed. Eng., Imag. Vis., vol. 8, no. 2, pp. 210–218, Mar. 2020, doi:
https://www.newtom.it/en/medicale/prodotti/go/
10.1080/21681163.2019.1674696.
[5] “Cephalometric anlaysis archives— CephX— AI driven dental services.”
[23] S. Nishimoto, Y. Sotsuka, K. Kawai, H. Ishise, and M. Kakibuchi, “Personal
CephX. Accessed: Mar. 12, 2022. [Online]. Available: https://cephx.com/it/tag/
computer-based cephalometric landmark detection with deep learning, using cepha-
cephalometric-anlaysis-it/
lograms on the internet,” J. Craniofacial Surgery, vol. 30, no. 1, pp. 91–95, Jan. 2019,
[6] “Cephalometric anlaysis archives— CephX— AI driven dental services.” CephX.
doi: 10.1097/SCS.0000000000004901.
Accessed: Mar. 12, 2022. https://cephx.com/it/tag/cephalometric-anlaysis-it/
[24] K. Panetta, R. Rajendran, A. Ramesh, S. Rao, and S. Agaian, “Tufts dental data-
[7] “Facad ortho tracing software.” facad.com. Accessed: Mar. 12, 2022. [Online].
base: A multimodal panoramic X-ray dataset for benchmarking diagnostic systems,”
Available: https://www.facad.com/wp/
IEEE J. Biomed. Health Inform., vol. 26, no. 4, pp. 1650–1659, Apr. 2022, doi: 10.1109/
[8] “Software for cephalometric analysis OrisCeph Rx CE.” OrisLine. Accessed: Mar. 12,
JBHI.2021.3117575.
2022. [Online]. Available: https://www.orisline.com/en/software-for-cephalometric-analysis/
[25] C. Lindner, C. W. Wang, C. T. Huang, C. H. Li, S. W. Chang, and T. F. Cootes, “Fully auto-
[9] “AudaxCeph software.” audaxceph.com. Accessed: Mar. 12, 2022. [Online]. Avail-
matic system for accurate localisation and analysis of cephalometric landmarks in lateral
able: https://www.audaxceph.com/
cephalograms,” Scientific Rep., vol. 6, no. 1, pp. 1–10, Jun. 2021, doi: 10.1038/s41598-021-
[10] “Content library — Aquarium — Orthodontic imaging and practice management
91681-7.
software — Patient education — 1(818)435-1368 — Dolphin imaging and management
[26] “Gum problems: 6 types, causes, symptoms, treatment & oral cancer.” Medi-
solutions — Product.” Dolphin Imaging. Accessed: Mar. 12, 2022. [Online]. Available: https://
cineNet. Accessed: Mar. 12, 2022. [Online]. Available: https://www.medicinenet.com/
www.dolphinimaging.com/product/Aquarium?Subcategory_OS_Safe_Name=Content_Library
gum_problems/article.htm
[11] F. Schwendicke, T. Golla, M. Dreher, and J. Krois, “Convolutional neural networks
[27] “Fractured tooth (Cracked Tooth): What it is, symptoms & repair,” Cleveland
for dental image diagnostics: A scoping review,” J. Dentistry, vol. 91, Dec. 2019, Art.
Clinic, Cleveland, OH, USA, 2021. Accessed: Mar. 12, 2022. [Online]. Available: https://
no. 103226, doi: 10.1016/j.jdent.2019.103226.
my.clevelandclinic.org/health/diseases/21628-fractured-tooth-cracked-tooth
[12] M. Juneja et al., “A review on cephalometric landmark detection techniques,”
[28] “Healthline: Medical information and health advice you can trust.” Healthline.
Biomed. Signal Process. Control, vol. 66, Apr. 2021, Art. no. 102486, doi: 10.1016/j.
Accessed: Mar. 12, 2022. [Online]. Available: https://www.healthline.com/
bspc.2021.102486.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
27
©SHUTTERSTOCK.COM/BARILLO_PICTURE
MDN-Enabled
SO for Vehicle
Proactive
Guidance in RideHailing Systems
Minimizing Travel Distance and Wait Time
by Xiaoming Li , Jie Gao , Chun Wang , Xiao Huang , and Yimin Nie
Digital Object Identifier 10.1109/MSMC.2022.3220315
Date of current version: 17 July 2023
28
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
2333-942X/23©2023IEEE
V
distances and rider wait times [2]. Guo et al. [3] propose an
ehicle proactive guidance strategies are
online ride-hailing dispatch framework that is based on
used by ride-hailing platforms to mitigate
spatiotemporal thermos guidance to address the real-time
supply–demand imbalance across regions
service vehicle dispatching problem. A concept named
by directing idle vehicles to high-demand
spatiotemporal thermos is defined to represent the
regions before the demands are realized. This
demand density of ride-hailing regions. In addition, the
article presents a data-driven stochastic optimization
random forest regression machine-learning method is utiframework for computing idle vehicle guidance strategies.
lized for spatiotemporal thermos forecasting. A data-drivThe objective is to minimize drivers’ idle travel distance,
en recommendation system that exploits the benefits of
riders’ wait time, and the oversupply costs (OSCs) and
vehicular social networks for ride-hailing services is
undersupply costs (USCs) of the platform. Specifically,
designed in [4] where long short-term memory is utilized to
we design a novel neural network that integrates gated
forecast the demands. Chen et al. [5] propose a hierarchical
recurrent units (GRUs) with mixture density networks
framework for vehicle dispatch in ride-sharing systems.
(MDNs) to capture the spatial-temporal features of the
The higher hierarchy optimizes idle
rider demand distribution.
mileage by rebalancing vehicles
The outcome of the neura l
across regions toward current and
network is fed into a stochastic
predicted rider demands.
optimization process to compute
The objective is to
While the lower hierarchy is to
near-optimal idle vehicle guidance
minimize the total
minimize the total mileage delay as
solutions. The performance of the
well as serve rider requests as
proposed framework is validated
idle travel distance
much as possible, Miao et al. [6]
through numeric experiments using
under the worst case
develop a data-driven taxi dispatch
New York yellow taxi trip record
demand scenario
framework under demand uncerdata. Our results show that the
tainty that is spatial-temporally
MDN-enabled stochastic optimizawhile maintaining
correlated using robust optimization approach outperforms other
service fairness across
tion modeling techniques. In this
machine learning-based vehicle
work, vacant vehicles are disguidance models that only utilize
the whole city.
patched toward predicted rider
the point estimates of rider demands.
demand that varies in an uncertain
In terms of managerial implicademand set constructed on spatialtions, it is clear from our experitemporally correlated data sets. The objective is to minimental results that, by adopting data-driven stochastic
mize the total idle travel distance under the worst case
optimization models in their vehicle guidance systems,
demand scenario while maintaining service fairness across
ride-hailing platforms can improve rider and driver satisthe whole city. In addition to guidance strategies at the sysfaction and reduce their operating costs.
tem level, the impact of guidance signals on individual drivers’ decisions is also studied. In [7], a sequential binary
Introduction
logistic regression model is proposed to determine the facThe most important service provided by ride-hailing plattors influencing the driver’s cruising decisions when receivforms, such as Lyft, Uber, and Didi, is to match drivers and
ing taxi-calling signals. The model is calibrated by survey
riders efficiently. To ensure service quality and reduce
data. Recently, machine learning [8] and deep reinforcewait times, the demand of riders needs to be promptly met
ment learning [9] approaches have been ubiquitously utiby the supply of drivers. However, dynamic changes in the
demands across the service regions often cause a supply–
lized in ride-hailing applications which shed light on a
demand imbalance in the regions and make it challenging
research trend of combining learning approaches with optifor the platforms to dispatch sufficient drivers to highmization modeling techniques.
demand regions in a timely manner to ensure low wait
The articles mentioned previously provide important
times. Without a proactive guidance strategy, a ride-hailinsights into designing a proactive guidance mechanism
ing platform has to react to the rider demands across
in ride-hailing systems. However, their approaches do
regions when they are realized. This reactive strategy may
not incorporate uncertainties in their optimization proprolong riders’ wait times since the needed idle vehicles
cess in the sense that they only predict scalar point estimay not be in riders’ immediate proximity.
mates of the demands in regions, which does not allow
Idle vehicle proactive guidance strategies have been
stochastic optimization (SO) models. This simplified
proposed in recent literature to tackle this challenge [1],
modeling of uncertainty often leads to a considerable
[2]. A proactive guidance strategy guides needed vehicles
decline in system performance [10]. As an exception, the
to regions where future demands are expected to outstrip
approach proposed in [6] does involve the uncertainty
supply. As a result, it can increase the rider serving rate
sets of the demand. However, their robust optimization
(SR) and, at the same time, reduce driver idle driving
models focus on guaranteed performance in worst case
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
29
scenarios, which is rather conservative for the purpose of
proactive guidance.
In this article, we propose a data-driven SO framework
to compute near-optimal idle vehicle proactive guidance
strategies given the dynamic rider demand and driver supply across ride-hailing service regions. Instead of just predicting the demand in the form of a scalar, the framework
models the uncertainty of rider demand by estimating its
probability distribution using historical rider demand
data. The uncertainty model is then integrated into an SO
process to compute proactive guidance strategies. The
contribution of this article is two-fold: 1) we extend MDNs
[11] by integrating GRUs [12], which enables the MDN to
capture various spatial-temporal features in estimating
rider demand distributions and 2) we integrate the
extended MDN with an SO process to minimize the vehicle guidance related costs, including USC, over supply
cost, and driver idle travel cost.
The MDN-SO Framework
In this section, we present the MDN-enabled SO (MDNSO) framework, which consists of two modules: an
extended MDN that is suitable for estimating demand distributions of time-series data and an SO process that
computes near-optimal proactive
guidance strategies.
Generally, GMM can be considered as a group of Gaussian
distributions with different weights, where the ith Gaussian is determined by weight r i, means n i and covariance
matrix R i (variance for v i univariate Gaussian). Then the
predicted probability distribution can be represented
using GMM by adjusting the parameter i. Notice that the
sum of Gaussian component weights must be equal to 1
because each weight is computed by the following softmax
function, which is shown in (2):
r i = softmax (h) i =
eh
|
r
i
n
k=1
eh
(2)
r
k
where h ri denotes the outputs of the hidden layer prior to
the layer stores GMM components. Meanwhile, the corresponding n i and v 2i are computed from (3) and (4),
respectively:
n i = h in
(3)
v i = exp ^h vi h .
(4)
The probabilistic forecasting model is built on the
XMDN where GRUs can encode useful information of the
past in single or multiple layers.
The input of each layer is the output of the previous layer concatenated with the network input. Then
The Extended MDN
Therefore,
we
propose
the outputs of the GRU hidden layer
MDN is a combination of a neural
h t will be used to compute the
network and a Gaussian mixture
an extended MDN
model (GMM). Unlike the regular
parameters of GMM from (2)–(4).
to be integrated
neural network that only predicts a
In addition, the concatenation of
into our SO process,
single value as the output, MDN
outputs of all layers is used to precan capture the model’s stochastic
dict the network’s output, which is
which requires the
behaviors by parameterizing a
compared with the target y. Finally,
distribution of the
Gaussian mixture distribution
we use the mixture density paramusing the outputs of a neural neteters to parameterize a Gaussian
rider demand
work. However, regular MDN modmixture distribution as the probaas input.
els are not sufficient for our
bilistic forecasting outcome. The
purpose as they do not possess the
prediction process can be repeated
capability of capturing spatial-temin a loop to predict rider demand
poral features in rider demand
for multiple time steps.
data. Therefore, we propose an extended MDN to be inteFurthermore, one of the issues in MDNs, like the congrated into our SO process, which requires the distribution
ventional deep neural network, is the overfitting problem
of the rider demand as input.
[13]. In this work, besides the dropout operations in
The extended MDN (XMDN) is an integration of regXMDN, we introduce the L2 regularization technique to
ular MDN with GRU. The GMM used by the XMDN is
avoid the overfitting issue. In this regard, we design the
configured by the mixed coefficients (also known as
loss function of XMDN shown in (5):
weights), mean, and variance of each Gaussian kernel
that is shown in (1):
N
K
E ^w GRU h = - | In ' |r k ^ X n, w GRU h
p (y ; X, i) = |r i N i ^ X h^y ; n i ^ X h, v i ^ X hh
K
n=1
(1)
i=1
where i = (r, n, v), and K is the number of Gaussian distributions (also known as components in the literature).
30
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
k=1
1
N ^t ; n k ^ X n, w GRU h, v 2k ^ X n, w GRU hh, + 2 w GRU
2
(5)
where the parameter w GRU denotes the set of weights and
biases in the GRU deep neural networks.
The Stochastic Optimization Process
We assume the ride-hailing platform operates during a day
that is discretized into a group of batching windows (also
known as time slot) with fixed size DT (e.g., 10 min). (We
use “batching window” and “time slot” interchangeably in
this article.) To facilitate the vehicle allocations, the ridehailing service zone is divided into a group of disjoint ridehailing service regions denoted as M. Let V t denote the
set of idle vehicles in batching window t. The binary variable x tv,m = 1 if idle vehicle v is guided to the point of interest (POI) in region m at time t, and x vt ,m = 0 otherwise.
At the beginning of each batching window, a certain
number of idle vehicles are guided to the ride-hailing
regions’ POIs with minimum guidance distance to meet
the rider’s requests in the future. This proactive guidance
operation incurs the idle vehicle guidance cost, which can
be formulated in (6):
a|
|g
v, m
|x
g v,m /m # DT + H ^1 - x tv,m h, 6v ! V t, 6m ! M
|E
m!M
dt tm,s ~P
t
t-1
t t,s
;b $ max ' 0, c | x v,m - d m - d m m1
v ! Vt
+ c $ max ' 0, c d tm- 1 + dt tm,s - | x tv,m m1E
(7)
v ! Vt
where dt tm,s and d tm- 1 denote the predicted rider demand at
region m in time slot t under scenario s and the number of
unserved riders at region m in time slot t - 1, respectively.
Notice that the stochastic programming model will degenerate to the deterministic model if only one scenario is
involved. b and c are introduced to denote the OSC per
vehicle and USC per requested order, respectively. Since
the stochastic programming model has a set of rider
demand scenarios (drawn from rider demand distribution),
the previous formula denotes the expected total cost (TC)
over the rider demand distribution.
A group of constraints must be satisfied according to
our problem settings. First, a certain level of supply–
demand ratio (i), along with the supply–demand ratio
gap (p ) among ride-hailing regions must be taken into
consideration, which is captured by the following constraints:
^i - p h^dt tm + d tm- 1 h #
|x
v ! Vt
t
v, m
# i ^dt tm + d tm- 1 h, 6m ! M . (8)
In addition, each idle vehicle can be guided to one
region’s POI at most, which are represented by
(9)
(10)
where H is a large positive number to linearize the “if” constraints [14], and m is the idle vehicle’s travel speed that is
assumed to be a constant value during the guidance operation. Therefore, g v,m /m is the guidance time between the GPS
location of vehicle v to the GPS location of region POI m.
Moreover, the total number of idle vehicles must be less
than the fleet size under a certain supply–demand ratio,
which leads to the following constraint:
| |x
v ! Vt m ! M
where g v,m denotes the distance between idle vehicle v’s
GPS location and the GPS location of region POI m, a is
introduced to denote the idle travel cost per mile. In addition, OSCs incur when the number of guided vehicles
exceeds the rider demand (including predicted rider
demands for the current batching window and the
unserved riders from the previous batching window). Likewise, the USCs incur when the number of guided vehicles
is lower than the rider demand. The sum of OSC and the
USC is defined in (7):
# 1, 6v ! V t.
Further, each idle vehicle, if guided, can only be guided
to the region’s POI that the vehicle can reach the POI within the length of the batching window. These time constraints are captured by
(6)
x vt ,m
t
v, m
m!M
t
v, m
# iC t .
(11)
v ! Vt m ! M
Given the objective function and constraints, now the
holistic optimization model for idle vehicle proactive guidance problem is summarized as follows:
minimize ^6 h + ^7 h
subject to ^8 h, ^9 h, ^10 h, ^11 h
x tv,m ! " 0, 1 , 6v ! V t, 6m ! M, 6t ! T .
(12)
As discussed previously, the objective is to minimize
the overall ride-hailing system costs.
To solve the SO model, we first reformulate it to its corresponding deterministic counterpart with a large group
of scenarios by applying the sample average approximation (SAA) [15] technique. The resulting deterministic
model can then be solved by an off-the-shelf solver such as
Gurobi (https://www.gurobi.com/) and CPLEX (https://
www.ibm.com/analytics/cplex-optimizer).
Numerical Experiment
In this section, we validate the performance of MDNSO through numerical experiments. We first describe
the numerical validation env ironment and performance metrics. Next, we discuss data processing and
feature engineering for XMDN and GRU. Finally, we
evaluate the proposed approach by comparing the performance with other machine learning-based vehicle
guidance models.
Experiment Setup
Both batching matching and historical averages are
coded in Python 3.8, and the mathematical optimization
models are solved by Gurobi 9.1 (https://www.gurobi.
com/academia/academic-program-and-licenses/). The
experiments are run on a PC with Intel Core i7 CPU,
32 GB RAM, Windows 10. The deep learning models
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
31
(GRU and XMDN) are coded in Python 3.8 and TensorFlow 2.4 under NVIDIA GeForce RTX 2080 GPU, 16 GB
RAM, and Ubuntu 18.04.
For GRU models, the training time of each epoch is
around 275 s, and the average training time of the GRU
model is approximately 3.5 h. For XMDN models, the
training time of each epoch is around 358 s, and the average training time of the XMDN model is approximately
4.7 h. After the training process, the deep learning models
can predict the rider demand (using GRU) and rider
demand distribution (using XMDN) by utilizing the timeseries sequence data from the testing set where the computational time for prediction is only a few seconds. In
addition, the optimization model can be solved by Gurobi
within 2 min. Therefore, the overall time is far less than
the batching window size, which indicates that our proposed framework can be applied to the dynamic ride-hailing platform.
Evaluation Metrics
We adopt the following three data-driven optimization
models as the guidance approaches 1) our proposed
approach MDN-SO, 2) the integration of GRU and deterministic optimization model that is labeled as GRU-DM,
and 3) the integration of historical average (HA) and deterministic optimization model that is labeled as HA-DM. In
addition, the nonguidance mechanism is also introduced
to compare the results. Meanwhile, we select the following
metrics for the performance comparison.
◆◆ OSC, USC, and TC: The metric involves two types of
costs, namely, OSC, which can be computed by the
driver’s idle driving distance, and USC, which can be
computed by the profit of service orders. The results
can be computed from (7) by replacing the predicted
rider demand with the real demand.
◆◆ Rider’s SR: For the ride-hailing service region k, the
metric is defined as the proportion of served (satisfied)
riders. Namely, the rider’s SR at region k is
s
SR k = min $ 1, dk .
k
(13)
where s k and d k denote the number of (guided) idle
vehicles at region k and the number of requests (real
rider demand) at region k, respectively.
◆◆ Rider’s waiting time (WT): WT is computed in different ways depending on the approaches. To be specific,
for guidance approaches (i.e., MDN-SO, GRU-DM, and
HA-DM), WT involves three parts, namely: 1) the time
duration between the end of the current batching window and the rider’s request time (WT1), 2) the driver’s
travel time from POI (from driver’s GPS for no guidance
scenario) to rider’s pickup coordinate (WT2), and
3) 10 minutes if the rider cannot be picked up in the current batching window (WT3):
WT = WT1 + WT2 + WT3 .
32
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
(14)
(In this case, the riders must wait until the next batching
window for service, and we assume the riders do not cancel
their requests if they are not served in the current batching
window. A similar assumption is discussed in [16].) In addition, we assume that riders are picked up using the firstcome-first-serve (FCFS) protocol. Also, for the no guidance
scenario, riders are picked up by their nearest drivers.
(Since the FCFS protocol is adopted, rider A, whose
request time is before rider B, will be picked up by a driver
even if the distance between the driver and rider B is closer
than the distance between the driver and rider A.)
Feature Engineering
We consider the following features that are highly correlated to rider demands. Features extracted from the data
set in this work include rider demand, region ID, day of
the month, month, day of the week, hour of the day, and
minute of the hour. The rider demand is used as the predicted target, while the rest of the features are used to
observe how they affect the target. We adopt XGBoost [17]
to determine the feature importance for the deep learning
predictor, whose metric is based on impurity value. The
result of the feature importance is illustrated in Figure 1.
We can observe that the region ID and hour of the day are
the most important features for the selected data set. The
feature of region ID and hour of the day takes over 50%
and 30%, respectively, which implies that the features significantly impact rider demand prediction.
Performance Evaluation
In this section, we choose one-week trip records (2 March
2016–8 March 2016) that involve five weekdays and two
weekend days for the experiment validations. The experimental results averaged five and two for the weekday and
weekend scenarios, respectively. Since no idle driver information is available in the data sets, we assume the coordinates of idle vehicles are randomly generated in the eight
ride-hailing regions. The parameter setting of the optimization models is described in Table 1.
We assume that the coordinates of the idle vehicles are
randomly generated across the eight ride-hailing service
regions. In addition, the number of idle vehicles (fleet size)
in the current time slot is determined by the real rider
demands from the previous time slot. We set the supply–
demand ratio parameter i to 0.95, 1.0, and 1.05 to evaluate
the experimental results under different fleet size levels.
We are more interested in how much benefit the ride-hailing platform could obtain from vehicle proactive guidance.
Since idle vehicles may distribute across regions under any
patterns, we consider the three idle vehicle distribution
scenarios shown as follows.
◆◆ Positively correlated idle vehicle distribution: Given a
set of region index K = " 1, 2, f, k ,, a set of idle vehicles distributed across regions {s i} i ! K, and a set of
demands across regions {d j} j ! K, we formulate such a
tuple sequence as follows:
f, 1 s i , d j 2, 1 s i, d j 2, 1 s i , d j 2, f
-
-
+
+
(15)
such that
f, s i # s i # s i , f
-
+
f, d j # d j # d j , f.
-
+
We call this type of idle vehicle distribution Positively
Correlated labeled as PC, if 6i, j ! K, i = j.
Intuitively, PC is introduced to describe such a scenario that the idle vehicles are “ideally” distributed across
regions, which indicates more idle vehicles are cruising
around the higher demand regions and vice versa. In this
sense, vehicle proactive guidance operation is unnecessary since the number of idle vehicles can meet the
demand for each region. However, this ideal scenario seldom happens in realistic applications [18].
◆◆ Negatively correlated idle vehicle distribution: Using
the same notation, we formulate such a tuple sequence
as follows:
f, 1 s i , d j 2, 1 s i, d j 2, 1 s i , d j 2, f
-
-
+
+
average SR. In addition, without guidance operation, the SR
under positively correlated distribution (labeled NG-PC) is
much higher than the one under uniform (labeled NG-U)
and negatively correlated distributions (labeled NG-NC).
This is because NG-PC considers such an ideal scenario
that the idle vehicles are cruising at their “right” regions.
Therefore, all the regions can satisfy the rider’s requests.
Notice that during some time slots (around 4 a.m. to 8 a.m.
on weekdays, around 4 a.m. to 11 a.m. on weekends),
HA-DM is inferior to NG-U in terms of average SR, implying
that without accurate rider demand predictions, a guidance
approach can be even worse than no guidance.
Further, MDN-SO is quite close to the NG-PC scenario
in terms of average SR, which indicates that our proposed
Feature Importance
Region-ID
55.26%
(16)
such that
1.21%
+
f, d j $ d j $ d j , f.
-
+
Month
%
2.5 %
4
3.2
7.
DoM
We call this type of idle vehicle distribution Negatively
Correlated labeled as NC, if 6i, j ! K, i = j.
In contrast to PC, NC is introduced to describe
such a “worst case” scenario that the idle vehicles
are cruising around the “wrong” regions. In this case,
vehicle proactive guidance operations are quite necessary to alleviate the imbalance of supply and
demand.
◆◆ Uniform idle vehicle distribution: In this case, the idle
vehicles are uniformly distributed across multiple ridehailing regions. We call this type of idle vehicle distribution Uniform that is labeled U.
We compare the validation results based on the previous idle vehicle distributions. First, as shown in Table 2,
we observe that the OSC increases and the USC decreases as the fleet size grows (i ranges from 0.95 to 1.05).
This is because more rider requests will be satisfied as
the number of idle vehicles increases, which leads to
more OSC and less USC. In addition, MDN-SO outperforms the remaining data-driven competitors GRU-DM
and HA-DM in terms of the TC, with the average TC
reduction by 17.5% and 63.8% on weekdays, 21.4% and
62.1% on weekends under i = 0.95; 17.2% and 70.5% on
weekdays, 23.2% and 68.8% on weekends under i = 1.0;
23.7% and 64.4% on weekdays, 31.9% and 63.7% on weekends under i = 1.05.
Second, as shown in Table 2, MDN-SO is approximately
2% and 17% higher than GRU-DM and HA-DM in terms of
MoH
%
-
74
f, s i # s i # s i , f
30.05%
DoW
HoD
Figure 1. The pie plot of the feature importance
where DoM, MoY, DoW, HoD, and MoH denote the
day of the month, the month of the year, the day of
the week, the hour of the day, and the minute of the
hour, respectively.
Table 1. Parameter Settings in the
Optimization Model.
Parameter
Value
a
US$0.4–US$0.9
b
idle travel distance cost: a $ g v,m
c
estimated from the real data set in the
corresponding time slot
m
30 mi/h
i
{0.95, 1, 1.05}
p
0.1
DT
10 min
Ct
set to the total rider demand in the previous
time slot
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
33
approach is capable of guiding idle vehicles in a more reasonable manner. This is because the MDN-SO framework
utilizes uncertainty in the forecasting results that involves
all the potential rider demand possibilities for decisionmaking. Moreover, compared with MDN-SO, GRU-DM, and
HA-DM, we observe that our proposed data-driven
approach is able to achieve close to the PC scenario, which
implies that MDN-SO provides a fairly effective strategy
for the idle vehicle proactive guidance operation.
Finally, the rider’s average waiting time is an essential
metric from the rider’s perspective. As shown in Figure 2,
the rider’s average waiting time drops as the fleet size
increases. This is because more rider requests will be satisfied when more idle vehicles are available. Therefore,
WT3 will be smaller. In addition, NG-PC outperforms the
NG-U and NG-NC regarding the rider’s average waiting
time. This is quite straightforward because the riders in
each region can be served by the idle drivers in the corresponding region under the NG-PC scenario, while there
exist a few riders who are served by the drivers in other
regions under the NG-NC scenario where the rider’s average waiting time will increase.
Further, among the three data-driven guidance
approaches, GRU-DM and HA-DM are 2.1% and 11.5% higher than MDN-SO in terms of the rider’s average waiting
time. Also, MDN-SO can reduce the rider’s average waiting
time by 20% compared with the NG-U scenario without
guidance, which is closer to the realistic scenario. This is
because MDN-SO leverages not only the predicted demand
uncertainty in each ride-hailing region but also guidance
operations to achieve a better solution.
Conclusions and Future Work
Effective idle vehicle guidance strategies provide ride-hailing platforms with competitive advantages in terms of
improved matching rates, reduced rider wait times, and
driver idle travel distances. More research work is needed
in this area to ensure the sustainable growth of ride-hailing
Table 2. The average OSC, under-supply cost (USC), TC, and SR using different data-driven
guidance approaches (HA-DM, GRU-DM, and MDN-SO) and no guidance with different idle
vehicle distributions (NG-PC, NG, and NG-NC).
i = 0.95
i=1
i = 1.05
OSC
USC
TC
SR
OSC
USC
TC
SR
HA-DM
20
6,795
6,815
79.2%
31
6,500
6,531
80.7%
GRU-DM
47
2,941
2,988
90.9%
57
3,096
3,153
91.1%
MDN-SO
72
2,394
2,466
92.7%
79
2,399
2,478
93.2%
NG-PC
N/A*
N/A
N/A
96%
N/A
N/A
N/A
96.5%
NG-U
N/A
N/A
N/A
75.7%
N/A
N/A
N/A
77.8%
NG-NC
N/A
N/A
N/A
38.3%
N/A
N/A
N/A
39.3%
HA-DM
40
5,509
5,549
82.8%
58
5,133
5,193
84.1%
GRU-DM
111
1,865
1,976
93.7%
125
1,983
2,108
93.9%
MDN-SO
132
1,503
1,635
95.1%
145
1,473
1,618
95.6%
NG-PC
N/A
N/A
N/A
96.6%
N/A
N/A
N/A
97.4%
NG-U
N/A
N/A
N/A
78.1%
N/A
N/A
N/A
80.6%
NG-NC
N/A
N/A
N/A
38.3%
N/A
N/A
N/A
39.3%
HA-DM
140
3,867
4,007
87.8%
125
3,858
3,983
87.3%
GRU-DM
204
1,665
1,869
94.5%
213
1,910
2,123
94.2%
MDN-SO
268
1,158
1,426
96.1%
241
1,203
1,444
96.4%
NG-PC
N/A
N/A
N/A
97.7%
N/A
N/A
N/A
97.7%
NG-U
N/A
N/A
N/A
80.2%
N/A
N/A
N/A
83%
NG-NC
N/A
N/A
N/A
38.4%
N/A
N/A
N/A
39.4%
*OSC, USC, and TC are set to N/A under NG-PC, PG-U, and NG-NC since no idle vehicle guidance operation is involved.
34
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
14
12
10
8
6
16
14
12
10
8
6
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Average Waiting Time (min)
16
guidance solutions. In future work, we plan to study the
impacts of adopting such a vehicle guidance framework on
the downstream matching/dispatching operations of a ridehailing platform. In addition, as an enhancement to our
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Average Waiting Time (min)
platforms in the long run. We propose an MDN-enabled SO
framework by integrating an extended MDN with a stochastic optimization process. The proposed framework
produces high service quality and low-cost vehicle
Time of a Day
Time of a Day
(a)
(b)
16
12
10
8
6
14
12
10
8
6
Time of a Day
Time of a Day
(c)
(d)
12
10
8
6
14
12
10
8
6
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Average Waiting Time (min)
14
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Average Waiting Time (min)
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Average Waiting Time (min)
14
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:00
11:00
12:00
13:00
14:00
15:00
16:00
17:00
18:00
19:00
20:00
21:00
22:00
23:00
Average Waiting Time (min)
16
Time of a Day
Time of a Day
(e)
(f)
MDN-SO
GRU-DM
HA-DM
NG
NG-NC
NG-PC
Figure 2. The rider’s average waiting time under different supply–demand ratio scenarios: (a) weekday,
i = 0.95, (b) weekend, i = 0.95, (c) weekday, i = 1, (d) weekend, i = 1, (e) weekday, i = 1.05, and (f) weekend,
i = 1.05.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
35
previous work on guidance and matching [2], we will
design integrated vehicle guidance and rider–driver matching systems that make use of the special characteristics of
the ride-hailing data and domain-specific constraints to
further improve the performance of the framework in
terms of system scalability and solution quality.
tion. His research interests include machine learning, computer vision, and natural language processing.
References
[1] H. Wang and H. Yang, “Ridesourcing systems: A framework and review,” Transp.
Res. B, Methodol., vol. 129, pp. 122–155, Nov. 2019, doi: 10.1016/j.trb.2019.07.009.
[2] J. Gao, X. Li, C. Wang, and X. Huang, “BM-DDPG: An integrated dispatching frame-
About the Authors
Xiaoming Li (xiaoming.li@mail.concordia.ca) earned his
M.S. degree in computer software and theory from Northeastern University and his Ph.D. degree in information and
systems engineering from Concordia University. He is a
research associate at Concordia University, Montreal, QC
H3G 1M8 Canada. His research interests include optimization under uncertainty, large-scale optimization, network
optimization, machine learning with applications in intelligent transportation systems, and supply chain optimization.
Jie Gao (jie.gao@hec.ca) earned her MASc. degree in
information systems and her Ph.D. degree in information
systems engineering from Concordia University. She is a
postdoctoral research fellow at HEC Montreal at the University of Montreal, Montreal, QC H3T 2A7 Canada. Her
research interests include data-driven optimization, game
theory, mechanism design, and machine learning with
applications in intelligent transportation systems, smart
cities, and community healthcare.
Chun Wang (chun.wang@concordia.ca) is a professor
with the Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC H3G 1M8 Canada. His research interests include the interface between
economic models, operations research, and artificial intelligence. He is actively conducting research in multiagent systems, data-driven optimization, and economic model-based
resource allocation with applications to healthcare management, smart grid, and smart city environments. He is a Member of IEEE.
Xiao Huang (xiao.huang@concordia.ca) earned her
B.E. degree in electronic engineering from Tsinghua University, her M.S. degree in mathematical finance from the University of Southern California, and her Ph.D. degree from
the Marshall School of Business at the University of Southern California. She is a professor and the Concordia University Research Chair in Supply Chain Management in the
John Molson School of Business at Concordia University,
Montreal, QC H3G 1M8 Canada. Her research interests
include competition and cooperation in supply chains, product and pricing strategies, and data-driven decision-making.
Yimin Nie (yimin.nie@ericsson.com) earned his B.S.
and M.S. degrees in theoretical physics from Peking University and his Ph.D. degree in computational neuroscience
from the Canadian Center of Behavior Neuroscience at the
University of Calgary. He is currently a senior data scientist
and artificial intelligence researcher at Global AI Accelerator (GAIA) at Ericsson Inc., Montreal, QC H4R 2A4 Canada.
He worked as a senior data scientist in multiple business
fields including E-commerce, finance, and telecommunica36
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
work for ride-hailing systems,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp.
11,666–11,676, Aug. 2022, doi: 10.1109/TITS.2021.3106243.
[3] Y. Guo, Y. Zhang, J. Yu, and X. Shen, “A spatiotemporal thermo guidance based
real-time online ride-hailing dispatch framework,” IEEE Access, vol. 8, pp. 115,063–
115,077, Jun. 2020, doi: 10.1109/ACCESS.2020.3003942.
[4] X. Wan, H. Ghazzai, and Y. Massoud, “A generic data-driven recommendation system for large-scale regular and ride-hailing taxi services,” Electronics, vol. 9, no. 4, p.
648, Apr. 2020, doi: 10.3390/electronics9040648.
[5] X. Chen, F. Miao, G. J. Pappas, and V. Preciado, “Hierarchical data-driven vehicle
dispatch and ride-sharing,” in Proc. IEEE 56th Annu. Conf. Decis. Control (CDC),
2017, pp. 4458–4463, doi: 10.1109/CDC.2017.8264317.
[6] F. Miao et al., “Data-driven robust taxi dispatch under demand uncertainties,”
IEEE Trans. Control Syst. Technol., vol. 27, no. 1, pp. 175–191, Jan. 2019, doi: 10.1109/
TCST.2017.2766042.
[7] W. Szeto, R. Wong, and W. Yang, “Guiding vacant taxi drivers to demand locations
by taxi-calling signals: A sequential binary logistic regression modeling approach and
policy implications,” Transp. Policy, vol. 76, pp. 100–110, Apr. 2019, doi: 10.1016/j.
tranpol.2018.06.009.
[8] Y. Liu, R. Jia, J. Ye, and X. Qu, “How machine learning informs ride-hailing services: A survey,” Commun. Transp. Res., vol. 2, 2022, Art. no. 100075, doi: 10.1016/j.commtr.2022.100075.
[9] Y. Liu, F. Wu, C. Lyu, S. Li, J. Ye, and X. Qu, “Deep dispatching: A deep reinforcement learning approach for vehicle dispatching on online ride-hailing platform,”
Transp. Res. E, Logistics Transp. Rev., vol. 161, 2022, Art. no. 102694, doi: 10.1016/j.
tre.2022.102694.
[10] E. Delage, S. Arroyo, and Y. Ye, “The value of stochastic modeling in two-stage stochastic programs with cost uncertainty,” Oper. Res., vol. 62, no. 6, pp. 1377–1393, Nov./
Dec. 2014, doi: 10.1287/opre.2014.1318.
[11] C. M. Bishop, “Mixture density networks,” Aston University, Birmingham, U.K.,
Tech. Rep. NCRG/94/004, 1994.
[12] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated
recurrent neural networks on sequence modeling,” 2014, arXiv:1412.3555.
[13] D. Ormoneit and V. Tresp, “Improved gaussian mixture density estimates using
Bayesian penalty terms and network averaging,” in Proc. 8th Int. Conf. Neural Inf.
Process. Syst., Nov. 1995, vol. 95, pp. 542–548.
[14] R. L. Rardin and R. L. Rardin, Optimization in Operations Research, vol. 166.
Upper Saddle River, NJ, USA: Prentice-Hall, 1998.
[15] S. Kim, R. Pasupathy, and S. G. Henderson, “A guide to sample average approximation,” in Handbook of Simulation Optimization, M. Fu, Ed. New York, NY, USA:
Springer Science & Business Media, 2015, pp. 207–243.
[16] T. Oda and C. Joe-Wong, “MOVI: A model-free approach to dynamic fleet management,” in Proc. IEEE Conf. Comput. Commun. (INFOCOM), 2018, pp. 2708–2716, doi:
10.1109/INFOCOM.2018.8485988.
[17] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc.
22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2016, pp. 785–794, doi:
10.1145/2939672.2939785.
[18] E. Brown, “The ride-hail utopia that got stuck in traffic,” Wall Street J., Feb. 2020.
[Online]. Available: https://www.wsj.com/articles/the-ride-hail-utopia-that-got-stuck
-in-traffic-11581742802
Edge Processing
©SHUTTERSTOCK.COM/HALLOJULIE
A LoRa-Based LCDT System for Smart Building
With Energy and Delay Constraints
by B Shilpa , Hari Prabhat Gupta ,
and Rajesh Kumar Jha
Digital Object Identifier 10.1109/MSMC.2022.3204848
Date of current version: 17 July 2023
2333-942X/23©2023IEEE
A
smart building is an emerging technology that
has the potential to be used in a variety of
ubiquitous computing applications. The
majority of existing work for smart building
monitoring consumes a significant amount
of energy to communicate the sensory data from the building to the end users (EUs). This work presents a low-cost
data transmission (LCDT) system for a smart building in
the context of a noisy environment. The system uses the
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
37
data rate. The transmission of a large amount of sensory
data takes a huge amount of time and consumes a lot of
energy. The pervasive usage of unlicensed frequency
bands by a large number of LoRa nodes creates the issue
of LoRa interference.
LoRa is commonly used for long-distance applications,
although it also performs well in indoor applications.
There are just a few studies [5], [6], [7], [8], [9] that assess
LoRa in dense indoor networks. The existing work [2],
[10], [11] proposed various solutions to solve the LoRa
interference issue. The authors in [10] used multiple gateways to handle LoRa interference. The scheduling of
nodes also reduces the interference by transmitting the data in a
given period [2]. The effective use
of LoRa network parameters, such
This work presents
as spreading factors, also helps to
Overview
a low-cost data
reduce the interference [11]. HowThe smart building consists of vartransmission
ever, the use of multiple gateways
ious types of sensor nodes (SNs)
increases the network’s cost;
for gathering, processing, and comsystem for a smart
scheduling nodes reduces gateway
municating the surrounding envibuilding in the
utility; and fixed spreading factors
ronment information to the users
may consume high power. The
[1], [2]. An SN has sensing, commucontext of a noisy
employ ment of c ut t i n g- ed ge
nication, processing, and power
environment.
machine learning and DL algounits. Examples of sensing units
rithms is enhancing traditional
are temperature, light, humidity
communication systems. Several
sensors, etc. The sensors in smart
DL models for wireless communibuildings generate huge data in the
cation systems were developed in the existing research
form of an MTS, which contains significant information
that must be mined to enable timely responses and better
works [12], [13], [14], [15], [16]. We intend to implement
decision making. The components of an MTS are the data
such principles into practice for LoRa communication.
of different sensors with a given sampling rate.
The research studies [1], [17], [18] related to smart buildThe communication unit of SNs in smart buildings coming are mainly focused on energy-efficient systems. As
monly uses Zigbee, Bluetooth, Wi-Fi, and other 2.4-GHz
they have not taken into account the system’s cost, the
technologies [3]. Such technologies support short-range
primary focus of this work is cost optimization.
communication and, therefore, have scalability issues.
Edge processing is a potential solution for communicatCommunicating the information using such technologies
ing the smart building data with limited energy and delay.
increases the cost of multihop devices. The scalability
It minimizes the communication time and energy conissue motivates the use of promising wireless solutions
sumption for conveying sensory data by allowing tasks to
capable of simultaneously supporting many nodes and
be processed locally. The cost of such EDs may vary based
long range communication. Low-power wide-area neton the specification of the devices. A dynamic compresworks (LPWANs) have evolved as the leading connection
sion ratio of sensory data for edge inference systems with
option for smart applications requiring extended range,
strict deadlines was described in [19]. The authors in [20]
high energy efficiency, and low cost.
proposed an adaptive data reduction method that uses
An LPWAN protocol that is built on LoRa technology
compressive sampling to lower the bandwidth needed for
is specified by an open standard known as the Longsensory data transmission while minimizing the informaRange Wide-Area Network (LoRaWAN). The primary
tion loss.
advantage of LoRa is its scalability because the gateway
In this work, we consider a smart building scenario,
modules in LoRa support concurrent communication of
where several nodes generate the sensory data while
multiple SNs [4]. Another advantage of LoRa is low enersensing the environment and communicate those data
gy usage during the transmission of the data to a large
to the EDs for further processing. The success of the
distance. LoRa also provides tradeoffs among power conscenario depends on the size of the data and the numsumption, communication range, and data rate. Despite
ber of nodes. Large data size and multiple nodes give
the aforementioned advantages of LoRa, communicating
high accuracy with high energy consumption and coma significant volume of the sensory data of smart buildmunication delay. The smart building scenario works
ings to the EUs is difficult because LoRa supports a low
successfully for a long time if the acquired sensory data
long-range (LoRa) communication protocol to conserve
energy and enable long-distance communication. The
smart building sensors generate data in the form of a multivariate time series (MTS). The system compresses such
an MTS before transmission by utilizing deep learning
(DL) techniques. A channel to reduce the transmission
noise of sensory data is also designed using the DL method. The system decompresses the received data at the
receiver end and obtains the original MTS. Additionally,
we also conducted experiments to demonstrate the utility
of the system. The experimental results demonstrate that
selecting a finite number of distinct edge device (ED)
types aids in developing an LCDT
system subject to energy and latency constraints.
38
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
are transferred in the given duration and the energy
◆◆ We present the analysis of the delay and energy required
consumption of all of the deployed nodes is equal. The
for sensory data compression and communication. The
system cost of such a scenario may be reduced by using
analysis considers the different types of devices with
different types of EDs based on the requirement of the
unequal processing, energy, and storage capabilities.
scenario. We address the following problem in this
◆◆ An optimization problem is formulated to minimize
work: How does one design an LCDT system to transmit
the cost and energy consumption of the data transmisthe huge size of sensory data of the smart building with
sion system of the smart building. We also present a
given energy and delay constraints? To solve this problow-time-complexity algorithm to solve the optimizalem, we present an LCDT system for smart buildings in a
tion problem.
noisy environment. The solution uses DL techniques for
◆◆ Finally, the experimental results are presented to illusthe compression and effective transmission of sensory
trate the solution’s effectiveness. The experiment’s
data. The system uses the LoRa communication protoparameters are defined based on the analysis of existcol to transfer the compressed smart building data to
ing hardware to make it practical.
the EUs. Along with this, the key
contributions are as follows:
The LCDT System
The LCDT system architecture
◆◆ We propose a compression–
consists of SNs, EDs attached
decompression approach called
The system uses
with a LoRa node, an LG, a nettransmitter- and receiver-nets
the long-range
work server (NS), an application
for lowering the amount of
server (AS), and EUs, as shown in
sensory data at the ED. The
communication
Figure 1. The SNs attached with
approach employs deep neural
protocol to conserve
the smart building collect the sennetwork (DNN) architectures
sory data in the form of the MTS
for compressing and decomenergy and enable
and forward it to the ED. The ED
pressing the sensory data. The
long-distance
is responsible for compressing the
DNN designed for compressing
communication.
received MTS and transmitting to
the data is lightweight and can
the LG. The LG receives the comsuccessfully run on low-propressed MTS and forwards the
cessing EDs.
same to the NS. The compressed
◆◆ We employ a mixed-density netMTS is retrieved to the original form at the NS and forwork architecture for the channel-net [21] to reduce
warded to the AS. The AS identifies the data and forthe noise effects between EDs and the LoRa gateway
wards them to the respective user based on the
(LG). The channel-net works on EDs after reducing
application. Finally, the EU receives the information colthe size of the sensory data by using the proposed
lected by the SNs.
compression DNN architecture.
Smart Building
With Sensor Nodes
Transmitter-Net
Channel-Net
Edge Device (LoRa Node)
LoRa
Gateway
Transmitter-Net
Receiver-Net
Network Server and
Application Server
End Users
Channel-Net
LoRa Communication
Non-LoRa Communication
Figure 1. An illustration of the LCDT system components for smart building using LoRa. The transmitter-net and
receiver-net are the mirror image of DNNs.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
39
sender to receiver [12]. The channel-net is designed as
The system uses the DNN model for the compression of
a mixture density network (MDN) with Gaussian comthe MTS. The encoder and decoder of the DNN work for
ponents to simulate the conditional density of the
the compression and decompression of the MTS at the ED
channel output given its input [21]. The MDN is a conand NS, respectively. Since the ED is a very lightweight
catenation of DNN and a mixture model with paramedevice compared to the NS, the delay and energy analysis
ters z (I l) as a channel input I l function. The DNN
of the system is considered only at the ED. The system
uses the transmitter-net as an encoder for compressing the
model of the channel-net consists of L dense layers folMTS and channel-net for handling the noise between the
lowed by a sampling layer.
ED and LG. Both the transmitter-net and channel-net are
The channel has the maximum fixed speed, denoted as
DNNs and work at the ED. The system consists of a
channel rate c, to process the received data. The condiset of N EDs with I different
tional probability density P (I m ; I l)
types, where N = {1, 2, f, n} and
of a mixture model is given to the
sampling layer to obtain the output
I = " 1, 2, f., k ,. The costs of i
The objective function
and j types of EDs are denoted as
I l. The conditional channel density
modeled by the MDN is given by
{
i
,
j
}
!
I
C i and C j, where
and
of the LCDT problem
C i ! C j . The total number of ith
is to determine the
k
type EDs in the system is given by
P (I m ; I l) = | r i (I l) z (I m ; I l) (2)
number
of
the
various
X i . The parameters of ED, such as
i=1
energy E i, processing speed Vi,
types of devices
and cost C i, will differ based on
where k is the number of mixture
necessary to achieve
the ED type i.
components, r i (I l) ! [0, 1] is the
mixing coefficient of component I,
the lowest system cost.
and z (I m ; I l) is the function repreThe Transmitter-Net
senting
the conditional densities of
The transmitter-net is a DNN that
m
runs on an ED for mapping the
I . The output of channel-net, i.e.,
input MTS data I ! R D X Z to a reduced dimensional MTS
I m, is forwarded to the NS through the LG. The receiver-net
I l ! R D X Z l, where D, Z, and Z l are the number of compoat the NS decompresses and retrieves the original data It.
nents of MTS; original size of MTS; and reduced size of
MTS, respectively, and Z l # Z. The DNN model consists of
Estimation of Cost, Delay, and Energy
Consumption of the System
L number of layers with q neurons in each layer. Initially,
the I is one-hot encoded, and the elements of the encoded
The cost of the LCDT system is determined by the number
vector are " I 1, I 2, f, I Z ,. The one-hot-encoded vector I 1 is
of EDs of each type utilized for the smart building. Let X i
input to the first layer of the DNN. The neurons in the first
be the total number of the ith type ED in the system,
layer receive input and perform simple computation with
i ! I. The system cost is therefore
activation function h and forward output to the next layer.
The neurons in the next layer receive weighted input from
C sys = C 1 X 1 + C 2 X 2 + g + C k X k . (3)
the previous layer, perform the computation, and forward
the output to the next layer. Likewise, the outputs of the
The delay of the LCDT system depends on the time
Lth and (L - 1)th layers are given by
taken by the transmitter-net and channel-net. The delay of
the transmitter-net is the estimated time to compress the
L
h L = | f (W j h j - 1),
MTS of SNs, i.e., the sum of the number of operations in
j=1
the DNN of each ED. Let the SNs generate MTS with sam
L-1 q
(1)
h L - 1 = | | h ij W ij (h ij - 1 (W i I i + b i))
pling rate m, which is processed by k types of EDs. The
j=1 i=1
delay of the nth type of ED with the transmitter-net of the
L-layered DNN is given by
where f, W j, I, and b are the activation function, weight
metrics of the jth layer, input, and bias, respectively. Due
k
L
to hardware constraints, the output of the last layer is
T comp
= | | mq j (2I a + 1) h q Vi X i . (4)
n
given to normalization, which transforms the data to satisi=1 j=1
fy the average power constraint or amplitude constraint.
Lastly, the compressed data I l ! R D X Z l are transmitted to
The estimated delay in the channel-net is given by
the LG via the channel-net.
k
The Channel-Net
The channel-net learns resilient representations of the
input data that can be retrieved with a low likelihood of
errors despite channel conditions translating from
40
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
L
T chan
= | | cq j (2I al + 1) h q Vi X i . (5)
n
i=1 j=1
The total delay of the system is the sum of the delays of the
transmitter-net and channel-net, which is given by
Tn = T comp
+ T chan
. (6)
n
n
Let E oi be the energy consumption per operation of the ith
type ED; then, the total energy consumption of the system
is given by
E n = (T comp
# E oi ) + (T chan
# E io). (7)
n
n
The Optimization Problem of the LCDT System
This work aims to design a low-cost system for the transmission of sensory data of the smart building with given
energy and delay constraints. The optimization problem of
the LCDT system is defined as
The LCDT Problem
min C sys (8a)
subject to constraint 1 : E n # E th (8b)
Constraint 2 : Tn # Tth . (8c)
The objective function of the LCDT problem is to determine the number of the various types of devices necessary
to achieve the lowest system cost. Constraint 1 indicates
that the energy consumption of the system should be
below the threshold E th . It helps to prolong the life of the
system. Constraint 2 ensures that the delay of the system
for receiving the data at the NS should not exceed the
threshold Tth . The thresholds E th and Tth are given by the
user based on the application of the system.
To solve the LCDT problem, Algorithm 1 computes the
required number of different types of EDs with given energy
and delay constraints. We start by fixing the maximum
number of EDs, i.e., n i to say n max for 1 # i # k . We consider the scenario described in the “The LCDT System” section. Algorithm 1 takes C i, E oi , Vi, E th, Tth, and n max as
inputs, where 1 # i # k. It then computes the E n iterative
for the nth type of ED by using (7), where 1 # n # k. If constraint 1 satisfies, i.e., E n # E th, the algorithm checks constraint 2, i.e., Tn # Tth, by using (6). Algorithm 1 returns the
number of EDs of each type, which satisfies both constrains. Finally, Algorithm 1 calculates the cost of the system with the selected number of EDs of each type and
returns the number of EDs, which gives the minimum cost.
The time complexity of the proposed algorithm is as follows:
There are 1 + k for loops in the function Insert, resulting in a time complexity of O (k # n max # ft), where ft is the
time complexity of the function Insert. The function Compute Cost has a time complexity of O (q) # c, where q and
c are the number of times and the complexity of computing
the cost, respectively. Thus, the computational complexity
of the algorithm is O (k # n max # q # ( ft + c)), which is in
polynomial time.
Example
Consider an LCDT system with the maximum number of
devices n max = 10 with two different types of EDs, i.e., X 1
and X 2 . We fix the total number of instructions to be
performed to 300. The cost, energy consumption, and
processing speed of the different EDs are assumed to
be in the ratios of 1:3, 1:4, and 10:1, respectively. The
threshold values E th, Tth, and C th are set to 1,500, 300,
and 5,000, respectively. Algorithm 1 is implemented to
find the minimum cost of the system. Initially, the algorithm computes the energy consumption and delay for
all of the combinations of EDs of different types. Next,
it finds the list of combinations of EDs that satisfy the
system constraints. Finally, the system’s cost is calculated for a given number of EDs of different types, and
it selects the combination of EDs that gives the minimum cost. For the maximum number of 10 devices, the
optimal cost found by Algorithm 1 is 11 with X 1 = 5
and X 2 = 2.
Discussion and Results
In this section, we illustrate the performance of the proposed system by using simulation results. The parameters
considered for simulation are X 1 and X 2 types of EDs
with cost, energy consumption, and processing speed in
the ratios of 1:3, 1:4, and 10:1, respectively.
For example, the ratio of parameters selected by a
market analysis considers the type 1 ED as Arduino
and the type 2 ED as Raspberry Pi. The cost of the
Raspberry Pi is three times the cost of the Arduino; the
energy consumption is four times higher;, and the processing speed is 10 times that of the Arduino. The
threshold values E th, Tth, and C th are unit free and set
initially to 1,500, 300, and 5,000, respectively. These
threshold values may be varied depending on the scenario of the application.
Figure 2(a) illustrates the impact of a number of
instructions to be performed on the system cost. It shows
that an increasing number of instructions increases the
Algorithm 1: The Solution of the
LCDT Problem
Input: Ci , E oi , Vi , E th, T th, nmax
Output: q 1, f, q k
1 for int X 1 ! 1 to nmax do
2 h
3 for int X k ! 1 to nmax do
4 if E n # E th and Tn # Tth then
5 {q 1, q 2, f, q k } = Insert (X 1, X 2, f, X k )
6 return q 1, q 2, f, q k ;
7 Function Insert (X 1, X 2, f, X k )
8 begin
9 Compute Cost = C 1 X 1 + C 2 X 2 + g + C k X k ;
10 if Cost 1 C sys then q 1 = X 1, q 2 = X 2, f, q k = X k and
C sys = Cost ;
11 return q 1, q 2, f, q k ;
12 end
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
41
is also observed that by increasing
system cost. This is because the
the delay threshold, we can minisystem uses a greater number of
The development of
mize the number of devices which,
devices for performing an increased
in turn, minimizes the cost of the
number of instructions in the given
a channel-net with
system. The impact of the number
delay and energy thresholds. Fighigher performance
of instructions on type 2 devices is
ure 2(a) also shows the impact of
shown in Figure 2(c). Type 2 devicdelay on the system cost. We can
is highlighted as a
es also increase in number with
minimize the system cost by
next step in reducing
respect to the number of instrucincreasing the delay threshold for
interference and
tions, but we can see a very minian increased number of instrucmal increase compared to type 1
tions. The delay threshold Tth, vartransmission errors in
devices. This is because the cost of
ied from 300 to 1,000. These values
LoRa communication.
type 2 devices is higher than type 1
can be ad justed based on the
devices, so the system considers
requirements of use case. The
fewer type 2 devices to minimize
results show that the cost of the
the system cost.
system depends on the delay
threshold, number, and cost of different types of devices.
Conclusion and Future Work
Figure 2(b) and (c) demonstrates the impact of the numIn this article, an LCDT method for smart building data
ber of instructions on the number of devices. Figure 2(b)
is proposed. Compression–decompression models based
shows that the number of type 1 devices increases with
on DL estimate the energy and communication delay for
respect to the number of instructions to be performed. It
80
Number of Type 1 Devices
Cost of the System
70
60
50
40
30
20
10
0
100 200 300 400 500 600 700 800 900 1,000
Number of Instructions
(a)
20
18
16
14
12
10
8
6
4
2
0
100 200 300 400 500 600 700 800 900 1,000
Number of Instructions
(b)
Number of Type 2 Devices
12
10
8
6
4
2
0
100 200 300 400 500 600 700 800 900 1,000
Number of Instructions
(c)
Tth = 300
Tth = 500
Tth = 700
Tth = 1,000
Figure 2. An illustration of the effect of the number of instructions on the system cost and the required number
of devices with different delay threshold. (a) The cost of the system. (b) The required number of type 1 devices.
(c) The required number of type 2 devices.
42
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
sensory data. A novel approach to implementing a communication channel entails creating a DL model to minimize transmission error. The best combination of EDs
needed to build an LCDT system is determined using an
algorithm that is described. The experimental findings
demonstrate that the system’s cost, which is constrained
by energy and latency considerations, can be decreased
by using a fixed number of distinct EDs. Future research
directions include expanding the analysis to take into
account various performance-enhancing characteristics.
The development of a channel-net with higher performance is highlighted as a next step in reducing interference and transmission errors in LoRa communication.
Internet Things J., vol. 7, no. 1, pp. 298–310, Jan. 2020, doi: 10.1109/JIOT.2019.
2946900.
[7] J. Petäjäjärvi, K. Mikhaylov, M. Hämäläinen, and J. H. Iinatti, “Evaluation of
LoRa LPWAN technology for remote health and wellbeing monitoring,” in Proc.
10th Int. Symp. Med. Inf. Commun. Technol. (ISMICT), 2016, pp. 1–5, doi: 10.1109/
ISMICT.2016.7498898.
[8] J. Haxhibeqiri, A. Karaagac, F. V. D. Abeele, W. Joseph, I. Moerman, and J. Hoebeke, “LoRa indoor coverage and performance in an industrial environment: Case
study,” in Proc. 22nd IEEE Int. Conf. Emerg. Technol. Factory Automat. (ETFA), 2017,
pp. 1–8, doi: 10.1109/ETFA.2017.8247601.
[9] L. Gregora, L. Vojtech, and M. Neruda, “Indoor signal propagation of LoRa technology,” in Proc. 17th Int. Conf. Mechatronics - Mechatronika (ME), 2016, pp. 1–4.
[10] D. Croce, M. Gucciardo, S. Mangione, G. Santaromita, and I. Tinnirello, “LoRa
technology demystified: From link behavior to cell-level performance,” IEEE
About the Authors
B Shilpa (b.shilpa@ifheindia.org) is a research scholar
with the Department of Electronics and Communication
Engineering, Faculty of Science and Technology, IFHE,
Hyderabad 501203, India. Her research interests include
wireless communication, wireless sensor networks, and the
Internet of Things.
Hari Prabhat Gupta (hariprabhat.cse@iitbhu.ac.in) is
an assistant professor in the Department of Computer Science and Engineering, Indian Institute of Technology (BHU),
Varanasi 221005, India. His research interests include wireless sensor networks, distributed algorithms, and the Internet of Things.
Rajesh Kumar Jha (rajeshjha@ifheindia.org) is an
assistant professor in the Department of Electronics and
Communication Engineering, Faculty of Science and Technology, IFHE, Hyderabad 501203, India. His research interests include very large scale integration and the Internet of
Things.
Trans. Wireless Commun., vol. 19, no. 2, pp. 822–834, Feb. 2020, doi: 10.1109/
TWC.2019.2948872.
[11] P. Kumari, H. P. Gupta, and T. Dutta, “Estimation of time duration for using the
allocated LoRa spreading factor: A game-theory approach,” IEEE Trans. Veh. Technol.,
vol. 69, no. 10, pp. 11,090–11,098, Oct. 2020, doi: 10.1109/TVT.2020.3007566.
[12] T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,”
IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563–575, Dec. 2017, doi: 10.1109/
TCCN.2017.2758370.
[13] T. J. O’Shea, K. Karra, and T. C. Clancy, “Learning to communicate: Channel auto-encoders, domain specific regularizers, and attention,” in Proc. IEEE Int.
Symp. Signal Process. Inf. Technol. (ISSPIT), 2016, pp. 223–228, doi: 10.1109/ISSPIT.
2016.7886039.
[14] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to-end wireless communication systems with conditional GANS as unknown channels,” IEEE
Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143, May 2020, doi: 10.1109/
TWC.2020.2970707.
[15] S. Dörner, S. Cammerer, J. Hoydis, and S. t. Brink, “Deep learning based communication over the air,” IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 132–143,
Feb. 2018, doi: 10.1109/JSTSP.2017.2784180.
[16] D. Wu, M. Nekovee, and Y. Wang, “Deep learning-based autoencoder for m-user
References
wireless interference channel physical layer design,” IEEE Access, vol. 8, pp. 174,679–
[1] B. Qolomany et al., “Leveraging machine learning and big data for smart build-
174,691, Sep. 2020, doi: 10.1109/ACCESS.2020.3025597.
ings: A comprehensive survey,” IEEE Access, vol. 7, pp. 90,316–90,356, Jul. 2019, doi:
[17] I. Sülo, S. R. Keskin, G. Dogan, and T. Brown, “Energy efficient smart buildings:
10.1109/ACCESS.2019.2926642.
LSTM neural networks for time series prediction,” in Proc. Int. Conf. Deep Learn.
[2] P. Kumari, H. P. Gupta, and T. Dutta, “A nodes scheduling approach for effective
Mach. Learn. Emerg. Appl. (Deep-ML), 2019, pp. 18–22, doi: 10.1109/Deep-ML.
use of gateway in dense LoRa networks,” in Proc. ICC IEEE Int. Conf. Commun. (ICC),
2019.00012.
2020, pp. 1–6, doi: 10.1109/ICC40277.2020.9149006.
[18] I. Abdennadher, N. Khabou, I. B. Rodriguez, and M. Jmaiel, “Designing energy effi-
[3] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aledhari, and M. Ayyash, “Internet
cient smart buildings in ubiquitous environments,” in Proc. 15th Int. Conf. Intell. Syst.
of Things: A survey on enabling technologies, protocols, and applications,” IEEE
Design. Appl. (ISDA), 2015, pp. 122–127, doi: 10.1109/ISDA.2015.7489212.
Commun. Surveys Tuts., vol. 17, no. 4, pp. 2347–2376, 4th quarter 2015, doi: 10.1109/
[19] X. Huang and S. Zhou, “Dynamic compression ratio selection for edge inference
COMST.2015.2444095.
systems with hard deadlines,” IEEE Internet Things J., vol. 7, no. 9, pp. 8800–8810,
[4] J. C. Liando, A. Gamage, A. W. Tengourtius, and M. Li, “Known and unknown facts
Sep. 2020, doi: 10.1109/JIOT.2020.2997128.
of LoRa: Experiences from a large-scale measurement study,” ACM Trans. Sens. Netw.,
[20] S. Tripathi and S. De, “An efficient data characterization and reduction scheme
vol. 15, no. 2, pp. 1–35, May 2019, doi: 10.1145/3293534.
for smart metering infrastructure,” IEEE Trans. Ind. Informat., vol. 14, no. 10, pp.
[5] E. D. Ayele, C. Hakkenberg, J. P. Meijers, K. Zhang, N. Meratnia, and P. J. M. Hav-
4300–4308, Oct. 2018, doi: 10.1109/TII.2018.2799855.
inga, “Performance analysis of LoRa radio for an indoor IoT applications,” in Proc.
[21] D. García Martí, J. Palacios Beltrán, J. O. Lacruz, and J. Widmer, “A mixture
Int. Conf. Internet Things Global Commun. (IoTGC), 2017, pp. 1–8, doi: 10.1109/
density channel model for deep learning-based wireless physical layer design,” in Proc.
IoTGC.2017.8008973.
23rd Int. ACM Conf. Model., Anal. Simul. Wireless Mobile Syst. (MSWiM), 2020, pp.
[6] W. Xu, J. Y. Kim, W. Huang, S. S. Kanhere, S. K. Jha, and W. Hu, “Measurement,
53–62, doi: 10.1145/3416010.3423229.
characterization, and modeling of LoRa technology in multifloor buildings,” IEEE
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
43
Conference Reports
by Qi Kang
and Shuaiyu Yao
The 19th IEEE
International
Conference on
Networking, Sensing,
and Control
T
he 19th IEEE International
Conference on Networking,
Sensing, and Control (ICNSC
2022) was held between 15 and
18 December 2022 in Shanghai,
China. ICNSC 2022 was hosted by
the IEEE Systems, Man, and Cybernetics Society; Tong ji University
(China); Fudan University (China);
and Shanghai Association for System Simulation (China). It was supported by the K.C. Wong Education
Foundation, Hong Kong, China.
The theme of this conference
was “autonomous intelligent systems,”
Digital Object Identifier 10.1109/MSMC.2023.3273460
Date of current version: 17 July 2023
focusing on intelligent control, machine learning, deep learning, network
communication, multiagent systems,
Internet of Things, and swarm intelligence. Following this theme, the conference provided a platform for both
academic researchers and industrial
practitioners involved in different
but related domains to discuss key
problems, exchange ideas, and tackle
emerging challenges, while sharing
innovative solutions and looking into
future research prospects.
The conference was held in a hybrid format with online and in-person
attendance. A total of 211 papers were
submitted to the conference, out
of which 144 were selected based
Figure 1. Some attendees of ICNSC 2022.
44
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
on a rigorous single-blind review
peer review for oral presentations.
This indicates a paper acceptance
rate of approximately 68.2%. The accepted papers have been included
in Proceedings of 2022 IEEE International Conference on Networking,
Sensing, and Control, which have
now been published in IEEE Xplore
and Engineering Index COMPENDEX
indexed. Notably, the authors hailed
from various countries, including
China, the United States, Japan,
Canada, France, Italy, and The Netherlands. ICNSC 2022 was successfully held as a multinational and
multidisciplinary conference that
provided scientists, engineers, and
students with a platform to con vene a nd d i s c u s s t hei r s h a r e d
interests (Figures 1 and 2), thanks
to the colla­borative efforts of the
orga nizing, progra m, a nd steering committees; the authors who
submitted exceptional papers; and
the reviewers who examined the
papers and provided many insightful comments.
The program agenda of the conference encompassed various technical
activities, including a plenary session,
The theme of this
conference was
“autonomous intelligent systems,”
focusing on intelligent control,
machine learning,
deep learning,
network communication, multiagent systems,
Internet of Things,
and swarm intelligence.
four keynote speeches, a best paper
award session, and 28 parallel panel
sessions that featured eight special
sessions. The plenary session kicked
off with opening remarks delivered
by Prof. Xiaohua Tong, vice president of Tongji University (Figure 3);
Prof. Mengchu Zhou, chair of the
­ICNSC Steering Committee (Figure 4);
and Prof. Qi Kang, general chair of
ICNSC 2022 (Figure 5). The conference featured keynote speeches from
renowned experts (shown in the following paragraphs), whose thoughtprovoking ideas set the tone for the
event. These speakers captivated the
audience with their visionary outlook and provided inspiring insights
into the future of networked systems
and control.
1) Prof. Peng Shi, editor-in-chief of
IEEE Transactions on Cybernetics, who is from the University of
Adelaide, Australia, gave a presentation titled “Consensus and Formation Control for Multi-agent
Systems.” Prof. Shi’s presentation
focused on multiagent systems
(MASs) and highlighted their key
features of communication, coordination, and collaboration for
achieving common goals effectively and efficiently. The presentation covered three main topics:
consensus, flocking/swarming, and
formation control within MASs.
Consensus, a fundamental problem in M ASs, was explored as
a requirement for cooperation
among agents. Flocking, a selforganizing behavior inspired by
lower-intelligence animals, enables
Figure 2. The conference site of ICNSC 2022.
Figure 3. The opening remarks delivered by Prof. Xiaohua Tong, vice
president of Tongji University.
Figure 4. The opening remarks delivered by Prof. Mengchu Zhou, chair of
the ICNSC Steering Committee.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
45
the emergence of swarm intelligence
to enhance system survivability
and competitiveness. Additionally,
formation control aims to drive
agents toward desired scalable and
adaptable formations. Prof. Shi’s
presentation presented modeling
analysis, design, simulations, and
experimental examples to showcase
the potential of distributed schemes
Figure 5. The opening remarks delivered by Prof. Qi Kang, general chair
of ICNSC 2022.
Figure 6. The keynote speech provided by Prof. Peng Shi.
Figure 7. The keynote speech provided by Prof. Ke Tang.
46
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
These speakers
captivated the audience with their
visionary outlook
and provided
inspiring insights
into the future of
networked systems
and control.
in achieving consensus and formation control (Figure 6).
2) Prof. Ke Tang from Southern University of Science and Technology, China, introduced “Learn to
Optimize” to the audience of the
conference. Prof. Tang’s speech
focused on the automation of algorithm design to address complex
real-world optimization problems.
Off-the-shelf algorithms and tools
are inadequate for these problems,
requiring extensive prior knowledge and manual algorithm design
efforts. The concept of learn to
optimize (L2O), a data-driven
approach for automated algorithm
and solver design, was introduced.
The speech discussed the building
blocks and recent advancements in
L2O, along with successful case
studies. Future directions in this
field were also presented (Figure 7).
3) Prof. Zhi Wei from the New Jersey
Institute of Technology, USA, provided a talk titled “Deep Autoencoders for Analysis of Single-Cell
RNA Sequencing Data.” Prof. Wei’s
talk focused on clustering analysis, specifically in the context
of single-cell RNA sequencing
(scRNA-seq) studies. Traditional
clustering methods often overlook
the unique character istics of
scRNA-seq data and fail to utilize
prior information or filter out irrelevant genes during the clustering
process. To overcome these limitations, Prof. Wei proposed the use
of model-based deep aut oen c o d e r s . These novel methods
aim to ­a dd re s s the identified
These sessions
facilitated dynamic conversations, where ideas
were rigorously
examined, and
diverse viewpoints
were respectfully
­debated.
issues a nd enhance clustering
performance. Through extensive
experiments on both simulated
and real datasets, the proposed
methods demonstrate a significant im­­provement in clustering
performance, leading to the generation of biologically meaningful clusters (Figure 8).
4) Prof. Tadahiko Murata from Kansai
University, Japan, delivered a presentation titled “Synthetic Societal
Data (Synthetic Population + Basic
Behavioral Data).” Prof. Murata’s
presentation focused on real-scale
social simulations for specific communities such as cities, towns, and
villages. With the COVID-19 pandemic, researchers are developing
social simulations for countermeasures against the virus. To develop
such simulations, synthetic populations have been synthesized based
on publicly released statistics without containing any privacy information. Prof. Murata’s research
outcome enables the generation
of synthetic societal data, which
include household compositions
and basic behavioral data, facilitating the development of real-scale
social simulations for emergency
and peaceful times (Figure 9).
The parallel sessions allowed researchers to delve into specific subtopics, fostering focused discussions
on areas such as autonomous agents
and multiagent, continual learning,
cyberphysical systems, edge computing, heterogeneous wireless networks,
Internet of Things, networked control systems, smart civil aviation and
Figure 8. The keynote speech provided by Prof. Zhi Wei.
Figure 9. The keynote speech provided by Prof. Tadahiko Murata.
Figure 10. The offline parallel sessions.
Ju ly 2023
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE
47
◆◆ “Detection transformer: Ultra-
Figure 11. The online parallel sessions.
a­ erospace, swarm intelligence, and transfer learning. The researchers present­­
ed their latest discoveries and breakthroughs, sparking intense debates and
encouraging the exchange of different
perspectives (Figures 10 and 11).
In addition, ICNSC 2022 was composed of eight special sessions that
addressed a diverse range of topics,
including
◆◆ Modeling, analysis, and control of
resource allocation systems
◆◆ A connected and autonomous
mobility system for energy and
environmental sustainability
◆◆ Artificial intelligence for IT operations
◆◆ Deep learning and optimization
for distributed industrial systems
◆◆ An evolutionary algorithm for big
data applications
◆◆ Data-driven estimation in industrial scenarios
◆◆ Latent representation learning for
incomplete big data
◆◆ Transfer perception and control
in real robotic applications.
The discussions of these sessions
brought together experts and attendees to tackle challenging issues and
address the emerging trends in the
field. These sessions facilitated dynamic conversations, where ideas
were rigorously examined, and diverse viewpoints were respectfully
48
debated. Attendees eagerly explored
interesting topics, engaging in deep
conversations, sharing feedback, and
exploring potential collaborations.
These interactions not only enriched
the knowledge of the attendees but
also nurtured a sense of community
and camaraderie.
After a series of oral presentation
competitions, a total of five papers
were chosen from the pool of candidate papers to receive the prestigious
accolades and best paper awards of
ICNSC 2022. Specifically, these awards
included two best conference paper
awards, two best student paper awards,
and one best emerging technology
paper award. The winners of the best
paper awards are listed as follows:
1) The winners of the best conference
paper awards:
◆◆ “Heuristic scheduling method
of flexible manufacturing based
on Petri nets and artificial
potential field” by Sijia Yi et al.
◆◆ “Open the black box of recurrent neural network by decoding the internal dynamics” by
Jiacheng Tang et al.
2) The winners of the best student
paper awards:
◆◆ “Design and implementation
of autonomous mapping system for UGV based on lidar” by
Xiaohong Xu et al.
IEEE SYSTEMS, MAN, & CYBERNETICS MAGAZINE Ju ly 2023
sonic echo signal inclusion
detection with transformer” by
Xiaoxin Fang et al.
3) The winner of the best emerging
technology paper award:
◆◆ “Design of resilient supervisory control for autonomous
connected vehicles approaching unsignalized intersection
in presence of communication
delays” by Carlo Motta et al.
The attendees of ICNSC 2022
experienced a vibrant and intellec­
tually stimulating environment, fostering lively and in-depth exchanges
and discussions. The conference attracted a diverse group of academicians, researchers, industry experts,
and students from around the globe,
and all of them were eager to share
their knowledge and insights in the
field of networking, sensing, and
control. Throughout the conference,
the participants actively engaged in
various sessions and presentations,
each offering unique perspectives
and cutting-edge research findings.
The atmosphere was characterized
by a palpable enthusiasm and a genuine passion for advancing the field of
networking, sensing, and control. In
addition, the social events, including
receptions, banquets, and networking
breaks, offered valuable opportunities
for participants to forge new connections, foster collaborations, and establish lasting friendships. In these informal settings, participants engaged
in lively conversations, sharing their
experiences, exchanging ideas, and
exploring potential joint projects. For
more information about ICNSC 2022,
including details about the conference
program, keynote speeches, and special sessions, please visit the official
website of the conference: http://www.
icnisc2022.com/. The upcoming conference, ICNSC 2023, will take place
in the captivating city of Marseille,
France, which is renowned for its rich
cultural heritage, breathtaking landscapes, and vibrant atmosphere.
The 1st IEEE International Summer School on
E-CARGO and Applications
(Online)
July 16-21, 2023
http://www.e-cargoschool.com/
Sponsors:
• IEEE Systems, Man, and Cybernetics Society
Organizer:
• Technical Committee of Distributed Intelligent Systems
Co-Organizers:
• Technical Committee of Computer-Supported Cooperative Work in Design
• Guangdong Chapter
• Nipissing University, Canada
Acknowledgement:
• Jinling Institute of Technology, China
Goal:
The Environments-Classes, Agents, Roles, Groups, and Objects (E-CARGO) model is an abstract model for
complex systems. It has been successfully applied in different applications. It has numerous potentials to promote
investigations into academic and industry problems. It fits the SMCS requirements of initiatives.
Role Based Collaboration (RBC) and its E-CARGO model have been developed into a powerful tool for
investigating collaboration and complex systems. Related research has brought and will bring in exciting
improvements to the development, evaluation, and management of systems including collaboration, services, clouds,
productions, and administration systems.
E-CARGO assists scientists and engineering in formalizing abstract problems, which originally are taken as
complex problems, and finally points out solutions to such problems including programming. The E-CARGO model
possesses all the preferred properties of a computational model. It has been verified by formalizing and solving
significant problems in collaboration and complex systems, e.g., Group Role Assignment (GRA). With the help of
E-CARGO, the methodology of RBC can be applied to solve various real-world problems. E-CARGO itself can be
extended to formalize abstract problems as innovative investigations in research. On the other hand, the details of
each E-CARGO component are still open for renovations for specific fields to make the model easily applied. For
example, in programming, we need to specify the primitive elements for each component of E-CARGO. When these
primitive elements are well-specified, a new type of modeling or programming language can be developed and
applied to solve general problems with software design and implementations.
This summer school will extend the applications of E-CARGO and RBC, which promote problem solving for
complex systems that are considered in SMCS, such as Cybernetics, Systems Science and Engineering, HumanMachine Systems, and Computational Social Systems.
Motivation:
In the field of Systems, Man, and Cybernetics (SMC), many researchers require solid tools to develop their
methodologies or solutions to their specific problems in their specific areas. There are many traditional tools for
specific areas, such as object or agent models, deep learning, evolutionary computation, or evolutionary
optimizations. However, these methodologies and models have their own limitations. Researchers are eager to have
a high-level, abstract, but expressive models and methodologies to guide them in understanding the requirement of
their specific problems, which are usually very complex. It is very hard for them to grasp the key elements to
analyze their problems, specify the requirement, and design a feasible solution.
E-CARGO is a novel model to meet the requirement of researchers in this aspect. Using E-CARGO, researchers
master a tool to start to investigate a problem along an easy-to-follow route and can gradually delve into the details
of the system or problem they are mainly concerned about. Such a tool helps them to understand their problems or
systems in an adaptive and incremental way.
In the summer school, we will demonstrate through lectures and labs many successful stories and case studies
for researchers to learn, follow, and practice.
The SMC Society encourages interdisciplinary research and innovations and is a reputational technology
incubator. It is the SMC Society that makes E-CARGO develop, expand, and mature.
Digital Object Identifier 10.1109/MSMC.2023.3275041
Attendees:
This school is open for everyone and anyone with some familiarity with abstract mathematical structures to
learn about the E-CARGO model and RBC theory. Our goal is to make the E-CARGO/RBC theory accessible to,
and inclusive of, everyone who is interested. We believe that E-CARGO is for everyone, and are committed to
fostering a kind, inclusive environment. From our experience, 4th-year students, graduate students including
master’s and PhD’s, and fresh researchers/practitioners in STEM majors are better fits.
Registration:
Including:
1) 5-day (10 sessions) of online participation of the summer school program.
2) a certificate for those registered attendees who attend not less than 7 sessions.
3) an author-signed hardcopy book for the top 10 students, and a hardcopy book for the top 11-50 students in
performance (Value: $170 including shipping cost): H. Zhu, E-CARGO and Role-Based Collaboration: Modeling
and Solving Problems in the Complex World, Wiley-IEEE Press, NJ, USA, Dec. 2021.
Note: We will also send out more books (51-?) based on the budget. The criterion is the registration time, i.e., First
In First Serve (FIFS).
IEEE SMC student member: $50CAD
IEEE SMC member: $50CAD
IEEE student member: $85CAD
IEEE member: $120CAD
Non-IEEE student: $120CAD
Non-IEEE member: $190CAD
Organization Committee:
General Chair:
Haibin Zhu, Nipissing University, Canada
Program Co-Chairs:
Dongning Liu, Guangdong University of Technology, China
Yin Sheng, Hohai University, China
Registration Co-Chairs:
Xianjun Zhu, Jinling Institute of Technology, China
Publicity Co-Chairs:
Hua Ma, Hunan Normal University, China
Libo Zhang, Southwest University, China
Instructors:
Haibin Zhu, Nipissing University, Canada
Dongning Liu, Guangdong University of Technology, China
Yin Sheng, Hohai University, China
Lab Instructor:
Qian Jiang, Macau University of Science and Technology, China
Secretary:
Chengyu Peng, Laurentian University, Canada
Contact: cpeng@laurentian.ca
Confirmed Panelists:
Sam Kwong, IEEE Fellow, Chair Professor, City University of Hong Kong, President, IEEE SMC Society
Mariagrazia Dotoli, Professor, Politecnico di Bari, Vice President – Membership & Student Activities, IEEE SMC
Society
Ljiljana Trajkovic, IEEE Fellow, Professor, Simon Fraser University, EiC, IEEE Transactions on Human Machine
Systems
Peng Shi, IEEE Fellow, Professor, University of Adelaide, EiC, IEEE Transactions on Cybernetics
Robert Kozma, IEEE Fellow, Professor, University of Memphis, EiC, IEEE Transactions on Systems, Man, and
Cybernetics: Systems
Weiming Shen, IEEE Fellow, Professor, Huazhong University of Science and Technology,
Download