Uploaded by Kenza Hamidouche

09206115

advertisement
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
MACHINE LEARNING
FOR 6G WIRELESS
NETWORKS
Carrying Forward Enhanced Bandwidth, Massive Access,
and Ultrareliable/Low-Latency Service
Jun Du, Chunxiao Jiang, Jian Wang, Yong Ren, and Mérouane Debbah
T
o satisfy the expected plethora of demanding services, the future generation of wireless networks
(6G) has been mandated as a revolutionary paradigm to carry forward the capacities of enhanced
broadband, massive access, and ultrareliable and lowlatency service in 5G wireless networks to a more powerful and intelligent level. Recently, the structure of 6G
networks has tended to be extremely heterogeneous,
densely deployed, and dynamic. Combined with tight
quality of service (QoS), such complex architecture will
result in the untenability of legacy network operation
routines. In response, artificial intelligence (AI), especially machine learning (ML), is emerging as a fundamental
solution to realize fully intelligent network orchestration
and management. By learning from uncertain and dynamic environments, AI-/ML-enabled channel estimation and
spectrum management will open up opportunities for
bringing the excellent performance of ultrabroadband
techniques, such as terahertz communications, into full
play. Additionally, challenges brought by ultramassive
access with respect to energy and security can be miti-
Digital Object Identifier 10.1109/MVT.2020.3019650
Date of current version: 25 September 2020
2 |||
gated by applying AI-/ML-based approaches. Moreover,
intelligent mobility management and resource allocation
will guarantee the ultrareliability and low latency of services. Concerning these issues, this article introduces
and surveys some state-of-the-art techniques based on
AI/ML and their applications in 6G to support ultrabroadband, ultramassive access, and ultrareliable and lowlatency services.
Motivation and Challenges
Recently, the 5G wireless network was developed to support enhanced mobile broadband (eMBB), massive
machine-type communications (mMTC), and ultrareliable and low-latency communications (uRLLC) [1],
according to the report of the International Telecommunication Union. Benefitting from such high performance,
5G has opened new doors of opportunity toward emerging applications, e.g., augmented reality (AR), virtual
reality (VR), tactile reality, mixed reality, and so on. However, the new media, such as holographic communications, will require much higher transmission speeds, up
to terabits per second, than AR and VR. Thus, 5G is far
from able to support the faster, more reliable, and largerscale communication requirements of these services. In
1556-6072/20©2020IEEE
IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
response, the investigation of future generations of wireless networks (6G) has been triggered, which promises
more powerful capacities in terms of ultrabroadband,
super-massive access, ultrareliability, and low latency
than 5G does, as listed in Table 1 [1].
To provide ubiquitous and various services, 6G networks tend to be more comprehensive and multidimensional by integrating current terrestrial networks with
space-/air-based information networks and marine information networks; then, heterogeneous network resources, as well as different types of users and data, will
be also integrated, as depicted in Figure 1. According to
such architecture, 6G networks are conceived to be cell
free, which means that users will move from one network to another seamlessly and automatically to pursue
the most suitable and qualified communications without
manual management and configurations. On the contrary, current 5G networking technologies still mainly
focus on a macro- and small-cell-based heterogeneous
architecture, which will be broken by the cell-free operation of 6G, and their performance will deteriorate when
applied to 6G with brand new architectures. In addition,
how to manage and control 6G networks to realize the
promising capacities of ultrabroadband, ultramassive
access, ultrareliability, and low latency also poses great
challenges brought by increasing ultradense, heterogeneous, and dynamic characteristics. Specifically, different kinds of satellite Internet, consisting of a large
number of satellites, were proposed and implemented
in recent years. For instance, the SpaceX project Starlink initially planned to build a constellation of 12,000
satellites in low-Earth orbit, which has been expanded
to 42,000 recently. In addition, mobile network operators
are accelerating the dense deployment of small-cell base
stations to reduce service latency by avoiding backhaul
transmission. Moreover, future large-scale Internet of
Things (IoT) systems in 6G will also bring challenges
of spectrum management and massive or super access
control. Furthermore, the integration of highly dynamic satellites, unmanned aerial vehicles (UAVs), and the
Internet of Vehicles (IoV) will result in more frequent
handovers, more uncertain user requirements, and
more unpredictable wireless communication environments than any previous generation of networks, which
makes it difficult to guarantee the ultrareliability and low
latency of services.
Therefore, 6G networks are developing into more multidimensional, heterogeneous, large-scale, and highly dynamic systems. All of these characteristics make it urgent
to explore new techniques that are adaptive, flexible, and
intelligent to bring a revolutionary leap of communications with ultrabroadband, ultramassive access support,
ultrareliability, and low latency. In addition, enormous
amounts of widely heterogeneous data generated from
6G networks will require advanced mathematical tools
to extract meaningful information from these data and
then make decisions, including resource management
and access control, pertaining to the proper functioning
of 6G, which are hardly achieved by traditional network
optimization techniques. In recent years, AI is emerging
as a fundamental paradigm to orchestrate communication and information systems from bottom to top. For the
foreseeable future, AI-enabled networks will open up new
opportunities for smart and intelligent 6G networking.
As a major branch of AI, ML can establish an intelligent system that operates in complicated environments.
Recently, ML has mainly developed into many branches,
such as classical ML, including supervised and unsupervised learning, deep learning (DL), and reinforcement
learning (RL). DL aims to understand the representations of data and can be modeled in supervised learning, unsupervised learning, and RL. Therefore, in some
surveys of ML, DL is not listed separately. As illustrated
in Figure 1, AI and ML techniques are expected to help 6G
networks make more optimized and adaptive data-driven decisions, alleviate communication challenges, and
meet requirements from emerging services. In this article, we focus on the scope of applying AI and ML to networking and resource management optimization, aiming
to bring about significant innovation of communications
on ultrabroadband, ultramassive access, ultrareliability,
and low latency.
Intelligent Ultrabroadband Transmission in 6G
In the bandwidth-hungry age, 5G networks have
exploited the spectrum bands of sub-GHz and 1–6 GHz
as efficiently as possible by introducing 24–100 GHz.
However, the current spectrum bands are still hardly
enough to meet the increasing demands. For instance,
some emerging applications, such as holography, may
require a data rate of up to terabits per second [1],
Table 1 A comparison of key performance indexes
between 4G, 5G, and 6G.
4G
5G
6G
Peak data rate
1 Gb/s
20 Gb/s
$ 1 Tb/s
User-experienced
data rate
10 Mb/s
100 Mbit/s
1 Gb/s
Spectrum
efficiency
1×
3×
15 – 30×
Mobility
350 km/h
500 km/h
$ 1,000 km/h
Latency
10 ms
1 ms
# 100 μs
5
6
Connection density
(devices/km2)
10
10
107
Network energy
efficiency
1×
100×
100–10,000×
Area traffic
capacity
0.1
Mb/s/m2
10
Mbit/s/m2
$ 1 Gb/s/m2
MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
||| 3
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
accurate information of time-varying channels is especially important to optimize terahertz bandwidth allocation and improve spectrum efficiency. In this
section, we introduce some state-of-the-art AI/ML
applications in terahertz channel estimation and spectrum management.
which is almost three orders higher than typical 5G
communications. In response, terahertz communications, utilizing bands in the range of 0.1–10 THz as well
as 140-, 220-, and 340-GHz frequencies, are expected to
support a data rate of up to terabits per second [2]. To
achieve such capacity-approaching performance,
Cloud-Fog/Edge
Typical
Techniques
Fog/Edge Layer
Cloud Layer
Software Defined
Network (SDN)
SDWN
Controller
Resource/
Application
Ultrareliable
Mobility Support
≥1,000 km/h
Satellite
Networks
Ultrabroadband
≥1 Tbit/s
High-Altitude
Platforms
Air
Space
Terahertz
Communications
Management
Cloud
Fog Gateway
FCP
FCP
FCP
Cloud
Edge
Gateway
Gateway
Base
Wireless
Intelligent
Access
Edge Node Radio
Station
Tower
Point
User Layer
Body Area
Networks
Ultramassive
Access
107 Devices/km2
UAV
Networks
Airborne
Internet
Maritime
Broadband
Underwater
Ocean
Land
Network Functions
Virtualization
IoV
Smart City
Underwater
Acoustic/Optical
Communications
Ultralow-Latency
Computing and Communication
≤100 µ s/s
Intelligent Network Management and Optimization in 6G
• Application Layer
Smart city, smart home, smart
health care, data mining/processing/
prediction, dimension reduction,
feature extraction, attack detection/
classification, caching, data offloading,
error detection/prediction, allocation,
data rate selection, and so on.
• Network Layers
• Physical Layer
Caching, traffic classification, anomaly
detection, throughput optimization,
latency minimization, attack detection,
intelligent routing, traffic prediction/
control, access control, source
encoding/decoding, and so on.
Channel tracking/equalization/
decoding, pathloss prediction/estimation,
intelligent beamforming, modulation
mode selection, anti-jamming, channel
access control, spectrum sensing/
management/allocation, physical-layer
security, and so on.
AI and ML Techniques
• Supervised Learning:
Neural Networks
Decision Tree,
Naive Bayesian,
K-Nearest Neighbors,
Logistic Regression,
and so on.
• Unsupervised/Semisupervised Learning:
K-Means,
ISOMAP,
Gaussian Mixture Model,
Expectation Maximization,
Locally Linear Embedding,
and so on.
• Reinforcement Learning:
Q-Learning,
Policy Gradients,
SARSA,
Deep Q Network,
and so on.
• Deep Learning:
Convolutional Neural
Network,
Recurrent Neural
Network, Recursive
Neural Network,
and so on.
Figure 1 An illustration of AI/ML applications in 6G to support ultrabroadband, ultramassive access, and ultrareliability/low latency.
4 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
AI-/ML-Enabled Terahertz Channel Modeling
and Estimation
At the terahertz frequency bands, the channels suffer
from high atmospheric absorption resulting from the
water vapor in the air, which influences losses significantly. In addition, free-space pathloss is also unavoidable physically in terms of atmospheric attenuation.
Furthermore, terahertz channels are observed as nonstationary, especially for dynamic scenarios where both
users and objects might be moving. Therefore, traditional channel models based on assumptions of being stationary or quasi-stationary can no longer apply to
terahertz channels.
ML algorithms are capable of analyzing the communication data and predicting likely signal loss in a given or
unknown environment. Therefore, many different types
of AI or ML algorithms can be applied to the physical
layer (PHL) of 6G networks to deal with the difficulties
just described for terahertz channel modeling and estimation. For instance, to improve estimation accuracy
in dynamic scenarios, the RL-based Bayesian filter has
been introduced to the angle-of-arrival (AoA) estimation
in terahertz channels in current studies. Specifically,
the Bayesian filter implements the estimation of the current AoA from both current measurement and previous
estimates. In this procedure, the prior transition probabilities between system states are important to the estimation performance of the Bayesian filter. RL then can
be applied to optimize the state transition probabilities
from the feedback of previous estimates and, hence, improve the performance of the Bayesian filter. Some other
feasible algorithms and applications in channel modeling and estimation are summarized as follows.
■■ Supervised learning: Supervised learning can be introduced to pathloss/shadowing prediction, localization,
interference management, channel estimation, and so
on. The feasible algorithms and models include radial
basis function neural networks, feed-forward neural
networks, K-nearest neighbor (KNN), multilayer perception, relevance vector machine, and support vector
machine (SVM).
■■ Unsupervised learning: Channel modeling and estimation problems, such as optimal modulation, interference mitigation, duplexing configuration, node
clustering, and multipath tracking, can be solved by
applying unsupervised learning algorithms, which
include K-means, clustering algorithms, fuzzy C-means,
and so on.
■■ DL: DL can be implemented for channel feature extraction, channel state information (CSI) estimation, signal
detection, and sparse signal recovery. Typical DL algorithms, such as convolutional neural networks, recurrent neural networks (RNNs), deep neural networks
(DNNs), deep belief networks, and deep Boltzmann
machines, can be expected as good candidates.
■■
RL: RL can be introduced to channel tracking, channel selection, modulation mode selection, radio
identification, and so on. Feasible algorithms and
models include fuzzy RL, Q-learning, WoLF-PHC
(Win-or-Learn-Fast-Policy Hill-Climbing) Markov
decision process (MDP), and partially observable
MDP.
Deep RL-Based Terahertz Spectrum Management
At present, there exists no restriction on terahertz spectrum use. The spectra have been occupied already by
some other applications, such as satellite services, spectroscopy, and meteorology [3]. Recently, the Federal
Communications Commission has been investing in utilizing terahertz spectrums for mobile services and applications. Therefore, spectrum-sharing methods are
necessary for the coexistence of future terahertz communications and the other existing applications listed
previously. In addition, as discussed in the previous section, 6G networks tend to be multidimensional, ultradense, and heterogeneous. Thus, considering that the
propagation medium and channel characteristic in integrated 6G networks are significantly distinct compared
with terrestrial networks in 5G, it requires more effort to
optimize the spectrum management of terahertz communications in 6G.
RL has the potential to realize smart or intelligent
spectrum management to deal with these problems, especially when large amounts of data can be leveraged
to train and predict. These training and prediction results can be taken advantage of to make decisions concerning whether or not the spectrum band is occupied
and to take action, such as accessing or releasing the
spectrum band. In addition, through the interaction between users and the wireless environment, users can
optimize their strategies iteratively to maximize the
value of reward functions, which can be established
considering spectrum efficiency, network capacity,
consumed energy, interference, and so on. However, RL
is not competent for learning an effective action–value
policy when there exist random noise or measurement
errors occupying the state observations, meaning that
the number of states in the presence of random noise is
infinite in practice. Addressing this problem of random
state measurements, deep RL (DRL) can be considered
a suitable tool to optimize the decisions on spectrum
management in 6G networks involving dynamic spectrum access, transmission power control, spectrum allocation, and so on.
Distributed Spectrum Access
Distributed algorithms for dynamic spectrum access
should be designed to adapt to general, complex, realworld settings effectively and efficiently. Meanwhile, the
expensive computational consumption resulting from
MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
||| 5
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
the trained DQN. Such a spectrum access framework is
implemented according to the procedure presented
in Figure 2.
the large state space and partial observability in the system can be also mitigated. To achieve this goal, a long
short-term memory (LSTM) layer maintaining an internal
state and aggregate observations can be established to
ensure the ability of estimating the true state using past
partial observations [4]. In addition, a dueling deep Q
network (DQN) method also can be applied to improve
the estimated Q-value resulting from bad states. After
training, users need to update only their DQN weights by
communicating with the central unit and then map its
local observation to spectrum access actions based on
Interaction With
Environment
Distributed Dynamic Power Control
By applying challenging optimal problems, traditional
power control techniques typically search the nearoptimal power allocation strategies. Such techniques
can hardly adapt to large-scale networks because of
their high computational complexity and precision
requirement of instantaneous CSI. Model-free RL has
Distributed Execution
Centralized Training
Spectrum Access
LSTM
Agent i
Select an Action:
Channel
Selection
max Q (s t, at )
i
a
i
a
Receive
Reward
rit
Power Control
Take
Action
a it
t
)
max Q (sit, a; θtarget
Communication
Environment
Observe
State
sit
Advantage
of Action:
A (sit, ait )
Cell
Upload (sit, ati)
and Local
Observations
x0
Average
Q-Value of
State V (sit )
Cell
Cell
Input
Output
Hidden
xn
x1
Input: 1) Selected Channels at t –1
2) Capacity of Each Channel
3) ACK Signal Received
Upload (sit, ati)
and Local
Observations
Input
t–1
(st–1
i , a i ) and
Local Observations Mini-Batch
Backhaul
Delay
ExperienceReplay
Memory
θ t+1
train
Output
Hidden
Power Control
Train DQN
Optimizer
θ ttrain
Update
θtarget
Upload Historical
Information:
States, Actions,
Rewards of Agent i
Spectrum
Resource Block
Selection
Backhaul
Delay of
Td Slots
Actor 1
Actor 2
Download Weights
of Actor Network
Actor N
θ ttrain Once per Tu Time Slots
s1t
Spectrum
Allocation
Input
r1t
a1t
s2t
r2t
a2t
t
sN
atN
rNt
Output
Hidden
Critic 1
Critic 2
Critic N
Figure 2 The DRL-based spectrum access, power control, and spectrum allocation in 6G networks. ACK: acknowledgment.
6 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
been introduced to deal with these problems in largescale and heterogeneous 6G networks and also can
achieve near-optimal power allocation, promising maximum sum-rate and scheduling fairness in real time. In
this framework, each transmitter collects QoS and CSI
information of its neighbors, which will be analyzed to
estimate or extract random variations and delay in the
CSI using deep Q-learning [5]. Figure 2 illustrates this
DRL-based power control mechanism.
Dynamic Spectrum Allocation
A distributed spectrum allocation framework also can
be formulated based on multiagent DRL, in which the
agent refers to each device occupying the spectrum
resource. In such a framework, the multiagent environment is established as a partially observable Markov
game model. To deal with the instability of the environment, a neighbor-agent actor-critic (NA AC) model,
which trains the information from neighbor nodes in a
centralized manner but with decentralized implementation, can be introduced to leverage the relationship
among devices sharing the spectrum resource to
improve system performance, such as sum-data rate
and spectrum efficiency. According to such a framework, the historical information is used for training the
RL model but not the decisions of spectrum allocation
[6]. This NAAC-based framework for spectrum allocation is seen in Figure 2.
AI-/ML-Based Energy and Security Management
for Super IoT
5G cellular networks introduced a new usage scenario
oriented to support massive IoT, namely, mMTC.
Toward 6G, the concept of a “super IoT” has been proposed recently, which can be elaborated with symbiotic radio and satellite-assisted IoT communications to
support an astonishing number of connected IoT devices and an extended coverage, respectively. Conseq u e n t l y, m o r e e f f i c i e n t e n e r g y m a n a g e m e n t
mechanisms are expected to support the large scale of
IoT systems to operate stably for long periods of time.
In addition, privacy and security issues will face more
severe challenges, especially for IoT systems collecting
individual or sensitive information. This section introduces AI-/ML-enabled energy and security management in super IoT systems.
Efficient Energy Management for Large-Scale
Energy-Harvesting Networks
In traditional IoT systems, low-power IoT devices are
typically limited by the energy stored in their batteries. Such energy shortcomings and limitations will
bring great challenges in energy management and
optimization when the scale of IoT systems grows
sharply. In response, energy-harvesting technologies
have been regarded as a promising approach to prolong the lifespan of super IoT systems by enabling IoT
devices to harvest energy from potential energy resources, e.g., solar and wind energy. However, such an energyharvesting scheme is a random process resulting from
the intensity dynamic of energy resources, which
means that the amount of energy stored in the battery of
each IoT device cannot be known precisely in advance.
In addition, the controllable energy is constrained
according to the current stored energy, which is also
capped by the battery capacity. Therefore, these problems will lead to the changing and uncertainty of the
total controllable energy and make it difficult to solve
the optimization problem of energy management,
since these energy constraints are always changing
[7]. Feasible approaches for energy management in
energy-harvesting-enabled IoT systems can be divided into offline management and online management,
and the latter can be realized through centralized or
distributed methods. Some ty pica l mecha nisms
designed in recent studies are summarized in Table 2.
Here, we analyze the advantages and disadvantages of
these approaches.
Offline Management
Recently, many offline-based energy management
approaches were designed by optimizing power allocation, access control, and so on. However, offline control
is essentially based on the assumption that the perfect
information of energy and channel status can be
observed before the operation, which is hardly implementable in practice.
Centralized Online AI-/ML-Based Management
In contrast to offline management, online AI-/ML-based
energy management approaches only require the present and previous energy arrivals and channel status
when implemented to improve the communication performance in energy-harvesting-enabled super IoT systems. Based on the online AI/ML framework, power
allocation and access control problems can be established as stochastic control problems whose discretized energy and channel states are then modeled as
MDPs. However, this online AI/ML framework still
requires the perfect information of energy arrivals and
channel status, which is hard to observe in practice.
The RL- and Lyapunov optimization-based methods
emerged as a result, most of which searched approximate solutions in a centralized fashion. Nevertheless,
this centralized approach is not applicable when the
number of IoT devices is large, resulting from inevitable significant feedback overheads. In addition, MDPs
always suffer the “curse of dimensionality,” which
results in heavy computational loads and makes MDPs
intractable.
MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
||| 7
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Distributed Online AI-/ML-Based Management
Without any prior information about energy arrivals
and channel status, a fully distributed online energy
management scheme will not require any information
exchange among IoT devices. Such distributed online
energy management is not easy to realize, considering
that convergence cannot always be guaranteed by
applying such an approach, which results from the
nonstationary environment. Moreover, many distributed online energy management schemes were proposed
based on the assumption that the global system state
is available for each device. To overcome these problems, a mean-field, multiagent DRL-based framework
was proposed in [8] to learn the optimal power control
to maximize the throughput of energy-harvestingenabled super IoT systems. In [8], the throughput maximization problem was modeled as a mean-field game
having a unique stationary solution, which ensures the
convergence of the problem. In addition, each IoT
device applies DRL individually to find the optimal
power control without any prior information about
energy arrivals and channel status. This distributed
approach can achieve throughput close to centralized
policies and can be implemented in large-scale IoT systems in practice.
Privacy and Security Guarantee for Super IoT
The extremely vast amounts of IoT devices and data
bring great challenges to privacy preservation and security guarantee. To protect super IoT systems from various kinds of threats and attacks, authentication, access
control, and attack detection are of paramount importance; traditional privacy and security technologies are
hardly applicable to super IoT, resulting from the heterogeneity of resources, volume of networks, limited energy
and storage of devices, and so on. By providing embedded intelligence in IoT devices and systems, AI-/MLbased security technologies are leveraged to deal with
these security problems. Next, we discuss some existing
AI-/ML-based solutions and feasible research directions
for addressing authentication, access control, and attack
detection in super IoT systems. Some recent typical studies are summarized in Table 3.
AI-/ML-Based Authentication and Access Control
Authentication and access control can help IoT devices
distinguish identity-based attacks and prevent unauthorized devices from accessing authorized systems [9]. To
improve authentication accuracy, different AI-/ML-based
approaches can perform well based on different scenarios and assumptions. In the following, we investigate
Table 2 The typical energy management mechanisms in energy harvesting-enabled large-scale IoT systems.
AI/ML Technique
First Introduced
(See “Tables 2 and 3
References”)
Category
Optimization Objective
Applications
Water-filling
A. Arafa 2018 [S1]
Offline
Throughput maximization
Energy consumption
optimization
Integer linear
programming
H. Ayatollahi
2017 [S2]
Offline
Throughput maximization
Communication
scheme selection
Directional waterfilling
O. Ozel 2011 [S3]
Offline
Throughput maximization,
delay minimization
Power control
DNN, MDP
M. K. Sharma
2019 [S4]
Centralized online
Time-averaged throughput
maximization
Power control
RL, DQN
M. Chu 2019 [S5]
Centralized online
Uplink sum-rate maximization
Multiaccess control
Lyapunov optimization
H. Yu 2019 [S6]
Centralized online
Throughput maximization
Power control
RL
F. A. Aoudia 2018 [S7]
Centralized online
QoS maximization
Energy harvesting and
RL
A. Ortiz 2018 [S8]
Centralized online
Throughput maximization
Power control
RL, MDP
K. Wu 2019 [S9]
Centralized online
Data importance value
maximization
Communication link
control
Bayesian RL
Y. Xiao 2015 [S10]
Centralized online
Long-term expected
reward maximization
Power control, data
transmission control
DNN, mean-field
game (MFG)
M. K. Sharma
2019 [S11]
Distributed online
Time-averaged throughput
maximization
Power control
MDP, MFG
D. Wang 2018 [S12]
Distributed online
Communication delay
minimization
Power control
Stochastic game
V. Hakami 2017 [S13]
Distributed online
Communication delay
minimization
Power control
Multi-agent RL,
Markov game
A. Ortiz 2017 [S14]
Distributed online
Sum-rate maximization
Power control
8 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
some feasible AI-/ML-based solutions in authentication
and access control problems in super IoT systems.
■■ RL: Q-learning-based approaches can be applied in
PHL authentication, which is realized by comparing
the PHL feature of a message with the claimed transmitter. In this procedure, the authentication accuracy
depends on the test threshold in the comparisons.
Resulting from the uncertain channel environment and
unpredicted spoofing model, each IoT device needs to
estimate the false alarm and misdetection rate of
spoofing, the future states of which are independent of
the previous states and actions. Therefore, the problem of threshold selection can be modeled as an MDP
with finite states.
■■ Supervised learning: Different from the threshold decision
in Q-learning-based approaches just described, the CSI
can be exploited through supervised learning to learn
how the channel changes, and then the PHL
■■
■■
authentication problems can be formulated as binary
classification problems, which are threshold free. Typical
supervised learning algorithms, such as decision tree,
SVM, KNN, and ensemble learning, then can be introduced to such classification problems to identify the
legitimate or illegitimate information according to the CSI.
Unsupervised learning: Unsupervised learning, such as
nonparametric Bayesian methods, can be introduced
in proximity-based authentication and access control
to identify the IoT devices in the proximity without
leaking the localization and other privacy-sensitive
information of IoT devices.
DL: According to the CSI in Wi-Fi or other radio signals
generated by IoT devices, human physiological and
behavioral characteristics can be learned by applying
multilayer DNN [10]. Based on activity recognition and
identification, authentication and access control
schemes then can be designed.
Table 3 Typical AI-/ML-based security mechanisms in large-scale IoT systems.
AI/ML
Techniques
Typical Research (See
“Tables 2 and 3 References”)
Security Problems
Attacks
Performance optimization
Neural network
J. M. McGinthy 2019 [S15]
Authentication
Spoofing
Classification accuracy
and delay
DRL
A. Ferdowsi 2019 [S16]
Authentication
Man-in-the-middle
and data injection
Extraction error rate,
detection delay
SVM, LSTM,
DL, RNN
J. Chauhan 2018 [S17]
Access control
Spoofing
Classification accuracy,
feature extraction time,
inference time
DNN, SVM
C. Shi 2017 [S18]
Authentication
Spoofing
False alarm rate
Q-learning,
Dyna-Q
L. Xiao 2016 [S19]
Authentication
Spoofing
False alarm rate, average
error rate, detection accuracy
Nash Q-learning
Y. Li 2017 [S20]
Access control
DoS attack
Root mean error
DRL
Y. Wang 2019 [S21]
Malware detection
Malware attack
Detection accuracy
RL
H. S. Anderson 2018 [S22]
Malware evasion
Malware attack
Successful rate of evasion
Q-learning,
Dyna-Q
L. Xiao 2017 [S23]
Malware detection
Malware attack
Detection accuracy
and delay
RF, KNN,
Bayesian net
F. A. Narudin 2016 [S24]
Malware detection,
access control
Malware attack,
intrusion
True positive rate, false positive
rate, detection precision
SVM, DQN
M. P. Arthur 2019 [S25]
Attack detection
Jamming, spoofing,
intrusion
Detection accuracy
DRL
N. Abuzainab 2019 [S26]
Attack detection,
secure routing and
transmission
Jamming,
eavesdropping
Detection accuracy,
system throughput
DL
A.A. Diro 2018 [S27]
Attack detection and
mitigation
DoS, probe,
R2L, U2R
Detection precision
and delay
Semi-supervised
Fuzzy C-means
S. Rathore 2018 [S28]
Attack detection and
mitigation
DoS, probe,
R2L, U2R
Detection precision, positive
predictive value, sensitivity
Decision tree
E. Viegas 2018 [S29]
Anomaly intrusion
detection
Intrusion
Detection accuracy
KNN, ANN, RF,
decision tree
R. Doshi 2018 [S30]
Attack detection and
mitigation
DDoS
Detection accuracy
DQN
G. Han 2017 [S31]
Secure channel
selection
Jamming
SINR
R2L: remote to local; U2R: user to root; SINR: signal-to-interference-plus-noise ratio.
MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
||| 9
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Tables 2 and 3 References
[S1] A. Arafa and S. Ulukus, “Mobile energy harvesting nodes:
Offline and online optimal policies,” IEEE Trans. Green Commun. Netw., vol. 2, no. 1, pp. 143–153, Mar. 2018. doi: 10.1109/
TGCN.2017.2777668.
[S2] H. Ayatollahi, C. Tapparello, and W. Heinzelman, “Reinforcement learning in MIMO wireless networks with energy harvesting,” in Proc. IEEE Int. Conf. Commun. (ICC), pp. 1–6, Paris,
France, May 21-25, 2017. doi: 10.1109/ICC.2017.7997229.
[S3] O. Ozel, K. Tutuncuoglu, J. Yang, S. Ulukus, and A. Yener,
“Transmission with energy harvesting nodes in fading wireless channels: Optimal policies,” IEEE J. Sel. Areas Commun., vol. 29, no. 8, pp. 1732–1743, Sept. 2011. doi: 10.1109/
JSAC.2011.110921.
[S4] M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Deep
learning based online power control for large energy harvesting networks,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Brighton, May 12–17, 2019, pp. 8429–
8433. doi: 10.1109/ICASSP.2019.8683468.
[S5] M. Chu, H. Li, X. Liao, and S. Cui, “Reinforcement learningbased multiaccess control and battery prediction with energy
harvesting in IoT systems,” IEEE Internet Things J., vol. 6, no.
2, pp. 2009–2020, Apr. 2019. doi: 10.1109/JIOT.2018.2872440.
[S6] H. Yu, Z. Zhou, C. Pan, X. Zhao, and S. Mumtaz, “Online resource allocation for energy harvesting based large-scale
multiple antenna systems,” in Proc. IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, Dec. 9–13, 2019, pp. 1–6.
doi: 10.1109/GCWkshps45667.2019.9024449.
[S7] F. Ait Aoudia, M. Gautier, and O. Berder, “RLMan: An energy
manager based on reinforcement learning for energy harvesting wireless sensor networks,” IEEE Trans. Green Commun. Netw., vol. 2, no. 2, pp. 408–417, June 2018. doi: 10.1109/
TGCN.2018.2801725.
[S8] A. Ortiz, T. Weber, and A. Klein, “A two-layer reinforcement
learning solution for energy harvesting data dissemination
scenarios,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal
Process. (ICASSP), Calgary, Canada, Apr. 15–20, 2018, pp.
6648–6652. doi: 10.1109/ICASSP.2018.8462056.
[S9] K. Wu, H. Jiang, and C. Tellambura, “Sensing, probing, and
transmitting policy for energy harvesting cognitive radio
with two-stage after-state reinforcement learning,” IEEE
Trans. Veh. Tech., vol. 68, no. 2, pp. 1616–1630, Feb. 2019. doi:
10.1109/TVT.2018.2888826.
[S10] Y. Xiao, D. Niyato, Z. Han, and L. A. DaSilva, “Dynamic energy trading for energy harvesting communication networks:
A stochastic energy trading game,” IEEE J. Sel. Areas Commun., vol. 33, no. 12, pp. 2718–2734, Dec. 2015. doi: 10.1109/
JSAC.2015.2481204.
[S11] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S.
Vassilaras, “Distributed power control for large energy harvesting networks: A multi-agent deep reinforcement learning
approach,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 4,
pp. 1140–1154, Dec. 2019. doi: 10.1109/TCCN.2019.2949589.
[S12] D. Wang, W. Wang, Z. Zhang, and A. Huang, “Delay-optimal
random access for large-scale energy harvesting networks,”
in Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, May
20–24, 2018, pp. 1–6. doi: 10.1109/ICC.2018.8422272.
[S13] V. Hakami and M. Dehghan, “Distributed power control for
delay optimization in energy harvesting cooperative relay
networks,” IEEE Trans. Veh. Tech., vol. 66, no. 6, pp. 4742–
4755, June 2017. doi: 10.1109/TVT.2016.2610444.
[S14] A. Ortiz, H. Al-Shatri, X. Li, T. Weber, and A. Klein, “Reinforcement learning for energy harvesting decode-and-forward
two-hop communications,” IEEE Trans. Green Commun.
Netw., vol. 1, no. 3, pp. 309–319, Sept. 2017. doi: 10.1109/
TGCN.2017.2703855.
[S15] J. M. McGinthy, L. J. Wong, and A. J. Michaels, “Groundwork
for neural network-based specific emitter identification authentication for IoT,” IEEE Internet Things J., vol. 6, no. 4, pp.
6429–6440, Aug. 2019. doi: 10.1109/JIOT.2019.2908759.
AI-/ML-Based Attack Analysis and Detection
Similar to applications in authentication and access control, AI/ML technologies also can be applied to analyze
and detect different kinds of attacks, such as spoofing,
jamming, denial of service (DoS) or distributed DoS
10 ||| [S16] A. Ferdowsi and W. Saad, “Deep learning for signal authentication and security in massive Internet-of-Things systems,”
IEEE Trans. Commun., vol. 67, no. 2, pp. 1371–1387, Feb. 2019.
doi: 10.1109/TCOMM.2018.2878025.
[S17] J. Chauhan, S. Seneviratne, Y. Hu, A. Misra, A. Seneviratne,
and Y. Lee, “Breathing-based authentication on resourceconstrained IoT devices using recurrent neural networks,”
Computer, vol. 51, no. 5, pp. 60–67, May 2018. doi: 10.1109/
MC.2018.2381119.
[S18] C. Shi, J. Liu, H. Liu, and Y. Chen, “Smart user authentication through actuation of daily activities leveraging WiFienabled IoT,” in Proc. ACM Int. Symp. Mobile Ad Hoc
Netw. Comput., Chennai, India, July 2017, pp. 1–10. doi:
10.1145/3084041.3084061.
[S19] L. Xiao, Y. Li, G. Han, G. Liu, and W. Zhuang, “PHY-layer
spoofing detection with reinforcement learning in wireless
networks,” IEEE Trans. Veh. Tech., vol. 65, no. 12, pp. 10,037–
10,047, Dec. 2016. doi: 10.1109/TVT.2016.2524258.
[S20] Y. Li, D. E. Quevedo, S. Dey, and L. Shi, “SINR-based DoS attack on remote state estimation: A game-theoretic approach,”
IEEE Trans. Contr. Netw. Syst., vol. 4, no. 3, pp. 632–642, Sept.
2017. doi: 10.1109/TCNS.2016.2549640.
[S21] Y. Wang, J. W. Stokes, and M. Marinescu, “Neural malware
control with deep reinforcement learning,” in Proc. IEEE Military Commun. Conf. (MILCOM), Norfolk, VA, Nov. 12–14, 2019,
pp. 1–8. doi: 10.1109/MILCOM47813.2019.9020862.
[S22] H. S. Anderson, A. Kharkar, B. Filar, D. Evans, and P. Roth,
“Learning to evade static PE machine learning malware
models via reinforcement learning,” Jan. 30, 2018, arXiv:1801.08917.
[S23] L. Xiao, Y. Li, X. Huang, and X. Du, “Cloud-based malware detection game for mobile devices with offloading,” IEEE Trans.
Mobile Comput., vol. 16, no. 10, pp. 2742–2750, Oct. 2017. doi:
10.1109/TMC.2017.2687918.
[S24] F. A. Narudin, A. Feizollah, N. B. Anuar, and A. Gani, “Evaluation of machine learning classifiers for mobile malware detection,” Soft Comput., vol. 20, no. 1, pp. 343–357, Jan. 2016.
doi: 10.1007/s00500-014-1511-6.
[S25] M. P. Arthur, “Detecting signal spoofing and jamming attacks
in UAV networks using a lightweight IDS,” in Proc. Int. Conf.
Comput., Inform. Telecommun. Syst. (CITS), Beijing, China,
Aug. 28–31, 2019, pp. 1–5. doi: 10.1109/CITS.2019.8862148.
[S26] N. Abuzainab et al., “QoS and jamming-aware wireless networking using deep reinforcement learning,” in Proc.
IEEE Military Commun. Conf. (MILCOM), Norfolk, VA, Nov.
12–14, 2019, pp. 610–615. doi: 10.1109/MILCOM47813.2019.
9020985.
[S27] D. A. Abeshu and C. Naveen, “Distributed attack detection
scheme using deep learning approach for Internet of Things,”
Future Gener. Comput. Syst., vol. 82, pp. 761–768, May 2018.
doi: 10.1016/j.future.2017.08.043.
[S28] S. Rathore and J. H. Park, “Semi-supervised learning based
distributed attack detection framework for IoT,” Appl.
Soft Comput., vol. 72, pp. 79–89, May 2018. doi: 10.1016/j.
asoc.2018.05.049.
[S29] E. Viegas, A. Santin, V. Abreu, and L. S. Oliveira, “Enabling
anomaly-based intrusion detection through model generalization,” in Proc. IEEE Symp. Comput. Commun. (ISCC),
Natal, Brazil, June 25–28, 2018, pp. 934–939. doi: 10.1109/
ISCC.2018.8538524.
[S30] R. Doshi, N. Apthorpe, and N. Feamster, “Machine learning
DDoS detection for consumer Internet of Things devices,” in
Proc. IEEE Secur. Privacy Workshop (SPW), San Francisco, CA,
May 24, 2018, pp. 29–35. doi: 10.1109/SPW.2018.00013.
[S31] G. Han, L. Xiao, and H. V. Poor, “Two-dimensional anti-jamming communication based on deep reinforcement learning,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process.
(ICASSP), New Orleans, LA, Mar. 5–9, 2017, pp. 2087–2091.
doi: 10.1109/ICASSP.2017.7952524.
(DDoS) attacks, eavesdropping, malware attacks, and so
on [9]. For instance, supervised learning, including SVM,
KNN, random forest (RF), and DNN, can be introduced to
detect these attacks by building classification and regression models. In addition, unsupervised learning can
IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
investigate unlabeled data to divide them into different
groups; e.g., multivariate correlation analysis can help to
detect DoS and DDoS attacks. In some recent studies, RL
algorithms have been applied to help IoT devices make
decisions on the selection of security protocols against
attacks. The feasible algorithms include Q-learning, DQN,
Dyna-Q, and so on.
AI-/ML-Enabled Ultrareliable/Low-Latency Applications
Satellite, UAV, and IoV communications will be integrated
in 6G networks, for which the high dynamics of channel,
environment, and traffic, as well as increasingly delaysensitive applications, require more reliable and lowlatency transmission technologies to guarantee
communication connectivity and timeliness. In addition,
accompanying frequent resource allocation, network
reconfiguration, and service customization also depend
heavily on reliable, low-latency, and flexible network management. To satisfy such needs, mobility management
and offloading techniques are expected to support ultrareliable and low-latency communications and also are
confronted with challenges brought by high dynamics,
multidimensionality, and significant heterogeneity. In this
section, we discuss some AI-/ML-based solutions targeting
the improvement of the reliability and timeliness of communications in 6G.
Intelligent Mobility and Handover Management in 6G
High-speed mobility of elements in 6G, including satellites, UAVs, vehicles, and so on, will result in frequent
handovers, making the connections and communications unstable and unreliable. Moreover, the service
requirements of low latency and high transmission rate
will also make it more challenging to achieve efficient
mobility and handover management. Therefore, to support ultrareliable and low-latency applications in 6G,
DRL, DL, and RL will be capable and powerful tools to
endow the mobility management with intelligence and
adaptivity [11].
■■ DRL: In a UAV-enabled 6G network, UAVs can perform
as DRL agents. They can observe the environment
states, such as the movement velocity, current position, and link quality, and then make the best decisions
in terms of mobility and handover actions to maximize
their rewards, which can be defined considering the
link stability, channel quality, transmission latency and
capacity, and so on. By interacting with the dynamic
environment, UAVs will learn their strategies of mobility and handover management automatically and
robustly to minimize transmission latency and handover failure probability and then will achieve highly reliable wireless connections in the system.
■■ DL: It is necessary to achieve the precise estimation of
state for mobility and handover management of UAVs.
However, the inaccuracies associated with onboard
MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE
■■
measurements, such as unpredictable drifts, biases,
and immense noise resulting from significant vibration of UAVs’ rotors, make it difficult to obtain accurate state estimates. A DL-based framework that can
apply the ANN, RNN, and so on may help to improve
the accuracy of state estimation. To be specific, a
DNN can be trained to identify the associated measurement noise models and then filter them out of the
final estimation. To further reduce computation complexity, the dropout technique also can be adopted
when training this DNN. In addition, DL also can be
applied to predict trajectories of UAVs. By learning
the movement behaviors of UAVs according to the
measurement information, the positional relationships among UAVs can be analyzed. Based on such
information, mobility and handover mechanisms with
high success rates can be designed. Furthermore,
LSTM also can perform as a powerful tool to design
efficient mobility and handover management schemes
[12]. By training the previous and current mobility
contexts of UAVs, the sequence of future time-dependent mobility states and trajectories of UAVs can be
obtained, which can be considered to optimize handover parameters.
RL: Cooperative Q-learning-based parameters on the
optimization of mobility-sensitive handover can learn
the required parameters appropriate for specific
velocity conditions in UAV-enabled 6G networks,
which can be adapted to the realistic environment,
where UAVs have time-varying velocities. To avoid frequent handovers and reduce handover and connectivity failures, dynamic fuzzy Q-learning can be utilized to
optimize the handover parameters and then guarantee
the reliability and efficiency of UAV-enabled connections and communications.
Intelligent Communication and Computing
Resource Allocation
Driven by the exponentially growing demands of multimedia data traffic and computation-heavy applications,
6G networks are expected to achieve a high QoS with
ultrareliability and low latency. In response, resource
allocation has been considered an important factor
that can improve 6G performance directly by configuring heterogeneous resources effectively and efficiently.
In 6G, the allocated resource can be divided into communication resources, which include channels and bandwidth, and computing resources, such as memory and
processing power. In recent years, various traffic offloading, caching, and cloud/fog/edge computing mechanisms
designed to allocate these communication, storage,
and computing resources in heterogeneous networks,
respectively, have become promising solutions to handle
the increasing data and computational requirements
with low-latency and on-demand services. In addition,
||| 11
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
action space of offloading decisions is often a continuous–discrete hybrid. To be specific, in a task offloading-enabled 6G network, the strategies for determining
which node should be selected to implement traffic/
computation offloading or caching constitute a discrete
action space. On the other hand, the possible resource
volume, which will be provided by the selected node
for offloading, is usually a continuous value. Such resource allocation problems with continuous–discrete
hybrid decision spaces tend to be extremely complex,
especially when time-varying tasks, energy harvesting,
and security issues are also considered.
To provide low-latency computing services, we
have carried out some preliminary work focusing on
the hybrid decision of computation offloading in 6G
networks based on DRL. As demonstrated in Figure 3,
different AI/ML techniques, such as RL, DRL, double
DRL, and so on, have been introduced to these resource
allocation techniques to deal with the sophisticated optimization of decision making resulting from the multidimensionality, random uncertainty, and dynamics of 6G.
By applying AI/ML tools, valuable information can be
extracted through training observed data, and then different functions for prediction, optimization, and decision making in traffic offloading, caching, and cloud/fog/
edge computing can be learned to support ultrareliable
and low-latency services [13].
However, most current RL- or DRL-based resource
allocation approaches were modeled in a discrete action space, which restricts the optimization of offloading decisions in a limited action space [14]. Such a model
assumption is unreasonable in practice, where the
Interaction With
Environment
Distributed
Execution
Communication
Environment
States:
• Task Load L
Take
• Battery Level b
Action
• Harvested
t , ft, i
am
m
Energy e
• Important
Receive
Factor Z
• Channel Gain gi Reward
ri t
• Computing
Capacity fi
Device m
Actor i :
Select an Action
Centralized Training
Upload Historical
Information:
States, Actions,
and Rewards
of Device m
Strategy:
Offloading (1 – α)l
Data to Server i ;
Locally Processing Download Weights
αl Data With CPU of Actor Network
Frequency f
Output Action
DRL
Actor µ (s; φµ)
u1, u2, …, uN
Continuous Action:
ui = (αi , fi )
Critic Q (s, i, u; φQ)
Q1, Q2, …, QN
Discrete Action:
i = arg maxj ∈NQi
"
Observe
State
t
sm
(a)
Hybrid-AC
DQLO (5 States)
DQLO (10 States)
Server Execution
Device Execution
Exhaustive Search
Upper Bound
100
20
Average Consumed Time (s)
Average Rewards
10
0
–10
–20
–30
–40
0.8
10–1
10–2
10–3
1
1.2
1.4 1.6 1.8
2 2.2
Requested Task Load ζ
2.4
2.6
(b)
6
7
8
9
10
11
12
Maximum Harvested Energy emax (10–4 J)
(c)
Figure 3 The framework and simulation results of a hybrid decision-controlled DRL-based dynamic computation offloading scheme in
large-scale IoT systems. (a) An illustration of a hybrid decision-controlled DRL-based dynamic computation offloading scheme. (b) Performance versus different task arrival rate. (c) Performance versus different maximum harvested energy. Hybrid-AC: hybrid action–critic;
DQLO: deep Q-learning-based offloading.
12 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
energy-harvesting-enabled devices can offload their
computational tasks to edge computing servers. The
server selection problem is modeled in a discrete action
space; meanwhile, the decision spaces of offloading ratio and local computation capacity are continuous. In
the DRL framework, at each step after observing the
states of systems (such as task load, battery level,
harvested energy of each device, channel status, and
computation capacity of each device and server), the
possible computation offloading decisions, including
server selection, offloading ratio, and local computation capacity allocation, are contained in the sets
of possible actions. Each device then selects the
best actions from these sets to maximize its reward,
which is determined by latency, energy cost, reliability, and so on. The detailed modeling and implementation of this proposed mechanism are provided in
[15]. To validate the efficiency and superiority of our
proposed hybrid action–critic-based computation
offloading approach, we test the average rewards received and execution time compared with those of
deep Q-learning-based offloading, server execution,
and device execution. The latter two mechanisms
indicate executing all computational tasks at the selected server remotely and at the device locally, respectively. Simulation results in Figure 3(b) and (c)
indicate that, with different task arrival rates and allowed maximum harvested energy, the proposed approach can achieve the highest reward and smallest
time latency among the four schemes.
technologies, including traffic, storage, and computing
offloading mechanisms, were identified to meet the
requirements of ultrareliability and low latency in 6G
services. As investigated in this article, AI-/ML-enabled
techniques may allow future 6G networks to learn from
uncertain and dynamic environments, adapt to unpredictable changes in an intelligent and automated fashion, and then achieve significantly improved
performance in aspects of ultrabroadband, ultramassive access, ultrareliability, and low latency.
There are still many challenges to realize comprehensive and mature applications of AI/ML techniques in 6G.
Especially for current computing devices with limited
power, memory, storage, and processing capacities, how
to modify AI-/ML-based algorithms and mechanisms,
which bring high complexity and huge amounts of computation, to get closer to practical implementation is
worthy of further investigation. In addition, varied and
emerging application scenarios and new AI/ML techniques may also bring challenges to the implementation
of intelligent technologies in 6G.
Acknowledgments
This research was supported by the National Natural
Science Foundation China under project 61971257, China
Postdoctoral Science Foundation under special grant
2019T120091 and grant 2018M640130, and the project
“The Verification Platform of Multi-tier Coverage Communication Network for Oceans (LZC0020).” The corresponding authors of this article are Chunxiao Jiang and
Yong Ren.
Conclusions
To satisfy emerging services and applications, AI-/MLenabled 6G networks have been considered fundamental enablers to carry forward the capacities of eMBB,
mMTC, and uRLLC in 5G to a more powerful and intelligent level. In this article, we focused on some solutions of applying AI and ML tools to 6G networking and
resource management optimization. We illustrated
intelligent terahertz techniques, such as AI-/MLenabled terahertz channel estimation and spectrum
management, which are considered revolutionary, to
achieve an ultrabroadband transmission. In addition,
we introduced AI/ML applications in energy management, especially for large-scale energy-harvesting networks. Moreover, AI-/ML-based security enhancement
mechanisms, including authentication, access control,
and attack detection, were discussed for super IoT
systems. Such intelligentization of energy and security
will help to achieve efficient and reliable ultramassive
access. Furthermore, we introduced some efficient
mobility and handover management approaches based
on DRL, DL, and Q-learning to realize ultrareliable and
stable transmission links and satisfy the high dynamics in 6G. Finally, intelligent resource allocation
MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE
Author Information
Jun Du (blgdujun@gmail.com) currently
holds a postdoctoral position with the
Department of Electrical Engineering,
Tsinghua University, China. Her research
interests are mainly in resource allocation and system security of heterogeneous networks and space-based information networks.
She is the recipient of the Best Paper Award from the
IEEE International Conference on Communications 2019
and the Best Paper Award from the International Conference on Wireless Communications and Mobile Computing in 2020. She is a Member of IEEE.
Chunxiao Jiang (jchx@tsinghua.edu
.cn) is an associate professor at the
School of Information Science and Technology, Tsinghua University, China. His
research interests include the application
of game theory, optimization, and statistical theories to communication, networking, and
resource allocation problems, in particular, space networks and heterogeneous networks. He is a Senior Member of IEEE.
||| 13
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Jian Wang (jian-wang@tsinghua.edu
.cn) joined the faculty of Tsinghua University, China, in 2006, where he is currently
an associate professor with the Department of Electronic Engineering. His
research interests include the application
of statistical theories, optimization, and machine learning
to communication, networking, navigation, and resource
allocation problems, in particular, heterogeneous networks and intelligent collaborative systems. He is a
Senior Member of IEEE.
Yong Ren (reny@tsinghua.edu.cn) is a
professor with the Department of Electronics Engineering and director of the
Complexity Engineered Systems Lab at
Tsinghua University, China. His current
research interests include complex systems theory and its applications to the optimization and
information sharing of the Internet, the IoT and ubiquitous networks, cognitive networks, and cyber-physical
systems. He is a Senior Member of IEEE.
M é r o u a n e D e b b a h (merouane
.debbah@huawei.com) is vice president of
the Huawei France Research Center. He is
jointly director of the Mathematical and
Algorithmic Sciences Lab as well as the
Lagrange Mathematical and Computing
Research Center. He has managed eight European Union
projects and more than 24 national and international
projects. His research interests lie in fundamental mathematics, algorithms, statistics, information, and communication sciences research. He is a Fellow of IEEE.
References
[1] Z. Zhang et al., “6G wireless networks: Vision, requirements, architecture, and key technologies,” IEEE Veh. Technol. Mag., vol. 14, no.
3, pp. 28–41, Sept. 2019. doi: 10.1109/MVT.2019.2921208.
[2] I. F. Akyildiz, J. M. Jornet, and C. Han, “Teranets: Ultra-broadband communication networks in the terahertz band,” IEEE Wireless Commun.,
vol. 21, no. 4, pp. 130–135, Aug. 2014. doi: 10.1109/MWC.2014.6882305.
14 ||| [3] R. Singh and D. Sicker, “Beyond 5G: THz spectrum futures and implications for wireless communication,” in Proc. 30th European Conf.
Int. Telecommunication Society (ITS), Helsinki, Finland, June 16–19,
2019. [Online]. Available: https://www.econstor.eu/bitstream/
10419/205213/1/Singh-Sicker.pdf
[4] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for
distributed dynamic spectrum access,” IEEE Trans. Wireless Commun.,
vol. 18, no. 1, pp. 310–323, Jan. 2019. doi: 10.1109/TWC.2018.2879433.
[5] Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning
for dynamic power allocation in wireless networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019. doi: 10.1109/
JSAC.2019.2933973.
[6] Z. Li and C. Guo, “Multi-agent deep reinforcement learning based
spectrum allocation for D2D underlay communications,” IEEE
Trans. Veh. Technol., vol. 69, no. 2, pp. 1828–1840, Dec. 2019. doi:
10.1109/TVT.2019.2961405.
[7] Y. Al-Eryani and E. Hossain, “The D-OMA method for massive multiple access in 6G: Performance, security, and challenges,” IEEE
Veh. Technol. Mag., vol. 14, no. 3, pp. 92–99, Sept. 2019. doi: 10.1109/
MVT.2019.2919279.
[8] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. Vassilaras,
“Distributed power control for large energy harvesting networks:
A multi-agent deep reinforcement learning approach,” IEEE Trans.
Cogn. Commun. Netw., vol. 5, no. 4, pp. 1140–1154, Dec. 2019. doi:
10.1109/TCCN.2019.2949589.
[9] L. Xiao, X. Wan, X. Lu, Y. Zhang, and D. Wu, “IoT security techniques
based on machine learning: How do IoT devices use AI to enhance
security?” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 41–49, Sept.
2018. doi: 10.1109/MSP.2018.2825478.
[10] A. Ferdowsi and W. Saad, “Deep learning for signal authentication and security in massive internet-of-things systems,” IEEE
Trans. Commun., vol. 67, no. 2, pp. 1371–1387, 2018. doi: 10.1109/
TCOMM.2018.2878025.
[11] A. Stamou, N. Dimitriou, K. Kontovasilis, and S. Papavassiliou, “Autonomic handover management for heterogeneous networks in
a future internet context: A survey,” IEEE Commun. Surveys Tuts.,
vol. 21, no. 4, pp. 3274–3297, Fourthquarter 2019. doi: 10.1109/
COMST.2019.2916188.
[12] H. Ye, L. Liang, G. Y. Li, J. Kim, L. Lu, and M. Wu, “Machine learning for vehicular networks: Recent advances and application examples,” IEEE Veh. Technol. Mag., vol. 13, no. 2, pp. 94–101, June 2018.
doi: 10.1109/MVT.2018.2811185.
[13] M. Min, L. Xiao, Y. Chen, P. Cheng, D. Wu, and W. Zhuang, “Learningbased computation offloading for IoT devices with energy harvesting,” IEEE Trans. Veh. Technol., vol. 68, no. 2, pp. 1930–1941, Feb.
2019. doi: 10.1109/TVT.2018.2890685.
[14] Y. He, F. R. Yu, N. Zhao, V. C. Leung, and H. Yin, “Software-defined
networks with mobile edge computing and caching for smart cities: A
big data deep reinforcement learning approach,” IEEE Commun. Mag.,
vol. 55, no. 12, pp. 31–37, Dec. 2017. doi: 10.1109/MCOM.2017.1700246.
[15] J. Zhang, J. Du, Y. Shen, and J. Wang, “Dynamic computation offloading with energy harvesting devices: A hybrid decision based deep
reinforcement learning approach,” IEEE Internet Things J., early access, June 2020. doi: 10.1109/JIOT.2020.3000527.
IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020
Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.
Related documents
Download