This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. MACHINE LEARNING FOR 6G WIRELESS NETWORKS Carrying Forward Enhanced Bandwidth, Massive Access, and Ultrareliable/Low-Latency Service Jun Du, Chunxiao Jiang, Jian Wang, Yong Ren, and Mérouane Debbah T o satisfy the expected plethora of demanding services, the future generation of wireless networks (6G) has been mandated as a revolutionary paradigm to carry forward the capacities of enhanced broadband, massive access, and ultrareliable and lowlatency service in 5G wireless networks to a more powerful and intelligent level. Recently, the structure of 6G networks has tended to be extremely heterogeneous, densely deployed, and dynamic. Combined with tight quality of service (QoS), such complex architecture will result in the untenability of legacy network operation routines. In response, artificial intelligence (AI), especially machine learning (ML), is emerging as a fundamental solution to realize fully intelligent network orchestration and management. By learning from uncertain and dynamic environments, AI-/ML-enabled channel estimation and spectrum management will open up opportunities for bringing the excellent performance of ultrabroadband techniques, such as terahertz communications, into full play. Additionally, challenges brought by ultramassive access with respect to energy and security can be miti- Digital Object Identifier 10.1109/MVT.2020.3019650 Date of current version: 25 September 2020 2 ||| gated by applying AI-/ML-based approaches. Moreover, intelligent mobility management and resource allocation will guarantee the ultrareliability and low latency of services. Concerning these issues, this article introduces and surveys some state-of-the-art techniques based on AI/ML and their applications in 6G to support ultrabroadband, ultramassive access, and ultrareliable and lowlatency services. Motivation and Challenges Recently, the 5G wireless network was developed to support enhanced mobile broadband (eMBB), massive machine-type communications (mMTC), and ultrareliable and low-latency communications (uRLLC) [1], according to the report of the International Telecommunication Union. Benefitting from such high performance, 5G has opened new doors of opportunity toward emerging applications, e.g., augmented reality (AR), virtual reality (VR), tactile reality, mixed reality, and so on. However, the new media, such as holographic communications, will require much higher transmission speeds, up to terabits per second, than AR and VR. Thus, 5G is far from able to support the faster, more reliable, and largerscale communication requirements of these services. In 1556-6072/20©2020IEEE IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. response, the investigation of future generations of wireless networks (6G) has been triggered, which promises more powerful capacities in terms of ultrabroadband, super-massive access, ultrareliability, and low latency than 5G does, as listed in Table 1 [1]. To provide ubiquitous and various services, 6G networks tend to be more comprehensive and multidimensional by integrating current terrestrial networks with space-/air-based information networks and marine information networks; then, heterogeneous network resources, as well as different types of users and data, will be also integrated, as depicted in Figure 1. According to such architecture, 6G networks are conceived to be cell free, which means that users will move from one network to another seamlessly and automatically to pursue the most suitable and qualified communications without manual management and configurations. On the contrary, current 5G networking technologies still mainly focus on a macro- and small-cell-based heterogeneous architecture, which will be broken by the cell-free operation of 6G, and their performance will deteriorate when applied to 6G with brand new architectures. In addition, how to manage and control 6G networks to realize the promising capacities of ultrabroadband, ultramassive access, ultrareliability, and low latency also poses great challenges brought by increasing ultradense, heterogeneous, and dynamic characteristics. Specifically, different kinds of satellite Internet, consisting of a large number of satellites, were proposed and implemented in recent years. For instance, the SpaceX project Starlink initially planned to build a constellation of 12,000 satellites in low-Earth orbit, which has been expanded to 42,000 recently. In addition, mobile network operators are accelerating the dense deployment of small-cell base stations to reduce service latency by avoiding backhaul transmission. Moreover, future large-scale Internet of Things (IoT) systems in 6G will also bring challenges of spectrum management and massive or super access control. Furthermore, the integration of highly dynamic satellites, unmanned aerial vehicles (UAVs), and the Internet of Vehicles (IoV) will result in more frequent handovers, more uncertain user requirements, and more unpredictable wireless communication environments than any previous generation of networks, which makes it difficult to guarantee the ultrareliability and low latency of services. Therefore, 6G networks are developing into more multidimensional, heterogeneous, large-scale, and highly dynamic systems. All of these characteristics make it urgent to explore new techniques that are adaptive, flexible, and intelligent to bring a revolutionary leap of communications with ultrabroadband, ultramassive access support, ultrareliability, and low latency. In addition, enormous amounts of widely heterogeneous data generated from 6G networks will require advanced mathematical tools to extract meaningful information from these data and then make decisions, including resource management and access control, pertaining to the proper functioning of 6G, which are hardly achieved by traditional network optimization techniques. In recent years, AI is emerging as a fundamental paradigm to orchestrate communication and information systems from bottom to top. For the foreseeable future, AI-enabled networks will open up new opportunities for smart and intelligent 6G networking. As a major branch of AI, ML can establish an intelligent system that operates in complicated environments. Recently, ML has mainly developed into many branches, such as classical ML, including supervised and unsupervised learning, deep learning (DL), and reinforcement learning (RL). DL aims to understand the representations of data and can be modeled in supervised learning, unsupervised learning, and RL. Therefore, in some surveys of ML, DL is not listed separately. As illustrated in Figure 1, AI and ML techniques are expected to help 6G networks make more optimized and adaptive data-driven decisions, alleviate communication challenges, and meet requirements from emerging services. In this article, we focus on the scope of applying AI and ML to networking and resource management optimization, aiming to bring about significant innovation of communications on ultrabroadband, ultramassive access, ultrareliability, and low latency. Intelligent Ultrabroadband Transmission in 6G In the bandwidth-hungry age, 5G networks have exploited the spectrum bands of sub-GHz and 1–6 GHz as efficiently as possible by introducing 24–100 GHz. However, the current spectrum bands are still hardly enough to meet the increasing demands. For instance, some emerging applications, such as holography, may require a data rate of up to terabits per second [1], Table 1 A comparison of key performance indexes between 4G, 5G, and 6G. 4G 5G 6G Peak data rate 1 Gb/s 20 Gb/s $ 1 Tb/s User-experienced data rate 10 Mb/s 100 Mbit/s 1 Gb/s Spectrum efficiency 1× 3× 15 – 30× Mobility 350 km/h 500 km/h $ 1,000 km/h Latency 10 ms 1 ms # 100 μs 5 6 Connection density (devices/km2) 10 10 107 Network energy efficiency 1× 100× 100–10,000× Area traffic capacity 0.1 Mb/s/m2 10 Mbit/s/m2 $ 1 Gb/s/m2 MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. ||| 3 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. accurate information of time-varying channels is especially important to optimize terahertz bandwidth allocation and improve spectrum efficiency. In this section, we introduce some state-of-the-art AI/ML applications in terahertz channel estimation and spectrum management. which is almost three orders higher than typical 5G communications. In response, terahertz communications, utilizing bands in the range of 0.1–10 THz as well as 140-, 220-, and 340-GHz frequencies, are expected to support a data rate of up to terabits per second [2]. To achieve such capacity-approaching performance, Cloud-Fog/Edge Typical Techniques Fog/Edge Layer Cloud Layer Software Defined Network (SDN) SDWN Controller Resource/ Application Ultrareliable Mobility Support ≥1,000 km/h Satellite Networks Ultrabroadband ≥1 Tbit/s High-Altitude Platforms Air Space Terahertz Communications Management Cloud Fog Gateway FCP FCP FCP Cloud Edge Gateway Gateway Base Wireless Intelligent Access Edge Node Radio Station Tower Point User Layer Body Area Networks Ultramassive Access 107 Devices/km2 UAV Networks Airborne Internet Maritime Broadband Underwater Ocean Land Network Functions Virtualization IoV Smart City Underwater Acoustic/Optical Communications Ultralow-Latency Computing and Communication ≤100 µ s/s Intelligent Network Management and Optimization in 6G • Application Layer Smart city, smart home, smart health care, data mining/processing/ prediction, dimension reduction, feature extraction, attack detection/ classification, caching, data offloading, error detection/prediction, allocation, data rate selection, and so on. • Network Layers • Physical Layer Caching, traffic classification, anomaly detection, throughput optimization, latency minimization, attack detection, intelligent routing, traffic prediction/ control, access control, source encoding/decoding, and so on. Channel tracking/equalization/ decoding, pathloss prediction/estimation, intelligent beamforming, modulation mode selection, anti-jamming, channel access control, spectrum sensing/ management/allocation, physical-layer security, and so on. AI and ML Techniques • Supervised Learning: Neural Networks Decision Tree, Naive Bayesian, K-Nearest Neighbors, Logistic Regression, and so on. • Unsupervised/Semisupervised Learning: K-Means, ISOMAP, Gaussian Mixture Model, Expectation Maximization, Locally Linear Embedding, and so on. • Reinforcement Learning: Q-Learning, Policy Gradients, SARSA, Deep Q Network, and so on. • Deep Learning: Convolutional Neural Network, Recurrent Neural Network, Recursive Neural Network, and so on. Figure 1 An illustration of AI/ML applications in 6G to support ultrabroadband, ultramassive access, and ultrareliability/low latency. 4 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. AI-/ML-Enabled Terahertz Channel Modeling and Estimation At the terahertz frequency bands, the channels suffer from high atmospheric absorption resulting from the water vapor in the air, which influences losses significantly. In addition, free-space pathloss is also unavoidable physically in terms of atmospheric attenuation. Furthermore, terahertz channels are observed as nonstationary, especially for dynamic scenarios where both users and objects might be moving. Therefore, traditional channel models based on assumptions of being stationary or quasi-stationary can no longer apply to terahertz channels. ML algorithms are capable of analyzing the communication data and predicting likely signal loss in a given or unknown environment. Therefore, many different types of AI or ML algorithms can be applied to the physical layer (PHL) of 6G networks to deal with the difficulties just described for terahertz channel modeling and estimation. For instance, to improve estimation accuracy in dynamic scenarios, the RL-based Bayesian filter has been introduced to the angle-of-arrival (AoA) estimation in terahertz channels in current studies. Specifically, the Bayesian filter implements the estimation of the current AoA from both current measurement and previous estimates. In this procedure, the prior transition probabilities between system states are important to the estimation performance of the Bayesian filter. RL then can be applied to optimize the state transition probabilities from the feedback of previous estimates and, hence, improve the performance of the Bayesian filter. Some other feasible algorithms and applications in channel modeling and estimation are summarized as follows. ■■ Supervised learning: Supervised learning can be introduced to pathloss/shadowing prediction, localization, interference management, channel estimation, and so on. The feasible algorithms and models include radial basis function neural networks, feed-forward neural networks, K-nearest neighbor (KNN), multilayer perception, relevance vector machine, and support vector machine (SVM). ■■ Unsupervised learning: Channel modeling and estimation problems, such as optimal modulation, interference mitigation, duplexing configuration, node clustering, and multipath tracking, can be solved by applying unsupervised learning algorithms, which include K-means, clustering algorithms, fuzzy C-means, and so on. ■■ DL: DL can be implemented for channel feature extraction, channel state information (CSI) estimation, signal detection, and sparse signal recovery. Typical DL algorithms, such as convolutional neural networks, recurrent neural networks (RNNs), deep neural networks (DNNs), deep belief networks, and deep Boltzmann machines, can be expected as good candidates. ■■ RL: RL can be introduced to channel tracking, channel selection, modulation mode selection, radio identification, and so on. Feasible algorithms and models include fuzzy RL, Q-learning, WoLF-PHC (Win-or-Learn-Fast-Policy Hill-Climbing) Markov decision process (MDP), and partially observable MDP. Deep RL-Based Terahertz Spectrum Management At present, there exists no restriction on terahertz spectrum use. The spectra have been occupied already by some other applications, such as satellite services, spectroscopy, and meteorology [3]. Recently, the Federal Communications Commission has been investing in utilizing terahertz spectrums for mobile services and applications. Therefore, spectrum-sharing methods are necessary for the coexistence of future terahertz communications and the other existing applications listed previously. In addition, as discussed in the previous section, 6G networks tend to be multidimensional, ultradense, and heterogeneous. Thus, considering that the propagation medium and channel characteristic in integrated 6G networks are significantly distinct compared with terrestrial networks in 5G, it requires more effort to optimize the spectrum management of terahertz communications in 6G. RL has the potential to realize smart or intelligent spectrum management to deal with these problems, especially when large amounts of data can be leveraged to train and predict. These training and prediction results can be taken advantage of to make decisions concerning whether or not the spectrum band is occupied and to take action, such as accessing or releasing the spectrum band. In addition, through the interaction between users and the wireless environment, users can optimize their strategies iteratively to maximize the value of reward functions, which can be established considering spectrum efficiency, network capacity, consumed energy, interference, and so on. However, RL is not competent for learning an effective action–value policy when there exist random noise or measurement errors occupying the state observations, meaning that the number of states in the presence of random noise is infinite in practice. Addressing this problem of random state measurements, deep RL (DRL) can be considered a suitable tool to optimize the decisions on spectrum management in 6G networks involving dynamic spectrum access, transmission power control, spectrum allocation, and so on. Distributed Spectrum Access Distributed algorithms for dynamic spectrum access should be designed to adapt to general, complex, realworld settings effectively and efficiently. Meanwhile, the expensive computational consumption resulting from MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. ||| 5 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. the trained DQN. Such a spectrum access framework is implemented according to the procedure presented in Figure 2. the large state space and partial observability in the system can be also mitigated. To achieve this goal, a long short-term memory (LSTM) layer maintaining an internal state and aggregate observations can be established to ensure the ability of estimating the true state using past partial observations [4]. In addition, a dueling deep Q network (DQN) method also can be applied to improve the estimated Q-value resulting from bad states. After training, users need to update only their DQN weights by communicating with the central unit and then map its local observation to spectrum access actions based on Interaction With Environment Distributed Dynamic Power Control By applying challenging optimal problems, traditional power control techniques typically search the nearoptimal power allocation strategies. Such techniques can hardly adapt to large-scale networks because of their high computational complexity and precision requirement of instantaneous CSI. Model-free RL has Distributed Execution Centralized Training Spectrum Access LSTM Agent i Select an Action: Channel Selection max Q (s t, at ) i a i a Receive Reward rit Power Control Take Action a it t ) max Q (sit, a; θtarget Communication Environment Observe State sit Advantage of Action: A (sit, ait ) Cell Upload (sit, ati) and Local Observations x0 Average Q-Value of State V (sit ) Cell Cell Input Output Hidden xn x1 Input: 1) Selected Channels at t –1 2) Capacity of Each Channel 3) ACK Signal Received Upload (sit, ati) and Local Observations Input t–1 (st–1 i , a i ) and Local Observations Mini-Batch Backhaul Delay ExperienceReplay Memory θ t+1 train Output Hidden Power Control Train DQN Optimizer θ ttrain Update θtarget Upload Historical Information: States, Actions, Rewards of Agent i Spectrum Resource Block Selection Backhaul Delay of Td Slots Actor 1 Actor 2 Download Weights of Actor Network Actor N θ ttrain Once per Tu Time Slots s1t Spectrum Allocation Input r1t a1t s2t r2t a2t t sN atN rNt Output Hidden Critic 1 Critic 2 Critic N Figure 2 The DRL-based spectrum access, power control, and spectrum allocation in 6G networks. ACK: acknowledgment. 6 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. been introduced to deal with these problems in largescale and heterogeneous 6G networks and also can achieve near-optimal power allocation, promising maximum sum-rate and scheduling fairness in real time. In this framework, each transmitter collects QoS and CSI information of its neighbors, which will be analyzed to estimate or extract random variations and delay in the CSI using deep Q-learning [5]. Figure 2 illustrates this DRL-based power control mechanism. Dynamic Spectrum Allocation A distributed spectrum allocation framework also can be formulated based on multiagent DRL, in which the agent refers to each device occupying the spectrum resource. In such a framework, the multiagent environment is established as a partially observable Markov game model. To deal with the instability of the environment, a neighbor-agent actor-critic (NA AC) model, which trains the information from neighbor nodes in a centralized manner but with decentralized implementation, can be introduced to leverage the relationship among devices sharing the spectrum resource to improve system performance, such as sum-data rate and spectrum efficiency. According to such a framework, the historical information is used for training the RL model but not the decisions of spectrum allocation [6]. This NAAC-based framework for spectrum allocation is seen in Figure 2. AI-/ML-Based Energy and Security Management for Super IoT 5G cellular networks introduced a new usage scenario oriented to support massive IoT, namely, mMTC. Toward 6G, the concept of a “super IoT” has been proposed recently, which can be elaborated with symbiotic radio and satellite-assisted IoT communications to support an astonishing number of connected IoT devices and an extended coverage, respectively. Conseq u e n t l y, m o r e e f f i c i e n t e n e r g y m a n a g e m e n t mechanisms are expected to support the large scale of IoT systems to operate stably for long periods of time. In addition, privacy and security issues will face more severe challenges, especially for IoT systems collecting individual or sensitive information. This section introduces AI-/ML-enabled energy and security management in super IoT systems. Efficient Energy Management for Large-Scale Energy-Harvesting Networks In traditional IoT systems, low-power IoT devices are typically limited by the energy stored in their batteries. Such energy shortcomings and limitations will bring great challenges in energy management and optimization when the scale of IoT systems grows sharply. In response, energy-harvesting technologies have been regarded as a promising approach to prolong the lifespan of super IoT systems by enabling IoT devices to harvest energy from potential energy resources, e.g., solar and wind energy. However, such an energyharvesting scheme is a random process resulting from the intensity dynamic of energy resources, which means that the amount of energy stored in the battery of each IoT device cannot be known precisely in advance. In addition, the controllable energy is constrained according to the current stored energy, which is also capped by the battery capacity. Therefore, these problems will lead to the changing and uncertainty of the total controllable energy and make it difficult to solve the optimization problem of energy management, since these energy constraints are always changing [7]. Feasible approaches for energy management in energy-harvesting-enabled IoT systems can be divided into offline management and online management, and the latter can be realized through centralized or distributed methods. Some ty pica l mecha nisms designed in recent studies are summarized in Table 2. Here, we analyze the advantages and disadvantages of these approaches. Offline Management Recently, many offline-based energy management approaches were designed by optimizing power allocation, access control, and so on. However, offline control is essentially based on the assumption that the perfect information of energy and channel status can be observed before the operation, which is hardly implementable in practice. Centralized Online AI-/ML-Based Management In contrast to offline management, online AI-/ML-based energy management approaches only require the present and previous energy arrivals and channel status when implemented to improve the communication performance in energy-harvesting-enabled super IoT systems. Based on the online AI/ML framework, power allocation and access control problems can be established as stochastic control problems whose discretized energy and channel states are then modeled as MDPs. However, this online AI/ML framework still requires the perfect information of energy arrivals and channel status, which is hard to observe in practice. The RL- and Lyapunov optimization-based methods emerged as a result, most of which searched approximate solutions in a centralized fashion. Nevertheless, this centralized approach is not applicable when the number of IoT devices is large, resulting from inevitable significant feedback overheads. In addition, MDPs always suffer the “curse of dimensionality,” which results in heavy computational loads and makes MDPs intractable. MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. ||| 7 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. Distributed Online AI-/ML-Based Management Without any prior information about energy arrivals and channel status, a fully distributed online energy management scheme will not require any information exchange among IoT devices. Such distributed online energy management is not easy to realize, considering that convergence cannot always be guaranteed by applying such an approach, which results from the nonstationary environment. Moreover, many distributed online energy management schemes were proposed based on the assumption that the global system state is available for each device. To overcome these problems, a mean-field, multiagent DRL-based framework was proposed in [8] to learn the optimal power control to maximize the throughput of energy-harvestingenabled super IoT systems. In [8], the throughput maximization problem was modeled as a mean-field game having a unique stationary solution, which ensures the convergence of the problem. In addition, each IoT device applies DRL individually to find the optimal power control without any prior information about energy arrivals and channel status. This distributed approach can achieve throughput close to centralized policies and can be implemented in large-scale IoT systems in practice. Privacy and Security Guarantee for Super IoT The extremely vast amounts of IoT devices and data bring great challenges to privacy preservation and security guarantee. To protect super IoT systems from various kinds of threats and attacks, authentication, access control, and attack detection are of paramount importance; traditional privacy and security technologies are hardly applicable to super IoT, resulting from the heterogeneity of resources, volume of networks, limited energy and storage of devices, and so on. By providing embedded intelligence in IoT devices and systems, AI-/MLbased security technologies are leveraged to deal with these security problems. Next, we discuss some existing AI-/ML-based solutions and feasible research directions for addressing authentication, access control, and attack detection in super IoT systems. Some recent typical studies are summarized in Table 3. AI-/ML-Based Authentication and Access Control Authentication and access control can help IoT devices distinguish identity-based attacks and prevent unauthorized devices from accessing authorized systems [9]. To improve authentication accuracy, different AI-/ML-based approaches can perform well based on different scenarios and assumptions. In the following, we investigate Table 2 The typical energy management mechanisms in energy harvesting-enabled large-scale IoT systems. AI/ML Technique First Introduced (See “Tables 2 and 3 References”) Category Optimization Objective Applications Water-filling A. Arafa 2018 [S1] Offline Throughput maximization Energy consumption optimization Integer linear programming H. Ayatollahi 2017 [S2] Offline Throughput maximization Communication scheme selection Directional waterfilling O. Ozel 2011 [S3] Offline Throughput maximization, delay minimization Power control DNN, MDP M. K. Sharma 2019 [S4] Centralized online Time-averaged throughput maximization Power control RL, DQN M. Chu 2019 [S5] Centralized online Uplink sum-rate maximization Multiaccess control Lyapunov optimization H. Yu 2019 [S6] Centralized online Throughput maximization Power control RL F. A. Aoudia 2018 [S7] Centralized online QoS maximization Energy harvesting and RL A. Ortiz 2018 [S8] Centralized online Throughput maximization Power control RL, MDP K. Wu 2019 [S9] Centralized online Data importance value maximization Communication link control Bayesian RL Y. Xiao 2015 [S10] Centralized online Long-term expected reward maximization Power control, data transmission control DNN, mean-field game (MFG) M. K. Sharma 2019 [S11] Distributed online Time-averaged throughput maximization Power control MDP, MFG D. Wang 2018 [S12] Distributed online Communication delay minimization Power control Stochastic game V. Hakami 2017 [S13] Distributed online Communication delay minimization Power control Multi-agent RL, Markov game A. Ortiz 2017 [S14] Distributed online Sum-rate maximization Power control 8 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. some feasible AI-/ML-based solutions in authentication and access control problems in super IoT systems. ■■ RL: Q-learning-based approaches can be applied in PHL authentication, which is realized by comparing the PHL feature of a message with the claimed transmitter. In this procedure, the authentication accuracy depends on the test threshold in the comparisons. Resulting from the uncertain channel environment and unpredicted spoofing model, each IoT device needs to estimate the false alarm and misdetection rate of spoofing, the future states of which are independent of the previous states and actions. Therefore, the problem of threshold selection can be modeled as an MDP with finite states. ■■ Supervised learning: Different from the threshold decision in Q-learning-based approaches just described, the CSI can be exploited through supervised learning to learn how the channel changes, and then the PHL ■■ ■■ authentication problems can be formulated as binary classification problems, which are threshold free. Typical supervised learning algorithms, such as decision tree, SVM, KNN, and ensemble learning, then can be introduced to such classification problems to identify the legitimate or illegitimate information according to the CSI. Unsupervised learning: Unsupervised learning, such as nonparametric Bayesian methods, can be introduced in proximity-based authentication and access control to identify the IoT devices in the proximity without leaking the localization and other privacy-sensitive information of IoT devices. DL: According to the CSI in Wi-Fi or other radio signals generated by IoT devices, human physiological and behavioral characteristics can be learned by applying multilayer DNN [10]. Based on activity recognition and identification, authentication and access control schemes then can be designed. Table 3 Typical AI-/ML-based security mechanisms in large-scale IoT systems. AI/ML Techniques Typical Research (See “Tables 2 and 3 References”) Security Problems Attacks Performance optimization Neural network J. M. McGinthy 2019 [S15] Authentication Spoofing Classification accuracy and delay DRL A. Ferdowsi 2019 [S16] Authentication Man-in-the-middle and data injection Extraction error rate, detection delay SVM, LSTM, DL, RNN J. Chauhan 2018 [S17] Access control Spoofing Classification accuracy, feature extraction time, inference time DNN, SVM C. Shi 2017 [S18] Authentication Spoofing False alarm rate Q-learning, Dyna-Q L. Xiao 2016 [S19] Authentication Spoofing False alarm rate, average error rate, detection accuracy Nash Q-learning Y. Li 2017 [S20] Access control DoS attack Root mean error DRL Y. Wang 2019 [S21] Malware detection Malware attack Detection accuracy RL H. S. Anderson 2018 [S22] Malware evasion Malware attack Successful rate of evasion Q-learning, Dyna-Q L. Xiao 2017 [S23] Malware detection Malware attack Detection accuracy and delay RF, KNN, Bayesian net F. A. Narudin 2016 [S24] Malware detection, access control Malware attack, intrusion True positive rate, false positive rate, detection precision SVM, DQN M. P. Arthur 2019 [S25] Attack detection Jamming, spoofing, intrusion Detection accuracy DRL N. Abuzainab 2019 [S26] Attack detection, secure routing and transmission Jamming, eavesdropping Detection accuracy, system throughput DL A.A. Diro 2018 [S27] Attack detection and mitigation DoS, probe, R2L, U2R Detection precision and delay Semi-supervised Fuzzy C-means S. Rathore 2018 [S28] Attack detection and mitigation DoS, probe, R2L, U2R Detection precision, positive predictive value, sensitivity Decision tree E. Viegas 2018 [S29] Anomaly intrusion detection Intrusion Detection accuracy KNN, ANN, RF, decision tree R. Doshi 2018 [S30] Attack detection and mitigation DDoS Detection accuracy DQN G. Han 2017 [S31] Secure channel selection Jamming SINR R2L: remote to local; U2R: user to root; SINR: signal-to-interference-plus-noise ratio. MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. ||| 9 This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. Tables 2 and 3 References [S1] A. Arafa and S. Ulukus, “Mobile energy harvesting nodes: Offline and online optimal policies,” IEEE Trans. Green Commun. Netw., vol. 2, no. 1, pp. 143–153, Mar. 2018. doi: 10.1109/ TGCN.2017.2777668. [S2] H. Ayatollahi, C. Tapparello, and W. Heinzelman, “Reinforcement learning in MIMO wireless networks with energy harvesting,” in Proc. IEEE Int. Conf. Commun. (ICC), pp. 1–6, Paris, France, May 21-25, 2017. doi: 10.1109/ICC.2017.7997229. [S3] O. Ozel, K. Tutuncuoglu, J. Yang, S. Ulukus, and A. Yener, “Transmission with energy harvesting nodes in fading wireless channels: Optimal policies,” IEEE J. Sel. Areas Commun., vol. 29, no. 8, pp. 1732–1743, Sept. 2011. doi: 10.1109/ JSAC.2011.110921. [S4] M. K. Sharma, A. Zappone, M. Debbah, and M. Assaad, “Deep learning based online power control for large energy harvesting networks,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Brighton, May 12–17, 2019, pp. 8429– 8433. doi: 10.1109/ICASSP.2019.8683468. [S5] M. Chu, H. Li, X. Liao, and S. Cui, “Reinforcement learningbased multiaccess control and battery prediction with energy harvesting in IoT systems,” IEEE Internet Things J., vol. 6, no. 2, pp. 2009–2020, Apr. 2019. doi: 10.1109/JIOT.2018.2872440. [S6] H. Yu, Z. Zhou, C. Pan, X. Zhao, and S. Mumtaz, “Online resource allocation for energy harvesting based large-scale multiple antenna systems,” in Proc. IEEE Globecom Workshops (GC Wkshps), Waikoloa, HI, Dec. 9–13, 2019, pp. 1–6. doi: 10.1109/GCWkshps45667.2019.9024449. [S7] F. Ait Aoudia, M. Gautier, and O. Berder, “RLMan: An energy manager based on reinforcement learning for energy harvesting wireless sensor networks,” IEEE Trans. Green Commun. Netw., vol. 2, no. 2, pp. 408–417, June 2018. doi: 10.1109/ TGCN.2018.2801725. [S8] A. Ortiz, T. Weber, and A. Klein, “A two-layer reinforcement learning solution for energy harvesting data dissemination scenarios,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), Calgary, Canada, Apr. 15–20, 2018, pp. 6648–6652. doi: 10.1109/ICASSP.2018.8462056. [S9] K. Wu, H. Jiang, and C. Tellambura, “Sensing, probing, and transmitting policy for energy harvesting cognitive radio with two-stage after-state reinforcement learning,” IEEE Trans. Veh. Tech., vol. 68, no. 2, pp. 1616–1630, Feb. 2019. doi: 10.1109/TVT.2018.2888826. [S10] Y. Xiao, D. Niyato, Z. Han, and L. A. DaSilva, “Dynamic energy trading for energy harvesting communication networks: A stochastic energy trading game,” IEEE J. Sel. Areas Commun., vol. 33, no. 12, pp. 2718–2734, Dec. 2015. doi: 10.1109/ JSAC.2015.2481204. [S11] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. Vassilaras, “Distributed power control for large energy harvesting networks: A multi-agent deep reinforcement learning approach,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 4, pp. 1140–1154, Dec. 2019. doi: 10.1109/TCCN.2019.2949589. [S12] D. Wang, W. Wang, Z. Zhang, and A. Huang, “Delay-optimal random access for large-scale energy harvesting networks,” in Proc. IEEE Int. Conf. Commun. (ICC), Kansas City, MO, May 20–24, 2018, pp. 1–6. doi: 10.1109/ICC.2018.8422272. [S13] V. Hakami and M. Dehghan, “Distributed power control for delay optimization in energy harvesting cooperative relay networks,” IEEE Trans. Veh. Tech., vol. 66, no. 6, pp. 4742– 4755, June 2017. doi: 10.1109/TVT.2016.2610444. [S14] A. Ortiz, H. Al-Shatri, X. Li, T. Weber, and A. Klein, “Reinforcement learning for energy harvesting decode-and-forward two-hop communications,” IEEE Trans. Green Commun. Netw., vol. 1, no. 3, pp. 309–319, Sept. 2017. doi: 10.1109/ TGCN.2017.2703855. [S15] J. M. McGinthy, L. J. Wong, and A. J. Michaels, “Groundwork for neural network-based specific emitter identification authentication for IoT,” IEEE Internet Things J., vol. 6, no. 4, pp. 6429–6440, Aug. 2019. doi: 10.1109/JIOT.2019.2908759. AI-/ML-Based Attack Analysis and Detection Similar to applications in authentication and access control, AI/ML technologies also can be applied to analyze and detect different kinds of attacks, such as spoofing, jamming, denial of service (DoS) or distributed DoS 10 ||| [S16] A. Ferdowsi and W. Saad, “Deep learning for signal authentication and security in massive Internet-of-Things systems,” IEEE Trans. Commun., vol. 67, no. 2, pp. 1371–1387, Feb. 2019. doi: 10.1109/TCOMM.2018.2878025. [S17] J. Chauhan, S. Seneviratne, Y. Hu, A. Misra, A. Seneviratne, and Y. Lee, “Breathing-based authentication on resourceconstrained IoT devices using recurrent neural networks,” Computer, vol. 51, no. 5, pp. 60–67, May 2018. doi: 10.1109/ MC.2018.2381119. [S18] C. Shi, J. Liu, H. Liu, and Y. Chen, “Smart user authentication through actuation of daily activities leveraging WiFienabled IoT,” in Proc. ACM Int. Symp. Mobile Ad Hoc Netw. Comput., Chennai, India, July 2017, pp. 1–10. doi: 10.1145/3084041.3084061. [S19] L. Xiao, Y. Li, G. Han, G. Liu, and W. Zhuang, “PHY-layer spoofing detection with reinforcement learning in wireless networks,” IEEE Trans. Veh. Tech., vol. 65, no. 12, pp. 10,037– 10,047, Dec. 2016. doi: 10.1109/TVT.2016.2524258. [S20] Y. Li, D. E. Quevedo, S. Dey, and L. Shi, “SINR-based DoS attack on remote state estimation: A game-theoretic approach,” IEEE Trans. Contr. Netw. Syst., vol. 4, no. 3, pp. 632–642, Sept. 2017. doi: 10.1109/TCNS.2016.2549640. [S21] Y. Wang, J. W. Stokes, and M. Marinescu, “Neural malware control with deep reinforcement learning,” in Proc. IEEE Military Commun. Conf. (MILCOM), Norfolk, VA, Nov. 12–14, 2019, pp. 1–8. doi: 10.1109/MILCOM47813.2019.9020862. [S22] H. S. Anderson, A. Kharkar, B. Filar, D. Evans, and P. Roth, “Learning to evade static PE machine learning malware models via reinforcement learning,” Jan. 30, 2018, arXiv:1801.08917. [S23] L. Xiao, Y. Li, X. Huang, and X. Du, “Cloud-based malware detection game for mobile devices with offloading,” IEEE Trans. Mobile Comput., vol. 16, no. 10, pp. 2742–2750, Oct. 2017. doi: 10.1109/TMC.2017.2687918. [S24] F. A. Narudin, A. Feizollah, N. B. Anuar, and A. Gani, “Evaluation of machine learning classifiers for mobile malware detection,” Soft Comput., vol. 20, no. 1, pp. 343–357, Jan. 2016. doi: 10.1007/s00500-014-1511-6. [S25] M. P. Arthur, “Detecting signal spoofing and jamming attacks in UAV networks using a lightweight IDS,” in Proc. Int. Conf. Comput., Inform. Telecommun. Syst. (CITS), Beijing, China, Aug. 28–31, 2019, pp. 1–5. doi: 10.1109/CITS.2019.8862148. [S26] N. Abuzainab et al., “QoS and jamming-aware wireless networking using deep reinforcement learning,” in Proc. IEEE Military Commun. Conf. (MILCOM), Norfolk, VA, Nov. 12–14, 2019, pp. 610–615. doi: 10.1109/MILCOM47813.2019. 9020985. [S27] D. A. Abeshu and C. Naveen, “Distributed attack detection scheme using deep learning approach for Internet of Things,” Future Gener. Comput. Syst., vol. 82, pp. 761–768, May 2018. doi: 10.1016/j.future.2017.08.043. [S28] S. Rathore and J. H. Park, “Semi-supervised learning based distributed attack detection framework for IoT,” Appl. Soft Comput., vol. 72, pp. 79–89, May 2018. doi: 10.1016/j. asoc.2018.05.049. [S29] E. Viegas, A. Santin, V. Abreu, and L. S. Oliveira, “Enabling anomaly-based intrusion detection through model generalization,” in Proc. IEEE Symp. Comput. Commun. (ISCC), Natal, Brazil, June 25–28, 2018, pp. 934–939. doi: 10.1109/ ISCC.2018.8538524. [S30] R. Doshi, N. Apthorpe, and N. Feamster, “Machine learning DDoS detection for consumer Internet of Things devices,” in Proc. IEEE Secur. Privacy Workshop (SPW), San Francisco, CA, May 24, 2018, pp. 29–35. doi: 10.1109/SPW.2018.00013. [S31] G. Han, L. Xiao, and H. V. Poor, “Two-dimensional anti-jamming communication based on deep reinforcement learning,” in Proc. IEEE Int. Conf. Acoustics, Speech Signal Process. (ICASSP), New Orleans, LA, Mar. 5–9, 2017, pp. 2087–2091. doi: 10.1109/ICASSP.2017.7952524. (DDoS) attacks, eavesdropping, malware attacks, and so on [9]. For instance, supervised learning, including SVM, KNN, random forest (RF), and DNN, can be introduced to detect these attacks by building classification and regression models. In addition, unsupervised learning can IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. investigate unlabeled data to divide them into different groups; e.g., multivariate correlation analysis can help to detect DoS and DDoS attacks. In some recent studies, RL algorithms have been applied to help IoT devices make decisions on the selection of security protocols against attacks. The feasible algorithms include Q-learning, DQN, Dyna-Q, and so on. AI-/ML-Enabled Ultrareliable/Low-Latency Applications Satellite, UAV, and IoV communications will be integrated in 6G networks, for which the high dynamics of channel, environment, and traffic, as well as increasingly delaysensitive applications, require more reliable and lowlatency transmission technologies to guarantee communication connectivity and timeliness. In addition, accompanying frequent resource allocation, network reconfiguration, and service customization also depend heavily on reliable, low-latency, and flexible network management. To satisfy such needs, mobility management and offloading techniques are expected to support ultrareliable and low-latency communications and also are confronted with challenges brought by high dynamics, multidimensionality, and significant heterogeneity. In this section, we discuss some AI-/ML-based solutions targeting the improvement of the reliability and timeliness of communications in 6G. Intelligent Mobility and Handover Management in 6G High-speed mobility of elements in 6G, including satellites, UAVs, vehicles, and so on, will result in frequent handovers, making the connections and communications unstable and unreliable. Moreover, the service requirements of low latency and high transmission rate will also make it more challenging to achieve efficient mobility and handover management. Therefore, to support ultrareliable and low-latency applications in 6G, DRL, DL, and RL will be capable and powerful tools to endow the mobility management with intelligence and adaptivity [11]. ■■ DRL: In a UAV-enabled 6G network, UAVs can perform as DRL agents. They can observe the environment states, such as the movement velocity, current position, and link quality, and then make the best decisions in terms of mobility and handover actions to maximize their rewards, which can be defined considering the link stability, channel quality, transmission latency and capacity, and so on. By interacting with the dynamic environment, UAVs will learn their strategies of mobility and handover management automatically and robustly to minimize transmission latency and handover failure probability and then will achieve highly reliable wireless connections in the system. ■■ DL: It is necessary to achieve the precise estimation of state for mobility and handover management of UAVs. However, the inaccuracies associated with onboard MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE ■■ measurements, such as unpredictable drifts, biases, and immense noise resulting from significant vibration of UAVs’ rotors, make it difficult to obtain accurate state estimates. A DL-based framework that can apply the ANN, RNN, and so on may help to improve the accuracy of state estimation. To be specific, a DNN can be trained to identify the associated measurement noise models and then filter them out of the final estimation. To further reduce computation complexity, the dropout technique also can be adopted when training this DNN. In addition, DL also can be applied to predict trajectories of UAVs. By learning the movement behaviors of UAVs according to the measurement information, the positional relationships among UAVs can be analyzed. Based on such information, mobility and handover mechanisms with high success rates can be designed. Furthermore, LSTM also can perform as a powerful tool to design efficient mobility and handover management schemes [12]. By training the previous and current mobility contexts of UAVs, the sequence of future time-dependent mobility states and trajectories of UAVs can be obtained, which can be considered to optimize handover parameters. RL: Cooperative Q-learning-based parameters on the optimization of mobility-sensitive handover can learn the required parameters appropriate for specific velocity conditions in UAV-enabled 6G networks, which can be adapted to the realistic environment, where UAVs have time-varying velocities. To avoid frequent handovers and reduce handover and connectivity failures, dynamic fuzzy Q-learning can be utilized to optimize the handover parameters and then guarantee the reliability and efficiency of UAV-enabled connections and communications. Intelligent Communication and Computing Resource Allocation Driven by the exponentially growing demands of multimedia data traffic and computation-heavy applications, 6G networks are expected to achieve a high QoS with ultrareliability and low latency. In response, resource allocation has been considered an important factor that can improve 6G performance directly by configuring heterogeneous resources effectively and efficiently. In 6G, the allocated resource can be divided into communication resources, which include channels and bandwidth, and computing resources, such as memory and processing power. In recent years, various traffic offloading, caching, and cloud/fog/edge computing mechanisms designed to allocate these communication, storage, and computing resources in heterogeneous networks, respectively, have become promising solutions to handle the increasing data and computational requirements with low-latency and on-demand services. In addition, ||| 11 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. action space of offloading decisions is often a continuous–discrete hybrid. To be specific, in a task offloading-enabled 6G network, the strategies for determining which node should be selected to implement traffic/ computation offloading or caching constitute a discrete action space. On the other hand, the possible resource volume, which will be provided by the selected node for offloading, is usually a continuous value. Such resource allocation problems with continuous–discrete hybrid decision spaces tend to be extremely complex, especially when time-varying tasks, energy harvesting, and security issues are also considered. To provide low-latency computing services, we have carried out some preliminary work focusing on the hybrid decision of computation offloading in 6G networks based on DRL. As demonstrated in Figure 3, different AI/ML techniques, such as RL, DRL, double DRL, and so on, have been introduced to these resource allocation techniques to deal with the sophisticated optimization of decision making resulting from the multidimensionality, random uncertainty, and dynamics of 6G. By applying AI/ML tools, valuable information can be extracted through training observed data, and then different functions for prediction, optimization, and decision making in traffic offloading, caching, and cloud/fog/ edge computing can be learned to support ultrareliable and low-latency services [13]. However, most current RL- or DRL-based resource allocation approaches were modeled in a discrete action space, which restricts the optimization of offloading decisions in a limited action space [14]. Such a model assumption is unreasonable in practice, where the Interaction With Environment Distributed Execution Communication Environment States: • Task Load L Take • Battery Level b Action • Harvested t , ft, i am m Energy e • Important Receive Factor Z • Channel Gain gi Reward ri t • Computing Capacity fi Device m Actor i : Select an Action Centralized Training Upload Historical Information: States, Actions, and Rewards of Device m Strategy: Offloading (1 – α)l Data to Server i ; Locally Processing Download Weights αl Data With CPU of Actor Network Frequency f Output Action DRL Actor µ (s; φµ) u1, u2, …, uN Continuous Action: ui = (αi , fi ) Critic Q (s, i, u; φQ) Q1, Q2, …, QN Discrete Action: i = arg maxj ∈NQi " Observe State t sm (a) Hybrid-AC DQLO (5 States) DQLO (10 States) Server Execution Device Execution Exhaustive Search Upper Bound 100 20 Average Consumed Time (s) Average Rewards 10 0 –10 –20 –30 –40 0.8 10–1 10–2 10–3 1 1.2 1.4 1.6 1.8 2 2.2 Requested Task Load ζ 2.4 2.6 (b) 6 7 8 9 10 11 12 Maximum Harvested Energy emax (10–4 J) (c) Figure 3 The framework and simulation results of a hybrid decision-controlled DRL-based dynamic computation offloading scheme in large-scale IoT systems. (a) An illustration of a hybrid decision-controlled DRL-based dynamic computation offloading scheme. (b) Performance versus different task arrival rate. (c) Performance versus different maximum harvested energy. Hybrid-AC: hybrid action–critic; DQLO: deep Q-learning-based offloading. 12 ||| IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. energy-harvesting-enabled devices can offload their computational tasks to edge computing servers. The server selection problem is modeled in a discrete action space; meanwhile, the decision spaces of offloading ratio and local computation capacity are continuous. In the DRL framework, at each step after observing the states of systems (such as task load, battery level, harvested energy of each device, channel status, and computation capacity of each device and server), the possible computation offloading decisions, including server selection, offloading ratio, and local computation capacity allocation, are contained in the sets of possible actions. Each device then selects the best actions from these sets to maximize its reward, which is determined by latency, energy cost, reliability, and so on. The detailed modeling and implementation of this proposed mechanism are provided in [15]. To validate the efficiency and superiority of our proposed hybrid action–critic-based computation offloading approach, we test the average rewards received and execution time compared with those of deep Q-learning-based offloading, server execution, and device execution. The latter two mechanisms indicate executing all computational tasks at the selected server remotely and at the device locally, respectively. Simulation results in Figure 3(b) and (c) indicate that, with different task arrival rates and allowed maximum harvested energy, the proposed approach can achieve the highest reward and smallest time latency among the four schemes. technologies, including traffic, storage, and computing offloading mechanisms, were identified to meet the requirements of ultrareliability and low latency in 6G services. As investigated in this article, AI-/ML-enabled techniques may allow future 6G networks to learn from uncertain and dynamic environments, adapt to unpredictable changes in an intelligent and automated fashion, and then achieve significantly improved performance in aspects of ultrabroadband, ultramassive access, ultrareliability, and low latency. There are still many challenges to realize comprehensive and mature applications of AI/ML techniques in 6G. Especially for current computing devices with limited power, memory, storage, and processing capacities, how to modify AI-/ML-based algorithms and mechanisms, which bring high complexity and huge amounts of computation, to get closer to practical implementation is worthy of further investigation. In addition, varied and emerging application scenarios and new AI/ML techniques may also bring challenges to the implementation of intelligent technologies in 6G. Acknowledgments This research was supported by the National Natural Science Foundation China under project 61971257, China Postdoctoral Science Foundation under special grant 2019T120091 and grant 2018M640130, and the project “The Verification Platform of Multi-tier Coverage Communication Network for Oceans (LZC0020).” The corresponding authors of this article are Chunxiao Jiang and Yong Ren. Conclusions To satisfy emerging services and applications, AI-/MLenabled 6G networks have been considered fundamental enablers to carry forward the capacities of eMBB, mMTC, and uRLLC in 5G to a more powerful and intelligent level. In this article, we focused on some solutions of applying AI and ML tools to 6G networking and resource management optimization. We illustrated intelligent terahertz techniques, such as AI-/MLenabled terahertz channel estimation and spectrum management, which are considered revolutionary, to achieve an ultrabroadband transmission. In addition, we introduced AI/ML applications in energy management, especially for large-scale energy-harvesting networks. Moreover, AI-/ML-based security enhancement mechanisms, including authentication, access control, and attack detection, were discussed for super IoT systems. Such intelligentization of energy and security will help to achieve efficient and reliable ultramassive access. Furthermore, we introduced some efficient mobility and handover management approaches based on DRL, DL, and Q-learning to realize ultrareliable and stable transmission links and satisfy the high dynamics in 6G. Finally, intelligent resource allocation MONTH 2020 | IEEE VEHICULAR TECHNOLOGY MAGAZINE Author Information Jun Du (blgdujun@gmail.com) currently holds a postdoctoral position with the Department of Electrical Engineering, Tsinghua University, China. Her research interests are mainly in resource allocation and system security of heterogeneous networks and space-based information networks. She is the recipient of the Best Paper Award from the IEEE International Conference on Communications 2019 and the Best Paper Award from the International Conference on Wireless Communications and Mobile Computing in 2020. She is a Member of IEEE. Chunxiao Jiang (jchx@tsinghua.edu .cn) is an associate professor at the School of Information Science and Technology, Tsinghua University, China. His research interests include the application of game theory, optimization, and statistical theories to communication, networking, and resource allocation problems, in particular, space networks and heterogeneous networks. He is a Senior Member of IEEE. ||| 13 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. Jian Wang (jian-wang@tsinghua.edu .cn) joined the faculty of Tsinghua University, China, in 2006, where he is currently an associate professor with the Department of Electronic Engineering. His research interests include the application of statistical theories, optimization, and machine learning to communication, networking, navigation, and resource allocation problems, in particular, heterogeneous networks and intelligent collaborative systems. He is a Senior Member of IEEE. Yong Ren (reny@tsinghua.edu.cn) is a professor with the Department of Electronics Engineering and director of the Complexity Engineered Systems Lab at Tsinghua University, China. His current research interests include complex systems theory and its applications to the optimization and information sharing of the Internet, the IoT and ubiquitous networks, cognitive networks, and cyber-physical systems. He is a Senior Member of IEEE. M é r o u a n e D e b b a h (merouane .debbah@huawei.com) is vice president of the Huawei France Research Center. He is jointly director of the Mathematical and Algorithmic Sciences Lab as well as the Lagrange Mathematical and Computing Research Center. He has managed eight European Union projects and more than 24 national and international projects. His research interests lie in fundamental mathematics, algorithms, statistics, information, and communication sciences research. He is a Fellow of IEEE. References [1] Z. Zhang et al., “6G wireless networks: Vision, requirements, architecture, and key technologies,” IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 28–41, Sept. 2019. doi: 10.1109/MVT.2019.2921208. [2] I. F. Akyildiz, J. M. Jornet, and C. Han, “Teranets: Ultra-broadband communication networks in the terahertz band,” IEEE Wireless Commun., vol. 21, no. 4, pp. 130–135, Aug. 2014. doi: 10.1109/MWC.2014.6882305. 14 ||| [3] R. Singh and D. Sicker, “Beyond 5G: THz spectrum futures and implications for wireless communication,” in Proc. 30th European Conf. Int. Telecommunication Society (ITS), Helsinki, Finland, June 16–19, 2019. [Online]. Available: https://www.econstor.eu/bitstream/ 10419/205213/1/Singh-Sicker.pdf [4] O. Naparstek and K. Cohen, “Deep multi-user reinforcement learning for distributed dynamic spectrum access,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 310–323, Jan. 2019. doi: 10.1109/TWC.2018.2879433. [5] Y. S. Nasir and D. Guo, “Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks,” IEEE J. Sel. Areas Commun., vol. 37, no. 10, pp. 2239–2250, Oct. 2019. doi: 10.1109/ JSAC.2019.2933973. [6] Z. Li and C. Guo, “Multi-agent deep reinforcement learning based spectrum allocation for D2D underlay communications,” IEEE Trans. Veh. Technol., vol. 69, no. 2, pp. 1828–1840, Dec. 2019. doi: 10.1109/TVT.2019.2961405. [7] Y. Al-Eryani and E. Hossain, “The D-OMA method for massive multiple access in 6G: Performance, security, and challenges,” IEEE Veh. Technol. Mag., vol. 14, no. 3, pp. 92–99, Sept. 2019. doi: 10.1109/ MVT.2019.2919279. [8] M. K. Sharma, A. Zappone, M. Assaad, M. Debbah, and S. Vassilaras, “Distributed power control for large energy harvesting networks: A multi-agent deep reinforcement learning approach,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 4, pp. 1140–1154, Dec. 2019. doi: 10.1109/TCCN.2019.2949589. [9] L. Xiao, X. Wan, X. Lu, Y. Zhang, and D. Wu, “IoT security techniques based on machine learning: How do IoT devices use AI to enhance security?” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 41–49, Sept. 2018. doi: 10.1109/MSP.2018.2825478. [10] A. Ferdowsi and W. Saad, “Deep learning for signal authentication and security in massive internet-of-things systems,” IEEE Trans. Commun., vol. 67, no. 2, pp. 1371–1387, 2018. doi: 10.1109/ TCOMM.2018.2878025. [11] A. Stamou, N. Dimitriou, K. Kontovasilis, and S. Papavassiliou, “Autonomic handover management for heterogeneous networks in a future internet context: A survey,” IEEE Commun. Surveys Tuts., vol. 21, no. 4, pp. 3274–3297, Fourthquarter 2019. doi: 10.1109/ COMST.2019.2916188. [12] H. Ye, L. Liang, G. Y. Li, J. Kim, L. Lu, and M. Wu, “Machine learning for vehicular networks: Recent advances and application examples,” IEEE Veh. Technol. Mag., vol. 13, no. 2, pp. 94–101, June 2018. doi: 10.1109/MVT.2018.2811185. [13] M. Min, L. Xiao, Y. Chen, P. Cheng, D. Wu, and W. Zhuang, “Learningbased computation offloading for IoT devices with energy harvesting,” IEEE Trans. Veh. Technol., vol. 68, no. 2, pp. 1930–1941, Feb. 2019. doi: 10.1109/TVT.2018.2890685. [14] Y. He, F. R. Yu, N. Zhao, V. C. Leung, and H. Yin, “Software-defined networks with mobile edge computing and caching for smart cities: A big data deep reinforcement learning approach,” IEEE Commun. Mag., vol. 55, no. 12, pp. 31–37, Dec. 2017. doi: 10.1109/MCOM.2017.1700246. [15] J. Zhang, J. Du, Y. Shen, and J. Wang, “Dynamic computation offloading with energy harvesting devices: A hybrid decision based deep reinforcement learning approach,” IEEE Internet Things J., early access, June 2020. doi: 10.1109/JIOT.2020.3000527. IEEE VEHICULAR TECHNOLOGY MAGAZINE | MONTH 2020 Authorized licensed use limited to: Princeton University. Downloaded on November 16,2020 at 13:24:58 UTC from IEEE Xplore. Restrictions apply.