This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. ACCEPTED FROM OPEN CALL Design Guidelines for Machine Learning-based Cybersecurity in Internet of Things Azzedine Boukerche and Rodolfo W. L. Coutinho Abstract Cybersecurity is one of the building blocks in need of increasing attention in Internet of things (IoT) applications. IoT has become a popular target for attackers seeking sensitive and personal user data, computing infrastructure for massive attacks, or aimed at compromising critical applications. Worryingly, the industrial race toward the forefront of IoT software and device development has led to increased market penetration of vulnerable IoT devices and applications. Nevertheless, traditional cybersecurity solutions designed for personal computers often rely on heavy computation and high communication overhead, and therefore are prohibitive for IoT, given the explosive number of IoT devices, their resource-constrained nature, and their heterogeneity. Hence, innovative solutions must be designed for securing IoT applications, while considering the peculiar characteristics of IoT devices and networks. In this article, we discuss the motivations and challenges of using machine learning (ML) models for the design of cybersecurity solutions for IoT. More specifically, we tackle the challenge of designing ML-based solutions and provide guidelines for ML-based physical layer solutions aimed at securing IoT. We propose a device-oriented and network-oriented classification and investigate recent works that designed ML-based solutions, considering IoT physical layer features, to secure IoT applications. The proposed classification helps engineers and practitioners starting in this area to better identify and understand the challenges, requirements, and up-to-date common design principles for securing IoT devices and networks considering physical layer features. Finally, we shed light on some future research directions that need further investigation. Introduction In recent years, significant advances have been made on embedded devices, sensing and actuation hardware, wireless networking technologies, edge computing, and data-centric networking, which have contributed to the development and market penetration of Internet of things (IoT). IoT has emerged as a network of seamlessly interconnected devices (e.g., sensors and actuators), which cooperate to attain common objectives [1]. Moreover, IoT has gained increased attention thanks to its potential to change the way people live and work by creating efficient, comfortable, green and enjoyable environments through smart applications over different domains, such as eduDigital Object Identifier: 10.1109/MNET.011.2000396 1 cation, health-care, transportation, manufacturing, and surveillance. Internet of things (IoT) has unlocked sensing and actuation-based applications in several domains. A traditional IoT application relies on various heterogeneous devices to sense the environment and act based on the observed conditions or received commands. The IoT devices gather a large amount of multimedia data through heterogeneous sensors, share collected data whenever needed through machineto-machine (M2M) communication, and offload it to edge or cloud infrastructures. Current advancements in key technologies are supporting the ever-growing expansion and popularization of IoT. The evolving Long Term Evolution (LTE) and 5G networks are expected to provide IoT applications with massive connectivity, high bandwidth, and ultra-reliable and low-latency communication. Edge computing will expand data processing capabilities closer to IoT, which helps improve energy efficiency and reduce network congestion, since devices will no longer need to offload all collected data cloud servers. Information-centric networking architectures will improve communication interoperability and data delivery in IoT, by employing data-centric request and response, and in-networking content caching, respectively. Nevertheless, cybersecurity is a fundamental building block in need of increased attention in IoT. IoT systems are being targeted with an unprecedented number of cyberattacks. The F-Secure reports that attack traffic on IoT devices more than tripled in the first half of 2019, when compared with the previous period, and reached a total of over 2.9 billion events (please refer to Attack Landscape H1 2019 (report available on https://tinyurl.com/sxaeq4c)). Moreover, malicious users rely on spoofing attacks, intrusions, jamming, eavesdropping, and malware to: leak sensitive IoT data; turn into botnets IoT systems for massive distributed denial-of-service (DDoS) attacks, spam, phishing, click-fraud; and make critical IoT applications unavailable (e.g., health-care systems, surveillance, smart transportation, smart grids, and industrial applications). The industrial race toward the forefront of the development of IoT devices has led to increased market penetration of vulnerable devices. For instance, the security researcher Billy Rios showed that the LifecarePCA drug infusion system, as well as five other Hospira drug delivery automated machines, is vulnerable to attacks that can change the drug dosage to be delivered (https://tinyurl. com/t8xyr4h). Nevertheless, traditional cybersecurity solutions designed for protecting personal Azzedine Boukerche is with the University of Ottawa; R. W. L. Coutinho is with Concordia University. 0890-8044/20/$25.00 © 2020 IEEE IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. FIGURE 1. Common security threatens for IoT applications. computers connected to the Internet will not be feasible for IoT because of the explosive number of IoT devices, their resource-constrained nature, and their heterogeneity. Thus, Restuccia et al. [2] has advocated for a secure-by-design approach, in which IoT systems shall be building as free of vulnerabilities as possible. However, security-by-design is hard to achieve in IoT, as devices are composed of several hardware parts manufactured by different vendors and software developed from different companies. Therefore, the design of new solutions to protect IoT from cyberattacks has received increased attention in the scientific and industrial communities. In particular, machine learning (ML)-based solutions have emerged for IoT cybersecurity. Traditional cybersecurity solutions are prohibitive for IoT, as they rely on heavy computation [3] and will overload the network with traffic for autonomous changing of default passwords on millions of IoT devices, two-factor device authentication, application of security patches and updates to IoT devices (https://tinyurl.com/yblo7yq6). Moreover, they were not designed considering the severe devices’ constraints in terms of computation, memory, radio bandwidth, and battery resources, and do not encompass the entire security spectrum on devices, edge computing, and wireless networking. In contrast, ML-based cybersecurity solutions for IoT have gained increased momentum, and several works have been proposed in the literature (see [1–4] and references therein). ML models can be used, for instance, to create traffic profiles, detect threats through traffic exchange that does not fall within the established normal behavior, detect IoT hardware vulnerabilities through observed physical layer characteristics, and authenticate legitimate devices based on their characteristics and behavior. Xiao et al. [3] analyzed learning-based solutions designed for IoT device authentication, access control, malware detection, and secure offloading. Li et al. [5] evaluated the feasibility and suitability of statistical learning models for detecting anomalous behavior of IoT devices by considering system statistics (e.g., CPU usage cycles and disk usage). In this work, we tackle the challenge of designing ML-based solutions for physical layer IoT security. Related works either addressed one particular security problem (e.g., intrusion detection) that might appear, or focused primarily on the discussion of the ML models while presenting proposed solutions to secure IoT. In contrast, we discuss ML-based solutions for cybersecurity in IoT appli- cations by addressing IoT from two distinct points of view: the device and network point of view. This process helps engineers and practitioners starting in the area to better understand the challenges and principle design of ML-based cybersecurity solutions when they are intended to protect IoT devices individually, as well as IoT network infrastructure. More specifically, the contributions of this work include: • A thorough discussion of the motivation for the design of novel solutions to secure IoT systems, ML-based cybersecurity, and requirements and current daunting challenges. • A proposed classification to categorize recent works that designed ML-based approaches for IoT cybersecurity in devices and network-based solutions. The proposed classification, by considering two distinct points of view of IoT systems, helps to better identify and understand the challenges, requirements and up-to-date common principles for the design of security solutions for IoT devices and networks considering physical layer features. • A thorough discussion of open issues and future research directions toward the design of efficient cybersecurity solutions for IoT. Fundamentals Cybersecurity for IoT Figure 1 illustrates classic attacks that IoT infrastructures can experience. Cybersecurity solutions must be designed to protect IoT data and avoid IoT devices to be compromised. Cisco estimates that data produced by IoT applications will reach nearly 850 ZB by 2021 (https://tinyurl. com/ybez862s). Despite this impressive number, it is worth highlighting that IoT data will mostly be sensitive and may reveal private aspects of users and their interactions with the application. For instance, a smart health-care application will produce data regarding users’ health conditions and historical health records. A smart home application will produce data regarding rooms and environmental states and conditions (e.g., temperature, lightness, humidity, and noise), as well as users’ interactions with them. In both examples mentioned above, IoT data leakage can reveal critical users’ sensitive information and behavior in most private spaces. An attacker in possession of such data can infer when a user is at home (or if the house is vacant), as IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply. 2 This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. IoT Characteristics Challenges for cybersecurity in IoT applications Massive deployment • • • Data is distributed among multiple devices. Individual protection of devices. Network overhead. Heterogeneity • • Devices with heterogeneous capabilities. Need for different solutions to secure different devices. Dynamic network topologies • • • IoT network topology changes frequently due to controllable and uncontrollable factors. Topology changes will affect communication pattern of IoT devices. Fingerprinting-based cybersecurity solutions should consider communication traffic pattern changes. Low-power and low-cost communication • • • IoT devices have severe energy constraints. Networking protocols do not implement robust mechanism for reliable communication. Distributed cybersecurity solutions should consider low-reliable communication in IoT applications. Low latency communication • • IoT applications might have time constraints. Complex cybersecurity solutions will incur additional delays. TABLE 1. IoT characteristics and challenges for cybersecurity . well as the user’s routines and preferences while at home. Therefore, cybersecurity solutions for IoT must deal with eavesdropping attacks efficiently, preventing information leakage, ensuring data will not be globally accessed, and limiting data lifetime to the minimum extent required. Moreover, IoT devices have been targeted by cyber-attackers aimed at taking control of them. In contrast to traditional computing systems, each IoT device performs a well-defined task. However, such a task might be critical; for instance, an IoT medical device can be used for insulin delivery in a health-care system. In this regard, compromised IoT devices can lead to fatal consequences, as they can pump lethal doses of the administered drug in the health-care applications (https:// tinyurl.com/y8tsb7fu). Besides, compromised IoT systems can be used to create botnets, which will be explored to attack and damage other computing infrastructures. Although each device individually lacks computing capabilities, the numbers compensate for this. An infected IoT device can be instructed to download malware and wait for commands to begin an attack. Despite having constrained resources, it is undeniable that orchestrated DDoS attacks from IoT are destructive because of the excessive number of involved devices. IoT botnets (e.g., a Mirai botnet) have served as infrastructure for powerful DDoS attacks, such as those in October 2016, which took down hundreds of websites (e.g., Twitter, Netflix, Reddit, and GitHub) for several hours [6]. The critical fact is that traditional cybersecurity approaches might not prove suitable for IoT applications, given the unique characteristics of IoT devices and networks, as summarized in Table 1. ML-Based Cybersecurity Machine learning has gained increased attention in the design of cybersecurity solutions for IoT. One of the reasons for such increased attention is the potential for using ML models to protect IoT data and control access to IoT resources. In traditional personal computer-based systems (e.g., client/server computing applications), data is located in a well-defined place and is requested by the users from a data unique identifier or address of the host storing it. In contrast, IoT data might be spread out among devices and processing units; that is, IoT data will not reside in a single place, and its location will not be well-defined. 3 Thereafter, a naive solution for securing IoT data would be to implement protective measurements on any single device in an IoT application. However, such a naive approach will be unfeasible, given the heterogeneous and resource-constrained nature of the IoT devices, and the heavy computation and high communication load nature of traditional cybersecurity techniques. Moreover, cybersecurity solutions must guarantee that access to IoT resources is controlled. IoT devices might perform vital tasks, such as in health-care applications. Hence, cybersecurity solutions must make sure that the access to update a device configuration or working mode is granted only to a legitimate entity. Such access control is needed to prevent, for instance, a malicious user from changing the dosage a device must deliver to a patient in a smart health-care application. In this regard, ML-based solutions can observe different variables in an IoT system and make decisions to secure it. In an IoT application, each device will have a well-defined task to perform. Moreover, the interaction between users and a set of IoT devices, or a machine-to-machine interaction in a given IoT application, tend to follow a pattern, that is, it is not a random interaction. In this regard, machine learning algorithms can be trained to learn such an interaction pattern, as well as the characteristics of networking traffic generated from such interactions. Therefore, an ML-based solution will be able to authenticate users, control data access, and identify DDoS attacks, compromised IoT devices, or unauthorized attempts to access IoT data or resources. Requirements and Fundamental Challenges Cybersecurity techniques for IoT must be lightweight, resilient, fault-tolerant, and robust. Moreover, they should tackle the heterogeneous capabilities of IoT devices and wireless networking technologies. In addition, cybersecurity techniques should protect IoT data by considering different data sensitivity levels. Moreover, they should guarantee that data is accessed only by users and system components that have the right permission to access it. Furthermore, cybersecurity solutions for IoT should detect unusual IoT traffic, block attack attempts, and mitigate damage when a device or component is compromised. Nonetheless, solutions to secure IoT must not incur significant overhead for the system and network, which would diminish the performance of an IoT application. IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. In this regard, supervised machine learning techniques (e.g., SVM, naive Bayes, K-nearest neighbor, deep neural networks, and random forests) have been used for detecting network intrusion and malware, DDoS, and spoofing attacks [3]. Supervised ML techniques require labeled data, with a set of inputs and their corresponding outputs, used to train the model initially. The working principle of such an approach overall includes the centralized training of the model and its later execution in selected IoT devices. This might require a vast amount of raw data for training the models. In addition, needed data from training might be sensitive and private, which will not be easy to acquire. Furthermore, supervised models must be resilient to maliciously introduced data; that is, they must reject compromised training data sets that might negatively impact the result. Biased data from user interactions must also be properly treated when training supervised models for securing IoT. The challenges mentioned above will also emerge whenever a used supervised ML model must be re-trained and updated. In contrast, unsupervised machine learning techniques (e.g., k-means, hierarchical clustering, and k-NN) have gained increased attention for IoT networks [7]. Unsupervised learning can be used for detecting data modification attacks, statistical data tuples classification into benign or malicious, abnormal flow identification, and malicious relay detection. The main advantage of unsupervised ML is that it does not require labeled data for training, which contributes to reducing complexity and required resources. However, efficient unsupervised ML-based solutions will require the proper selection of features to be considered, and removal of features possessing no discriminating power, aimed at coping with the curse of the dimensionality problem. Finally, it is worth mentioning that some of the machine learning models, such as deep learning, are well known for the difficulties of deep understanding behind decisions taken. Thus, ML-based cybersecurity solutions might fail concerning forensic capabilities, as taken decisions might not be traced. It will be challenging to develop ML-based solutions to secure IoT that are capable of providing transparency and accountability of the taken actions. It might not be possible to prove that taken actions were correct, which would challenge the system of being defensible in court law whenever needed. ML-Based Cybersecurity for IoT The first step toward the design of efficient ML-based solutions for IoT applications is to understand IoT characteristics, security requirements, and design challenges. To facilitate this process, we propose a novel classification to categorize current ML-based designs to secure IoT applications. Based on the primary goal, we categorize the solutions in IoT devices and IoT network security, as summarized in Table 2. The proposed classification contributes to the study of the challenges and requirements of IoT systems from a device and network point of view. Hence, for each category, we highlight the design principles and main challenges to be overcome, and shed light on some recent works in the literature. The discussed works are summarized in Table 3. Approach Description Device security Solutions aimed at tackling vulnerabilities and attacks intended to IoT devices (e.g., hardware trojan, cloning, and battery draining), and secure them to avoid privacy leakage, DDoS and jamming. Network security Solutions aimed at securing IoT communication infrastructure (e.g., edge nodes, access points, routers, and cache systems) against adversaries. TABLE 2. Classification of IoT cybersecurity approaches. IoT Device Security One of the daunting challenges in IoT applications is how to secure the devices. IoT devices might present vulnerabilities, such as open telnet ports, outdated firmware, and unencrypted transmission of sensitive data. Hence, they are susceptible to many kinds of attacks, which include hardware trojan, non-network side-channel attacks, DDoS, and tampering attacks [15]. Moreover, IoT devices overall have severe limitations in terms of power supply, which lead them to work in a duty-cycled manner to conserve energy. However, they are also susceptible to sleep deprivation and battery draining attacks. In this regard, ML-based cybersecurity approaches can be explored to ensure IoT devices are working correctly, that is, detecting when they are compromised or receiving unusual requests for sensitive data or due to DDoS attempts. Moreover, ML-based cybersecurity can improve authentication mechanisms and access control to data and networks for new devices added to the system. Machine learning has been used in proposed solutions to authenticate IoT devices through fingerprinting. Figure 2 depicts the general work principle of such approaches. IoT devices will have unique radio signal signatures. The unique signatures of transmitted signals will happen due to the transmitter’s hardware imperfections or effects of signal propagation (e.g., fading, Doppler effect, noise, and distortion). Furthermore, recent studies [9–11] designed ML-based solutions to extract unique features from received signals and determine if a device that is trying to authenticate in the network is legitimate or adversarial. Das et al. [9] proposed a Long Short Term Memory (LSTM)-based classifier to learn unique hardware imperfections of legitimate IoT devices. Hence, such unique imperfections are used to distinguish legitimate devices from adversaries that try to emulate them. To do so, wireless signals through samples of transmitted preambles, composed of multiple symbols, are considered. For a given input, the LSTM classifier’s output will be the imperfection characteristics of the transmitter hardware, in terms of frequency offset, phase offset, filters, timing offset, and multipath. Chatterjee et al. [10] proposed the RF-PUF for IoT device authentication through physical unclonable functions (PUF). In the RF-PUF, device identification is performed at the receiver node, from frequency, in-phase (I) and quadrature (Q) components and channel features) extracted from received wireless signals. The proposed solution implements a three-layer Artificial Neural Network (ANN) that will determine the unique identifier IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply. 4 This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. Proposal Category ML technique Goal Description Xiao et al. [8] Network security DQN Secure mobile edge caching devices Determine the edge node the IoT device should use, the task offloading rate/time, and the transmission power to be used in the communication. Those parameters are selected from the observed users’ density, devices’ battery level, jamming strength, and radio channel bandwidth. Liu et al. [7] Network security k-means Detect malicious devices within IoT multihop paths Use probe packets to discover multi-hop paths from source nodes to the sink. The sink node determines the fraction of unmodified packets of each path, from received probes. Hence, k-means is used to cluster nodes into benign and malicious, based on the path reputation they are a member of and their contribution to each path. Das et al. [9] Device security LSTM Device authentication Use the unique hardware imperfections of IoT devices to authenticate them. Chatterjee et al. [10] Device security ANN Device authentication Authenticate IoT devices from physical unclonable functions. Ferdosi and Saad [11] Device security LSTM Device authentication Gateway nodes authenticate devices of massive IoT scenarios through received watermarked signals. Chen et al. [12] Network security DBN Detect jamming attacks in the mobile edge computing infrastructure Deep belief network is used to learn features of eavesdropping and jamming attacks to mobile edge computing systems. Miettinen et al. [13] Network security Random Forest Detect devices with unpatched vulnerabilities Use devices’ fingerprint to identify if they have any unpatched vulnerability. Hence, protective measurements are taken to limit the operation of a vulnerable device in the IoT network. Alli et al. [14] Network security PSO and Neuro-Fuzzy Prevent malicious IoT devices of offloading invalid data aimed at network congestion and exhaustion of fog and cloud computing resources. Surrogate entities at fog nodes collect and store information regarding IoT devices within the network. PSO is used at the fog nodes to select the optimal node, aimed at reducing delay, for handling offloaded tasks. Neuro-Fuzzy is used at gateways to evaluate data coming from IoT devices and identify malicious task offloading. Vashist et al. [4] Device security ANN, SVM, kNN and decision tree classifiers Detect burst errors on multiple consecutive flits of a packet in a WiNoC. Implements a set of machine learning classifiers to detect jamming attacks aimed at denial-of-service on wireless Network-on-Chip. The classifiers are used to distinguish burst errors occasioned during normal operation from errors that happen when an internal or external attacker is interfering in the communication. TABLE 3. Summary of discussed works. of the transmitter based on the output (normalized geometric means of feature values) and PUF properties. The main disadvantage of the above work is the high demand at the gateway node, which might fail in simultaneously authenticating IoT devices in massive IoT systems. In this regard, Ferdosi and Saad [11] proposed an LSTM-based watermarking algorithm for assisting dynamic massive IoT device authentication. In the proposed solution, the LSTM model is used to extract fingerprints from device signals’ characteristics (spectral flatness, mean, variance, skewness, and kurtosis). The output is a bitstream used to watermark the original signal using a key. At the gateway, a proposed dynamic watermarking LSTM (DW-LSTM) model is used to extract the bit, and features of a received watermarked signal. Those outputs are compared, and in the event of dissimilarities between two sequences, an attack alarm is triggered. In contrast to the works mentioned above, Vashist et al. [4] addressed jamming attacks aimed at DoS on wireless Network-on-Chip (WiNoC). The authors used a burst error correction code to monitor the rate of burst errors received over the wireless medium, and ML classifiers (ANN, SVM, kNN, and decision tree) to detect the persistent jamming attack. In the considered attack model, 5 an external or internal attacker will interfere with legitimate transmissions, which will cause high burst error rates on multiple consecutive flits of a packet. Hence, ML classifiers were employed to distinguish random burst errors occasioned by power source fluctuations, ground bounce, or crosstalk from burst errors due to jamming attacks. The authors created a simulation-based dataset with different bit error rates (BER) to model normal operation and burst errors from jamming attacks. The number of transmitted and received flits, as well as the number of errors, are used together with the operating mode (i.e., normal or attacked) are used for training the classifiers. Despite the advancements, many challenges should be addressed during the design of AI-based cybersecurity solutions to protect devices. First, the solutions must be lightweight as devices have limited resources in terms of computing, storage, and energy. Second, the solutions will need to deal with the lack of reliable data sets to be used for training and validation. Simulated data were considered to evaluate the proposed solutions in [9, 10], for instance. Third, it might be required to re-train and update the parameters of an ML-based cybersecurity solution. Hence, the data exchange for such tasks should be done in a way that will not congest the network. IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply. This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. FIGURE 2. ML-based IoT device fingerprinting. IoT Network Security Network-based IoT security aims to create barriers to protect the IoT network, rather than addressing security in a per-device manner. This includes, for instance, identification of the malicious device, traffic filtering as it traverses the network, identifying unusual requests without congesting the network and increasing latency, link protection between IoT and edge/cloud servers, and device identification and registration when new devices connect to the network. ML-based solutions to secure IoT networks can also be deployed at edge and cloud infrastructure to monitor incoming and outgoing traffic of devices within the network, profile them, and determine when the network is under attack from normal and unusual behavior of the entities. Jamming is one of the attacks in IoT networks aimed to disrupt communication between devices and edge servers. In order to tackle jamming attacks, physical-layer security methods have been proposed for IoT, as alternative solutions to encryption/decryption-based methods, which are costly in terms of computing resources. Chen et al. [12] proposed a deep learning framework for jamming attack detection in a mobile edge computing infrastructure supporting IoT-based cyber-physical transportation. The proposed framework uses a deep belief network to analyze attack behaviors from required permissions, sensitive application programming interfaces (APIs), and dynamic behaviors. Liu et al. [7] used k-means clustering to identify malicious nodes involved in data routing in IoT multi-hop applications. Accordingly, probe packets are transmitted from source nodes toward the destination (sink). The destination calculates the fraction of unmodified packets by checking the integrity of each received probe packet, along the multi-paths from the source node. Hence, k-means is used to cluster nodes in two groups (benign or malicious nodes) based on the reputation attributes of the paths they are part of, and their contributions to the paths. Another approach to secure IoT networks is to detect the presence of devices with unpatched vulnerabilities and apply necessary protection measurements to secure the other devices in the same network. The IoT SENTINEL [13] implements software-defined networking (SDN)-based Security Gateway to monitor and classify the devices, as well as to send device fingerprints to the proposed IoT Security. The Random Forest algorithm is used to create classifiers for devices with known fingerprints. Hence, upon the connection of new IoT devices in the network, 23 features extracted from each packet of a set col- lected during devices’ initialization are used as input for each classifier that will provide a binary decision as to whether the input fingerprint matches the device-type. In addition, IoT networks can suffer from DoS of edge computing resources. Malicious nodes can attack edge computing infrastructure by maliciously offloading tasks aimed at occupying processing, storage, and communication edge computing resources. Hence, tasks offloaded by legitimate devices will not find available resources on the edge and will need to be handled locally, which will exhaust IoT resource-constrained devices and impair the performance of applications. Alli et al. [14] proposed the SecOFF-FCIoT, an ML-based approach for secure task offloading to fog and cloud servers. The proposed solution uses Particle Swarm Optimization (PSO) at IoT device level to optimally select a fog node to handle offloaded tasks. Hence, a neuro-fuzzy model is used at gateway nodes to evaluate data coming from IoT devices and isolate the malicious devices that are sending invalid data with the purpose of congesting the network. In contrast, Xiao et al. [8] investigated the use of a reinforcement learning-based procedure for securing mobile edge caching (MEC) devices. In IoT applications, a MEC infrastructure will be a target of attackers that seek either leakage of data cached at the MEC devices, or denial of service through an impaired performance of MEC systems. Hence, the authors investigated the use of a deep Q-network (DQN) to secure MEC. The DQN model observes user density, battery levels, jamming strength, and radio channel bandwidth, and selects the edge device to offload the task, the offloading rate/time, and the transmission power of the IoT device for the task offloading for the MEC device. Herein, collaborative solutions need to be explored to improve the security of large-scale and massive IoT networks. IoT devices can select edge servers based on the level of security of the communication and server. However, the need for periodic communication among IoT devices, for exchange of the edge devices security level they have used, will congest the network and incur additional costs, such as energy. Hence, it requires the development of collaborative machine learning approaches where models’ parameters are shared among the devices, rather than the data used for training. Future Research Directions While important progress has been achieved, there are several directions that require further exploration in the design of solutions to secure IoT applications. IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply. 6 This article has been accepted for inclusion in a future issue of this magazine. Content is final as presented, with the exception of pagination. First, there is a lack of machine learning-based solutions that consider different information for device profiling. Current ML-based cybersecurity solutions for IoT authentication and access control consider device profile in terms of their hardware imperfections. However, additional information for improved device profiling can be considered to increase the performance of IoT cybersecurity solutions. For instance, a combination of IoT infrastructure usage information, such as CPU, memory and networking traffic intensity and pattern, rather than considering a single aspect (as is done in the current literature), as well as the use of high-level information, such as social interactions with other devices, can improve the performance of ML-based cybersecurity solutions in IoT applications. Battery draining and sleep deprivation attacks are popular and catastrophic in IoT devices. As mentioned in [15], some works in the literature have already investigated the energy usage pattern of IoT devices, aimed at detecting energy depletion and DDoS attacks. However, more research efforts in this area are needed. For instance, IoT devices will work in a duty-cycled manner, where devices will be sleeping (i.e., transceiver will be turned off) most of the time for reducing energy consumption. Such features should be explored, where supervised learning can be used to correlate devices with similar functionalities and detect when a device is working with an abnormal active and sleep cycle. Furthermore, there is a lack of investigation of collaborative and distributed machine learning-based solutions. IoT will demand ML-based solutions on distributed and heterogeneous devices. Such solutions must be collaborative and do not rely on centralized data training. In this regard, federated learning could be used as a starting point for such approaches. In addition, classic challenges of machine learning, such as a data set for training and validation, must be tackled. There is a lack of IoT data sets in terms of incoming/outgoing network traffic, device operations, and user interactions. Moreover, there is a lack of data sets related to attacks and threats of IoT applications. Conclusion This article presented a detailed discussion of the advantages and challenges of machine learning (ML)-based solutions to secure the Internet of things (IoT). We described the fundamental design requirements and challenges of cybersecurity solutions for IoT. Hence, we discussed how ML-based solutions could be advantageous to tackle the vulnerabilities of IoT. We classified ML-based cybersecurity solutions as device-based and network-based, according to the main security goal they are intended to cope with in IoT applications. This proposed classification helps the understanding of the requirement and challenges faced when designing new ML-based cybersecurity solutions for IoT applications. For each category of the proposed classification, we shed light on the main goal and fundamental challenges to be tackled, and discussed representative works in the literature. Finally, we presented some future research directions that need further investigation. 7 Acknowledgment This work is partially supported by the NSERC DISCOVERY, NSERC CREATE TRANSIT and Canada Research Chairs Programs. References [1] J. Jagannath et al., “Machine Learning for Wireless Communications in the Internet of Things: A comprehensive Survey,’’ Ad Hoc Networks, vol. 93, 2019, p. 101913–59. [2] F. Restuccia et al., “Securing the Internet of Things in the Age of Machine Learning and Software-Defined Networking,’’ IEEE Internet of Things J., vol. 5, no. 6, Dec. 2018, pp. 4829–42. [3] L. Xiao et al., “IoT Security Techniques Based on Machine Learning: How do IoT Devices Use AI to Enhance Security?’’ IEEE Signal Processing Mag., vol. 35, no. 5, Sep. 2018, pp. 41–49. [4] A. Vashist et al., “Securing a Wireless Network-on-Chip Against Jamming Based Denial-of-Service Attacks,’’ Proc. IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2019, pp. 320–25. [5] F. Li et al., “System Statistics Learning-Based IoT Security: Feasibility and Suitability,’’ IEEE Internet of Things J., vol. 6, no. 4, Aug. 2019, pp. 6396–6403, [6] C. Kolias et al., “DDoS in the IoT: Mirai and Other Botnets,’’ Computer, vol. 50, no. 7, July 2017, pp. 80–84. [7] X. Liu et al., “Identifying Malicious Nodes in Multihop IoT Networks Using Diversity and Unsupervised Learning,’’ Proc. IEEE Int’l Conference on Communications (ICC), May 2018, pp. 1–6. [8] L. Xiao et al., “Security in Mobile Edge Caching with Reinforcement Learning,’’ IEEE Wireless Commun., vol. 25, no. 3, June 2018, pp. 116–122. [9] R. Das et al., “A Deep Learning Approach to IoT Authentication,’’ Proc. IEEE Int’l Conference on Communications (ICC), May 2018, pp. 1–6. [10] B. Chatterjee et al., “RF-PUF: Enhancing IoT Security Through Authentication of Wireless Nodes Using in-situ Machine Learning,’’ IEEE Internet of Things J., vol. 6, no. 1, Feb. 2019, pp. 388–398. [11] A. Ferdowsi and W. Saad, “Deep Learning for Signal Authentication and Security in Massive Iinternet-of-Things Systems,’’ IEEE Trans. Commun., vol. 67, no. 2, Feb. 2019, pp. 1371–87. [12] Y. Chen et al., “Deep Learning for Secure Mobile Edge Computing in Cyber-Physical Transportation Systems,’’ IEEE Network, vol. 33, no. 4, July 2019, pp. 36–41. [13] M. Miettinen et al., “IoT SENTINEL: Automated Device-type Identification for Security Enforcement in IoT,’’ Proc. IEEE 37th Int’l Conf. on Distributed Computing Systems (ICDCS), June 2017, pp. 2177–84. [14] A. Alli and M. Alam, “SecOFF-FCIoT: Machine Learning Based Secure Offloading in Fog-Cloud of Things for Smart City Applications,’’ Internet of Things, vol. 7, 2019, pp. 70–89. [15] A. Mosenia and N. Jha, “A Comprehensive Study of Security of Internet-of-Things,’’ IEEE Trans. on Emerging Topics in Computing, vol. 5, no. 4, Oct. 2017, pp. 586-602. Biographies A zzedine B oukerche [FIEEE, FEiC, FCAE, FAAAS] is a Distinguished University Professor and Canada Research Chair Tier-1 at the University of Ottawa. He has received the C. Gotlieb Computer Medal Award, Ontario Distinguished Researcher Award, Premier of Ontario Research Excellence Award, G. S. Glinski Award for Excellence in Research, IEEE Computer Society Golden Core Award, IEEE CS-Meritorious Award, IEEE TCPP Leaderships Award, IEEE ComSoc ASHN Leaderships and Contribution Award, and the University of Ottawa Award for Excellence in Research. His research interests include wireless ad hoc and sensor networks, wireless networking and mobile computing. R odolfo W. L. C outinho (rodolfo.coutinho@concordia.ca) is an assistant professor at Concordia University, Canada. He received the ACM MSWiM’19 Rising Star Award and the 2018 Pierre Laberge Prize at the University of Ottawa. He also received the Best Thesis Awards from the CAPES, Brazilian Computer Society and the Brazilian Computer Networks and Distributed Systems Interest Group. He has served as TPC Co-Chair for ACM and IEEE conferences. His research interests include Internet of Things, underwater networks, information-centric networking, and mobile computing. IEEE Network • Accepted for Publication Authorized licensed use limited to: Cornell University Library. Downloaded on September 06,2020 at 09:44:35 UTC from IEEE Xplore. Restrictions apply.