PCI Conflict and RSI Collision Detection in LTE Networks Using Supervised Learning Techniques Rodrigo Miguel Martins Diz Miranda Veríssimo Thesis to obtain the Master of Science Degree in: Electrical and Computer Engineering Supervisor(s): Doctor António José Castelo Branco Rodrigues Doctor Maria Paula dos Santos Queluz Rodrigues Doctor Pedro Manuel de Almeida Carvalho Vieira Examination Committee Chairperson: Doctor José Eduardo Charters Ribeiro da Cunha Sanguino Supervisor: Doctor Pedro Manuel de Almeida Carvalho Vieira Member of the Committee: Doctor Pedro Joaquim Amaro Sebastião November 2017 Acknowledgments First of all, I would like to thank my supervisor, Professor António Rodrigues, and my co-supervisors, Professor Pedro Vieira and Professor Maria Paula Queluz, for all the support and insights given throughout the Thesis. I would also like to thank CELFINET for the unique opportunity to work in a great environment while doing this project, specially Eng. João Ferraz, who helped me understand the discussed network conflicts and the database structure. Additionally, I would like to express my gratitude to Eng. Luzia Carias for helping me in the data gathering process, and also to Eng. Marco Sousa for discussing ideas related to Data Science and Machine Learning. I would like to thank the instructors from the Lisbon Data Science Starters Academy for their discussions and guidance related to this Thesis and Data Science in general, namely Eng. Pedro Fonseca, Eng. Sam Hopkins, Eng. Hugo Lopes and João Ascensão. To all my friends and colleagues that helped me through these last 5 years in Técnico, by studying and collaborating in course projects, or by just being great people to be with. Namely, André Rabaça, Bernardo Gomes, Diogo Arreda, Diogo Marques, Eric Herji, Filipe Fernandes, Francisco Franco, Francisco Lopes, Gonçalo Vilela, João Escusa, João Ramos, Jorge Atabão, José Dias, Luı́s Fonseca, Miguel Santos, Nuno Mendes, Paul Schydlo, Rúben Borralho, Rúben Tadeia, Rodrigo Zenha and Tomás Alves. iii iv Abstract Nowadays, mobile networks are rapidly changing, which makes it difficult to maintain good and clean Physical Cell Identity (PCI) and Root Sequence Index (RSI) plans. These are essential for the Quality of Service (QoS) and mobility of Long Term Evolution (LTE) mobile networks, since bad PCI and RSI plans can introduce wireless network problems such as failed handovers, service drops and failed service establishments and re-establishments. Thereupon, it is possible in theory to identify PCI and RSI conflicting cells through the analysis of relevant Key Performance Indicators (KPI) to both problems. To do so, each cell must be labeled in accordance to configured cell relations. Machine Learning (ML) classification can then be applied in these conditions. This thesis aims to present ML approaches to classify time series data from mobile network KPIs, detect the most relevant KPIs to PCI and RSI conflicts, construct ML models to classify PCI and RSI conflicting cells with a minimum False Positive (FP) rate and near real time performance, as well as their test results. To achieve these goals, three hypotheses were tested in order to obtain the best performing ML models. Furthermore, bias was reduced by testing five different classification algorithms, namely Adaptive Boosting (AB), Gradient Boost (GB), Extremely Randomized Trees (ERT), Random Forest (RF) and Support Vector Machines (SVM). The obtained models were evaluated in accordance to their average Precision and peak Precision metrics. Lastly, the used data was obtained from a real LTE network. The best performing models were obtained by using each KPI measurement as an individual feature. The highest average Precision obtained for PCI confusion detection was 31% and 26% for the 800 MHz and 1800 MHz frequency bands, respectively. No conclusions were taken concerning PCI collision detection, due to the marginally low number of 6 PCI collisions in the dataset. The highest average Precision obtained for RSI collision detection was 61% and 60% for the 800 MHz and 1800 MHz frequency bands, respectively. Keywords: Wireless Communications, LTE, Machine Learning. Classification, PCI Conflict, RSI Collision. v vi Resumo Atualmente, as redes móveis estão a ser modificadas rapidamente, o que dificulta a manutenção de bons planos de Physical Cell Identity (PCI) e de Root Sequence Index (RSI). Estes dois parâmetros são essenciais para uma boa Qualidade de Serviço (QoS) e mobilidade de redes móveis Long Term Evolution (LTE), pois maus planos de PCI e de RSI poderão levar a problemas de redes móveis, tais como falhas de handovers, de estabelecimento e de restabelecimento de serviços, e quedas de serviços. Como tal, é possı́vel, em teoria, identificar conflitos de PCI e colisões de RSI através da análise de Key Performance Indicators (KPI) relevantes a cada problema. Para tal, cada célula LTE necessita de ser identificada como conflituosa ou não conflituosa de acordo com as relações de vizinhança. Nestas condições, é possı́vel aplicar algoritmos de classificação de Aprendizagem Automática (ML). Esta Tese pretende apresentar abordagens de ML para classificação de séries temporais provenientes de KPIs de redes móveis, obter os KPIs mais relevantes para a deteção de conflitos de PCI e de RSI, construir modelos de ML com um número mı́nimo de Falsos Positivos (FP) e desempenho em quase tempo real. Para alcançar estes objetivos, foram testadas três hipóteses de modo a obter os modelos de ML com melhor desempenho. Foram testados cinco algoritmos de classificação distintos, nomeadamente Adaptive Boosting (AB), Gradient Boost (GB), Extremely Randomized Trees (ERT), Random Forest (RF) e Support Vector Machines (SVM). Os modelos obtidos foram avaliados de acordo com as Precisões médias e picos de Precisão. Por último, os dados foram obtidos de uma rede LTE real. Os melhores modelos foram obtidos ao utilizar cada medição de KPI como uma variável individual. A maior Precisão média obtida para confusões de PCI foi de 31% e de 26% para as bandas de 800 MHz a de 1800 MHz, respetivamente. Devido ao número bastante baixo de seis colisões de PCI presentes nos dados obtidos, não foi possı́vel retirar nenhuma conclusão relativamente à sua deteção. A maior Precisão média obtida para colisões de RSI foi de 61% e de 60% para as bandas de 800 MHz e de 1800 MHz, respetivamente. Palavras Chave: Comunicações Móveis, LTE, Aprendizagem Automática, Classificação, Conflito de PCI, Colisão de RSI. vii viii Contents Acknowledgments iii Abstract v Resumo vii List of Figures xiv List of Tables xv List of Symbols xviii Acronyms xxiii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.3 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.4 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 LTE Background 3 2.1 Introduction to LTE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 LTE Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2.1 Core Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.2 Radio Access Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Multiple Access Techniques Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3.1 OFDMA Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.2 SC-FDMA Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.3 MIMO Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Physical Layer Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Transport Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.2 Modulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.4.3 Downlink User Data Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 ix 2.4.4 Uplink User Data Transmission . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5 Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5.1 Idle Mode Mobility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5.2 Intra-LTE Handovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5.3 Inter-system Handovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.6 Performance Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.1 Performance Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6.2 Key Performance Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.6.3 Configuration Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Machine Learning Background 27 3.1 Machine Learning Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Machine Learning Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Generalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 Underfitting and Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Feature Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.7 More Data and Cleverer Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.8 Classification in Multivariate Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.9 Proposed Classification Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.9.1 Adaptive Boosting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.9.2 Gradient Boost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.9.3 Extremely Randomized Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.9.4 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.9.5 Support Vector Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.10 Classification Model Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Physical Cell Identity Conflict Detection 47 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Key Performance Indicator (KPI) Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.3 Network Vendor Feature Based Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 Global Cell Neighbor Relations Based Detection . . . . . . . . . . . . . . . . . . . . . . . 52 4.4.1 Data Cleaning Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4.2 Classification Based on Peak Traffic Data . . . . . . . . . . . . . . . . . . . . . . . 56 4.4.3 Classification Based on Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 61 4.4.4 Classification Based on Raw Cell Data . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.5 Preliminary Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 x 5 Root Sequence Index Collision Detection 71 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 Key Performance Indicator Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3 Global Cell Neighbor Relations Based Detection . . . . . . . . . . . . . . . . . . . . . . . 74 5.3.1 Data Cleaning Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.3.2 Peak Traffic Data Based Classification . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.3.3 Feature Extraction Based Classification . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3.4 Raw Cell Data Based Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 83 5.4 Preliminary Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6 Conclusions 87 6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 A PCI and RSI Conflict Detection 91 Bibliography 97 xi xii List of Figures 2.1 The EPS network elements (adapted from [6]). . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Overall E-UTRAN architecture (adapted from [6]). . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Frequency-domain view of the LTE multiple-access technologies (adapted from [6]). . . . 7 2.4 MIMO principle with two-by-two antenna configuration (adapted from [4]). . . . . . . . . . 8 2.5 Preserving orthogonality between sub-carriers (adapted from [5]). . . . . . . . . . . . . . 8 2.6 OFDMA transmitter and receiver (adapted from [4]). . . . . . . . . . . . . . . . . . . . . . 10 2.7 SC-FDMA transmitter and receiver with frequency domain signal generation (adapted from [4]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.8 OFDMA reference symbols to support two eNB transmit antennas (adapted from [4]). . . 12 2.9 LTE modulation constellations (adapted from [4]). . . . . . . . . . . . . . . . . . . . . . . . 14 2.10 Downlink resource allocation at eNB (adapted from [4]). . . . . . . . . . . . . . . . . . . . 14 2.11 Uplink resource allocation controlled by eNB scheduler (adapted from [4]). . . . . . . . . . 17 2.12 Data rate between TTIs in the uplink direction (adapted from [4]). . . . . . . . . . . . . . . 17 2.13 Intra-frequency handover procedure (adapted from [4]). . . . . . . . . . . . . . . . . . . . 20 2.14 Automatic intra-frequency neighbor identification (adapted from [4]). . . . . . . . . . . . . 21 2.15 Overview of the inter-RAT handover from E-UTRAN to UTRAN/GERAN (adapted from [4]). 22 3.1 Procedure of three-fold cross-validation (adapted from [32]). . . . . . . . . . . . . . . . . . 30 3.2 Bias and variance in dart-throwing (adapted from [18]). . . . . . . . . . . . . . . . . . . . . 31 3.3 Bias and variance contributing to total error. . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4 A learning curve showing the model accuracy on test examples as function of the number of training examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.5 Example of a Decision Tree to decide whether a football match should be played based on the weather (adapted from [45]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.6 Left: The training and test percent error rates using boosting on an Optical Character Recognition dataset that do not show any signs of overfitting [25]. Right: The training and test percent error rates on a heart-disease dataset that after five iterations reveal overfitting [25]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.7 A general tree ensemble algorithm classification procedure. . . . . . . . . . . . . . . . . . 39 3.8 Data mapping from the input space (left) to a high-dimensional feature space (right) to obtain a linear separation (adapted from [21]). . . . . . . . . . . . . . . . . . . . . . . . . . xiii 42 3.9 The hyperplane constructed by SVMs that maximizes the margin (adapted from [21]). . . 42 4.1 PCI Confusion (left) and PCI Collision (right). . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Time series analysis of KPI values regarding 4200 LTE cells over a single day. . . . . . . 50 4.3 Boxplots of total null value count for each cell per day for three KPIs. . . . . . . . . . . . . 54 4.4 Absolute Pearson correlation heatmap of peak traffic KPI values and the PCI conflict detection label. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.5 Smoothed Precision-Recall curves for peak traffic PCI confusion detection. . . . . . . . . 59 4.6 Learning curves for peak traffic PCI confusion detection. . . . . . . . . . . . . . . . . . . . 60 4.7 The CPVE for PCI confusion detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.8 Smoothed Precision-Recall curves for statistical data based PCI confusion detection. . . 63 4.9 Learning curves for statistical data based PCI confusion detection. . . . . . . . . . . . . . 64 4.10 The CPVE for PCI collision detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.11 The CPVE for PCI confusion detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.12 Smoothed Precision-Recall curves for raw cell data based PCI confusion detection. . . . 67 4.13 Learning curves for raw cell data PCI confusion detection. . . . . . . . . . . . . . . . . . . 68 4.14 Precision-Recall curves for raw cell data PCI collision detection. . . . . . . . . . . . . . . 68 5.1 Time series analysis of KPI values regarding 23500 LTE cells over a single day. . . . . . . 74 5.2 Boxplots of total null value count for each cell per day for two KPIs. . . . . . . . . . . . . . 76 5.3 Absolute Pearson correlation heatmap of peak traffic KPI values and the RSI collision detection label. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.4 Smoothed Precision-Recall curves for peak traffic RSI collision detection. . . . . . . . . . 79 5.5 Learning curves for peak traffic RSI collision detection. . . . . . . . . . . . . . . . . . . . . 80 5.6 The CPVE for RSI collision detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.7 Smoothed Precision-Recall curves for statistical data based RSI collision detection. . . . 82 5.8 Learning curves for statistical data based RSI collision detection. . . . . . . . . . . . . . . 83 5.9 The CPVE for RSI collision detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.10 Smoothed Precision-Recall curves for raw cell data RSI collision detection. . . . . . . . . 85 5.11 Learning curves for raw cell data RSI collision detection. . . . . . . . . . . . . . . . . . . . 86 A.1 PCI and RSI Conflict Detection Flowchart. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 xiv List of Tables 2.1 Downlink peak data rates [5]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 Uplink peak data rates [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 Differences between both mobility modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Description of the KPI categories and KPI examples. . . . . . . . . . . . . . . . . . . . . . 24 2.5 Netherlands P3 KPI analysis done in 2016 [16]. . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 The three components of learning algorithms (adapted from [18]). . . . . . . . . . . . . . 29 3.2 Confusion Matrix (adapted from [31]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.1 Chosen Accessibility and Integrity KPIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Chosen Mobility, Quality and Retainability KPIs. . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 The obtained cumulative Confusion Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.4 The obtained Model Evaluation metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 Resulting dataset composition subsequent to data cleaning. . . . . . . . . . . . . . . . . . 55 4.6 Average importance given to each KPI by each Decision Tree based classifier. . . . . . . 57 4.7 Peak traffic PCI Confusion classification results. . . . . . . . . . . . . . . . . . . . . . . . 58 4.8 PCI Confusion classification training and testing times in seconds. . . . . . . . . . . . . . 60 4.9 Statistical data based PCI confusion classification results. . . . . . . . . . . . . . . . . . . 62 4.10 Statistical data based PCI confusion classification training and testing times in seconds. . 64 4.11 Raw cell data PCI confusion classification results. . . . . . . . . . . . . . . . . . . . . . . 66 4.12 Raw cell data PCI confusion classification training and testing times in seconds. . . . . . 67 5.1 Chosen Accessibility and Mobility KPIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.2 Chosen Quality and Retainability KPIs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3 Average importance given to each KPI by each Decision Tree based classifier. . . . . . . 78 5.4 Peak traffic RSI collision classification results. . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.5 RSI collision classification training and testing times in seconds. . . . . . . . . . . . . . . 80 5.6 Statistical data based RSI collision classification results. . . . . . . . . . . . . . . . . . . . 81 5.7 RSI collision classification training and testing times in seconds. . . . . . . . . . . . . . . 82 5.8 Raw cell data RSI collision classification results. . . . . . . . . . . . . . . . . . . . . . . . 84 5.9 RSI collision classification training and testing times in seconds. . . . . . . . . . . . . . . 85 xv xvi List of Symbols Srxlevel Rx level value of a cell. Qrxlevelmeas Reference Signal Received Power from a cell. Qrxlevmin Minimum required level for cell camping. Qrxlevelminof f set Offset used when searching for a Public Land Mobile Network of preferred network operators. SServingCell Rx value of the serving cell. Sintrasearch Rx level threshold for the User Equipment to start making intra-frequency measurements. Snonintrasearch Rx level threshold for the User Equipment to start making inter-system measurements. Qmeas Reference Signal Received Power measurement for cell re-selection. Qhyst Power domain hysteresis in order to avoid the ping-pong fenomena between cells. Qof f set Offset control parameter to deal with different frequencies and cell characteristics. Treselection Time limit to perform cell re-selection. T hreshhigh Higher threshold for a User Equipment to camp on a higher priority layer. T hreshlow Lower threshold for a User Equipment to camp on a low priority layer. x Input vector for a Machine Learning model. y Output vector that a Machine Learning model aims to predict. yb Output vector that a Machine Learning model predicts. 2 σab Covariance matrix of variable vectors a and b. λ Eigenvalue of a Principal Component. Wt Weight array of t iterations. θt Parameters of a classification algorithm of t iterations. αt Weight of a hypothesis of t iterations. Zt Normalization factor of t iterations. H Machine Learning model. f Functional dependence between input and output vectors. fb Estimated functional dependence. ψ Loss function. gt Negative gradient of a loss function of t iterations. Ey Expected prediction loss. ρt Gradient step size of t iterations. K Number of randomly selected features. xvii nmin Minimum sample size for splitting a Decision Tree node. M Total number of Decision Trees to grow in an ensemble. S Data subset. S fmax Maximal value of a variable vector in a data subset S. S fmin Minimal value of a variable vector in a data subset S. fc Random cut-point of a variable vector. Optimization problem for Support Vector Machines. C Positive regularization constant for Support Vector Machines. ξ Slack variable that states whether a data sample is on the correct side of a hyperplane. α Lagrange multiplier. #SV Number of Support Vectors. K(·, ·) Support Vector Machines kernel function. σ Free parameter. γ Positive regularization constant for Support Vector Machines. β Weight constant for defining importance for either Precision or Recall metrics. Q1 First quartile. Q3 Third quartile. Nrows Number of sequences needed to generate the 64 Random Access Channel preambles. xviii Acronyms 1NN One Nearest Neighbor 3GPP Third Generation Partnership Project 4G Fourth Generation AB Adaptive Boosting AuC Authentication Centre BCH Broadcast Channel BPSK Binary Phase Shift Keying CM Configuration Management CNN Convolutional Neural Network CQI Channel Quality Indicator CPVE Cumulative Proportion of Variance Explained CRC Cyclic Redundancy Check CS Circuit-Switched DFT Discrete Fourier Transform DL-SCH Downlink Shared Channel EDGE Enhanced Data for Global Evolution eNB Evolved Node B EPC Evolved Packet Core EPS Evolved Packet System E-SMLC Evolved Serving Mobile Location Centre ERT Extremely Randomized Tree E-UTRA Evolved UMTS Terrestrial Radio Access xix E-UTRAN Evolved UMTS Terrestrial Radio Access Network FDMA Frequency Division Multiple Access FFT Fast Fourier Transform FN False Negative FP False Positive FTP File Transfer Protocol GB Gradient Boost GERAN GSM EDGE Radio Access Network GMLC Gateway Mobile Location Centre GPRS General Packet Radio Service GSM Global System for Mobile Communications GTP GPRS Tunneling Protocol GW Gateway HARQ Hybrid Adaptive Repeat and Request HSPA High Speed Packet Access HSDPA High Speed Downlink Packet Access HSS Home Subscriber Server HSUPA High Speed Uplink Packet Access ID Identity IDFT Inverse Discrete Fourier Transform IEEE Institute of Electrical and Electronics Engineers IFFT Inverse Fast Fourier Transform IP Internet Protocol IQR Interquartile Range ITU International Telecommunication Union kNN k-Nearest Neighbor KPI Key Performance Indicators LCS LoCation Services xx LSTM Long Short Term Memory LTE Long Term Evolution MAC Medium Access Control MCH Multicast Channel ME Mobile Equipment MIB Master Information Block MIMO Multiple-Input Multiple-Output ML Machine Learning MME Mobility Management Entity MNO Mobile Network Operators MT Mobile Termination NaN Not a Number NE Network Element NR Network Resource OAM Operations, Administration and Management OFDM Orthogonal Frequency Division Multiplexing OFDMA Orthogonal Frequency Division Multiple Access OS Operations System PAPR Peak-to-Average Power Ratio PAR Peak-to-Average Ratio PBCH Physical Broadcast Channel PC Principal Component PCA Principal Component Analysis PCCC Parallel Concatenated Convolution Coding PCH Paging Channel PCI Physical Cell Identity PCRF Policy Control and Charging Rules Function PDCCH Physical Downlink Control Channel xxi PDN Packet Data Network PDSCH Physical Downlink Shared Channel PLMN Public Land Mobile Network PM Performance Management PMCH Physical Multicast Channel PRACH Physical Random Access Channel PRB Physical Resource Block P-GW Packet Data Network Gateway PR Precision-Recall PS Packet-Switched PS HO Packet-Switched Handover PUCCH Physical Uplink Control Channel PUSCH Physical Uplink Shared Channel QAM Quadrature Amplitude Modulation QoS Quality of Service QPSK Quadrature Phase Shift Keying RACH Random Access Channel RAT Radio Access Technology RBF Radial Basis Function RBS Radio Base Station RF Random Forest RLC Radio Link Control ROC Receiver Operator Characteristic RRC Radio Resource Control RSI Root Sequence Index RSRP Reference Signal Received Power RSRQ Reference Signal Received Quality RSSI Received Signal Strength Indicator xxii SAE System Architecture Evolution SAE GW SAE Gateway SC-FDMA Single-Carrier Frequency Division Multiple Access SDU Service Data Unit S-GW Serving Gateway SIB System Information Block SIM Subscriber Identity Module SNMP Simple Network Management Protocol SNR Signal-to-Noise Ratio SON Self-Organizing Network SQL Structured Query Language SVM Support Vector Machines TDMA Time Division Multiple Access TE Terminal Equipment TMN Telecommunication Management Network TN True Negative TP True Positive TTI Transmission Time Interval UE User Equipment UICC Universal Integrated Circuit Card UL-SCH Uplink Shared Channel UMTS Universal Mobile Telecommunications System URSI International Union of Radio Science USIM Universal Subscriber Identity Module UTRAN UMTS Terrestrial Radio Access Network V-MIMO Virtual Multiple-Input Multiple-Output VoIP Voice over IP WCDMA Wideband Code Division Multiple Access WCNC Wireless Communications and Networking Conference xxiii xxiv Chapter 1 Introduction This chapter aims to deliver an overview of the presented work. It includes the context and motivation that led to the development of this work, as well as its objectives and overall structure. 1.1 Motivation Two of the major concerns of Mobile Network Operators (MNO) are to optimize and to maintain network performance. However, maintaining performance has proved the be challenging mainly for large and complex networks. In the long term, changes made in the networks may increase the number of internal conflicts and inconsistencies. These modifications include changing the antenna tilting, changing the cell’s power or even changes that cannot be controlled by the MNOs, such as user mobility and radio channel fading. In order to assess the network performance, quantifiable performance metrics, known as Key Performance Indicators (KPI), are typically used. KPIs can report network performance such as the handover success rate and the channel interference averages of each cell, and are calculated periodically, resulting in time series. In order to automatically detect the network fault causes, some work has been done by using KPI measurements with unsupervised techniques, as in [1]. This thesis focuses on applying supervised techniques for two known Long Term Evolution (LTE) network conflicts, namely Physical Cell Identity (PCI) conflicts and Root Sequence Index (RSI) collisions. 1.2 Objectives This thesis aims to create Machine Learning (ML) models that can correctly classify PCI conflicts and RSI collisions with a minimum False Positive (FP) rate and with a near real time performance. To achieve this goal, three hypotheses to obtain the best models were tested: 1. PCI conflicts and/or RSI collisions are better detected by using KPI measurements in the daily peak traffic instant of each cell; 1 2. PCI conflicts and/or RSI collisions are better detected by extracting statistical calculations from each KPI daily time series and using them as features; 3. PCI conflicts and/or RSI collisions are better detected by using each cell’s KPI measurements in each day as an individual feature. These three hypotheses were tested by taking into account the average Precisions and the peak Precisions obtained from testing the models, as well as their training and testing durations. In order to reduce bias from this study, five different classification algorithms were set, namely Adaptive Boosting (AB), Gradient Boost (GB), Extremely Randomized Tree (ERT), Random Forest (RF) and Support Vector Machines (SVM). The aim of the classifiers was to classify cells as either nonconflicting or conflicting, depending on the detection use case. The used classification algorithm implementations were obtained from the Python Scikit-Learn library [2]. 1.3 Structure This work is divided into four main chapters. Chapter 2 presents a technical background of LTE and Chapter 3 addresses ML concepts as well as more specific ones, such as how time series can be classified to reach the thesis’ objectives and a technical overview of the proposed classification algorithms. These two aforementioned chapters deliver the necessary background to understand the work in Chapters 4 and 5. Chapter 4 introduces the LTE PCI network parameter, how PCI conflicts can occur, perform hypothesis testing and present the respective hypotheses’ results. Additionally, it includes sections focused on data cleaning, KPI selection and preliminary conclusions. Chapter 5 has the same structure as Chapter 4, but it is focused on RSI collisions. Finally, in Chapter 6, conclusions are drawn and future work is suggested. 1.4 Publications Two scientific papers were written in the context of this Thesis, namely: • ”PCI and RSI Conflict Detection in a Real LTE Network Using Supervised Techniques” written by R. Verı́ssimo, P. Vieira, M. P. Queluz and A. Rodrigues. This paper was submitted to the 2018 Institute of Electrical and Electronics Engineers (IEEE) Wireless Communications and Networking Conference (WCNC), Barcelona, Spain 15th-18th April 2018. • ”Deteção de Conflitos de PCI e de RSI Numa Rede Real LTE Utilizando Aprendizagem Automática” written by R. Verı́ssimo, P. Vieira, M. P. Queluz and A. Rodrigues. This paper was submitted to the 11th International Union of Radio Science (URSI) Congress, Lisbon, Portugal 24th November 2017. 2 Chapter 2 LTE Background This chapter provides an overview of the LTE standard [3], aiming for a better understanding of the work that will be developed under the Thesis scope. Section 2.1 presents a brief introduction to LTE and Section 2.2 delivers an architectural overview of this system. Section 2.3 presents a succinct overview of the multiple access techniques that are used in LTE. The physical layer design is introduced in Section 2.4. Section 2.5 addresses how mobility is handled in LTE. Finally, Section 2.6 describes how data originated from telecommunication networks is typically collected and evaluated. The content of this chapter is mainly based on the following references: [4, 5] in Section 2.1; [6, 7] in Section 2.2; [6, 4, 5] in Section 2.3; [4, 5] in Section 2.4; [4] in Section 2.5; [8, 9] in Section 2.6. 2.1 Introduction to LTE LTE is a Fourth Generation (4G) wireless communication standard developed by the Third Generation Partnership Project (3GPP); it resulted from the development of a packet-only wideband radio system with flat architecture, and was specified for the first time in the 3GPP Release 8 document series. The downlink in LTE uses Orthogonal Frequency Division Multiple Access (OFDMA) as its multiple access scheme and the uplink uses Single-Carrier Frequency Division Multiple Access (SC-FDMA). Both of these solutions result in orthogonality between the users, diminishing the interference and enhancing the network capacity. The resource allocation in both uplink and downlink is done in the frequency domain, with a resolution of 180 kHz and consisting in twelve sub-carriers of 15 kHz each. The high capacity of LTE is due to its packet scheduling being carried out in the frequency domain. The main difference between the resource allocation on the uplink and on the downlink is that the former is continuous, in order to enable single carrier transmission, whereas the latter can freely use resource blocks from different parts of the spectrum. Resource blocks are frequency and time resources that occupy 12 subcarriers of 15 kHz each and one time slot of 0.5 ms. By adopting the uplink single carrier solution, LTE enables efficient terminal power amplifier design, which is essential for the terminal battery life. Depending on the available spectrum, LTE allows spectrum flexibility that can range from 1.4 MHz up to 20 MHz. In ideal conditions, the 20 MHz bandwidth can provide up to 172.8 Mbps downlink user 3 data rate with 2x2 Multiple-Input Multiple-Output (MIMO) and 340 Mbps with 4x4 MIMO; the uplink peak data rate is 86.4 Mbps. 2.2 LTE Architecture In contrast to the Circuit-Switched (CS) model of previous cellular systems, LTE is designed to only support Packet-Switched (PS) services, aiming to provide seamless Internet Protocol (IP) connectivity between the User Equipment (UE) and the Packet Data Network (PDN), without disrupting the end users’ applications during mobility. LTE corresponds to the evolution of radio access through the Evolved UMTS Terrestrial Radio Access Network (E-UTRAN) alongside an evolution of the non-radio aspects, named as System Architecture Evolution (SAE), which includes the Evolved Packet Core (EPC) network. The combination of LTE and SAE forms the Evolved Packet System (EPS), which provides the user with IP connectivity to a PDN for accessing the Internet, as well as running different services simultaneously, such as File Transfer Protocol (FTP) and Voice over IP (VoIP). The features offered by LTE are supported through several EPS network elements with different roles. Figure 2.1 shows the global network architecture that encompasses both the network elements and the standardized interfaces. The network comprises of the core network (i.e. EPC) and the access network (i.e. E-UTRAN). The access network consists of one node, the Evolved Node B (eNB), which connects to the UEs. The network elements are inter-connected through interfaces that are standardized in order to allow multivendor interoperability. E-SMLC GMLC HSS SLg SLs S6a S1-MME MME Gx PCRF Rx S11 UE LTE-Uu eNB S1-U Serving S5/S8 Gateway PDN Gateway SGi Operator s IP services Figure 2.1: The EPS network elements (adapted from [6]). The UE is the interface through which the subscriber is able to communicate with the E-UTRAN; it is composed by the Mobile Equipment (ME) and by the Universal Integrated Circuit Card (UICC). The ME is essentially the radio equipment that is used to communicate; it can also be divided into both Mobile Termination (MT) — which conducts all the communication functions — and Terminal Equipment (TE) — that terminates the streams of data. The UICC is a smart card, informally known as the Subscriber Identity Module (SIM) card; it runs the Universal Subscriber Identity Module (USIM), which is an application that stores user-specific data (e.g. phone number and home network identity). Additionally, it also employs security procedures through the security keys that are stored in the UICC. 4 2.2.1 Core Network Architecture The EPC corresponds to the core network and its role is to control the UE and to establish the bearers – paths that user traffic uses when passing an LTE transport network. The EPC has as main logical nodes, the Mobility Management Entity (MME), the Packet Data Network Gateway (P-GW), the Serving Gateway (S-GW) and the Evolved Serving Mobile Location Centre (E-SMLC). Furthermore, there are other logical nodes that also belong to the EPC such as the Home Subscriber Server (HSS), the Gateway Mobile Location Centre (GMLC) and the Policy Control and Charging Rules Function (PCRF). These logical nodes are described in the following points: • MME is the main control node in the EPC. It manages user mobility in the corresponding service area through tracking, and also manages the user subscription profile and service connectivity by cooperating with the HSS. Moreover, it is the sole responsible for security and authentication of users in the network. • P-GW is the node that interconnects the EPS with the PDNs. It acts as an IP attachment point and allocates the IP addresses for the UE. Yet, this allocation can also be performed by a PDN where the P-GW tunnels traffic between the UE and the PDN. More so, it handles traffic gating and filtering functions required for the services being used. • S-GW is a network element that not only links user plane traffic between the eNB and the P-GW, but also retains information about the bearers when the UE is in idle state. • E-SMLC has the responsibility to manage both the scheduling and to coordinate the resources necessary to locate the UE. Furthermore, it estimates the UE speed and corresponding accuracy through the final location that it assesses. • HSS is a central database that holds information regarding all the network operator’s subscribers such as their Quality of Service (QoS) profile and any access restrictions for roaming. It not only holds information about the PDNs to which the user is able to connect, but also stores dynamic information (e.g. the identity of the MME to which the user is currently attached or registered). Additionally, the HSS is also allowed to integrate the Authentication Centre (AuC) which is responsible to generate the vectors used for both authentication and security keys. • GMLC incorporates the fundamental functionalities to support LoCation Services (LCS). After being authorized, it sends positioning requests to the MME and collects the final location estimates. • PCRF is responsible for managing the users’ QoS and data charges. The PCRF is connected to the P-GW and sends information to it for enforcement. 2.2.2 Radio Access Network Architecture The E-UTRAN represents the radio component of the architecture. It is responsible to connect the UEs to the EPC and subsequently connects UEs between themselves and also to PDNs (e.g. the Internet). 5 Composed solely of eNBs, the E-UTRAN is a mesh of interconnected eNBs through X2 interfaces (that can be either physical or logical links). These nodes are intelligent radio base stations that cover one or more cells and that are also capable of handling all the radio related protocols (e.g. handover). Unlike in Universal Mobile Telecommunications System (UMTS), there is no centralized controller in E-UTRAN for normal user traffic and hence its architecture is flat, which can be observed in Figure 2.2. Figure 2.2: Overall E-UTRAN architecture (adapted from [6]). The eNB has two main responsibilities: firstly, it sends radio transmissions to all its mobile devices on the downlink and also receives transmissions from them on the uplink; secondly, it controls the lowlevel operation of all its mobile devices through signalling messages (e.g. handover commands) that are related to those same radio transmissions. The eNBs are normally connected with each other through an interface called X2 and also to the EPC through the S1 interface. Additionally, the eNBs are connected to the MME by means of the S1-MME interface and also to the S-GW through the S1-U interface. The key functions of E-UTRAN can be summarized as: • managing the radio link’s resources and controlling the radio bearers; • compressing the IP headers; • encrypting all data sent over the radio interface; • routing user traffic towards the S-GW and delivering user traffic from the S-GW to the UE; • providing the required measurements and additional data to the E-SMLC in order the find the UE position; • handling handover between connected eNBs through X2 interfaces; • signalling towards the MME and also the bearer path towards the S-GW. The eNBs are responsible for all these functions on the network side, where one single eNB can manage multiple cells. One key differentiation factor from previous generations is that LTE assigns 6 the radio controller function to the eNB. This strategy reduces latency and improves the efficiency of the network due to the closer interaction between the radio protocols and the radio access network. There is no need for a centralized data-combining function in the network, as LTE does not support soft-handovers. The removal of the centralized network requires that, as the UE moves, the network transfers all information related to the UE towards another eNB. The S1 interface has an important feature that allows for a link between the access network and the core network (i.e. S1-flex). This means that multiple core network nodes can serve a common geographical area, being connected by a mesh network to the set of eNBs in that area. Thus, an eNB can be served by multiple MME/S-GWs, as happens for the eNB#2 in Figure 2.2. This allows UEs in the network to be shared between multiple core network nodes through an eNB, and hence eliminating single points of failure for the core network nodes and also allowing for load sharing. 2.3 Multiple Access Techniques Overview In order to fulfil all the requirements defined for LTE, advances were made to the underlying mobile radio technology. More specifically, to both the multicarrier and multiple-antenna technology. The first major design choice in LTE was to adopt a multicarrier approach. Regarding the downlink, the nominated schemes were OFDMA and Multiple Wideband Code Division Multiple Access (WCDMA), with OFDMA being the selected one. Concerning the uplink, the suggested schemes were SC-FDMA, OFDMA and Multiple WCDMA, resulting in the selection of SC-FDMA. Both of these selected schemes presented the frequency domain as a new dimension of flexibility that introduced a potent new way to improve not only the system’s spectral efficiency, but also to minimize both the fading problems and inter-symbol interference. These two selected schemes are represented in Figure 2.3. Figure 2.3: Frequency-domain view of the LTE multiple-access technologies (adapted from [6]). Before delving into the basics of both OFDMA and SC-FDMA, it is important to present some basic concepts first: • for single carrier transmission in LTE, a single carrier is modulated in phase and/or amplitude. The spectrum wave form is a filtered single carrier spectrum that is centered on the carrier frequency. • in a digital system, the higher the data rate, the higher the symbol rate and thereupon the larger 7 the bandwidth required for the same modulation. In order to carry the desired number of bits per symbols, the modulation can be changed by the transmitter. • in a Frequency Division Multiple Access (FDMA) system, the system can be accessed simultaneously by different users through the use of different carriers and sub-carriers. In this last system, it is crucial to avoid excessive interference between carriers without adopting long guard bands between users. • in the research for even better spectral efficiencies, multiple antenna technologies were considered as a way to exploit another new dimension — the spatial domain. As such, the first LTE Release led to the introduction of the MIMO operation that includes spatial multiplexing and also pre-coding and transmit diversity. The basic principle of MIMO is presented in Figure 2.4 where different streams of data are fed to the pre-coding operation and forwarded to signal mapping and OFDMA signal generation. Signal Mapping & Generation Modulation Layer Mapping and Pre-coding Demux Signal Mapping & Generation Modulation MIMO Decoding Figure 2.4: MIMO principle with two-by-two antenna configuration (adapted from [4]). 2.3.1 OFDMA Basics OFDMA consists of narrow and mutually orthogonal sub-carriers that are separated typically by 15 kHz from adjacent sub-carriers, regardless of the total transmission bandwidth. Orthogonality is preserved between all sub-carriers in every sampling instant of a specific sub-carrier, as all other sub-carriers have a zero value, which can be observed in Figure 2.5. Figure 2.5: Preserving orthogonality between sub-carriers (adapted from [5]). As stated in the beginning of Section 2.3, OFDMA was selected over Multiple WCDMA. The key characteristics that led to that decision [7, 10, 11] are: 8 • low-complexity receivers even with severe channel conditions; • robustness to time-dispersive radio channels; • immunity to selective fading; • resilience to narrow-band co-channel interference and both inter-symbol and inter-frame interference; • high spectral efficiency; • efficient implementation with Fast Fourier Transform (FFT). Meanwhile, OFDMA also presents some challenges, such as [7, 10, 11]: • higher sensitivity to carrier frequency offset caused by leakage of the Discrete Fourier Transform (DFT), relatively to single carrier systems; • high Peak-to-Average Power Ratio (PAPR) of the transmitted signal, which requires high linearity in the transmitter, resulting in poor power efficiency; • sensitivity to Doppler shift, that was solved in LTE by choosing a sub-carrier spacing of 15 kHz and hence providing a relatively large tolerance; • sensitivity to frequency synchronization problems. The OFDMA implementation is based on the use of both DFT and Inverse Discrete Fourier Transform (IDFT) in order to move between time and frequency domain representation. Furthermore, the practical implementation uses the FFT, which moves the signal from time to frequency domain representation; the opposite operation is done through the Inverse Fast Fourier Transform (IFFT). The transmitter used by an OFDMA system contains an IFFT block that acts on each sub-carrier to convert the signal to the frequency domain. The input of the previous block results from the serial-toparallel conversion of the data source. Finally, a cyclic extension is added to the output signal of the IFFT block, which aims to avoid inter-symbol interference. By contrast, inverse operations are implemented in the receiver with the addition of an equalisation block between the FFT and the demodulation blocks. The architecture of the OFDMA transmitter and receiver is presented in Figure 2.6. The cyclic extension is performed by copying the final part of the symbol to its beginning. This method is preferable to adding a guard interval because the Orthogonal Frequency Division Multiplexing (OFDM) signal is periodic. When the symbol is periodic, the impact of the channel corresponds to a multiplication by a scalar, assuming that the cyclic extension is long enough. Moreover, this periodicity of the signal allows for a discrete Fourier spectrum, enabling the use of both DFT and IDFT in the receiver and transmitter respectively. An important advantage of the use of OFDMA in a base station transmitter is that it can allocate any of its sub-carriers to users in the frequency domain, allowing the scheduler to benefit from frequency diversity. Yet, the signalling resolution caused by the resulting overhead prevents the allocation of a 9 Transmitter Modulator Bits Receiver Serial to Parallel . . . IFFT Cyclic Extension Total Radio Bandwidth (eg. 20 MHz) Remove Cyclic Extension Serial to Parallel . . . FFT Equaliser Demodulator Bits Figure 2.6: OFDMA transmitter and receiver (adapted from [4]). single sub-carrier, forcing the use of a Physical Resource Block (PRB) consisting of 12 sub-carriers. As such, the minimum bandwidth that can be allocated is 180 kHz. This allocation in the time-domain corresponds to 1 ms, also known as Transmission Time Interval (TTI), although each PRB only lasts for 0.5 ms. In LTE, each PRB can be modulated either through Quadrature Phase Shift Keying (QPSK) or Quadrature Amplitude Modulation (QAM), namely 16-QAM and 64-QAM. 2.3.2 SC-FDMA Basics Although OFDMA works well on the LTE downlink, it has one drawback: the transmitted signal power is subjected to large variations. This results in high PAPR, which in turn can cause problems for the transmitter’s power amplifier. In the downlink, the base station transmitters are large and expensive devices that can use expensive power amplifiers. The same does not happen in the uplink, where the mobile transmitter has to be cheap. This makes OFDMA unsuitable for the LTE uplink. Hence, it was decided to use SC-FDMA for multiple access. Its basic form could be perceived as equal to the QAM modulation, where each symbol is sent one at a time, similarly to Time Division Multiple Access (TDMA) systems, such as Global System for Mobile Communications (GSM). The frequency domain generation of the signal, which can be observed in Figure 2.7, adds the OFDMA property of good spectral waveform. This eliminates the need for guard bands between different users, similarly to OFDMA downlink. A cyclic extension is also added periodically to the signal, as happens in OFDMA with the exception of not being added after each symbol. This is due to the symbol rate being faster than in OFDMA. The added cyclic extension prevents inter-symbol interference between blocks of symbols and also simplifies the receiver design. The remaining inter-symbol interference is handled by running the receiver equalizer in the receiver for a block of symbols, until reaching the cyclic prefix. While the transmission occupies the whole spectrum allocated to the user in the frequency domain, the system has a 1 ms resolution allocation. For instance, when the resource allocation is doubled, so is the data rate, assuming the same level of overhead. Hence, the individual transmission gets shorter in the time domain, however gets wider in the frequency domain. The allocations do not need to have frequency domain continuity, but can take any set of continuous allocation of frequency domain 10 Transmitter Bits Modulator Receiver Remove Cyclic Extension Sub-carrier Mapping DFT . .. Cyclic Extension IFFT Total Radio Bandwidth (eg. 20 MHz) MMSE Equaliser FFT IDFT Demodulator Bits Figure 2.7: SC-FDMA transmitter and receiver with frequency domain signal generation (adapted from [4]). resources. The allowed amount of 180 kHz resource blocks – the minimum resource allocation based on the 15 kHz sub-carrier spacing of OFDMA downlink – that can be allocated are defined by the practical signaling constraints. The maximum allocated bandwidth can go up to 20 MHz, but tends to be smaller as it is required to have a guard band towards the neighboring operator. As the transmission is only done in the time domain, the system retains its good envelope properties and the waveform characteristics are highly dependent of the applied modulation method. Thus, SC-FDMA is able to reach a very low signal Peak-to-Average Ratio (PAR). Moreover, it facilitates efficient power amplifiers in the devices, saving battery life. Regarding the base station receiver for SC-FDMA, it is slightly more complex than the OFDMA receiver. This is even more complex if it needs equalizers that are able to perform as well as OFDMA receivers. Yet, this disadvantage is far outweighed by the benefits of the uplink range and device battery life that can be reached with SC-FDMA. Furthermore, by having a dynamic resource usage with a 1 ms resolution means that there is no base-band receiver per UE on standby and those who do have data to transmit use the base station in a dynamic fashion. Lastly, the most resource consuming process in both uplink and downlink receiver chains is the channel decoding with increased data rates. 2.3.3 MIMO Basics The MIMO operation is one of the fundamental technologies that the first LTE release brought, despite being included earlier in WCDMA specifications [5]. However, in WCDMA, the MIMO operates differently from LTE, where a spreading operation is applied. In the first LTE release, MIMO includes spatial diversity, pre-coding and transmit diversity. Spatial multiplexing consists in the signal transmission from two or more different antennas with different data streams, with further separation through signal processing in the receiver. Thus, in theory, a 2-by-2 antenna configuration doubles the peak data rates, or quadruples it if applied with a 4-by-4 antenna configuration. Pre-coding handles the weighting of the signals transmitted from different antennas, in order to maximize the received Signal-to-Noise Ratio (SNR). Lastly, transmit diversity is used to exploit 11 the gains from independent fading between different antennas through the transmission of the same signal from various antennas with some coding. Figure 2.8: OFDMA reference symbols to support two eNB transmit antennas (adapted from [4]). In order to allow the separation, at the receiver, of the MIMO streams transmitted by different antennas, reference symbols are assigned to each antenna. This eliminates the possibility of existing corruption in the channel estimation from another antenna, because each stream sent by each antenna is unique. This principle can be observed in Figure 2.8 and can be applied by two or more antennas, having the first LTE Release specified up to four antennas. Furthermore, as the number of antennas increases, the same happens to the required SNR, to the complexity of the transmitters and receivers and to the reference symbol overhead. MIMO can also be used in LTE uplink, despite not being possible to increase the single user data rate in mobile devices that only have a single antenna. Yet, the cell level maximum data rate can be doubled through the allocation of two devices with orthogonal reference signals, i.e. Virtual MultipleInput Multiple-Output (V-MIMO). Accordingly, the base station handles this transmission as a MIMO transmission, separating the data streams by means of the MIMO receiver. This operation does not bring any major implementation complexity on the device perspective as only the reference signal sequence is altered. On the other hand, additional processing is required from the network side in order to separate the different users. Lastly, it is also important to mention that SC-FDMA is well compatible with MIMO, as the users are orthogonal between them inside the same cell and the local SNR may be very high for the users close to the base station. 2.4 Physical Layer Design After covering the OFDMA and SC-FDMA principles, it is now possible to describe the physical layer of LTE. This layer is characterized by the design principle of resource usage based solely on dynamically allocated shared resources, instead of having dedicated resources reserved for a single user. Furthermore, it has a key role in defining the resulting capacity and thus allows for a comparison between different systems for expected performance. This section will introduce the transport channels and how they are mapped to the physical channels, the available modulation methods for both data and control channels and the uplink/downlink data transmission. 12 2.4.1 Transport Channels As there is no reservation of dedicated resources for single users, LTE contains only common transport channels; these channels have the role of connecting the Medium Access Control (MAC) layer to the physical layer. The physical channels carry the transport channel and it is the processing applied to those physical channels that characterizes the transport channel. Moreover, the physical layer needs to provide dynamic resource assignment both for data rate variation and for resource division between users. The transport channels and their mapping to the physical channels are described in the following points: • Broadcast Channel (BCH) is a downlink broadcast channel that is used to broadcast the required system parameters to enable devices accessing the system. • Downlink Shared Channel (DL-SCH) carries the user data for point-to-point connections in the downlink direction. All the information transported in the DL-SCH is intended only for a single user or UE in the RRC CONNECTED state. • Paging Channel (PCH) transports the paging information in the downlink direction aimed for the device in order to move it from a RRC IDLE to a RRC CONNECTED state. • Multicast Channel (MCH) is used in the downlink direction to carry multicast service content to the UE. • Uplink Shared Channel (UL-SCH) transfers both the user data and the control information from the device in the uplink direction in the RRC CONNECTED state. • Random Access Channel (RACH) acts in the uplink direction to answer to the paging messages as well as to initiate the move from or towards the RRC CONNECTED state according to the UE data transmission needs. The mentioned RRC IDLE and RRC CONNECTED states are described in Section 2.5. In the uplink direction, the UL-SCH and RACH are respectively transported by the Physical Uplink Shared Channel (PUSCH) and Physical Random Access Channel (PRACH). In the downlink direction, the PCH and the BCH are mapped to the Physical Downlink Shared Channel (PDSCH) and the Physical Broadcast Channel (PBCH), respectively. Lastly, the DL-SCH is mapped to the PDSCH and MCH is mapped to the Physical Multicast Channel (PMCH). 2.4.2 Modulation Both the uplink and downlink directions use the QAM modulator, namely 4-QAM (also known as QPSK), 16-QAM and 64-QAM, whose symbol constellations can be observed in Figure 2.9. The first two are available in all devices, while the support for 64-QAM in the uplink direction depends upon the UE class. QPSK modulation is used when operating at full transmission power as it allows for good transmitter power efficiency. For 16-QAM and 64-QAM modulations, the devices use a lower maximum transmitter power. 13 QPSK 2bi t s / s y mbol 16QAM 4bi t s / s y mbol 64QAM 6bi t s / s y mbol Figure 2.9: LTE modulation constellations (adapted from [4]). Binary Phase Shift Keying (BPSK) has been specified for control channels, which can opt between BPSK or QPSK for control information transmission. Additionally, uplink control data is multiplexed along with the user data, both type of data use the same modulation (i.e. QPSK, 16-QAM or 64-QAM). 2.4.3 Downlink User Data Transmission The user data is carried on the PDSCH in the downlink direction with a 1 ms resource allocation. Moreover, the sub-carriers are allocated to resource units of 12 sub-carriers, totalling to 180 kHz allocation units. Thus, the user data rate depends on the number of allocated sub-carriers; this allocation of resources is managed by the eNB and it is based on the Channel Quality Indicator (CQI) obtained from the terminal. Similarly to what happens in the uplink, the resources are allocated in both the time and frequency domain, as it can be observed in Figure 2.10. The bandwidth can be allocated between 0 and 20 MHz with continuous steps of 180 kHz. Figure 2.10: Downlink resource allocation at eNB (adapted from [4]). The Physical Downlink Control Channel (PDCCH) notifies the device about which resources are 14 allocated to it in a dynamic fashion and with a 1 ms allocation granularity. PDSCH data can occupy between 3 and 6 symbols per 0.5 ms slot, depending on both the PDCCH and on the cyclic prefix length (i.e. short or extended). In the 1 ms subframe, the first 0.5 ms are used for control symbols (for PDCCH) and the following 0.5 ms are used solely for data symbols (for PDSCH). Furthermore, the second 0.5 ms slot can fit 7 symbols if a short cyclic prefix is used. Not only the available resources for user data are reduced by the control symbols, but they also have to be shared with broadcast data and with reference and synchronization signals. The reference symbols are distributed evenly in the time and frequency domains in order to reduce the overhead needed. This distribution of reference symbols requires rules to be defined in order to both the receiver and the transmitter can understand the mapping. The common channels, such as the BCH, also need to be taken into account for the total resource allocation space. The channel coding chosen for LTE user data was turbo coding, which uses the same encoder (i.e. Parallel Concatenated Convolution Coding (PCCC)) type turbo encoder as used in WCDMA/High Speed Packet Access (HSPA) [5]. The turbo interleaver of WCDMA was also modified to better fit the LTE properties and slot structures, as well as to allow higher flexibility for implementing parallel signal processing with increasing data rates. The channel coding consists in 1/3-rate turbo coding for user data in both uplink and downlink directions. To reduce the processing load, the maximum block size for turbo coding is limited to 6144 bits and higher allocations are then segmented to multiple encoding blocks. In the downlink there is not any multiplexing to the same physical layer resources with PDCCH as they have their own separate resources during the 1 ms subframe. LTE uses physical layer retransmission combining, also commonly referred as Hybrid Adaptive Repeat and Request (HARQ). In such an operation, the receiver also stores packets with failed Cyclic Redundancy Check (CRC) checks and combines the received packet with the previous one when a retransmission is received. After the data is encoded, it is scrambled and then modulated. The scrambling is done in order to avoid cases where a device decodes data that is aimed for another device that has the same resource allocation. The modulation mapper applies the intended modulation (i.e. QPSK, 16-QAM or 64-QAM) and the resulting symbols are fed for layer mapping and pre-coding. For multiple transmit antennas, the data is divided into two or four data streams (depending if two of four antennas are used) and then mapped to resource elements available for PDSCH followed by the OFDM signal generation. For a single antenna transmission, the layer mapping and pre-coding functionalities are not used. Thus, the resulting instantaneous data rate for downlink depends on the: • modulation method applied, with 2, 4 or 6 bits per modulated symbol depending on the modulation method of QPSK, 16-QAM and 64-QAM, respectively; • allocated amount of sub-carriers; • channel encoding rate; • number of transmit antennas with independent streams and MIMO operation. 15 Assuming that all the resources are allocated for a single user and counting only the physical layer resources available, the instantaneous peak data rate for downlink ranges between 0.9 and 86.4 Mbps with a single stream, that can rise up to 172.8 Mbps with 2 x 2 MIMO. For 4 x 4 MIMO it can also reach a theoretical instantaneous peak data rate of 340 Mbps. The single stream and 2 x 2 MIMO bandwidths can be observed on Table 2.1. Table 2.1: Downlink peak data rates [5]. Peak bit rate per sub-carrier [Mbps] / bandwidth combination [MHz] QPSK 1/2 16-QAM 1/2 16-QAM 3/4 64-QAM 3/4 64-QAM 4/4 64-QAM 3/4 64-QAM 4/4 2.4.4 Single stream Single stream Single stream Single stream Single stream 2 x 2 MIMO 2 x 2 MIMO 72/1.4 180/3.0 300/5.0 600/10 1200/20 0.9 1.7 2.6 3.9 5.2 7.8 10.4 2.2 4.3 6.5 9.7 13.0 19.4 25.9 3.6 7.2 10.8 16.2 21.6 32.4 43.2 7.2 14.4 21.6 32.4 43.2 64.8 86.4 14.4 28.8 43.2 64.8 86.4 129.6 172.8 Uplink User Data Transmission The user data in the uplink direction is carried on the PUSCH, which has a 10 ms frame structure and is based on the allocation of time and frequency domain resources with 1 ms and 180 kHz resolution, respectively. The scheduler that handles this allocation of resources is located in the eNB, as can be observed in Figure 2.11. Only random access resources can be used without prior signalling from the eNB and there are no fixed resources for the devices. Accordingly, the device needs to provide information for the uplink scheduler of its transmission requirements as well as its available transmission power resources. The frame structure uses a 0.5 ms slot and an allocation period of two 0.5 ms slots (i.e. subframe). Similarly to what was discussed in the previous subsection concerning the downlink direction, user data has to share the data space with reference symbols and signalling. The bandwidth can be allocated between 0 and 20 MHz with steps of continuous 180 kHz, similarly to downlink transmission. The slot bandwidth adjustment between consecutive TTIs can be observed in Figure 2.12, in which doubling the data rate results in also doubling the bandwidth being used. It needs to be noted that the reference signals always occupy the same space in the time domain and, consequently, higher data rate also corresponds to a higher data rate for the reference symbols. The cyclic prefix used in uplink can also either be short or extended, where the short cyclic prefix allows for a bigger data payload. The extended prefix is not frequently used, as the benefit of having seven data symbols is greater than the possible degradation that can result from inter-symbol interference caused by channel delay spread higher than the cyclic prefix. The channel coding for user data in the uplink direction is also 1/3-rate turbo coding, the same as in the downlink direction. Besides the turbo coding, the uplink also has the physical layer HARQ with the same combining methods as in the downlink direction. 16 Figure 2.11: Uplink resource allocation controlled by eNB scheduler (adapted from [4]). Figure 2.12: Data rate between TTIs in the uplink direction (adapted from [4]). Thus, the resulting instantaneous uplink data rate depends on the: • modulation method applied, with the same methods available in the downlink direction; • bandwidth applied; • channel coding rate; • time domain resource allocation. Similarly to the previous subsection, assuming that all the resources are allocated for a single user and counting only the physical layer resources available, the instantaneous peak data rate for uplink ranges between 900 kbps and 86.4 Mbps, as shown in Table 2.2. As discussed in subsection 2.3.3, the cell or sector specific maximum total data throughput can be increased with V-MIMO. 17 Table 2.2: Uplink peak data rates [4]. Peak bit rate per sub-carrier [Mbps] / bandwidth combination [MHz] QPSK 1/2 16-QAM 1/2 16-QAM 3/4 16-QAM 4/4 64-QAM 3/4 64-QAM 4/4 2.5 Single stream Single stream Single stream Single stream Single stream Single stream 72/1.4 180/3.0 300/5.0 600/10 1200/20 0.9 1.7 2.6 3.5 3.9 5.2 2.2 4.3 6.5 8.6 9.7 13.0 3.6 7.2 10.8 14.4 16.2 21.6 7.2 14.4 21.6 28.8 32.4 43.2 14.4 28.8 43.2 57.6 64.8 86.4 Mobility This section presents an overview of how LTE mobility is managed for Idle and Connected modes, as mobility is crucial in any telecommunications system; mobility has many clear benefits, such as maintaining low delay services (e.g. voice or real time video connections) while moving in high speed transportations and switching connections to the best serving cell in areas between cells. However, this comes with an increased network complexity. That being said, the LTE radio network aims to provide seamless mobility while minimizing network complexity. Table 2.3: Differences between both mobility modes. RRC IDLE RRC CONNECTED Cell reselections done automatically by the UE Based on UE measurements Controlled by broadcasted parameters Different priorities can be assigned to frequency layers Network controlled handovers Based on UE measurements There are two procedures in which mobility can be divided, idle and connected mode mobility. The former is based on UE being active and autonomously reselecting cells in accordance to parameters sent by the network, without being connected to it; in the latter, the UE is connected to the network (i.e. transmitting data) and the E-UTRAN makes the decision of whether or not to trigger an handover according to the reports sent by the UE. These two states correspond respectively to the RRC IDLE and RRC CONNECTED mode, whose differences are summarized in Table 2.3. It is also important to mention these measurements that are performed by the UE for mobility in LTE: • Reference Signal Received Power (RSRP), which is the averaged power measured in a cell across receiver branches of the resource elements that contain reference signals specific to the cell; • Reference Signal Received Quality (RSRQ), which is the ratio of the RSRP and the Evolved UMTS Terrestrial Radio Access (E-UTRA) Received Signal Strength Indicator (RSSI) for the reference signals; • RSSI, which is the total received wideband power on a specific frequency and it includes noise originated from interfering cells and other sources of noise. Moreover, it is not individually measured by the UE, yet it is used in calculating the RSRQ value inside the UE. 18 2.5.1 Idle Mode Mobility In Idle mode, the UE chooses a suitable cell based on radio measurements (i.e. cell selection). Whenever a UE selects a cell, it is camped in that same cell. The cell is required to have good radio quality and not be blacklisted. Specifically, it must fulfil the S-criterion: Srxlevel > 0, (2.1) Srxlevel > Qrxlevelmeas − (Qrxlevmin − Qrxlevelminof f set ), (2.2) where and Srxlevel corresponds to the Rx level value of the cell, Qrxlevelmeas is the RSRP, Qrxlevmin is the minimum required level for cell camping and Qrxlevelminof f set is an offset used when searching for a higher priority Public Land Mobile Network (PLMN) corresponding to preferred network operators. The aforementioned offset is used because LTE allows to set priority levels for PLMNs in order to specify preferred network operators in cases such as roaming. As the UE stays camped in a cell, it will be continuously trying to find better cells as candidates for reselection in accordance to the reselection criteria. Furthermore, the network can also block the UE to consider specific cells for reselection (i.e. cell blacklisting). To reduce the amount of measurements, it was defined that if the Rx level value of the serving cell (i.e. SServingCell ) is high enough, the UE does not need to make any intra-frequency, inter-frequency or inter-system measurements. The measurements for intra-frequency and inter-frequency start respectively once that SServingCell ≤ Sintrasearch and SServingCell ≤ Snonintrasearch , where Sintrasearch and Snonintrasearch refer to the serving cell’s Rx level thresholds for the UE to start making intra-frequency and inter-system measurements, respectively. For intra-frequency and equal priority E-UTRAN frequency cell selection, a cell ranking is made on the Rs criterion for the serving cell and Rn criterion for the neighboring cells: Rs = Qmeas,s + Qhyst , (2.3) Rn = Qmeas,n + Qof f set , (2.4) where Qmeas is the RSRP measurement for cell re-selection, Qhyst is the power domain hysteresis in order to avoid the ping-pong phenomena between cells, Qof f set is an offset control parameter to deal with different frequencies and/or cell specific characteristics (e.g. propagation properties and hierarchical cell structures). The reselection occurs to the highest ranking neighbor cell that is better ranked than the serving cell for longer than Treselection , in order to avoid frequently made reselections. Through the hysteresis provided by Qhyst , a neighboring cell needs to be better than the serving cell by a configurable amount in order to perform reselection. Lastly, the Qof f set allows bias for the reselection of particular cells and/or frequencies. Regarding both inter-frequency and inter-system reselection in LTE, they are based on the method labeled as layers. Layers were designed to allow the operators to control how the UE prioritizes camping on different Radio Access Technology (RAT)s or frequencies. This method is known as absolute priority 19 based reselection, where each layer is appointed a specific priority and the UE attempts to camp on the highest priority layer that can provide a decent service. The UE will camp on a higher priority layer if it is above a threshold T hreshhigh — that is defined by the network — for longer than the Treselection period. Furthermore, the UE will camp on a layer with lower priority only if the higher priority layer drops below the aforementioned threshold and if the lower priority layer overcomes the threshold T hreshlow . 2.5.2 Intra-LTE Handovers As mentioned previously, the UE mobility is only controlled by the handovers when the Radio Resource Control (RRC) connection is established. The handovers are based on UE measurements and are also controlled by the E-UTRAN, which decides when to perform the handover and what the target cell will be. In order to perform lossless handovers, packet forwarding is used between the source and the target eNB. In addition, the S1 connection in the core network is only updated once the radio handover is completed (i.e. Late path switch) and the core network has no control over the handovers. Figure 2.13: Intra-frequency handover procedure (adapted from [4]). The intra-frequency handover operation can be observed in Figure 2.13. In the beginning, the UE has a user plane connection to the source eNB and also to the SAE Gateway (SAE GW). Besides that, there is a S1 signalling connection between the MME and the eNB. Once the target cell fulfills the measurement threshold, the UE sends the measurement report to the source eNB, which will establish a signaling connection and GPRS Tunneling Protocol (GTP) tunnel towards the target cell. When the target eNB has the required available resources, the source eNB sends an handover command towards the UE. Once that is done, the UE can then switch from the source to the targeted eNB, resulting in a successful update of the core network connection. Before the Late path switching is completed, there is a brief moment when the user plane packets in downlink are forwarded from the source eNB towards the target eNB through the X2 interface. In the uplink, the eNB forwards all successfully received uplink Radio Link Control (RLC) Service Data 20 Unit (SDU) to the packet core and, furthermore, the UE re-transmits the unacknowledged RLC SDUs from the source eNB. Regarding the handover measurements, the UE must identify the target cell through its synchronization signals before it can send the measurement report. Once the reporting threshold is fulfilled, the UE sends handover measurements to the source eNB. Figure 2.14: Automatic intra-frequency neighbor identification (adapted from [4]). The UE in E-UTRAN can detect the intra-frequency neighbors automatically, which in turn resulted in both a simpler network management and better network quality. The correct use of this functionality is important as call drops due to missing neighbors are common. It can be observed in Figure 2.14, where the UE approaches a new cell and receives its PCI through the synchronization signals. The UE then sends a measurement report to the eNB once the handover report threshold has been reached. On the other hand, the eNB does not have an X2 connection to that cell and the physical cell Identity (ID) is not enough to uniquely identify that cell, as the maximum number of physical cell IDs is only 504 and large networks can extend to tens of thousands of cells. Thereupon, the serving eNB requests the UE to decode the global cell ID from the broadcast channel of the target cell, as it uniquely identifies that same cell. Through the global cell ID, the serving eNB can now find the transport layer address alongside the information sent by the MME and, thus, set up a new X2 connection, allowing the eNB to proceed with the handover. The generation of the intra-frequency neighborlist is simpler than creating inter-frequency or interRAT neighbors, as the UE can easily identify all the cells within the same frequency. For inter-frequency and inter-RAT neighbor creation, the eNB not only must ask the UE to make specific measurements for them, but must also schedule gaps in the signal to allow the UE to proceed with the measurements. 2.5.3 Inter-system Handovers LTE allows for inter-system handovers, also called inter-RAT handovers, between the E-UTRAN and GSM EDGE Radio Access Network (GERAN), UMTS Terrestrial Radio Access Network (UTRAN) or cdma2000 R . The inter-RAT handover is controlled by the source access system in order to start the 21 measurements and to decide to perform or not the handover. This handover is carried out backwards as a normal handover, due to the resources being reserved in the target systems prior to the handover command being sent to the UE. Regarding the GERAN system, it does not support Packet-Switched Handover (PS HO) as the resources are not reserved before the handover. The core network is responsible for the signalling, because there are not any direct interfaces between these different radio systems. The inter-RAT handover is similar to the one of intra-LTE where the packet core node is changed. The information from the target system is transported to the UE in a transparent fashion through the source system. To avoid the loss of user data, the user data can be forwarded from the source to the target system. The UE does not perform any signalling to the core network and, thus, speeds up the execution of the handover. Furthermore, the security and QoS context is transferred from the source to the target system. Additionally, the Serving Gateway (GW) can be used as the mobility anchor for inter-RAT handovers. An overview of the inter-system handover is represented in Figure 2.15. Figure 2.15: Overview of the inter-RAT handover from E-UTRAN to UTRAN/GERAN (adapted from [4]). 2.6 Performance Data Collection As telecommunication networks are becoming more and more complex, new monitoring and managing operations need to be developed. There is now a set of methods that allows for data collection originated from the networks. These methods not only grant a better planning and optimization of the networks, but also allow to know if they are delivering the required quality to the users. 2.6.1 Performance Management Performance Management (PM) consists on evaluating and reporting both the behaviour and effectiveness of the network elements by gathering statistical information, maintaining and examining historical logs, determining system performance and modifying the system modes of operation [12]. It was one of the added concepts to the Telecommunication Management Network (TMN) framework defined by the 22 International Telecommunication Union (ITU), to manage telecommunication networks and services in order to handle the growing complexity of the networks. The other concepts consist on security, fault, accounting and configuration. Performance Management (PM) involves the following: • configuring data-collection methods and network testing; • collecting performance data; • optimizing network service and response time; • proactive management and reporting; • managing the consistency and quality of network services. PM is the measurement of both network and application traffic in order to deliver a consistent and predictable level of service at a given instance and across a defined period of time. PM enables the vendors and operators to detect the deteriorating trend in advance and thus solve potential threats, preventing faults [13]. The architecture of a PM system consists on four layers: • Data Collection and Parsing Layer - where data is collected from Network Element (NE)’s using a network specific protocol (e.g. FTP and Simple Network Management Protocol (SNMP)); • Data Storage and Management Layer - consisting on a data warehouse that stores the parsed data; • Application Layer - that processes the collected and stored data; • Presentation Layer - which aims to provide a web-based user interface by presenting the generated PM results in the form of dashboards and real-time graphs and charts. It is challenging to perform an efficient administration of PM consisting on collection, sorting, processing and aggregating massive volumes of performance measurement data that are collected over time periods. There is another challenge of performance measurements not having an unified structure, as each NE’s manufacturer has proprietary protocols and data structures to gauge performance in their devices. 2.6.2 Key Performance Indicators A KPI is a quantifiable metric of the performance of essential operations and/or processes in an organization. In other words, it consists on network performance measurements. KPIs result from statistical calculations based on counters installed on NEs that can register several indicators (e.g. failed handovers, handover types and number of voice calls). They assist in identifying the strategic value drivers in PM analysis and also in verifying if all elements across several levels of the network are using consistent strategies to achieve the shared goals. With a careful analysis, this allows to precisely identify where an action must be taken in order to improve the network’s performance [14]. While defining KPIs, it is 23 crucial to understand the metrics that are going to be measured and also their measurement frequency, complexity and benchmark. According to [15], KPIs can be divided into three types: • MEAN - KPIs produced to reflect a mean measurement based on a number of sample results; • RATIO - KPIs produced to reflect the percentage of a specific case occurrence to all the cases; • CUM - KPIs produced to reflect a cumulative measurement which is always increasing. Table 2.4: Description of the KPI categories and KPI examples. Categories Description Examples Accessibility KPIs that show the probability to provide a service to an end-user at request. Retainability KPIs that show how often an end-user abnormally loses a service connection across its duration. Integrity KPIs that show how much the services are impaired once established. Availability KPIs that show how the percentage of time that the cells are available. Mobility KPIs that show how well handovers are being performed. Quality KPIs that show how well the services are being delivered to the end-user. Call Setup Success Rate Random Access Success Rate ERAB-Retainabilty Voip ERAB-Retainabilty Downlink Traffic [MBytes] Uplink Traffic [MBytes] Availability LTE Intra Mobility Success Rate Single Radio Voice Call Continuity Average Uplink Power Resource Blocks % Of Uplink Power Resource Blocks KPIs specific to telecommunication networks can be classified into five categories: Accessibility, Retainability, Integrity, Availability and Mobility - in order to divide the measurements from distinct sectors [15]. Vendors can also have another additional category, which is Quality, according to vendor documentation. These categories are summarized on Table 2.4. Table 2.5: Netherlands P3 KPI analysis done in 2016 [16]. Voice KPIs - Drive Test T-Mobile KPN Vodafone Tele2 99.3 5.1 3.5 98.8 5.3 3.6 99.2 4.9 2.8 99.1 5.1 3.6 98.7 5.4 3.5 99.2 4.9 2.8 Big Cities Call Success Ratio (%) Call Setup Time (s) Speech Quality (MOS-LQ0) 99.7 3.7 3.7 Small Cities Call Success Ratio (%) Call Setup Time (s) Speech Quality (MOS-LQ0) 99.2 3.8 3.6 The KPI values must be within defined thresholds, depending on the environment (i.e. urban, suburban or rural), in order to fulfil the network performance requirements as well as service level agreements. Take for instance the Call Setup Success Rate KPI, that indicates how many call setups were successful in percentage. Depending on the environment, the absolute number of successful call setups 24 will vary - it will be lesser in rural environments and higher in urban environments -, affecting its ratio, slightly. Thus, the KPIs should not be the same for different environments. There is an annual benchmark report between different operators in distinct countries which are made by a widely respected consultancy firm - P3. The benchmarks are produced through the use of KPIs from different environments. The firm did a study in the Netherlands for 2016 named The Mobile Network Test in the Netherlands [16] using four mobile operators - T-Mobile, KPN, Vodafone and Tele2. Table 2.5 shows a comparison between those four operators. The aforementioned table shows a slight KPI difference between two different environments in all the operators’ networks that were tested. 2.6.3 Configuration Management Configuration Management (CM) provides the operator with the ability to assure correct and effective operation of the network as it evolves. Configuration Management (CM) actions aim to both control and monitor the active configuration on the NEs and Network Resource (NR)s. These actions can be initiated by the operator or by functions in the Operations Systems (OS) or NEs. CM actions can be taken as part of an implementation programme (e.g. additions and deletions), an optimisation programme (e.g. modifications) and to maintain the overall QoS. These actions can either target a single NE of the network or several NEs, as part of a complex procedure [17]. CM Service Components Whenever a network is first installed and activated, it is posteriorly enhanced and adapted to fulfill short and long term requirements and also to satisfy customer needs. In order to cover these aspects, the CM provides the operator with a set of capabilities, such as initial system installation, system operation to adapt the system to short term requirements, system update to overcome software bugs or equipment faults and system upgrade to enhance or extend the network by features or equipment respectively. These capabilities are provided by the management system through its service components – system modification and system monitoring. The former is used whenever it is necessary to adapt the system data to a new requirement due to optimisation or new network configurations, while the latter allows the operator to receive reports on the configuration of the entire network or parts of it when there is an autonomous change of its states or values. CM Functions The requirements of CM led to system modification functions, such as the creating, deletion and conditioning of NEs and NRs. All these functions apply the following requirements: • minimal network disturbance by only taking the affected resources out of service if needed; • independent physical modifications from related logical modifications; • all the required actions should be finished before the resources are brought back to service; • data consistency checks should be taken. 25 26 Chapter 3 Machine Learning Background This chapter provides an overview of ML concepts and algorithms which will allow a better understanding of the work that will be developed along the Thesis. Section 3.1 gives a brief introduction about ML and its three main components – representation, evaluation and optimization – are presented and explained in Section 3.2. Section 3.3 focuses on the main goal of ML – generalization. Sections 3.4 and 3.5 present the three biggest challenges of ML – underfitting, overfitting and dimensionality – and how it is possible to mitigate them. Section 3.6 explains the concept of feature engineering and its importance for ML problems. Section 3.7 compares the benefits of having more data versus more intelligent algorithms. Section 3.8 addresses the classification problem and how it can be applied to time series using ML. Lastly, Section 3.10 explains how ML classification models are evaluated. The content of this chapter is mainly based on the following references: [18, 19] in Section 3.1; [18] in Sections 3.2 and 3.3; [18, 20] in Sections 3.4 and 3.5; [18] in Section 3.6; [21, 22, 23, 24] in Section 3.8; [25, 26, 27, 28, 29, 30, 21] in Section 3.9; [31] in Section 3.10. 3.1 Machine Learning Overview To solve a problem on a computer, an algorithm is typically needed. However, it is not possible to build an algorithm for some tasks, such as differentiating spam emails from legitimate emails. In this case, both the input – email documents that consist on files with characters – and the output – a yes or no, indicating if whether the message is spam or not are known; what is not known is how to transform the input to the output. It is believed that there is a process that explains the observed data, but it is not possible to identify it completely. What is possible to do is to make a good and useful approximation, which may not explain everything, but can at least explain some part of the data. This is done through the detection of certain patterns or regularities which can help to understand the process or to make predictions. These predictions are made under the assumption that if the near future will not be much different from the past when the sample data was collected, then future predictions can also be right. ML uses the theory of statistics in building mathematical models, for making an inference from data 27 samples. The procedure starts with an initial model with some pre-defined parameters which are optimized through a learning algorithm using a set of training data. The model may be: • supervised – to make predictions in the future; • unsupervised – to gain knowledge from data; • semi-supervised – to gain knowledge from data in order to perform predictions. It is important to mention two aspects of using ML: first, in training, efficient algorithms are needed to solve the optimization problem, as well as to process the massive amount of data that generally is available; second, once a model is learned, its representation and algorithm solution for inference needs to be efficient. In specific applications, the efficiency of the learning or inference algorithms (i.e. its memory space and time complexity) may be as important as its predictive accuracy. The previous example of email spam differentiation is a mature type of ML, called classification, and it is a case of supervised learning. A classifier is a system that typically accepts a vector of discrete and/or continuous feature values and outputs a single discrete value – the class; a feature is an individual measurable property of an observed phenomenon. The spam filter classifies the email messages into ”spam” or ”not spam” and its input might be a Boolean vector x = (x1 , ..., xj , ...xd ), where xj = 1 if the j th word in the dictionary appears in the email and xj = 0 otherwise. A classifier learns from a training set of examples (xi , yi ), where xi = (xi,1 , ..., xi,d ) is an observed input corresponding to an email’s word dictionary Boolean vector and yi is the corresponding output of whether that email is spam or not, resulting in the output of a model. The classifier is tested whether the model produces the correct output yt for future examples xt . For the spam filter, this means that it will test whether it correctly classifies unseen emails as ”spam” or ”not spam”. 3.2 Machine Learning Components The first problem that is faced when a possible application for ML is found, is the large variety of available learning algorithms. This problem consists on combinations of three components: • representation – a classifier must be represented in a formal language that is recognizable and manageable by a computer. Choosing a representation for a classifier is equivalent to choosing the set of classifiers that it can learn. This set is called the hypothesis space of the classifier and if a model is not in it, then it cannot be learned. • evaluation – an evaluation function is needed to score different models. The evaluation function used internally by the algorithm may be different from the external one, which the classifier tries to optimize, for ease of implementation. • optimization – the method to search for the highest-scoring model amongst other models. The choice of an optimization technique is crucial to the efficiency of the classifier and also allows to verify if the evaluation function in the produced model has more than one optimum solution. 28 Table 3.1: The three components of learning algorithms (adapted from [18]). Representation Evaluation Optimization Instances k-nearest neighbor Support vector machines Hyperplanes Naive Bayes Logistic regression Decision trees Sets of rules Propositional rules Logic programs Neural networks Graphical models Bayesian networks Conditional random fields Accuracy/Error rate Precision and recall Squared error Likelihood Posterior probability Information gain K-L divergence Cost/Utility Margin Combinatorial optimization Greedy search Beam search Branch-and-bound Continuous optimization Unconstrained Gradient descent Conjugate gradient Quasi-Newton methods Constrained Linear programming Quadratic programming Table 3.1 shows common examples of each of the three aforementioned components. For instance, k-Nearest Neighbor (kNN) classifies a test example by finding the k most similar training examples and predicting the majority class among them. Hyperplane-based methods form a linear combination of the features per class and predict the class with the highest-valued combination. Decision trees test one feature at each internal node, with one branch for each feature value, and have class predictions at the leaves. It is also important to add that not all combinations of one component from each column of Table 3.1 make equal sense, as discrete representations tend to go with combinatorial optimization and continuous ones with continuous optimization. 3.3 Generalization ML aims mainly to generalize beyond the examples in the training set, as it may be unlikely to find these exact examples in testing sets. A common mistake that is made when beginning to study ML is to test on the training data and have the illusion of success. In fact it is easy to have good results on training sets, since the classifier just has to memorize the examples; if tested on new data, sometimes the results are not better than random guessing. In order to generalize a ML model beyond the examples in the training set, a common and simple procedure is to separate all the available data into two non-overlapping sets – the training and testing sets – with a size ratio of 4:1, or 9:1 respectively. The aforementioned ratios depend on the amount of data available, since increasing the fraction corresponding to the training set allows the creation of a more generalized model at the cost of having a smaller test set to validate the resulting model. However, the test data can influence the classifier indirectly, such as tuning the classifier parameters through the analysis of the test data results (classifier parameter tuning is a fundamental step in developing successful models). This leads to a need to have a holdout set for classifier parameter tuning at the cost of reducing the amount of available data for training. Thankfully, this penalty can be mitigated by doing k-fold cross-validation. Through this method, the training data is randomly divided into k equally 29 sized subsets, holding out each one while training on the remaining k − 1 subsets and testing each learned classifier on the subset not used for training. After iterating k times, the k-fold cross validation algorithm averages the results to evaluate the classifier parameter settings. For instance, three-fold cross-validation is represented in Figure 3.1. This method can lead to even more reliable results by running k-fold cross-validation multiple times and averaging end results at the end – Repeated k-fold cross-validation. In this last method, the data is reshuffled and divided into k new subsets for each k-fold cross-validation run [32]. Figure 3.1: Procedure of three-fold cross-validation (adapted from [32]). 3.4 Underfitting and Overfitting When the data and associated knowledge leads to a model with a lower than expected test accuracy score – ratio of correct predictions from all predictions made – it is possible to then erroneously lead to the creation of a model that is not grounded in reality. For instance, after continuous parameter adjustments it leads to a 100% accuracy score for training data and it seems to be a great model. However, after testing the model on test data it revealed to have a worse accuracy score than in the beginning. This problem is called overfitting and it starts to happen when the model is tuned to a point it starts to decrease its test accuracy score and begins to learn some random regularity contained in the set of training patterns. The opposite can also happen when the model can still be further fine tuned to have an even better test accuracy score than before. This last case is called underfitting and occurs when the model is incapable of capturing the variability of the data. Overfitting is a known problem and its many forms are not immediately obvious. Therefore, it is easier to decompose generalization error into bias and variance [33]. Bias can be defined as a classifier’s tendency to consistently learn the same wrong thing. Variance is the tendency to learn random things irrespective of the real signal. Figure 3.2 illustrates Bias and Variance through an analogy with throwing darts at a board. For instance, a linear model has high bias, because when the frontier between two classes is not a hyperplane the model is unable to induce it. However, decision trees do not have this problem as they are able to represent any Boolean function at the cost of high variance – decision trees learned from different training sets generated by the same phenomenon are often very different, when they should be the same. Similar reasoning can be applied to optimization methods – beam search has 30 Figure 3.2: Bias and variance in dart-throwing (adapted from [18]). lower bias, but higher variance compared to greedy search as it tries more hypotheses. Cross-validation can also be used to counter overfitting. To give an illustration, it can be used to choose the best size of decision tree to be learned, preventing it to be overly complex; hence, it generalizes better. However, too many parameter choices might lead to overfitting [34]. There are more methods to counter overfitting besides cross-validation. The most popular is done by adding a regularization term to the evaluation function which penalizes models with more structure, favoring smaller ones with less room to overfit. Ridge and Lasso regressions are two examples of popular regularization methods [35]. An alternative is to perform a statistical significance test such as chi-square before adding new structure to see how much the class changes with or without that new structure. That being said, skepticism is needed towards claims of techniques that solve overfitting because it is easy to avoid overfitting (high variance) by going towards the opposite problem of underfitting (high bias). Figure 3.3: Bias and variance contributing to total error. The ideal case of a null variance and of a null bias is not possible to achieve in practice as there is a tradeoff between them. With this tradeoff, the optimum model complexity is attained for the minimum of both bias and variance. This tradeoff is represented in Figure 3.3. In the optimum model complexity, the model starts to overfit if the complexity increases and starts to underfit if the complexity decreases. 31 3.5 Dimensionality The biggest problem in ML after overfitting and underfitting is the ”curse of dimensionality” (i.e. the number of used features). This expression was coined by Bellman in 1961 to express the fact that many algorithms that work fine in low dimensions become unmanageable when the input is high-dimensional [36]. In ML this problem is even bigger because generalizing correctly becomes exponentially harder as the dimensionality of the examples grows. This is due to a fixed-size training set covering a very small fraction of the input space. Thankfully, there is an effect that partially counteracts this problem, which is non-uniformity of the data. In most applications, examples are not spread uniformly throughout the instance space, but are concentrated on or near a lower-dimensional manifold. Learners can then implicitly take advantage of this lower effective dimension. There are also algorithms for reducing the dimensionality of the data [37]. In order to reduce dimensionality without losing much information, some analysis techniques were developed such as Principal Component Analysis (PCA). PCA provides a roadmap for how to reduce a complex data set to a lower dimension to reveal the sometimes hidden, simplified dynamics that often underlie it [38]. The data noise is measured by the SNR which is determined by calculating data variances. The variance between variables allows to quantify their redundancy by measuring the spread between them – the higher the spread, the lower the redundancy. With a = [a1 a2 ... an ] and 2 , between the two is represented by the b = [b1 b2 ... bn ] as two variable vectors, the covariance, σab following dot product: 2 σab = 1 abT , n−1 (3.1) where the beginning term is a constant for normalization. By building a covariance matrix between all variables, the sum of the diagonal values yields in the overall variability. PCA then replaces the original variables with new ones, called Principal Components. Principal Components are orthogonal and have variances (called eigenvalues) in decreasing order while maintaining the overall variability from the covariance matrix. Thus, it is possible to explain all the variance in the data by having all the eigenvalues. In order to reduce the data’s dimensionality and choose the most relevant Principal Components, what can be done is to choose the Principal Components which the sum of eigenvalues satisfies a defined threshold. This sum of eigenvalues of Principal Components is a cumulative function called Cumulative Proportion of Variance Explained (CPVE). The CPVE of the first k Principal Components in a dataset with n variables is given as follows: Pk λi CP V Ek = Pi=0 , n i=0 λi (3.2) where λ is an eigenvalue of a Principal Component and k ≤ n. By defining a threshold, of say 98%, it is possible to choose the Principal Components that reduce the data dimensionality while losing a small fraction of the original variance. 32 3.6 Feature Engineering Feature engineering is the process of using domain knowledge of the data to create features that make ML algorithms work. The most important factor that makes ML projects succeed is the features selected. If many independent variables that correlate well with the class are present, the learning is easy. However, learning may not be possible if the class is a very complex function of the features. Most of the effort in ML projects is to construct features from raw data, because data in its raw form usually does not allow for learning to happen. In order to have a successful ML project, the data should be relevant, well processed and plenty. That is why most of the work and time invested in this kind of projects is in gathering, integrating, cleaning and pre-processing data in addition to the trial and error that goes into feature design – evaluating models with different features and combinations. ML consists in an iterative process of running the classifier, analysing the result and modifying the data and/or the classifier. Learning is often the quickest part, while feature engineering is more difficult because it is domain-specific. 3.7 More Data and Cleverer Algorithms If the best set of features is obtained, but the models are not accurate as intended, then there are two ways to build better models – to conceive a better learning algorithm or gathering more data. ML researchers strive for the former, however the quickest path to build better classifiers is to simply get more data. Pragmatically, a simple algorithm with enormous amounts of data can beat an intelligent one with a more modest amount of data. This brings another problem – scalability. In the 1980s, the main bottleneck was data, while today is time. There are now enormous amounts of data available, however there is not enough time to process such amounts (within wanted requirements). Therefore, part of it is not used. In order to use more data in a shorter time window, faster ways to learn complex classifiers are being conceived [39]. Smarter algorithms have a small payoff because, as a first approximation, they all do the same. All classifiers essentially work by grouping nearby examples into the same class. The key difference is their meaning of ”nearby”. With non-uniformly distributed data, classifiers can produce very different class separating planes, while still approximately making the same predictions (if enough training examples are given). As a rule of thumb, it is better to try the simplest classifiers first (e.g. naive Bayes before logistic regression and kNN before SVM). More sophisticated classifiers are seductive, but usually harder to use as they have more parameters to tune in order to get good results. Classifiers can be divided into two types: those whose representation has a fixed size, such as linear classifiers, and those whose representation can grow with the data, like decision trees. Fixed size classifiers can only take relevant advantage of the amount of data up to a certain point (with more and more data, their accuracy asymptotes to a certain value). Variable-size classifiers can in principle learn any function given sufficient data, but not in practice due to the algorithm’s limitations (e.g. greedy search 33 falls into local optimum solutions, not returning the global optimum solution) and to computational cost. Moreover, there still is the curse of dimensionality where no existing amount of data may be enough. Thus, clever algorithms often have a higher payoff if better designed. This is why ML projects have a significant component of classifier design [40]. A good approach to test how much the obtained models would improve if more data was added to the training set is through observation of learning curves. A learning curve shows a measure of predictive performance on a given domain as a function of some measure of varying amounts of learning effort [41]. Learning curves are most often presented with the predictive accuracy of the test samples as a 礀 挀 愀 爀 甀 挀 渀䄀挀 漀 椀 琀 挀 椀 搀 攀 倀爀 function of the training examples as in Figure 3.4. 猀 攀 氀 洀瀀 愀 最䔀砀 渀 椀 渀 椀 愀 爀 吀 昀 漀 爀 攀 洀戀 一甀 Figure 3.4: A learning curve showing the model accuracy on test examples as function of the number of training examples. 3.8 Classification in Multivariate Time Series In ML, classification is the problem of identifying to which set of categories or classes a new observation belongs to, on the basis of a training set that consists of previous observations from the known class [19]. In order words, a classification task involves separating data into training and testing sets where each observation has a target value, known as class labels, and one or more attributes, known as features. The classification is done through an algorithm that is called a classifier, whose role is to produce a model based on the training set that predicts the classes of the test data given only the features of the testing set. A time series is a set of observations measured sequentially through time [42]. Time series analysis has turned into one of the most popular branches of statistics. Recent developments in computing have provided the basic infrastructure for fast access to vast amounts of data which allowed the analysis of time series in various sectors, such as telecommunications, medical and financial sectors. Time series can be either univariate or multivariate. The former refers to a time series of single observations recorded sequentially through time while the latter consists of sequences of values of several contemporaneous variables changing with time [43]. 34 In order to perform time series classification, one has to choose between two options that depend on the type of the data: • apply the One Nearest Neighbor (1NN) classifier with a distance metric such as Euclidean Distance or Dynamic Time Warping in order to classify a time series as the most similar one in the training set; • extract features from each time series and classify these features with an adequate classifier, such as SVM. While Dynamic Time Warping is known to be one of the best performing univariate time series classification techniques, it is also very slow as each comparison between two time series of length N has O(N 2 ) complexity [44]. For M long multivariate time series with K different time series in the testing set and a big training set with T multivariate time series, it will result in a complexity of O(M T KN 2 ) and thus will be very slow as it will compare each time series to the entire training set. Due to this limitation, Dynamic Time Warping is normally used with methods that can reduce the amount of time series to compare with the cost of a slight reduction to the accuracy. The second option can also deliver good accuracy scores and is usually much faster than the first option. The most common approach is to apply statistical calculations in order to extract data such as the mean and standard deviation that can best characterize each time series to its class. This extracted data will be used as features of each time series and then posteriorly learned and classified by a chosen algorithm. One of the best algorithms for that process is SVM. As this work will involve working with large testing and training sets under time constraints, the second option will be focused as it is the fastest one. 3.9 Proposed Classification Algorithms This section provides an in depth explanation of each proposed classification algorithm that will be used in this Thesis. The first four classifiers belong to the family of ensemble classifiers, more specifically tree ensemble classifiers, and the last one is a conventional classifier. 3.9.1 Adaptive Boosting Ensemble methods are a ML approach based on the concept of creating a highly accurate classifier by combining several weak and inaccurate classifiers. One of the ensemble techniques is boosting that consists in a two step approach. Firstly, it uses subsets of the original data to produce weak performing models (high bias, low variance) and then boosts their performance through combining them together based on a chosen cost function. Namely, AB or AdaBoost was the first practical boosting algorithm and remains one of the most used and studied classifiers [26]. The AB algorithm pseudocode is shown in Algorithm 1. There are m labeled training samples (x1 , y1 ), ..., (xm ym ) where xi belongs to some domain X and yi ∈ {−1; +1}. A weight array Wt (i) is 35 Algorithm 1 The boosting algorithm AdaBoost (adapted from [25]). Input: training data (x, y) Output: the resulting model H 1: function A DAPTIVE B OOSTING ((x, y)) 2: H←Ø 3: for i = 1, ..., m do 4: W1 (i) ← 1/m 5: for t = 1, ..., T do 6: randomly choose a data subset Dt from the training data set according to the samples’ weight 7: fit a weak learner ht (x, θt ) from data subset Dt 8: measure the performance of ht (x, θt ) by its weighted error Wt 9: calculate hypothesis weight αt 10: for i = 1, ..., m do 11: update Wt+1 (i) 12: H ← H ∪ {αt ht (x, θt )} 13: return H initialized over the m training samples in order to all to have the same starting weight. A data subset Dt is randomly obtained from the training set considering the weights of each sample (samples with higher weights are more likely to be chosen) for each iteration t = 1, ..., T . A weak classifier is then fitted ht (x, θt ) : X → {−1; +1}, with θt being its parameters, and the aim of the weak classifier is to find a weak hypothesis with low weighted error εt relative to Wt (i), where: X εt = P ri∼Wt [ht (xi , θt ) 6= yi ] = Wt (i). (3.3) i:ht (xi ,θt )6=yi Afterwards, a weight αt is assigned to the resulting hypothesis from ht (x, θt ): αt = 1 − εt 1 ln( ). 2 εt (3.4) In the end of each iteration the weight of each sample Wt (i) is updated proportionally to its weighted classification error: Wt+1 (i) = Wt (i)exp(−αt yi ht (xi , θt )) , Zt (3.5) where Zt is a normalization factor. The final hypothesis obtained from model H computes the sign of a weighted combination of weak hypotheses, working as a majority vote of the weak hypotheses ht (x, θt ) with weight αt : H(x) = sign T X ! αt ht (x, θt )) . (3.6) t=1 The AB algorithm is often used with Decision Trees as weak learners. A Decision Tree classifies a sample in accordance to which node it goes to, after passing through several conditions in each branch split or decision point. These conditions are based on comparisons regarding the features of the sample and they determine the path taken by the sample across the tree. A simple example of a Decision Tree is shown in Figure 3.5, where a decision if a football match should be played depends on the weather 36 conditions. Decision Trees are used because they are easily interpretable; they allow for nonlinear data classification; they give different importances to features, performing feature selection; they are fast to classify data. Moreover, Decision Trees have a key disadvantage which shows up whenever a Decision Tree does not have a growth limit – it easily overfits to the training data. Boosting a Decision Tree increases its resistance to overfitting if the weak learners’ accuracy is higher than random guessing. Two situations of resistance to overfitting and occurrence of overfitting in boosting are set in Figure 3.6. Pl a yaf oot ba l l ma t c h Wea t her Ra i n Ov er c a s t Y e s Wi nd Wea k Y e s S unny S t r ong Humi di t y Nor ma l No Y e s Hi gh No Figure 3.5: Example of a Decision Tree to decide whether a football match should be played based on the weather (adapted from [45]). If using Decision Trees, AB has as regularization parameters: the maximum depth limit of the tree, the minimum number of samples required to create a leaf node and the minimum number of samples required to split an internal node. AB can also use another regularization parameter which is the learning rate with the aim of shrinking the contribution of each weak trained model to the ensemble model. This previous regularization technique is known as shrinkage and it has shown to dramatically increase test set accuracy because it leads to less steps that allow for obtaining the loss function minimum more precisely. Figure 3.6: Left: The training and test percent error rates using boosting on an Optical Character Recognition dataset that do not show any signs of overfitting [25]. Right: The training and test percent error rates on a heart-disease dataset that after five iterations reveal overfitting [25]. 37 3.9.2 Gradient Boost GB is another popular boosting algorithm for creating collections of classifiers. To make a quick distinction between GB and AB, the latter varies each classifier’s training set to have samples with higher weighted error in order to minimize the overall classification error. GB calculates a negative gradient of a loss function (direction of quickest improvement) and picks a weak learner that is the closest to that gradient to add to the model [28]. Algorithm 2 The boosting algorithm Gradient Boost (adapted from [27]). Input: training data (x, y) Output: the function estimate fb 1: function G RADIENT B OOST((x, y)) 2: fb0 ← Ø 3: for t = 1, ..., T do 4: compute the negative gradient gt (x) 5: fit a new weak learner h(x, θt ) 6: find the best gradient descent step-size ρt 7: update the function estimate fbt 8: return fb The GB algorithm pseudocode is shown in Algorithm 2. Let f be the unknown functional dependence f x → y where x is an input variable and y its respective label. The goal is to obtain an estimate fb (i.e. a model) in order to minimize a loss function ψ(y, f ): fb(x) = y, (3.7) fb(x) = argmin ψ(y, f (x)). f (x) In order to minimize the loss function ψ(y, f ), what GB does is to choose a weak learner h(x, θt ) that is closest to a negative gradient gt (xi ) along the training data in each iteration t = 1, ..., T : gt (x) = Ey ∂ψ(y, f (x)) |x ∂f (x) , (3.8) f (x)=fd t−1 (x) where Ey is the expected y loss. Instead of searching for the general solution for the boost increment in the function space, one can choose the new function increment to be the most correlated with −gt (x). This allows for the replacement of a potentially complex optimization task with the simple and classic least-square minimization task: M X (ρt , θt ) = argmin [−gt (xi ) + ρh(xi , θ)]2 , ρ,θ (3.9) i=1 and ρt = argminρ M X ψ[yi , fd t−1 (xi ) + ρh(xi , θt )]. (3.10) i=1 where M is the total number of samples in the training set and ρt is the gradient step size of iteration t. In the end of each iteration, for instance iteration t, the function estimate fbt is updated as follows: 38 fbt = fd t−1 + ρh(x, θt ). (3.11) The performance of GB depends heavily on the chosen loss function ψ(y, f (x)) and weak learner h(x, θ). The weak learner chosen for this study will be Decision Trees, for the reasons explained in the last section. A common loss function is the mean squared error: ψ(y, f (x)) = M X [yi − ybi ]2 , (3.12) i=1 where ybi is predicted output. Another popular loss function is logistic loss for logistic regression: ψ(y, f (x)) = M X [yi ln(1 + exp(−b yi )) + (1 − yi )ln(1 + exp(b yi ))]. (3.13) i=1 By using Decision Trees as weak learners, GB can avoid overfitting by having as regularization parameters the maximum depth limit of the tree, the minimum number of samples required to create a leaf node and the minimum number of samples required to split an internal node. Similarly to AB, GB also allows for shrinkage and its learning rate can be changed to reduce overfitting to data. 3.9.3 Extremely Randomized Trees Inside the family of ensemble methods, there is another type of techniques besides boosting, which are bagging inspired algorithms. In general, these algorithms aim to control generalization error through perturbation and averaging of weak learners (e.g. Decision Trees). One of those algorithms is the ERT algorithm which belongs to the family of tree ensemble algorithms and stands out by strongly randomizing both feature and cut-point choice while splitting a tree node. In the extreme case, it builds fully randomized and grown trees from the whole training set (low bias, high variance) whose structures are independent of the output values of the learning sample. A general classification procedure of a tree ensemble algorithm is shown in Figure 3.7 where a prediction y is obtained by a majority vote of all the generated Decision Trees with sample x as input. Figure 3.7: A general tree ensemble algorithm classification procedure. The ERT classifier builds an ensemble of unpruned decision trees according to the classical top39 down procedure. ERT differs from other tree-based ensemble methods as it splits nodes by choosing cut-points fully or partially at random and it uses the whole training dataset to grow the trees. Algorithm 3 The Extremely Randomized Trees splitting algorithm (adapted from [29]). Input: the local learning subset S corresponding to the node we want to split Output: a split [f < fc ] or nothing 1: function S PLIT A NODE (S) 2: if Stop Split(S) is TRUE then return NULL 3: else 4: select K features f1 , ..., fK among all non constant (in S) candidate features 5: draw K splits s1 , ..., sK , where si = Pick a random split(S, fi ), ∀i = 1, ..., K 6: return a split s∗ such that Score(s∗ , S) = maxi=1,...,K Score(si , S) 1: 2: 3: 4: 1: 2: 3: 4: 5: Inputs: a subset S and a feature f Output: a split function P ICK A RANDOM SPLIT(S, f ) S S let fmax and fmin denote the maximal and minimal value of f in S S S draw a random cut-point fc uniformly in [fmin , fmax ] return the split [f < fc ] Inputs: a subset S Output: a boolean function S TOP SPLIT(S, f ) if |S| < nmin then return TRUE else if all attributes are constant in S then return TRUE else if the output is constant in S then return TRUE else return FALSE The ERT splitting algorithm pseudocode is shown in Algorithm 3. It has as main parameters: K, the number of randomly selected features at each node; nmin , the minimum sample size for splitting a node; M , the total number of trees to grow in the ensemble. Each grown tree uses the full training set to generate the ensemble model. The K parameter determines the strength of the feature selection process, nmin the strength of averaging output noise, and M the effectiveness of the variance reduction of the ensemble model aggregation. In the end, all the predictions of the trees are aggregated to return the final prediction through a majority vote. The ERT classifier aims to strongly reduce variance through a fully randomization of the cut-point and feature combined with ensemble averaging compared to the weaker randomization schemes used by other methods. By training each weak learner with the full training set instead of data subsets, ERT thus minimizes bias. Regarding computational performance, the tree growing complexity is similar to a simple Decision Tree. However, as each node splitting procedure is totally random, ERT is expected to have faster performance compared to other tree ensemble methods which locally optimize cut-points. Being based on Decision Trees as weak learners, ERT can avoid overfitting by having as regularization parameters the maximum depth limit of the tree, the minimum number of samples required to create a leaf node and the minimum number of samples required to split an internal node. 3.9.4 Random Forest RF is another bagging inspired algorithm in the family of tree ensemble algorithms. Similarly to ERT, the basic premise of RF is that building a small Decision Tree with few features is a computationally cheap 40 process. Furthermore, several small and weak trees can be grown in parallel and these set of Decision Trees then result in a strong classifier algorithm by averaging or by majority vote, which can be observed once more in Figure 3.7. Algorithm 4 The Random Forest algorithm. Input: the training set S, features F and number of trees in forest B Output: the resulting model H 1: function R ANDOM F OREST (S, F, B) 2: H←Ø 3: for i ∈ 1, ..., B do 4: S i ← a data subset from S 5: hi ← Randomized Tree Learn(S i , F ) 6: H ← H ∪ {hi } 7: return H Inputs: a subset S i and features F Output: a learned tree hi 1: function R ANDOMIZED T REE L EARN (S i , F ) 2: hi ← Ø 3: for each generated node n in tree hi do 4: f ← a very small subset of F 5: split on best feature in f 6: return the learned tree hi RF is similar to ERT with the exception of two steps. First, it uses data subsets for growing its trees (where ERT uses the whole training dataset). Second, it uses a very small subset of features to be chosen on splitting a node (where ERT chooses a random feature from all features). The RF pseudocode is shown in Algorithm 4. The resulting RF model is firstly initialized and then for each grown tree in the ensemble , a data subset S i for the it h tree is used from the training set S. Each Decision Tree is grown using a modified learning algorithm where it only uses a small subset of all features f ⊂ F to perform a node split, where F is the total set of features. By limiting the split on a small subset of features, RF allows for a drastically higher learning compared to standard Decision Trees, because it is the most computationally expensive step in Decision Tree growing. Additionally, by using small subsets of features f , RF increases the chance of growing uncorrelated weak learners, because standard Decision Trees based ensembles result in splits made by the same features, resulting in more correlated outcomes. By growing more uncorrelated weak learners, the better the ensemble algorithm is at predicting outcomes. By using Decision Trees as weak learners, RF can also increase its resistance to overfitting by having as regularization parameters the maximum depth limit of the tree, the minimum number of samples required to create a leaf node and the minimum number of samples required to split an internal node. 3.9.5 Support Vector Machines SVMs aim to separate data points of different classes through the use of hyperplanes that define decision boundaries. They are capable of handling linear and non-linear classification tasks. The main idea behind SVMs is to map the original observations from the input space into a high-dimensional feature 41 space such that the classification problem becomes simpler. The mapping is performed by a suitable choice of a kernel function and is represented in Figure 3.8. Figure 3.8: Data mapping from the input space to a high-dimensional feature space to obtain a linear separation (adapted from [21]). d Considering a training data set {xi , yi }N i=1 , with xi ∈ R being the input vectors and yi ∈ {−1, +1} the class labels. SVMs map the d-dimensional input vector x from the input space to the dh -dimensional feature space using a linear or nonlinear function ϕ(·) : Rd → Rdh . The hyperplane that separates the classes in the feature space is defined as wT ϕ(x) + b = 0, with b ∈ R and w an unknown vector with the same dimensions as ϕ(x). An observation x is assigned to the first class if f (x) = sign(wT ϕ(x) + b) equals +1 or to the second class if f (x) equals -1. Figure 3.9: The hyperplane constructed by SVMs that maximizes the margin (adapted from [21]). SVMs are based on the maximum margin principle and aim at constructing a hyperplane with maximum distance between the two classes, which can be seen in Figure 3.9. However, the data of both classes from most real life applications are overlapped, which makes a perfect linear separation impossible. Thus, there should be a certain number of tolerated misclassifications around the margin. The resulting optimization problem for SVMs where the violation of the constraints is penalized is written as 42 N X 1 min (w, ξ) = wT w + C ξi , w,ξ,b 2 i=1 (3.14) such that yi (wT ϕ(xi ) + b) ≥ 1 − ξi , i = 1, ..., N, (3.15) and ξi ≥ 0, (3.16) i = 1, ..., N, where C is a positive regularization constant and ξi is a slack variable that states whether a sample xi is between the margin and the correct side of the hyperplane or not. The regularization constant C in the cost function defines the trade-off between a large margin and misclassification error. A low C results in a smooth decision boundary whilst a high C aims at classifying all training examples correctly. SVMs respect the principle of structural risk minimization that balances model complexity (i.e. first term in (3.14)) and empirical error (i.e. second term in (3.14)), through regularization. Regarding the distance of xi to the decision boundary: • ξ ≥ 1 : yi (wT ϕ(xi ) + b) < 0 implies that the decision function and the target have a different sign, meaning that xi is misclassified; • 0 < ξi < 1 : xi is correctly classified, but is located inside the margin; • ξi = 0 : xi is correctly classified and is either located outside the margin or on the margin boundary. The optimization problem in (3.14) to (3.16) is typically referred to as the primal optimization problem. The optimization problem for SVMs can be written in the dual space using the Lagrange multipliers αi ≥ 0 for the first set of constraints (3.15). The solution for the Lagrange multipliers is obtained by solving a quadratic programming problem, which leads to the SVM classifier taking the form f (x) = sign #SV X ! αi yi K(x, xi ) + b , (3.17) i=1 where #SV represents the number of support vectors and the kernel function K(·, ·)) is positive definite and satisfies Mercer’s conditions, i.e. K(·, ·) = ϕ(x)T ϕ(xi ). While solving the optimization problem, only K(·, ·) is used which is related to ϕ(·). Accordingly, this allows SVMs to work in a high-dimensional feature space, without performing calculations in it. One can choose one of several types of kernels, such as • Linear SVM: K(x, z) = xT z, • Polynomial SVM of degree d: K(x, z) = (τ + xT z)d , τ ≥ 0, 2 • Radial Basis Function (RBF): K(x, z) = exp(− kx−zk 2σ 2 ), 43 where K(·, ·) is positive definite for all σ values in the RBF kernel case and τ ≥ 0 values in the polynomial 2 case. For the RBF case, kx − zk refers to the squared euclidean distance between two feature vectors and σ is a free parameter. Furthermore, the RBF can also have a simpler definition if γ = 1 2σ 2 and it defines how much weight a single training example has on the decision boundary. Regarding the aforementioned kernels, they result in global and unique solutions for (3.14) to (3.16). The SVM classifier has a notable property which is called sparseness and it means that a number of the resulting Lagrange multipliers αi equals zero. Therefore, the resulting classifier in (3.17) only takes over all nonzero αi values (i.e. support values) instead of all data points. Vectors xi are referred to as support vectors and these data points are located close to the decision boundary and aid in the construction of the separating hyperplane. Succinctly, SVMs main strengths lie in its scalability to high dimensional data, its regularization parameter γ (to avoid over-fitting) and its model training easiness (unexistence of local optima), while its main weakness lies in its dependence on a suitable kernel to function properly [46]. 3.10 Classification Model Evaluation Current research done in ML has moved away from simply presenting accuracy results when performing an empirical validation of new algorithms. Accuracy scores can simply be obtained by dividing the number of correct predictions of a classifier by the total amount of examples in the test set – the closer to 1, the better it is. It was argued that accuracy scores can be misleading, being recommended to use Receiver Operator Characteristic (ROC) curves for binary decision problems [47]. ROC curves show how the number of correctly classified positive examples varies with the number of incorrectly classified negative examples. However, ROC curves can show an overly optimistic view of an algorithm’s performance if there is a large skew in the class distribution. Precision-Recall (PR) curves, often used in Information Retrieval [48, 49], have been cited as an alternative to ROC curves for tasks with a large skew in the class distribution [50, 51, 52, 53, 54, 55]. In a binary decision problem, a classifier labels examples as either positive or negative and its decisions can be represented in a structure known as confusion matrix. The confusion matrix has four categories: • True Positive (TP) – positive examples correctly labeled as positives; • False Positive (FP) – negative examples incorrectly labeled as positives; • True Negative (TN) – negative examples correctly labeled as negatives; • False Negative (FN) – positive examples incorrectly labeled as negatives. The confusion matrix is shown in Table 3.2 and is useful to construct a point in PR space. The Recall, Precision and Accuracy metrics are defined as: 44 Table 3.2: Confusion Matrix (adapted from [31]). Actual positive Actual negative Predicted positive Predicted negative TP FN FP TN Total TP + FN FP + TN Recall = TP TP + FN TP TP + FP (3.19) TP + TN TP + TN + FP + FN (3.20) P recision = Accuracy = (3.18) where Recall measures the fraction of positive examples that are correctly labeled, Precision measures the fraction of examples classified as positive as truly positive and Accuracy measures the fraction of correctly classified examples. Precision can be thought as a measure of a classifier’s exactness – a low precision can indicate a large number of FP – while Recall can be thought as a measure of a classifier’s completeness – a low recall indicates many FN. Both the Precision and Recall metrics are often combined as their harmonic mean, known as the F-measure [56], which can be formulated as follows: F = (1 + β 2 ) × Recall × P recision , (β 2 × P recision) + Recall (3.21) where β allows to weight either Precision or Recall more heavily, with both being balanced when β = 1 [57]. For ML projects that want to minimize the number of FP at the cost of potentially more FN, then (3.21) should have β < 1, weighting more heavily the Precision metric. However, for ML projects that want to minimize the number of FN at the cost of potentially more FP, then (3.21) should have β > 1 instead, weighting more heavily the Recall metric. 45 46 Chapter 4 Physical Cell Identity Conflict Detection 4.1 Introduction This chapter introduces the PCI conflict problem that can occur in LTE radio networks, and also its subcategories – confusions and collisions. Furthermore, the steps taken towards achieving the best approach to detect PCI conflicts, by using ML models to analyse KPI daily measurements are presented. Each LTE cell has two identifiers, with different purposes – the Global Cell ID and the PCI. The Global Cell ID is used to identify the cell from an Operations, Administration and Management (OAM) perspective. The PCI has a value in the range of 0 to 503, and is used to scramble the data in order to allow mobile phones to separate information from different eNB. Since a LTE network may contain a much larger number of cells than the 504 available numbers of PCIs, the same PCI must be reused by different cells. However, an UE, which is any device used directly by an end-user to communicate, cannot distinguish between two cells if both have the same PCI and frequency bands; this phenomenon is called a PCI conflict. PCI conflicts can be divided into two situations – PCI confusions and PCI collisions. A PCI confusion occurs whenever a E-UTRAN cell has two different neighbor E-UTRAN cells with equal PCI and frequency. A PCI collision happens whenever a E-UTRAN cell has a neighbor E-UTRAN cell with identical PCI and frequency. These two events are represented in Figure 4.1. A good PCI plan can be applied to avoid most PCI conflicts. By contrast, it can be difficult to do such a plan without getting any PCI conflicts in a dense network. Moreover, network changes, namely increased power of a cell and radio channel fading, can lead to PCI conflicts. These changes might result in a mobile phone that detects a cell different from one of the PCI plan. PCI conflicts can lead to an increase of dropped calls due to failed handovers as well as an increased channel interference. This chapter is organised in five sections. After the introduction, Section 4.2 presents the chosen KPIs that were relevant for the PCI conflict classification task. Section 4.3 showcases a PCI conflict classification task based in a network vendor equipment feature for PCI conflict reporting. Section 4.4 47 Figure 4.1: PCI Confusion (left) and PCI Collision (right). presents a new PCI conflict classification approach based in configured global cell relations and tests the following three hypotheses: 1. PCI conflicts are better detected by using KPI measurements in the daily peak traffic instant of each cell. 2. PCI conflicts are better detected by extracting statistical calculations from each KPI daily time series and using them as features. 3. PCI conflicts are better detected by using each cell’s KPI measurements in each day as an individual feature. Lastly, Section 4.5 presents the preliminary conclusions within this chapter. The overall PCI conflict detection procedure using the configured global cell relations can be observed in Figure A.1. 4.2 Key Performance Indicator (KPI) Selection The first step towards reaching the objective of this investigation was to gather a list of all the available network vendor LTE KPIs and their respective documentation. In accordance with the theory behind LTE and how PCIs are used, a new list containing the most relevant KPIs for PCI conflict detection was obtained. These KPIs are represented in Tables 4.1 and 4.2. A brief time series analysis of these KPIs regarding 4200 cells over a single day is also represented in Figure 4.2. Table 4.1: Chosen Accessibility and Integrity KPIs. Accessibility Integrity RandomAcc Succ Rate DL Latency ms DL Avg Cell Throughput Mbps DL Avg UE Throughput Mbps Regarding Accessibility, RandomAcc Succ Rate refers to the success rate of random access procedures made through the PRACH, and it is relevant to detect PCI conflicts as PCIs are used for signal synchronization and random access procedures. Thus, PCI conflicts can lead to the corruption of the PRACH, reducing the success rate of random access procedures [58]. 48 Table 4.2: Chosen Mobility, Quality and Retainability KPIs. Mobility Quality Retainability IntraFreq Prep HO Succ Rate IntraFreq Exec HO Succ Rate ReEst during HO Succ Rate Average CQI UL PUCCH Interference Avg UL PUSCH Interference Avg Service Drop Rate Service Establish In Integrity, DL Latency ms measures the average time period it takes for a small IP packet to travel from the UE to the Internet server, and backwards. DL Latency ms is relevant to detect PCI conflicts, as processed handovers to unexpected PCI conflicting cells, that are far away from the UE, report higher downlink latency due to higher distance than normal to the target cell. The last two KPIs measure the average cell and UE downlink throughput, respectively. They were chosen, because PCI values are related to the positioning of the reference signals where in PCI conflicts may result in reference signal collisions. These reference signal collisions result in lower average downlink throughput for both cells and UEs [59]. In Mobility, IntraFreq Prep HO Succ rate measures the success rate of the handover preparation between cells in the same frequency band, and IntraFreq Exec HO Succ Rate refers to the success rate of processed handovers between cells in the same frequency band. The aforementioned KPIs are relevant for detecting PCI conflicts, as UEs may initiate handovers to the wrong cell that has the same PCI and frequency band (other than the one intended), resulting in more frequently failed handovers [58]. ReEst during HO Succ Rate measures the success rate of handover re-establishment to the target cell. It is relevant for detecting PCI conflicts, as processed handovers to unexpected cells may not be able to be re-established due to low coverage by the target cell, reducing the handover re-establishment success rate. In Quality, UL PUCCH Interference Avg and UL PUSCH Interference Avg measure the average noise and interference power on the Physical Uplink Control Channel (PUCCH) and on the PUSCH, respectively. These KPIs are relevant for PCI conflict detection because PCI conflicting cells have the same frequency band and might have higher noise and interference. Average CQI measures the average CQI, which is relevant to identify PCI conflicting cells because they have the same frequency band and reference signals, resulting in a situation where the channel quality might be lower than normal. Regarding Retainability, Service Drop Rate measures the drop rate of all services in a cell. It is relevant to detect PCI conflicts as service drops can happen whenever an UE attempts to perform a handover to a PCI conflicting cell and fails the handover and the handover re-establishment. The last KPI Service Establish, measures the total number of established services during a period and was chosen to differentiate cells with different amounts of traffic. Regarding Figure 4.2, it represents the distribution of each KPI’s values of 4200 LTE cells over a single day. The Interquartile Range (IQR) represents the values where 50% of the data is distributed. The IQR can be obtained through IQR = Q3 − Q1 , where both Q3 and Q1 refer to the third and first quartiles of each KPI measure across a single day. The Upper and Lower Fences correspond to Q3 + 1.5 × IQR and Q1 − 1.5 × IQR, respectively; these limits are used to check for outlier values, which are 49 UL_PUCCH_Interference_Avg dBm 10.0 7.5 5.0 2.5 95 100 105 110 115 120 Service_Drop_Rate 20000 15000 10000 5000 0.6 0.4 0.0 Miliseconds 0 0 DL_Latency_ms RandomAcc_Succ_Rate 250 1.0 200 0.8 150 100 50 0 0.6 0.4 0.2 0.0 IntraFreq_Exec_HO_Succ_Rate IntraFreq_Prep_HO_Succ_Rate ReEst_during_HO_Succ_Rate 1.0 0.8 0.8 0.8 0.6 0.4 0.6 0.4 0.2 0.0 0.0 00 :0 03 0 :0 06 0 :0 09 0 :0 12 0 :0 15 0 :0 18 0 :0 21 0 :00 0.2 Success Rate 1.0 Success Rate 1.0 00 :0 03 0 :0 06 0 :0 09 0 :0 12 0 :0 15 0 :0 18 0 :0 21 0 :00 Success Rate 40 0.6 0.4 0.2 0.0 00 :0 03 0 :0 06 0 :0 09 0 :0 12 0 :0 15 0 :0 18 0 :0 21 0 :00 Mbps 150 60 20 0.2 0 50 80 0.8 DL_Avg_UE_Throughput_Mbps 100 DL_Avg_Cell_Throughput_Mbps 1.0 Drop Rate Established Services Service_Establish 95 100 105 110 115 120 125 Mbps Average CQI 12.5 UL_PUSCH_Interference_Avg dBm Average_CQI Success Rate 15.0 Median KPI Values Outlier KPI Values Lower and Upper Fences Interquartile Range Figure 4.2: Time series analysis of KPI values regarding 4200 LTE cells over a single day. all the values that are outside of the fences. The represented outlier KPI values refer to the minimum and/or maximum KPI values registered for all cells that are outside of the Upper and Lower Fences. As expected, the Service Establish reflects the regular traffic of mobile networks across a single working day, with its maximum peaks existing in lunch and late afternoon periods, and minimum traffic existing during the night periods. The traffic has a visible effect on the remaining KPI, such as high traffic leading to a lower CQI and random access procedure success rate. It can be easily observed that, with the exception of the Average CQI and RandomAcc Succ Rate KPIs, the outlier KPI values can go well beyond the Upper and Lower Fences. It can also be noted that the IntraFreq Prep HO Succ Rate and the ReEst during HO Succ Rate KPIs median values are very close to 1, but with outlier values that can go as down as to 0 which can be explained for cell malfunctioning problems. Furthermore, the Service Drop Rate median values are very close to 0, but outlier values are around the 0.2 ratio and that 50 can go up to a ratio of 1. This fact reveals the high variable nature of all KPI values of mobile networks. 4.3 Network Vendor Feature Based Detection The next step was to collect the selected KPI data from a real MNO’s LTE network, measured by equipment of a network vendor. The data gathered from the mobile network operator had an average of 7% of missing values for KPI data and 45% of missing values for the PCI conflict CM parameter. The missing KPI values can be due to measurements done by the cells that failed, while the missing PCI conflict CM parameter values can mean that the PCI Conflict Detection feature is inactive or unavailable in 45% of the cells. The PCI Conflict Detection feature was used to label each cell as conflicting or nonconflicting. It was decided to use the Service Establish KPI in order to find the 15 minute time period of each cell that had the highest amount of established services and thus higher traffic; additionally, the two previous and two following 15 minute measurements were also recorded. When LTE cells have conflicts, the conflicts are usually more noticeable through the evaluation of the cell’s KPIs in peak traffic instants, leading to the aforementioned decision. This resulted in a total of 5 measurements of 15 minute periods each, totalling a period of 1 hour and 15 minutes for each cell, with the most demanding 15 minute period in the middle. The tsfresh feature extraction package was then used to apply statistical calculations to all the KPI time series of all the cells from two consecutive days, and to retrieve the most important results through hypothesis testing [60]. Tsfresh applies hypothesis testing to each statistical calculation obtained from each KPI of each cell, based on the respective cell class and selects the most relevant ones. Tsfresh selected 87 different features of statistical calculations that had the highest contribution for the classification problem. These 87 selected features were statistical calculations obtained from the Service ReEst Succ Rate and ReEst during HO Succ Rate KPIs. Table 4.3: The obtained cumulative Confusion Matrix. Actual PCI Conflicting Actual Nonconflicting Predicted PCI Conflicting Predicted Nonconflicting 1222 106478 689 183511 Total 107700 184200 The AB, GB, ERT, RF and SVM classifiers were applied to the 87 selected features due to their known high classification performance [61, 62]. The highest Precision was obtained through the SVM classifier. The best performing hyperparameters obtained from a 10-fold cross-validation were C = 100, γ = 10−4 and tolerance for stopping criterion of tol = 10−3 (difference of the distance of an observation to the previous iteration’s margin by the actual one). The best performing kernel for the SVM classifier was the RBF kernel. Furthermore, the data needed to be standardized as the SVM classifier expects the values to range from either [−1; 1] or [0; 1]. The evaluation results were obtained after applying 100 iterations of k-fold cross-validation with k = 10 followed by a reshuffling of the data in order to maximize the generalization of the results. The data consisted of 2919 cells, where 1842 were nonconflicting cells and 1077 were PCI conflicting cells. The 51 resulting confusion matrix with the sum of all FPs, FNs, TPs and TNs obtained in the iterations, as well as the obtained model evaluation metrics, are presented in Tables 4.3 and 4.4 respectively. Table 4.4: The obtained Model Evaluation metrics. Accuracy Precision Recall F-measure Training Duration Average Testing Duration Average 63.28% 63.95% 1.13% 2.23% 0.020 ms 987 ms The Precision score was quite low even with the very small Recall score because, as FPs are required to be as low as possible, the Precision score needs to be as high as possible. The Accuracy score was marginally above to a majority classifier’s Accuracy Score of 689 + 183511 × 100 = 63.1% 689 + 183511 + 1222 + 106478 (4.1) where all cells are classified as nonconflicting. The F-measure was not much relevant with these results, as both Recall and Precision scores were low. Regarding the Training and Testing Duration Averages, they were quite low, which was a positive point. A possible reason for the low model performance could have been the selection of KPIs and data time periods, as well as the used feature extraction methods, which could have resulted in suboptimal results. However, it was the best performing approach that was applied, as using the total daily data or extracting simple statistical measurements like the mean and standard deviation lead to worse outcomes. 4.4 Global Cell Neighbor Relations Based Detection In light of the obtained results from the previous section, the best approach was to go back to the most fundamental part of building ML models – check the quality of the data. The documentation of the PCI Conflict Detection Feature of the network vendor that was used for labeling was confusing, even for experienced engineers that work with the network vendor equipment. This fact created doubts concerning the quality of the detection made by the feature and resulted in an investigation to verify the quality of the labeling (done by the network vendor feature). Thanks to a product developed by CELFINET, it is possible to know all the PCIs and frequency bands of all the configured neighbor cells of each cell that use equipment from different vendors. Otherwise, it would be very difficult, if not impossible, to confirm the network vendor feature detection quality. Two Structured Query Language (SQL) scripts were developed to detect PCI confusions – check for configured neighbor cells with equal PCI and frequency bands – and to detect PCI collisions – check for configured neighbor cells with equal PCI and frequency bands as the source cell. It was found that the detection offered by the network vendor feature was very different from what was obtained from those scripts. Cells where one or more cases of PCI confusions were detected through the scripts, were in fact labeled as nonconflicting by the network vendor feature and the same was observed for PCI collisions. In light of these results, it was decided not to use the network vendor feature, and use instead the written scripts based in the global cell neighbor relations since their results were more reliable. These written 52 scripts also allowed for detecting PCI collisions and confusions separately, as the network vendor feature was not able to distinguish those two types of PCI conflicts. The wrong labeling done by the network vendor feature also reflected the almost random results obtained in the previous section, as the labeling is crucial for a good functioning ML model. This new procedure to label cells required the collection of new data, as the global cell relations are updated in the database once per week. This decision led to another consequence, which was higher difficulty to collect high amounts of data as it was only possible to collect data once per week. The aforementioned consequence led to it being only possible to gather data respective to three days due to time constraints. By only using one classification algorithm, namely SVM as in the last section, it is possible that the reported results could be biased. Thus, it was decided to use a total of five different classifications algorithms that were introduced in Section 3.9, namely ERT, RF, SVM, AB, GB. These classifiers were used from the Scikit-learn library [2]. By registering the results from these five classification algorithms, it was possible to choose the best classification algorithm for each frequency band and PCI conflict (i.e. collision and confusion). The general procedure to test the hypotheses in this section consisted in the following: 1. visualising important aspects if applicable – observe, for instance, the distribution of daily KPI null values of all cells to gain insight in order to proceed to data cleaning; 2. cleaning the data with Python Data Analysis Library – observe the occurrence of null values of each KPI and of data artifacts, such as infinite values and numeric strings, and either correct or discard cells with those values; 3. hyperparameter tuning – search for the optimal hyperparameters for each classification algorithm for each frequency band and type of PCI conflict through tools from the Scikit-learn library; 4. evaluating obtained models – testing each classification algorithm on test data and registering the obtained results. 4.4.1 Data Cleaning Considerations The process of data cleaning consisted in the following five steps: 1. data visualization – observe the daily distribution of KPI null values of all cells and discard cells with outlier total KPI null values; 2. data imputation – linearly interpolate missing values in each KPI of each cell; 3. data artifact correction – check and correct any data artifacts, such as strings in a continuous variable; 4. data separation – separate the data into groups of cells with the same frequency bands; 5. dataset split – split each resulting data set into training and test sets to be used by the classification algorithms. 53 It was considered that each cell in each day was independent from itself in different days, as the used data consisted of three days in different weeks. The initial set of raw data had a total of 32750 nonconflicting cells, 3176 cells with PCI confusion and 6 cells with PCI collision. As the data consisted of time series measured by several sensors, there were high chances that the data could contain null values and other artifacts such as string values (e.g. errors, infinite values). Furthermore, in order to successfully perform classification, it was required that each time series has few or even zero null values and zero data artifacts. In order to reach this goal, it is required to perform data cleaning. The Python Data Analysis Library, known as pandas, was used for this task [63]. Figure 4.3: Boxplots of total null value count for each cell per day for three KPIs. The first step was to check for the daily distribution of null values for each KPI in all cells. It was found that only three KPIs had a third quartile of null value counts higher than zero, which are illustrated in Figure 4.3 by boxplots with the null count distribution in the background. It was noticeable that ReEst during OP Succ Rate had high occurrences of high null value counts compared to the remaining KPIs, with a median count of 66 null values per cell. The remaining two KPIs are not as degraded as the aforementioned one, with a median of zero and a third quartile of 5 null value counts per cell. Either one of two things could have been done: • remove the ReEst during HO Succ Rate KPI and delete all data with cells with the sum of null values higher than 13, which is the upper outer fence of the remaining two highest null count KPIs, thus only eliminating outliers and keeping most of the data; • keep the ReEst during HO Succ Rate KPI and delete all data with more total null counts than its first quartile (i.e. 42 null value counts). It was clear that the best choice would have been the former, but in order to study all the KPIs importances for detecting PCI conflicts, it was chosen to perform the latter. In the next subsection, the 54 correlations between the KPIs for peak traffic instants will be obtained including the feature importances given by the decision tree ensemble classifiers. Thus, the next subsection will give more insight if whether or not the decision of not considering the ReEst during HO Succ Rate KPI should be taken. After deleting all data with the sum of null values higher than 42, the data was greatly reduced to 7124 nonconflicting cells, 1511 cells with PCI confusion and 6 cells with PCI collision. The second step was to linearly interpolate missing values, as the data consisted of time series, followed by deleting any cell data that still had null values. The reason for a fraction of cell data still having null values after interpolating was due to those null values being in the first daily measurements, being not possible to interpolate. This step further reduced the data set to 5214 nonconflicting cells, 1176 cells with PCI confusion and 6 cells with PCI collision. No more data was deleted at this point. This big reduction from the initial data set was necessary to test the considered hypotheses in a more confident manner. The third step was to replace any existing data artifacts, such as unexpected strings. It was verified that both DL Avg Cell Throughput Mbps and DL Avg UE Throughput Mbps had a few occurrences of infinite throughputs. These values of both KPIs were replaced by the maximum KPI value that each cell had in that same day. No more data artifacts were present in the data. No outlier values were deleted because, as the data consisted of time series, removing outlier values of time series also meant removing the respective cell data which was already greatly reduced. Furthermore, since the great majority of the classification algorithms are decision tree based, the outlier values will not affect their performance as decision trees are robust to outliers. The fourth step involved separating LTE cell data into its different frequency bands, namely 800, 1800, 2100 and 2600 MHz. Afterwards, it was decided to only analyse the 800 and 1800 MHz bands as they represented about 91% of all the cell data. Furthermore, after the data cleaning, the 2100 and 2600 MHz frequency bands had no reported PCI conflicts. This choice of separating the cells by frequency bands was taken in order to create more specific models as they are used for different reasons. Low frequency bands, such as 800 MHz, cover bigger areas and are more used by an eNB when its higher frequency bands already have high amounts of traffic. High frequency bands, such as 1800 MHz, provide higher traffic capacity and are used in more populated environments. The resulting dataset is represented in Table 4.5. Interestingly, there were no PCI collisions in the 1800, 2100 and 2600 MHz frequency bands for either the raw and cleaned data sets. This fact could be due to cells that operate in low frequency bands, such as the 800 MHz, cover bigger areas than higher frequency ones and are located in low traffic environments which hinders its detection by the mobile network operators. Table 4.5: Resulting dataset composition subsequent to data cleaning. Cell 800 MHz Band 1800 MHz Band Nonconflicting PCI confusion PCI collision 3402 856 6 1737 320 0 The fifth and last step consisted of splitting the entire data set into the training and test sets. It was 55 decided to assign 80% of the total data set to the training set and 20% of the total data set to the test set. Due to the minimal amount of PCI collisions, it was decided to use 3 cells with PCI collision for both the training and test sets, even if the results would not have any statistical significance. 4.4.2 Classification Based on Peak Traffic Data This subsection tests the hypothesis if whether PCI conflicts can be detected by only analysing KPI values in the instant of highest traffic of each individual cell. This hypothesis was first proposed because radio network conflicts are most noticeable through KPI observation in busy traffic periods. Furthermore, by analysing only one daily measurement per KPI in each cell considerably reduces the complexity and processing power needed to detect PCI conflicts as the number of data rows per cell are highly reduced. Figure 4.4: Absolute Pearson correlation heatmap of peak traffic KPI values and the PCI conflict detection label. As the data in this subsection does not consist of time series, each KPI was considered as a feature. Therefore, it would be interesting to explore the relationships between KPIs and observe if there are 56 highly correlated KPIs. Removing highly correlated features can reduce potential overfitting issues. It was decided to remove features that would cause correlations of absolute values over 0.8. In order to observe the correlations between KPIs, a Pearson correlation heatmap of absolute values was created that can be observed in Figure 4.4. After analysing the heatmap, it was clear that the highest correlation occurs between the UL PUSCH Interference Avg and UL PUCCH Interference Avg KPIs, which was expected as the average interference power for both PUCCH and PUSCH are rather close, and behave similarly. As the aforementioned correlation, which was the highest one, was marginally lower than 0.8, all features were kept. It was also interesting to observe that the second highest correlation was between Average CQI and DL Avg UE Throughput Mbps, which was also expected as high throughputs are related to higher channel quality. In Figure 4.4 there were also correlation values between each KPI and the PCI conflict label (named as pciconflict) that identified each cell as either nonconflicting, with PCI confusions or PCI collisions. Knowing that the performance of classification algorithms is better if the features are highly correlated with the identification label, then the three best KPIs from the dataset were the Average CQI, DL Avg UE Throughput Mbps and RandomAcc Succ Rate, even by having a very small correlation. The most interesting insight that could be taken from this analysis was that KPIs related to mobility were not the highest correlated to the labelling, but were instead part of the lowest ones, which was unexpected. However, the previous fact could be due to the analysis not taking into account the whole daily KPI measurements. It was also noted that the KPI with the highest total count of null values, ReEst during HO Succ Rate, had the third lowest correlation, which strengthened the option of removing that KPI and repeat the data cleaning process. After taking conclusions from the correlation heatmap, the next step was to transform the data through standardization which will mainly benefit SVM in order to converge faster and deliver better predictions. Afterwards, it was applied 10-fold cross validation with the training set to test several combinations of hyperparameters that maximized precision for each classifier. The aforementioned process is known as grid search and it was applied by resorting to the Scikit-learn library [2]. After a total of 3 hours of grid searching for all classifiers in parallel, the best hyperparameters were obtained. Table 4.6: Average importance given to each KPI by each Decision Tree based classifier. KPI ERT RF AB GB Average CQI UL PUCCH Interference Avg UL PUSCH Interference Avg Service Establish Service Drop Rate DL Avg Cell Throughput Mbps DL Avg UE Throughput Mbps DL Latency ms RandomAcc Succ Rate IntraFreq Exec HO Succ Rate IntraFreq Prep HO Succ Rate ReEst during HO Succ Rate 0.118 0.090 0.086 0.098 0.060 0.086 0.112 0.080 0.122 0.080 0.018 0.050 0.120 0.100 0.094 0.105 0.051 0.090 0.110 0.094 0.116 0.089 0.005 0.026 0.170 0.110 0.105 0.125 0.025 0.095 0.100 0.065 0.125 0.035 0.035 0.010 0.121 0.096 0.100 0.115 0.054 0.090 0.105 0.101 0.111 0.076 0.009 0.022 Afterwards, each classification algorithm was trained with the obtained hyperparameters on the train57 ing sets containing cells of different frequency bands (800MHz and 1800 MHz). In order to further reduce the data complexity, it was decided that features with less than 5% importance given by each tree based classifier should be removed. The average feature importances that each decision tree based classification algorithm gave were registered and are represented in Table 4.6. The obtained feature importances allowed to further explore the KPI contributions for classification. One of the most interesting insights that was retrieved by the aforementioned table, was that, with exception to the Service Establish KPI, the three KPIs that had the highest importance were the ones that had the highest correlation with the PCI conflict label. The high importance of the Service Establish KPI could also be explained by the fact of the number of established services will measure the amount of traffic impacting the remaining KPIs. The importance given from all classifiers to Mobility KPIs was average for the execution of handovers, but very small for IntraFreq Prep HO Succ Rate, which was below 5%. Additionally, as the former was one of the KPIs that had the highest null value counts, it was discarded from the data set. As the ReEst during HO Succ Rate was assigned the second lowest importance from all classifiers with less than 5% of given importance, it was also discarded from the data set. Consequently, the data set was changed from 12 KPIs to 10 KPIs. The data cleaning was repeated but by removing all cell data with more than a total sum of null values of 13 (i.e. the upper fence of the two KPIs with higher null value count). This new approach resulted in a data set for the 800 MHz frequency band with 8666 nonconflicting cells, 1551 cells with PCI confusion and 6 cells with PCI collision. The 1800 MHz frequency band data set changed to a total of 16675 nonconflicting cells, 1294 cells with PCI confusion and zero cells with PCI collision. Each data set was divided once again, with 80% for training and 20% for testing. Once more, for the 800 MHz frequency band, it was decided to use 3 cells with PCI collision for both training and test sets. Table 4.7: Peak traffic PCI Confusion classification results. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT RF SVM AB GB 84.94% 84.94% 84.94% 84.94% 84.01% NaN NaN NaN NaN 29.41% 00.00% 00.00% 00.00% 00.00% 04.42% 92.43% 92.43% 92.43% 92.43% 92.13% NaN NaN NaN NaN 03.33% 00.00% 00.00% 00.00% 00.00% 02.87% With the new and cleaned data sets, grid search with 10-fold cross validation was repeated once again for each classifier. After 3 hours, the new hyperparameters that maximized the Precision were obtained. Afterwards, each classification algorithm was trained on the training data set and was tested on the test set. As each resulting model outputs probabilities and classifies as class A or class B through a specified probability threshold, a default threshold of 50% was set. The classification results for detecting PCI confusions are showcased in Table 4.7. It should be added that when a classifier did not result in any TP and FP, the Precision is represented as a Not a Number (NaN), as it results in a division by zero. It was clear that GB was the best performing classifier as it was the only one that classified data samples with a certainty above 50%, but with low Precision and low Recall for both 58 frequency bands. Nevertheless, the best Precision and Recall was delivered on the 800 MHz frequency band. The remaining models were unable to return any TP and FP. The aforementioned fact may have indicated that the data did not have enough information for this classification task. Figure 4.5: Smoothed Precision-Recall curves for peak traffic PCI confusion detection. As Table 4.7 did not present much information concerning the majority of the used classifiers, it was decided to calculate and plot their Precision-Recall curves for testing for both frequency bands test sets. The resulting plots were smoothed through a moving average with a window of size 20 and are illustrated in Figure 4.5. The area under each classifier’s curve is its average Precision. Through a close analysis of the plot for the 800 MHz frequency band, it was clear that GB was the best performing classifier with precision peaking at 35% until reaching 25% Recall. Thenceforth, RF and ERT show higher Precision, with ERT having the highest average Precision of 0.24. Regarding the 1800 MHz frequency band, ERT was the best performing one with higher average Precision and also with Precision as high as 80% until reaching 20% Recall. From that point onwards, its performance was approximately tied with RF and AB. For both cases, SVM was clearly the worst performing classifier with this data. The lower average Precision comparatively to the 800 MHz frequency band could be due to the 1800 MHz frequency band having a different cell class balance and also being commonly used over different environments, with different amounts of traffic that can hinder the classification process. After analysing the classification results, the next step was to evaluate how much time each classifier took to train and to test for each frequency band, which is showcased in Table 4.8. In general, the fastest performing classifier was SVM, which was the worst performing one. Additionally, the classifier with fastest testing performance was GB with real-time testing times of 0.1 and 0.2 seconds to test data sets with thousands of data samples. It should also be pointed out that the presented time durations are highly influenced by the chosen number of iterations or estimators for each classifier. With these results, the GB classifier could be chosen for the 800 MHz frequency band due to its fast training and testing 59 Table 4.8: PCI Confusion classification training and testing times in seconds. 800 MHz Band 1800 MHz Band Model Training time [s] Testing time [s] Training time [s] Testing time [s] ERT RF SVM AB GB 18.7 27.6 1.5 10.9 11.8 5.2 4.7 0.1 0.2 0.1 17.8 59 8.3 18.6 31 6 7.7 0.4 0.3 0.2 times as well as good Precision scores for low Recall. For the 1800 MHz, it is harder to choose the best performing classifier, but the ERT could be chosen as it was the one with highest average Precision. In order to verify whether or not enough data was used for the classification task, it was decided to build and plot learning curves for both frequency bands. The learning curves applied 5-fold cross validation for 5 different training set sizes and measured the Precision-Recall area, which is also known as average Precision score. The resulting learning curves are illustrated in Figure 4.6. The main insight that could be taken from the learning curves is that the average Precision scores were already approximately stabilized for the two last training set sizes for both frequency bands. This fact proves that the results would not be significantly better if more data was added. These results, while not practical for mobile network operators, show that it might be possible to classify PCI confusions through KPI analysis. Figure 4.6: Learning curves for peak traffic PCI confusion detection. Regarding PCI collision classification, as there are only 3 cases of it on the training and test sets, the classification results could not be significant. Nevertheless, grid search with 10-fold cross validation was performed for each classifier and the optimal hyperparameters were obtained after 3 hours. The classifiers were then trained on the training set and were tested on the test set. No classifier could classify a sample as a PCI collision with more than 50% certainty, so it was decided not to show the 60 table with the results. The Precision-Recall curves were obtained and plotted. It was chosen not to add the resulting plot to this work because the maximum Precision that was obtained was 6% with 33% Recall for SVM, not adding much visually. Furthermore, the average Precision was 1% for ERT and AB. Due to the marginally low number of PCI collisions in the training set, it was not possible to obtain and plot the learning curves. Even with these not significant results, the SVM classifier could be the best classification algorithm for PCI collision classification. These results, while not significant due to the marginally low number of PCI collisions, show that it is not possible to classify PCI collisions by only analysing the KPI measurements at daily peak traffic instants. 4.4.3 Classification Based on Feature Extraction This subsection tests the hypothesis if PCI conflicts can be detected by extracting statistical measurements from each KPI daily time series and use those measurements as features. This hypothesis was proposed as it is one of the main approaches to classify time series. For extracting statistical data from all KPIs time series, it was decided to use tsfresh which is a popular Python tool for this task [60]. It was first intended to extract those statistical measurements from the busiest 1 hour and 15 minute periods from each cell, centered on the daily traffic peak. Unfortunately, the extraction was not finished even after 48 hours of running tsfresh. Thus, it was chosen to run tsfresh on full daily KPI time series since it was faster. More specifically, it took 5 hours to extract statistical data from data relative to the 800 MHz and 1800 MHz frequency bands, in order to detect PCI confusions. It also took 5 hours to extract statistical from data relative to the 800 MHz frequency band, to detect PCI collisions. It should be mentioned that tsfresh did not find any statistical feature that was relevant to detect PCI collisions. Thus, all resulting statistical measurements were used as features for PCI collision detection, even if they were not statistically significant. Regarding PCI confusions, 798 and 909 features were extracted for the 800 MHz and 1800 MHz frequency bands, respectively. Concerning PCI collisions, 2200 features were extracted for the 800 MHz frequency band that were not selected through hypothesis testing, as mentioned above. Due to the high number of total resulting features, this new data set brings dimensionality problems. Fortunately, decision tree based classifiers are resistant to these problems. The same cannot be said regarding SVM, which can result in hours of model training and possibly overfitting problems. Hence, concerning the use of SVM, it was decided to reduce the data’s dimensionality through the application of PCA. It was defined that the number of Principal Component (PC)s chosen should result in a respective CPVE that reaches 98%. This decision ensures to reduce dimensionality and retain most of the information that the data contains. The data was first standardized and then the PCA was applied to the data, retrieving each PC’s eigenvalues. Each eigenvalue was divided by the sum of all eigenvalues, resulting in the cumulative proportion of variance functions. The resulting functions for both 800 MHz and 1800 MHz frequency bands are illustrated in Figure 4.7. It was inferred that the data relative to the 800 MHz frequency band could be reduced to 273 PCs and also that the data relative to the 1800 MHz frequency band could be reduced to 284 PCs. The number of PCs was different for both frequency bands as their 61 Cumulative Proportion of Variance Explained 800 MHz Band 1800 MHz Band 1.00 0.98 1.00 0.98 0.75 0.75 0.50 0.50 0.25 0.25 0 200 273 400 600 800 0 200 Principal Component 284 400 600 800 Principal Component Figure 4.7: The CPVE for PCI confusion detection. data had different number of features. These PCs will be used as new features to the SVM classifier, resulting in a dimensionality reduction of around 30% with only a 2% variance loss. As soon as the data sets were ready with the new features, grid search with 10-fold cross validation was performed for each classifier. After 11 hours, the new hyperparameters were obtained and were used to train and test each obtained model. The obtained results for detecting PCI confusions are showcased in Table 4.9. With this new approach, two models were able to classify cells with PCI confusion with approximately 50% Precision and 2.5% Recall for the 800 MHz frequency band. The best performing model was obtained by AB, however no model could classify a cell with PCI confusion for the 1800 MHz frequency band. This last fact could be due to the 1800 MHz frequency band having a different class balance and also due to its frequent use in different radio environments that have different amounts of traffic. Table 4.9: Statistical data based PCI confusion classification results. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT RF SVM AB GB 85.24% 85.24% 85.24% 85.24% 85.18% NaN NaN NaN 50.00% 46.00% 00.00% 00.00% 00.00% 02.83% 02.43% 93.27% 93.27% 93.27% 93.27% 93.27% NaN NaN NaN NaN NaN 00.00% 00.00% 00.00% 00.00% 00.00% As Table 4.9 did not give many insights, it was decided once again to calculate and plot the resulting Precision-Recall curves for each created model. The resulting plots were smoothed through a moving average with a window of size 20 and are illustrated in Figure 4.8. The average Precision for all models and frequency bands were overall slightly better than the previous hypothesis, specially for the 800 MHz frequency band. Regarding the 800 MHz frequency band, ERT, RF and GB had overall the best 62 performance which were similar between themselves. However, AB performed better for Recall lower than 3%. Concerning the 1800 MHz frequency band, GB performed the best until reaching 17% Recall where it starts performing similarly to ERT. Once again, SVM was the worst performing. Figure 4.8: Smoothed Precision-Recall curves for statistical data based PCI confusion detection. Afterwards, it was decided to evaluate the training and testing times of each classifier that led to the presented results. All of those times are presented in Table 4.10. ERT and RF presented the lowest training times for the 800 MHz and 1800 MHz frequency bands, respectively. GB had the lowest testing times for both frequency bands, and was also the one with the best performance overall for PCI confusion detection. Once more, the times presented are highly influenced by the chosen number of iterations or estimators for each classifier. In order to verify whether or not enough data was used for the classification task, learning curves were built and plotted for both frequency bands. The resulting learning curves are illustrated in Figure 4.9. The main insight that could be taken from the resulting plot was that the classification performance stabilized for the 800 MHz frequency band, while the performance was still slightly increasing for the 1800 MHz frequency band. However, the overall performance would not significantly increase with more data for both frequency bands. These results show an improvement from the previous hypothesis regarding PCI confusion detection. Furthermore, the results could be improved further if instead of analysing daily measurements of each cell, periods of 48 or more hours were analysed. This approach could retrieve more significant statistical features and thus result in higher classification performance. Regarding PCI collision classification, and similarly to what was done for PCI confusion classification, PCA was applied to the data to be used for SVM, as it consisted of 2200 initial features. The chosen CPVE threshold was once again 98% and the resulting function is illustrated in Figure 4.10. It was found 63 Table 4.10: Statistical data PCI confusion classification training and testing times in seconds. 800 MHz Band 1800 MHz Band Model Training time [s] Testing time [s] Training time [s] Testing time [s] ERT RF SVM AB GB 15.7 51.3 820 18.9 26.4 1.9 1.6 7.2 0.1 0.1 12.9 5.7 1511 13.1 11.4 2.2 0.1 16.9 0.1 0.2 Figure 4.9: Learning curves for statistical data based PCI confusion detection. that the data could be reduced to 619 PCs, resulting in a dimensionality reduction of approximately 30% with only a variance loss of 2%. Cumulative Proportion of Variance Explained 800 MHz Band 1.00 0.98 0.75 0.50 0.25 0 500 619 1000 1500 Principal Component Figure 4.10: The CPVE for PCI collision detection. 64 2000 Afterwards, grid search was performed once again in order to obtain the optimal hyperparameters to train and test training and testing each model, which took another 11 hours. Similarly to the previous subsection, no classifier was able to classify a PCI collision with more than 50% certainty, so it was decided not to show the table with the results. The Precision-Recall curves were obtained and plotted, which showed a maximum Precision peak of 23% with 100% Recall by RF while it was approximately zero for the remaining classifiers; the plot was not illustrated since the sample of PCI collisions in the dataset was not statistically significant. These results show an improvement relative to the last subsection and are of interest to MNOs, as PCI collisions are very rare. For instance, an analysis for PCI collisions of 15000 cells could be reduced to 15 cells, where 3 of them represented the entirety of PCI collisions. With this data and these results, the RF classifier was the best suited for the PCI collision classification task. 4.4.4 Classification Based on Raw Cell Data This subsection tests the hypothesis if PCI conflicts can be detected by using each KPI measurement, in each day, as an individual feature. This hypothesis was proposed in order to compare a more computationally intensive, but simpler approach, with the previous ones. Consequently, as there are 96 daily measurements per KPI in each cell, and a total of 10 KPIs, there is a total of 96 measurements × 10 KP Is = 960 f eatures. This approach used the same cells of the previous two subsections. The data was standardized and, in order to minimize noise from each KPI (due to several factors such as radio channel fading and user mobility), it was also smoothed with a simple moving average filter with a window of size 20. The aforementioned window size was chosen as it was the one that yielded the best results. Similarly to the last subsection, due to the high number of features, it was decided to apply PCA once again in order to reduce data dimensionality when using SVM. The CPVE threshold was defined as 98%, as previously. After applying PCA to the data, the CPVE function was obtained for the 800 MHz and 1800 MHz frequency bands and illustrated in Figure 4.11. The obtained number of PCs in the defined threshold was also the same for both frequency bands, which was 634. The CPVE was very similar to both frequency bands. This step resulted in a 34% dimensionality reduction with a variance loss of 2%. After reducing data dimensionality for the SVM classifier, the next step was to apply grid search with 10-fold cross validation on the training set. The goal was to obtain once again the optimal hyperparameters that maximized Precision and Recall for detecting PCI confusions. The hyperparameters were obtained after approximately 13 hours of grid testing. With these hyperparameters, each classification algorithm was trained and tested; results are presented in Table 4.11. With this approach, the results were notably better as more classifiers successfully predicted PCI confusions. More specifically, regarding the 800 MHz frequency band, the GB classifier was the best performing one with 75% Precision, 1.07% Recall. Additionally, GB presented slightly higher Accuracy than RF, SVM and AB classifiers that behaved as majority class classifiers. Concerning the 1800 MHz frequency band, the RF classifier was 65 Cumulative Proportion of Variance Explained 800 MHz and 1800 MHz Bands 1.00 0.98 0.75 0.50 0.25 0 250 500 634 750 1000 Principal Component Figure 4.11: The CPVE for PCI confusion detection. the best performing one in terms of Precision, with a Precision score of 100% and a Recall score of 0.9%. The GB classifier presented slightly different results, with higher Accuracy and Recall, but lower Precision. Table 4.11: Raw cell data PCI confusion classification results. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT RF SVM AB GB 85.37% 85.63% 85.63% 85.63% 85.73% 22.22% NaN NaN NaN 75.00% 00.71% 00.00% 00.00% 00.00% 01.07% 93.57% 93.60% 93.54% 93.54% 93.63% 100% 100% NaN NaN 80.00% 00.45% 00.90% 00.00% 00.00% 01.80% In order to obtain more insights regarding the performance of each model, it was decided once again to obtain and plot the Precision-Recall curves for each model. The resulting plots were smoothed with a window size of 20 and are illustrated in Figure 4.12. The increase of the average Precision was notable for both frequency bands comparatively with the last two subsections. Regarding the 800 MHz frequency band, the best performing classifier was GB, with a smoothed Precision peak of 63% and 3% Recall. Both the GB and ERT classifiers registered a notable performance increase comparatively to the two previous subsections. Concerning the 1800 MHz frequency band, ERT was the best performing one, overall, with an average Precision of 26%. Again, SVM had the worst performance, behaving closely as a majority class classifier, sometimes even worse (see Recall from around 12% onwards). Afterwards, in order to evaluate the models more deeply, the training and testing times were also registered and are presented in Table 4.12. GB was the fastest classifier to train and to test for the 800 MHz frequency band data, with a real-time testing time of 0.2 seconds. For the 1800 MHz frequency band, SVM was the fastest to train, but the worst performing, while the fastest testing classifier was GB, 66 Figure 4.12: Smoothed Precision-Recall curves for raw cell data based PCI confusion detection. once again. With these results, one can say that GB was the best classification algorithm to apply to the 800 MHz frequency band data. The ERT classifier was the most suited for the 1800 MHz frequency band data as its training and testing times are not much higher than either SVM or GB and also due to its best classification performance. Table 4.12: Raw cell data PCI confusion classification training and testing times in seconds. 800 MHz Band 1800 MHz Band Model Training time [s] Testing time [s] Training time [s] Testing time [s] ERT RF SVM AB GB 18.7 61 74 503 14.3 1.5 2.3 2.5 0.5 0.2 40.3 133 40.1 1286 136 1.4 1.5 0.6 1.6 0.2 The next step was to investigate whether or not the classification results could be improved if more data was added. In order to do so, the learning curves for all models and frequency bands were obtained, plotted and illustrated in Figure 4.13. It can easily be observed that the average Precision scores for both frequency bands did not stabilize. More specifically, the 1800 MHz frequency band data registered a higher increase of average Precision comparatively to the 800 MHz frequency band data. The obtained insight is that the results could be further improved with more data. Regarding PCI collision classification, the SVM classifier used 634 PCs as features because the 800 MHz frequency band data set was the same as for PCI confusion detection. Grid search was performed, taking 13 hours once again, and the resulting optimal hyperparameters were used to train and test each classifier. Similarly to the last two subsections, no classifier was able to predict a PCI collision over the 67 Figure 4.13: Learning curves for raw cell data PCI confusion detection. 50% probability threshold. This fact could be due to the class imbalance, as only 3 data samples in the training and test set consist of PCI collisions. Figure 4.14: Precision-Recall curves for raw cell data PCI collision detection. The Precision-Recall curves were obtained and plotted, and are shown in Figure 4.14. The approach allowed for the AB classifier to correctly classify one PCI collision out of three with no FPs. However, for 100% Recall, the resulting models only achieved around 6% of average Precision. With these results, the AB classifier was the best classifier for PCI collision classification. With the obtained results for all hypothesis, it is possible to assert that the hypothesis proposed in this subsection, not only was the simplest one, but it was also the one that conducted to the best results for both PCI confusions and collisions. This assertion was based in the fact that even with low Recall, it was able to identify PCI conflicts with the highest Precision score near to 100%. The obtained model’s 68 training was made in near real-time, in the order of minutes. The predictions were made in real-time, in less than a second. 4.5 Preliminary Conclusions The goal of this chapter was to study how the PCI is used in LTE networks in order to develop a supervised methodology to detect PCI conflicts with near real time performance. In Section 4.2 the chosen KPIs were presented by stating their meaning and how they were relevant for PCI conflict detection. A brief daily time series analysis of KPI values was presented as well, which allowed for a better comprehension of their daily behaviours. In Section 4.3 the network vendor PCI Conflict Detection feature was used to label each cell as either conflicting or nonconflicting. The Python tsfresh library was used to apply and extract significative statistical measurements from the peak 1 hour and 15 minutes of each individual cell which were used as features for a SVM classifier. The obtained model presented a Precision of 63.95% for a Recall of 1.13%. The low model performance was due to the PCI Conflict Detection feature, the selected KPIs or time period selections. In Section 4.4 a new cell labeling approach was applied by using global cell neighbor configurations to detect PCI conflicts. This new labeling approach delivered a better labeling control, allowing a distinction between PCI confusions and collisions, and proved that the network vendor PCI Detection feature was not consistent with the newly obtained labels. The data was further analysed which led to the removal of two KPIs that had high null count averages and low feature importances. The three presented hypotheses were tested by using five different classification algorithms, namely SVM, AB, GB, ERT and RF. All the obtained results from all hypotheses delivered near real time performance with training and testing times rarely going beyond 150 and 10 seconds, respectively. The hypothesis that led to the best results was using each KPI measurement in each day as an individual feature. Regarding the 800 MHz frequency band, the best model was obtained by GB, which led to an average Precision of 31% with a Precision peak of 60% for 3% Recall. Regarding the 1800 MHz frequency band, the best model was obtained by ERT, which delivered an average Precision of 26% with a Precision peak of 80% for 1% Recall. Additionally, the obtained learning curves showed that the results would significantly improve if more data was added to create the models. However, no more data was obtained since to get one more day of data required to wait one more week to get it. Furthermore, due to new security policies, it was more difficult to obtain access to new data. The fact of the third hypothesis having delivered better results comparatively to the second hypothesis, applying and extracting statistical calculations from the KPIs, could have been due to the latter leading to a loss of information by extracting statistical measurements from full daily periods. More clearly, network problems are better detected by KPIs in peak traffic instants, and that information was lost by compressing it in statistical calculations of full daily KPIs. As the third hypothesis used all the information in its raw form, the models were able to perform more effective classifications. 69 70 Chapter 5 Root Sequence Index Collision Detection 5.1 Introduction This chapter introduces the RSI collision problem that can occur in LTE radio networks; furthermore, the steps taken towards achieving the best approach to detect RSI collisions, by using ML models to analyse KPI daily measurements are described. Whenever an UE is turned on, it starts scanning the radio network for frequencies corresponding to the respective network operator. After the UE is synchronized to a frequency, it checks if it is connected to the right PLMN by reading the Master Information Block (MIB) as well as the System Information Block (SIB) 1 and 2. Namely, the SIB 2 contains the RSI which indicates the index of the logical root sequence in order to derive the PRACH preamble sequence to start the random access procedure. The random access procedure is used for service connection establishment and re-establishment, intrasystem handovers and UE synchronization for uplink and downlink data transfers. The LTE random access procedure can be performed in two different ways: by allowing non-contention based or contention based access. PRACH preambles aim to differentiate requests coming from different UEs through the PRACH. Furthermore, each LTE cell uses 64 preambles, where 24 are reserved to be chosen by the eNB for non-contention based access and the remaining 40 are randomly selected by the UEs for contention based access. Whenever two or more neighbor cells operate in the same frequency band and have the same RSI, there is a higher occurrence of preamble collision amongst the requests coming from various other UEs. The aforementioned problem is called RSI collision and can lead to an increase of failed service establishments and re-establishments as well as an increase of failed handovers. In LTE there are 838 root sequences available for preambles with each having a length of 839 bits. A UE can generate several preambles with one root sequence and a cyclic shift. The smaller the cyclic shift, the more preambles can be generated from a root sequence. Knowing that the total number of PRACH preambles available in each LTE cell is 64, then the number of sequences needed to generate 71 the 64 preambles in a given cell is: & Nrows = ' 64 length integer( sequence cyclic shif t ) . (5.1) For instance, with the RSI as 200 and cyclic shift as 110, then the required number of rows to generate the 64 preambles is: 64 = 10. = 839 integer( 110 ) Nrows (5.2) Thus, for a correct RSI planning, if a cell A has a RSI of 200, then the neighbor cells B and C must have 210 and 220 as RSI. This avoids neighbor cells using the same preambles and, thus, avoids RSI collisions. This chapter is organised in four sections. After the introduction, section 5.2 presents the chosen KPIs that were relevant for the RSI conflict classification task. Section 5.3 proposes a RSI collision classification task based in configured global cell relations and tests the following three hypotheses: 1. RSI collisions are best detected by using KPI measurements in the daily peak traffic instant of each cell. 2. RSI collisions are best detected by extracting statistical calculations from each KPI daily time series and using them as features. 3. RSI collisions are best detected by using each cell’s KPI measurements in each day as an individual feature. Lastly, Section 5.4 presents the preliminary conclusions of the work done in this chapter. The overall RSI collision detection procedure can be observed in Figure A.1. 5.2 Key Performance Indicator Selection In accordance with the theory behind LTE and how PCIs are used, a new list containing the most relevant KPIs for RSI collision detection was obtained. These KPIs are represented in Tables 5.1 and 5.2. A brief time series analysis of these KPIs regarding 23500 cells over a single day is also represented in Figure 5.1. Table 5.1: Chosen Accessibility and Mobility KPIs. Accessibility Mobility RandomAcc Succ Rate IntraFreq Exec HO Succ Rate IntraFreq Prep HO Succ Rate Regarding Accessibility, RandomAcc Succ Rate refers to the success rate of random access procedures made through the PRACH. The aforementioned KPI is supposed to be the most relevant KPI to detect RSI collisions as collisions will strongly decrease the success rate of random access procedures. 72 In Mobility, IntraFreq Prep HO Succ rate measures the success rate of the handover preparation between cells in the same frequency band and IntraFreq Exec HO Succ Rate refers to the success rate of processed handovers between cells in the same frequency band. The aforementioned KPIs are relevant for detecting RSI collisions, as handovers require performing random access procedures and may use contention based access. By using contention based access, there is a higher occurrence of two of more UEs simultaneously sending the same PRACH preamble if there is a RSI collision. Hence, resulting in more frequently failed handovers. Table 5.2: Chosen Quality and Retainability KPIs. Quality Retainability UL PUCCH Interference Avg UL PUSCH Interference Avg Service Establish Service ReEst Succ Rate In Quality, UL PUCCH Interference Avg and UL PUSCH Interference Avg measure the average noise and interference power on the PUCCH and the PUSCH respectively. These KPIs are relevant for RSI collision detection because cells with RSI collisions have the same frequency band and might have be in high density traffic areas. Thus, having increased interference. Regarding Retainability, Service Establish measures the total number of established services during a period and was chosen to differentiate cells with different amounts of traffic. Service ReEst Succ Rate refers to the success rate of service re-establishment in a given cell. This KPI is relevant to detect RSI collisions as when a UE suffers a service drop it will perform a service re-establishment request through a random access procedure. If there is a RSI collision, there will be higher occurrences of failed service re-establishments due to failed random access procedures. Regarding Figure 5.1, it represents the distribution of each KPI’s values of 23500 LTE cells over a single day. The IQR represents the values where 50% of the data is distributed. The IQR can be obtained through IQR = Q3 − Q1 , where Q3 and Q1 refer to the third and first quartiles of each KPI measure across a single day. The Upper and Lower Fences correspond to Q3 + 1.5 × IQR and Q1 − 1.5 × IQR, respectively. The Upper and Lower Fences are used to check for outlier values, which are all the values that are outside of the fences. The represented outlier KPI values refer to the minimum and/or maximum KPI values registered for all cells that are outside of the Upper and Lower Fences. As expected, the Service Establish reflects the regular traffic of mobile networks across a single working day, with its maximum peaks existing in lunch and late afternoon periods and minimum traffic existing during the night periods. The traffic has a visible effect on the remaining KPI, such as high traffic leading to a lower service re-establishment and random access procedure success rate. It can be easily observed that, with the exception of the Service ReEst Succ Rate and RandomAcc Succ Rate KPIs, the outlier KPI values can go well beyond the Upper and Lower Fences. It can also be noted that the IntraFreq Prep HO Succ Rate and the IntraFreq Exec HO Succ Rare KPIs median values are very close to 1, but with outlier values that can go as down as to 0 which can be explained for cell malfunctioning problems. These observations reveal the high variable nature of all KPI values of mobile networks across whole countries. 73 95 95 100 105 105 110 Established Services UL_PUSCH_Interference_Avg 100 dBm dBm UL_PUCCH_Interference_Avg 110 115 115 120 120 30000 25000 20000 15000 10000 5000 0 125 Service_ReEst_Succ_Rate 1.0 0.8 0.8 0.8 0.4 Success Rate 1.0 Success Rate 1.0 0.6 0.4 0.6 0.4 0.2 0.2 0.0 0.0 0.0 RandomAcc_Succ_Rate 00 :0 03 0 :0 06 0 :0 09 0 :0 12 0 :0 15 0 :0 18 0 :0 21 0 :00 0.2 00 :0 03 0 :0 06 0 :0 09 0 :0 12 0 :0 15 0 :0 18 0 :0 21 0 :00 Success Rate IntraFreq_Exec_HO_Succ_Rate IntraFreq_Prep_HO_Succ_Rate 0.6 Service_Establish 1.0 Success Rate 0.8 Median KPI Values Outlier KPI Values Lower and Upper Fences Interquartile Range 0.6 0.4 0.2 00 :0 03 0 :0 06 0 :0 09 0 :0 12 0 :0 15 0 :0 18 0 :0 21 0 :00 0.0 Figure 5.1: Time series analysis of KPI values regarding 23500 LTE cells over a single day. 5.3 Global Cell Neighbor Relations Based Detection A SQL script was made to detect RSI collisions – check for configured neighbor cells with equal RSI and frequency bands between themselves. Similarly to what was done in the previous chapter, it was decided to use a total of five different classifications algorithms that were introduced in Section 3.9, namely ERT, RF, SVM, AB, GB. These classifiers where used from the Scikit-learn library [2]. By registering the results from these five classification algorithms, it will be possible to choose the best RSI collision classification algorithm for each frequency band. The general procedure to test the hypotheses in this section consisted in the following: 1. visualising important aspects if applicable – observe, for instance, the distribution of daily KPI null values of all cells to gain insight in order to proceed to data cleaning; 2. cleaning the data with Python Data Analysis Library – observe the occurrence of null values of each KPI and of data artifacts, such as infinite values and numeric strings, and either correct or discard cells with those values; 74 3. hyperparameter tuning – search for the optimal hyperparameters for each classification algorithm for each frequency band and type of PCI conflict through tools from the Scikit-learn library; 4. evaluating obtained models – testing each classification algorithm on test data and registering the obtained results. 5.3.1 Data Cleaning Considerations The process of data cleaning consisted in the following five steps: 1. data visualization – observe the daily distribution of KPI null values of all cells and discard cells with outlier total KPI null values; 2. data imputation – linearly interpolate missing values in each KPI of each cell; 3. data artifact correction – check and correct any data artifacts, such as strings in a continuous variable; 4. data separation – separate the data into groups of cells with the same frequency bands; 5. dataset split – split each resulting data set into training and test sets to be used by the classification algorithms. It was considered that each cell in each day was independent from itself in different days, as the data used consisted of three days in different weeks. The initial set of raw data consisted of a total of 26596 nonconflicting cells and 14527 cells with RSI collision. As the data consisted of time series measured by several sensors, there were high chances that the data may contain null values and other artifacts such as string values (e.g. errors, infinite values). Furthermore, in order to successfully perform classification, it is required that each time series has few or even zero null values and zero data artifacts. In order to reach this goal, it is required to perform data cleaning. The Python Data Analysis Library, known as pandas, was used for this task of data cleaning [63]. The first step was to check for the distribution of null values for each KPI in all cells in each day. It was found that only three KPIs had a third quartile of null value counts higher than zero, which are illustrated by boxplots with the null count distribution in background in Figure 5.2. It was notable that Service ReEst Succ Rate had high occurrences of high null value counts compared to the remaining KPIs, with a median count of 30 null values per cell. The remaining two KPIs were not as degraded as the aforementioned one, with a median of zero and a third quartile of 5 null value counts per cell. Either one of two things could have been done: • remove the Service ReEst Succ Rate KPI and delete all data with cells with the sum of null values higher than 13, which was the upper outer fence of the remaining two highest null count KPIs, thus only eliminating outliers and keeping most of the data; • keep the Service ReEst Succ Rate KPI and delete all data with more total null counts than its first quartile (i.e. 13 null value counts). 75 Figure 5.2: Boxplots of total null value count for each cell per day for two KPIs. It was clear that the best option was the first one. For the same threshold of allowed null values, the first option removed a small portion of the data, while the second removed 75% of the data. Furthermore, as seen in the previous chapter, the degradation of the Service ReEst Succ Rate was very similar to the ReEst during HO Succ Rate which was removed from the data. After removing the Service ReEst Succ Rate KPI and deleting all data with the sum of null values higher than 13, the data was reduced to 17940 nonconflicting cells, 11131 cells with RSI collision. The second step was to linearly interpolate missing values, as the data consisted of time series, followed by deleting any cell data that still had null values. The reason for a fraction of cell data still having null values after interpolating was due to those null values being in the first daily measurements, being not possible to interpolate such values. This step further reduced the data set to 17906 nonconflicting cells and 11105 cells with RSI collision. No more data was deleted at this point. This big reduction from the initial data set was necessary to test the following hypotheses in a more confident manner. No data artifacts were present in the data and no outlier values were deleted because, as the data consisted of time series, removing outlier values of time series also meant removing the respective cell data which was already greatly reduced. Furthermore, as the great majority of the classification algorithms are decision tree based, the outlier values will not affect their performance as decision trees are robust to outliers. The fourth step involved separating LTE cell data into its different frequency bands, namely 800, 1800, 2100 and 2600 MHz. Afterwards, it was decided to only analyse the 800 and 1800 MHz bands as they represented about 91% of all the cell data. Furthermore, after the data cleaning, the 2100 and 2600 MHz frequency bands had only a total of 38 reported RSI collisions. This choice of separating the cells by frequency bands was taken in order to create more specific models as they are used for different reasons. Low frequency bands, such as 800 MHz, cover bigger areas and are more used by an eNB when its higher frequency bands already have high amounts of traffic. High frequency bands, such as 1800 MHz, provide higher traffic capacity and are used in more populated environments. The 800 MHz 76 frequency band data set thus consisted of 6302 nonconflicting cells and 4230 cells with RSI collision. The 1800 MHz frequency band data set consisted of 10866 nonconflicting cells and 6837 cells with RSI collision. The fifth and last step consisted of splitting the entire data set into the training and test sets. It was decided to assign 80% of the total data set to the training set and 20% of the total data set to the test set. 5.3.2 Peak Traffic Data Based Classification This subsection tests the hypothesis if RSI collisions can be detected by only analysing KPI values in the instant of highest traffic of each individual cell. By analysing only one daily measurement per KPI in each cell considerably reduces the complexity and processing power needed to detect RSI collisions as the number of data rows per cell are highly reduced. Figure 5.3: Absolute Pearson correlation heatmap of peak traffic KPI values and the RSI collision detection label. Similarly for PCI conflict detection, each KPI was considered as a feature in order to explore the relationships between KPIs. It was decided to remove features that would cause correlations of absolute values over 0.8. In order to observe the correlations between KPIs, a Pearson correlation heatmap of absolute values was created that can be observed in Figure 5.3. After analysing the heatmap, it was clear once again that the highest correlation occurs between the UL PUSCH Interference Avg and UL PUCCH Interference Avg KPIs which was expected, as already explained in the last chapter. As the aforementioned correlation, which was the highest one, was marginally lower than 0.8, then all features were kept. In Figure 5.3 there were also correlation values between each KPI and the RSI collision label (named as collision) that identified each cell as either nonconflicting or with RSI colli77 sions. Knowing that the performance of classification algorithms is stronger with variables that are highly correlated with the identification label, then the three best KPIs would be the RandomAcc Succ Rate, UL PUCCH Interference Avg and UL PUSCH Interference Avg, even if they had a very small correlation. The most interesting insight that could be taken from this analysis was that KPIs related to mobility were the lowest correlated to the labeling compared to the remaining KPIs. The previous fact could be due to the random access procedure being non-contention based for doing handovers where the eNB chooses a reserved preamble for a UE to use. After taking conclusions from the correlation heatmap, the next step was to transform the data through standardization which will mainly benefit SVM in order to converge faster and deliver better predictions. Afterwards, grid search was applied to obtain the optimal hyperparameters to create the models. After a total of 3 hours of grid searching for all classifiers in parallel, the best hyperparameters were obtained. Table 5.3: Average importance given to each KPI by each Decision Tree based classifier. KPI ERT RF AB GB RandomAcc Succ Rate UL PUCCH Interference Avg UL PUSCH Interference Avg Service Establish IntraFreq Exec HO Succ Rate IntraFreq Prep HO Succ Rate 0.352 0.202 0.150 0.144 0.115 0.037 0.230 0.213 0.176 0.141 0.178 0.062 0.350 0.130 0.160 0.050 0.310 0 0.215 0.158 0.159 0.182 0.175 0.111 Afterwards, each classification algorithm was trained with the obtained hyperparameters on both frequency bands’ training sets. In order to further reduce the data complexity, it was decided that features with less than 5% importance given by each tree based classifier would be removed. The average feature importances that each decision tree based classification algorithm gave were registered and are represented in Table 5.3. The obtained feature importances allowed to further explore the KPI contributions for classification. One of the most interesting insights that was retrieved by the aforementioned table, was that the mobility KPIs, namely the IntraFreq Exec HO Succ Rate, was the second most important feature. This was the opposite of what was thought when analysing the correlation matrix and, thus, there are some cells that do perform contention based access to perform handovers. As expected, the RandomAcc Succ Rate was the most important feature by all obtained models. However, the overall importances given to IntraFreq Prep HO Succ Rate was very small, even not being used by the AB model. As the mean of the importances was slightly higher than 5% the aforementioned KPI was not dropped from the dataset. Thus, no KPIs were dropped in this step. As the dataset was not changed, the trained models were tested on the test set. As each resulting model outputs probabilities and classifies as class A or class B through a specified probability threshold, a default threshold of 50% was chosen again. The classification results for detecting RSI collisions are showcased in Table 5.4. At first glance, no major conclusions could be taken from the results as the highest metrics were almost evenly distributed through the models. However, the highest Precisions were obtained by the ERT and RF models at the cost of delivering the lowest Recall scores. The afore78 Table 5.4: Peak traffic RSI collision classification results. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT RF SVM AB GB 62.04% 61.66% 62.42% 62.67% 62.48% 73.91% 58.06% 54.75% 54.79% 52.22% 02.79% 02.95% 16.07% 19.67% 34.75% 61.35% 61.63% 62.19% 61.95% 62.09% 75.00% 81.25% 57.22% 66.67% 57.42% 00.27% 01.19% 09.41% 03.47% 08.14% mentioned fact may have indicated that the data did not have enough information for this classification task. 800 MHz Band 1.0 1800 MHz Band ERT (area = 0.50) RF (area = 0.50) SVM (area = 0.48) AB (area = 0.53) GB (area = 0.50) ERT (area = 0.49) RF (area = 0.51) SVM (area = 0.48) AB (area = 0.51) GB (area = 0.51) 0.8 Precision 0.6 0.4 0.2 0.0 0.0 0.2 0.4 Recall 0.6 0.8 1.0 0.0 0.2 0.4 Recall 0.6 0.8 1.0 Figure 5.4: Smoothed Precision-Recall curves for peak traffic RSI collision detection. As Table 5.4 did not present much information concerning the majority of the used classifiers, it was decided to calculate and plot their Precision-Recall curves for testing for both frequency bands test sets. The obtained Precision-Recall curves are illustrated in Figure 5.4. The area under each classifier’s curve is its average Precision. Regarding the 800 MHz frequency band, it is clear that the curve relative to the AB curve has a strange behaviour. The aforementioned behaviour was due to the AB model assigning several cells with the same probability values. For both frequency bands, there was no clear best performing model as all behaved similarly. Additionally, for both cases, SVM was clearly the worst performing classifier with this data. After analysing the classification results, the next step was to evaluate how much time each classifier took to train and to test for each frequency band, which times are showcased in Table 5.5. The fastest 79 Table 5.5: RSI collision classification training and testing times in seconds. 800 MHz Band 1800 MHz Band Model Training time [s] Testing time [s] Training time [s] Testing time [s] ERT RF SVM AB GB 0.4 1.8 5.7 0.7 0.7 0.1 0.5 0.1 0.1 0.1 0.4 3.7 34.4 0.5 0.1 0.1 0.9 0.8 0.1 0.1 obtained model was GB with training and testing times going as low as 0.1 seconds. Furthermore, it was one of the best performing models. Again, it should also be pointed out that the times presented are highly influenced by the chosen number of iterations or estimators for each classifier. With these results, ERT could be chosen for the 800 MHz frequency band due to its fast training and testing times as well as its high Precision peak of 75% for low Recall. For the 1800 MHz, it is harder to choose the best performing classifier as the results were very similar. However, the GB could be chosen as it was the fastest performing model. 800 MHz Band 1800 MHz Band ERT RF SVM AB GB 0.65 ERT RF SVM AB GB Precision ­ Recall area 0.60 0.55 0.50 0.45 0.40 0.35 0.30 500 1000 1500 2000 2500 3000 Training examples 3500 4000 4500 1000 2000 3000 4000 5000 Training examples 6000 7000 Figure 5.5: Learning curves for peak traffic RSI collision detection. In order to verify whether or not enough data was used for the classification task, the learning curves for both frequency bands were obtained and plotted. The resulting learning curves are illustrated in Figure 5.5. The main insight that could be taken from the learning curves is that the average Precision scores were already approximately stabilized for the two last training set sizes for both frequency bands. Thus, results would not be significantly better if more data was added. These results, while not practical for mobile network operators, show that it is possible to classify RSI collisions through KPI analysis. 80 5.3.3 Feature Extraction Based Classification This subsection tests the hypothesis if whether RSI collisions can be detected by extracting statistical measurements from each KPI daily time series and use those measurements as features. In order to extract statistical data from all KPIs time series, it was decided to apply tsfresh once again [60]. Similarly for PCI confusion detection, it was chosen to run tsfresh on full daily KPI time series as it ran faster. More specifically, it took 5 hours to extract statistical data from data relative to the 800 MHz and 1800 MHz frequency bands in order to detect RSI collisions. 732 and 951 features were extracted for the 800 MHz and 1800 MHz frequency bands, respectively. Cumulative Proportion of Variance Explained 800 MHz Band 1800 MHz Band 1.00 0.98 1.00 0.98 0.75 0.75 0.50 0.50 0.25 0.25 0 100 200 273 300 400 0 100 Principal Component 200 284 300 400 Principal Component Figure 5.6: The CPVE for RSI collision detection. Due to the high number of total resulting features, this new data set brings dimensionality problems. Thus, the data was first standardized and then PCA was applied to the data. The resulting CPVE functions for both 800 MHz and 1800 MHz frequency bands are illustrated in Figure 5.6. It was inferred that the data relative to the 800 MHz frequency band could be reduced to 273 and 284 PCs relative to the 800 MHz and 1800 MHz frequency bands, respectively. The number of PCs was different from both frequency bands as their data had different number of features. This operation lead to a dimensionality reduction of around 35% with only a 2% variance loss. Table 5.6: Statistical data based RSI collision classification results. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT RF SVM AB GB 60.32% 64.93% 60.94% 64.02% 66.87% 100% 61.30% 54.80% 56.79% 61.60% 00.48% 32.62% 11.55% 40.83% 44.88% 62.27% 64.13% 61.79% 66.37% 69.39% 72.97% 66.94% NaN 59.88% 63.97% 02.00% 12.12% 00.00% 36.29% 45.53% As the data sets were ready with the new features, grid search with 10-fold cross validation was 81 performed for each classifier. After 11 hours, the new hyperparameters were obtained and were used to train and test each obtained model. The obtained results for detecting RSI collisions are showcased in Table 5.6. The ERT model did deliver the highest Precision for both frequency bands, however GB had the highest Accuracy and Recall overall. 800 MHz Band 1.0 1800 MHz Band ERT (area = 0.54) RF (area = 0.58) SVM (area = 0.48) AB (area = 0.60) GB (area = 0.61) ERT (area = 0.54) RF (area = 0.55) SVM (area = 0.43) AB (area = 0.61) GB (area = 0.61) 0.8 Precision 0.6 0.4 0.2 0.0 0.0 0.2 0.4 Recall 0.6 0.8 1.0 0.0 0.2 0.4 0.6 Recall 0.8 1.0 Figure 5.7: Smoothed Precision-Recall curves for statistical data based RSI collision detection. As Table 5.6 did not deliver many insights, it was decided once again to calculate and plot the resulting Precision-Recall curves for each created model. The resulting plots are illustrated in Figure 5.7. The GB model performed the best for both frequency bands, having a Precision peak of around 85% and an average Precision of 61%. The abnormal curve behaviour of the AB model was due to it assigning several cells with the same probability values. Once again, SVM was the worst performing. Table 5.7: RSI collision classification training and testing times in seconds. 800 MHz Band 1800 MHz Band Model Training time [s] Testing time [s] Training time [s] Testing time [s] ERT RF SVM AB GB 3 13 119 14.2 28.4 0.9 3.7 2.9 0.1 1 9 8 87 22.9 246 3.1 1.5 2.4 0.1 0.5 The training and testing times of each created model that led to the presented results were also collected. All of those times are presented in Table 5.7. The GB model showed testing times lower than one second, however it had one of the highest training times. More specifically, it reached 28.4 and 246 82 seconds of training time for the 800 MHz and 1800 MHz frequency band, respectively. Nonetheless, the GB model presented superior performance relative to other obtained models with near real time performance, thus being the best model overall. 800 MHz Band 1800 MHz Band ERT RF SVM AB GB 0.70 0.65 ERT RF SVM AB GB Precision ­ Recall area 0.60 0.55 0.50 0.45 0.40 0.35 1000 2000 3000 Training examples 4000 5000 2000 4000 6000 Training examples 8000 Figure 5.8: Learning curves for statistical data based RSI collision detection. In order to verify whether or not enough data was used for the classification task, learning curves were built and plotted for both frequency bands. The resulting learning curves are illustrated in Figure 5.8. The obtained learning curves showed that the collected results would not significantly increase if more data was added to the dataset. 5.3.4 Raw Cell Data Based Classification This subsection tests the hypothesis if whether RSI collisions can be detected by using each KPI measurement in each day as an individual feature. As there are 96 daily measurements per KPI in each cell as well as a total of 6 KPIs, there is a total of 96 measurements × 6 KP Is = 576 f eatures. The data was standardized and, in order to minimize noise from each KPI, was also smoothed with a simple moving average with a window of size 20. The aforementioned window size was chosen as it was the one that yielded the best results. Similarly to the last subsection, due to the high number of features, PCA was applied once again in order to reduce data dimensionality to use it as input to SVM. The CPVE threshold was defined as 98%, as previously. After applying PCA to the data, the CPVE function was obtained for the 800 MHz and 1800 MHz frequency bands and is illustrated in Figure 5.9. The obtained number of PCs in the defined threshold was also the same for both frequency bands, which was 332 PCs. The obtained CPVE was very similar to both frequency bands. This step resulted in a 58% dimensionality reduction with a variance loss of 2%. 83 Cumulative Proportion of Variance Explained 800 MHz Band 1.00 0.98 0.75 0.50 0.25 0 100 200 300 332 400 500 Principal Component Figure 5.9: The CPVE for RSI collision detection. Table 5.8: Raw cell data RSI collision classification results. 800 MHz Band 1800 MHz Band Model Accuracy Precision Recall Accuracy Precision Recall ERT RF SVM AB GB 59.49% 61.70% 60.07% 64.73% 66.41% 50.00% 62.64% 52.24% 60.38% 60.84% 00.83% 13.52% 16.61% 37.60% 47.92% 59.83% 65.55% 59.25% 64.99% 66.22% 75.00% 63.86% 46.67% 59.59% 62.72% 00.22% 33.07% 09.14% 40.32% 39.52% After reducing data dimensionality for the SVM classifier, the next step was to apply grid search with 10-fold cross validation on the training set.The goal was to obtain once again the optimal hyperparameters that maximized Precision first and Recall second for detecting RSI collisions. The hyperparameters were obtained after approximately 10 hours of grid testing. With these hyperparameters, each classification algorithm was trained and tested, and the results are showcased in Table 5.8. Once again, the GB model had more accuracy for both frequency bands. The RF and ERT models had the highest Precision for the 800 MHz and 1800 MHz frequency bands, respectively. In order to obtain more insights regarding the performance of each model, the Precision-Recall curves were obtained and plotted for each obtained model. The resulting plots are illustrated in Figure 5.10. The GB model presented the highest average Precision, while the RF and ERT models showed slightly worse average Precision. Afterwards, in order to evaluate the performance increase even further, the training and testing times were also registered and are showcased in Table 5.9. The GB model showed testing times lower than one second and the third highest training times for both frequency bands. More specifically, it took 12.8 and 24.4 seconds to train in the 800 MHz and 1800 MHz frequency bands, respectively. However, the GB model’s performance was in near real time and was thus the best performing model overall. The final step was to investigate whether or not the classification results could be improved if more 84 800 MHz Band 1.0 1800 MHz Band ERT (area = 0.54) RF (area = 0.56) SVM (area = 0.45) AB (area = 0.60) GB (area = 0.61) ERT (area = 0.54) RF (area = 0.58) SVM (area = 0.44) AB (area = 0.56) GB (area = 0.60) 0.8 Precision 0.6 0.4 0.2 0.0 0.0 0.2 0.4 Recall 0.6 0.8 1.0 0.0 0.2 0.4 0.6 Recall 0.8 1.0 Figure 5.10: Smoothed Precision-Recall curves for raw cell data RSI collision detection. Table 5.9: RSI collision classification training and testing times in seconds. 800 MHz Band 1800 MHz Band Model Training time [s] Testing time [s] Training time [s] Testing time [s] ERT RF SVM AB GB 3.1 1.9 189 17.9 12.8 0.9 0.5 5.2 0.1 0.1 0.4 0.6 395 54.1 24.4 0.1 0.2 17.9 0.1 0.2 data was added. Thus, the obtained learning curves are illustrated in Figure 5.11. The learning curves showed that the results would improve, specially for the GB model. The learning curves relative to SVM showed a big downward trend which meant that the model was overfitting to the data while the remaining models were not. 5.4 Preliminary Conclusions The goal of this chapter was to study how the RSI is used in LTE networks in order to develop a supervised methodology to detect RSI collisions with near real time performance. In Section 5.2 the chosen KPIs were presented by stating their meaning and how they were relevant for detecting RSI collisions. A brief daily KPI time series analysis was presented as well, which allowed for a better understanding of their daily behaviours. In Section 5.3 a similar cell labeling approach to the one presented in the last chapter was used, 85 800 MHz Band 1800 MHz Band ERT RF SVM AB GB 0.65 0.60 ERT RF SVM AB GB Precision ­ Recall area 0.55 0.50 0.45 0.40 0.35 0.30 1000 2000 3000 4000 Training examples 5000 6000 2000 4000 6000 Training examples 8000 Figure 5.11: Learning curves for raw cell data RSI collision detection. once again. Specifically, by using cell neighbor configurations to detect RSI collisions. The data was further analysed, which led to the removal of one KPI that had high null count averages. The three presented hypotheses were tested by using five different classification algorithms, namely SVM, AB, GB, ERT and RF. Similarly for PCI conflict detection, all the obtained results from all hypotheses delivered near real time performance with training and testing times rarely going beyond 250 and 5 seconds, respectively. The hypothesis that led to the best results was using each KPI measurement in each day as an individual feature, as was the case for PCI conflict detection. Regarding the 800 MHz frequency band, the best model was obtained by GB which led to an average Precision of 61%, with a Precision peak of about 85% for 3% Recall. Regarding the 1800 MHz frequency band, the best model was again obtained by GB which delivered an average Precision of 60%, with a Precision peak of about 85% for 1% Recall. The chosen hypothesis had similar results to the second hypothesis which applied and extracted statistical calculations from the KPIs. This fact was not the same for PCI conflict detection which had a distinct class imbalance. Specifically, PCI conflicts represented 10% of the total data in contrast to RSI collisions that represented around 40% of the total data. Additionally, RSI collisions are more easily identifiable than PCI conflicts as their main symptom is the success rate of the random access procedure, which is easily measured. Lastly, the obtained learning curves showed that the results would improve if more data was added to create the models. This was the criteria that resulted in the third hypothesis being chosen as the best one out of the three presented hypotheses. 86 Chapter 6 Conclusions 6.1 Summary This Thesis aimed to create and test ML models that were able to classify PCI conflicts and RSI collisions with a minimum FP rate and near real time performance. To achieve such goal, three different hypotheses were proposed and tested. Chapter 2 presented a general technical background of LTE radio technology, since it was important to understand how a LTE system operates and collects performance data. Chapter 3 addressed ML concepts as well as more specific ones, such as how time series can be classified to reach the Thesis’ objectives. Furthermore, it presented a technical overview of the five applied classification algorithms in order to reduce the results’ bias, namely AB, GB, ERT, RF and SVM. Chapter 4 introduced the LTE PCI network parameter, how PCI conflicts can occur, tested three hypotheses of PCI conflict detection and presented the hypotheses’ results. It was shown that the PCI is used to scramble date in order to aid mobile phones to separate information from different eNBs and has a limited range of 0 to 503 values. PCI conflicts happen when two or more neighbor cells operate in the same frequency band and share the same PCI, which can lead to service drops and failed handovers. The 12 proposed KPIs to be used for PCI conflict detection were presented and explained why they were relevant. A first approach of using the network vendor PCI Conflict Detection Feature to label the cells as either nonconflicting or conflicted was applied. By extracting statistical calculations from daily KPIs and using them as features to be applied to a SVM classifier yielded poor results, with Precision and Recall scores of 63.28% and 1.13%, respectively. This revealed that the used data should be revised. A new labelling approach was applied thanks to a CELFINET product that allows to obtain configured cell relations, which labels both PCI collisions and confusions. This new labelling revealed to be superior and more auditable comparatively with the network vendor PCI Conflict Detection Feature. The data cleaning procedure was presented and explained, as well as the reason behind splitting the dataset into the 800 MHz and 1800 MHz frequency bands. 87 Upon testing all three hypotheses, all of them yielded near real time performance with training and testing durations rarely going beyond 150 and 10 seconds, respectively. Furthermore, the third hypothesis, using each daily KPI measurement as an individual feature, yielded the best results. The best model for the 800 MHz frequency band was obtained by GB, having reached an average Precision of about 31% with a Precision peak of 80% for 3% Recall. Additionally, for the 1800 MHz frequency band, the ERT model achieved the best results with an average Precision of 26% with a Precision peak of 80% for 1% Recall. Chapter 5 introduced the LTE RSI network parameter and how RSI collisions can occur, tested three hypotheses of RSI collision detection and presented their respective results. It was shown that the RSI indicates the index of the logical root sequence in order to derive the PRACH preamble sequence to start the random access procedure. RSI collisions happen whenever two or more neighbor cells operate in the same frequency band and share the same RSI, leading to an increase of failed service establishments and re-establishments, as well as an increase of failed handovers. The approach taken to detect RSI collisions was the same as for PCI conflicts, but by only using 7 KPIs. After testing all three hypotheses, all of them yielded near real time performance with training and testing durations rarely going beyond 250 and 5 seconds, respectively. Furthermore, the third hypothesis yielded the best results, once again. The best model was obtained by GB, having reached for the 800 MHz frequency band an average Precision of about 61% with a Precision peak of 85% for 3% Recall. Additionally, for the 1800 MHz frequency band, the GB model achieved an average Precision of 60% with a Precision peak of 85% for 1% Recall. The fact of the third hypothesis having delivered better results comparatively to the second hypothesis, applying and extracting statistical calculations from the KPIs, could have been due to loss of information by extracting statistical calculations from full daily periods. Specifically, network problems are best detected by KPIs in peak traffic instants and that information could have been lost by compressing it in statistical calculations of full daily KPIs. As the third hypothesis used all the information in its raw form, the models may have been able to perform more effective classifications. The obtained results showed that after testing the model on data of a specific day, it is possible to select a probability threshold in order to vary the Recall and obtain the cells that have higher chances of having PCI conflicts or RSI collisions. This is because, the lower the Recall, the higher the Precision, as seen in the Precision-Recall curves. A point that should be stressed is that, due to several factors (i.e. distances between cells, cell radius, and radio environment), there is not a clear distinction between a nonconflicting and a conflicting cell. A cell can have a conflict with a cell that is far away, but the KPIs may not show a problem in the cell due to the large distance between the two cells. Thus, a new approach taking distances into account should be taken. Several obstacles were encountered during this work, namely in the data gathering process and in processing power which took a considerable time investment. For instance, data gathering for a single day took between 1 to 2 hours to finish. As the databases containing the configured relations were only 88 updated once per week, six weeks were required to have data from 6 days (3 for PCI conflict detection and the other 3 for RSI collision detection). Additionally, the results regarding the third hypothesis showed that they would improve with more data. Regarding processing power, limitations were most noticed in the optimal hyperparameter search. Overall, 10 hours were required to obtain the optimal hyperparameters for each classification algorithm and for each hypothesis. 6.2 Future Work There is much to be explored for both PCI conflict and RSI collision detection. For instance, taking the distances between conflicting cells should be a priority, as conflicts between distant cells will not have a noticeable impact on the KPIs. The obtained distances should be studied to find an optimal distance threshold to label cells initially reported as conflicting, as either conflicting or nonconflicting. This optimal distance threshold will be the distance where the KPIs stop being significantly affected by conflicts. To obtain such distance, an algorithm could be developed that takes into account the power emitted by the cell, antenna tilt and its azimuth. Furthermore, more data should be added to the dataset. As a single KPI measurement is not independent from the previous ones as well as from other KPI measurements in the same instant, deep learning can be applied. There is a popular deep learning network, namely the Long Short Term Memory (LSTM) network, that explores the time dependency in time series by remembering values over arbitrary intervals. It is possible to apply a LSTM network to raw multivariate time series, such as daily KPIs, in order to classify sequences as either conflicting or nonconflicting. Furthermore, there is another deep learning network, namely the Convolutional Neural Network (CNN), that takes into account the interactions between simultaneous features. Thus, it is also possible to apply a CNN to raw multivariate time series, such as daily KPIs in order to explore the interactions between simultaneous KPIs. 89 90 Appendix A PCI and RSI Conflict Detection Figure A.1: PCI and RSI Conflict Detection Flowchart. 91 92 Bibliography [1] A. Gómez-Andrades et al., “Automatic root cause analysis for LTE networks based on unsupervised techniques,” IEEE Transactions on Vehicular Technology, vol. 65, no. 4, pp. 2369–2386, 2016. [2] F. Pedregosa et al., “Scikit-learn: Machine learning in python,” Machine Learning, vol. 12, pp. 2825– 2830, Nov. 2011. [3] “Requirements for evolved UTRA and UTRAN,” 3GPP Technical Report TR 25.913, Tech. Rep. version 2.1.0., June 2005. [4] H. Holma and A. Toskala, LTE for UMTS: OFDMA and SC-FDMA Based Radio Access, 1st ed. Wiley Publishing, 2009, ISBN:978-0-470-99401-6. [5] H. Holma and A. Toskala, WCDMA for UMTS: HSPA Evolution and LTE, 4th ed. Wiley Publishing, 2007, ISBN:978-0-470-31933-8. [6] S. Sesia, I. Toufik, and M. Baker, LTE - The UMTS Long Term Evolution: From Theory to Practice, 2nd ed. Wiley Publishing, 2011, ISBN:978-0-470-66025-6. [7] C. Cox, An Introduction to LTE, LTE-advanced, SAE, VoLTE and 4G Mobile Communications, 2nd ed. Wiley Publishing, 2014, ISBN:978-1-118-81803-9. [8] N. Shankar and S. Nayak, “Performance management in network management system,” International Journal of Science and Research (IJSR), no. 4:2505-2507, May 2015, paper ID: SUB154796. [9] Cisco, “Performance Management Best Practices and Broadband Service Providers,” Cisco, Tech. Rep., June 2008. [10] M. Sattorov, S. Yeo, and H. Jo-Kang, “Pros and cons of multi-user orthogonal frequency division multiplexing,” Division of IT Engineering, Graduate School, Mokwon University, Korea, May 2008. [11] N. Dewangan, A detailed Study of 4G in Wireless Communication: Looking insight in issues in OFDM, 1st ed. Anchor Academic Publishing, 2014, ch. 1, ISBN:978-3-954-89584-7. [12] J. Sathyan, Fundamentals of EMS, NMS and OSS/BSS, 1st ed. Publications, 2010. 93 Boston, MA, USA: Auerbach [13] C. Rizos, “Challenges of network performance monitoring,” SNMPcenter, 2014. Avail- able at: http://www.snmpcenter.com/challenges-of-network-performance-monitoring/, accessed: 18/April/2017. [14] A. Weber and R. Thomas, “Key Performance Indicators - Measuring and Managing the Maintenance Function,” IVARA, Tech. Rep., November 2005. [15] “Universal Mobile Telecommunications System (UMTS); LTE; Telecommunication management; Key Performance Indicators (KPI) for Evolved Universal Terrestrial Radio Access Network (EUTRAN): Definitions (3GPP TS 32.450 version 14.0.0 Release 14),” 3GPP, Tech. Rep. 14.0.0, April 2017. [16] “The mobile network test in the netherlands,” P3, 2016. Available at: http://www. connect-testmagazine.com/wp-content/uploads/2016/03/160311_P3_connect_Mobile_ Benchmark_NL_2016_report_Release_FV.pdf, accessed: 20/April/2017. [17] “3rd Generation Partnership Project; Technical Specification Group Services and System Aspects; Telecommunication management; Configuration Management (CM); Concept and high-level requirements(Release 14),” 3GPP, Tech. Rep. 14.0.0, April 2017. [18] P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. [19] E. Alpaydin, Introduction to Machine Learning, 2nd ed. The MIT Press, 2010, ch. 1, ISBN:978-0- 262-01243-0. [20] H. K. Jabbar and R. Z. Khan, “Methods to avoid over-fitting and under-fitting in supervised machine learning (comparative study),” in Computer Science, Communication & Instrumentation Devices, H. R. Janahanlal Stephen and S. Vasavi, Eds. Research Publishing, 2015. [21] J. Luts et al., “A tutorial on support vector machine-based methods for classification problems in chemometrics,” Analytica Chimica Acta, vol. 665, no. 2, pp. 129 – 145, 2010. [22] T. Gorecki and M. Luczak, “Multivariate time series classification with parametric derivative dynamic time warping,” Expert Syst. Appl., vol. 42, no. 5, pp. 2305–2312, Apr. 2015. [23] Z. Zhang, “Introduction to machine learning: k-nearest neighbors,” Annals of Translational Medicine, vol. 4, no. 11, 2016. [24] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, “A practical guide to support vector classification,” Department of Computer Science, National Taiwan University, Tech. Rep., 2003. [25] R. E. Schapire, Explaining AdaBoost. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 37–52. [26] Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, 1997, vol. 55, no. 1, pp. 119 – 139. 94 [27] A. Natekin and A. Knoll, “Gradient boosting machines, a tutorial,” Front. Neurorobot., vol. 2013, 2013. [28] J. H. Friedman, “Greedy function approximation: A gradient boosting machine.” Ann. Statist., vol. 29, no. 5, pp. 1189–1232, 10 2001. [29] P. Geurts, D. Ernst, and L. Wehenkel, “Extremely randomized trees,” Machine Learning, vol. 63, no. 1, pp. 3–42, Apr 2006. [30] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. [31] J. Davis and M. Goadrich, “The relationship between precision-recall and ROC curves,” in Proceedings of the 23rd International Conference on Machine Learning, ser. ICML ’06. New York, NY, USA: ACM, 2006, pp. 233–240. [32] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-validation,” Encyclopedia of Database Systems, pp. 532–538, 2009. [33] P. Domingos, “A unified bias-variance decomposition and its applications,” in In Proc. 17th International Conf. on Machine Learning. Morgan Kaufmann, 2000, pp. 231–238. [34] A. Y. Ng, “Preventing ”overfitting” of cross-validation data,” in In Proceedings of the Fourteenth International Conference on Machine Learning. Morgan Kaufmann, 1997, pp. 245–253. [35] D. M. McNeish, “Using lasso for predictor selection and to assuage overfitting: A method long overlooked in behavioral sciences,” Multivariate Behavioral Research, vol. 50, no. 5, pp. 471–484, 2015. [36] R. Bellman, “Adaptive control processes: A guided tour. (A RAND Corporation Research Study).” Princeton, N. J.: Princeton University Press, XVI, 255 p. (1961)., 1961. [37] J. B. Tenenbaum, V. de Silva, and J. C. Langford, “A global geometric framework for nonlinear dimensionality reduction,” Science, vol. 290, no. 5500, 2000. [38] J. Shlens, “A tutorial on principal component analysis,” in Systems Neurobiology Laboratory, Salk Institute for Biological Studies, 2005. [39] G. Hulten and P. Domingos, “Mining complex models from arbitrarily large databases in constant time,” in Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’02. New York, NY, USA: ACM, 2002, pp. 525–531. [40] P. Langley, “Machine learning as an experimental science,” Machine Learning, vol. 3, no. 1, pp. 5–8, 1988. [41] C. Perlich, Learning Curves in Machine Learning. [42] C. Chatfield, Time-Series Forecasting. Boston, MA: Springer US, 2010, pp. 577–580. CRC Press, 2000. 95 [43] K. Chakraborty et al., “Forecasting the behavior of multivariate time series using neural networks.” Neural Networks, vol. 5, no. 6, pp. 961–970, 1992. [44] X. Wang et al., “Experimental comparison of representation methods and distance measures for time series data,” Data Min. Knowl. Discov., vol. 26, no. 2, pp. 275–309, Mar. 2013. [45] J. R. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, no. 1, pp. 81–106, Mar. 1986. [46] L. Auria and R. A. Moro, “Support vector machines (SVM) as a technique for solvency analysis,” DIW Berlin, German Institute for Economic Research, Discussion Papers of DIW Berlin 811, 2008. [47] F. J. Provost, T. Fawcett, and R. Kohavi, “The case against accuracy estimation for comparing induction algorithms,” in Proceedings of the Fifteenth International Conference on Machine Learning, ser. ICML ’98. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1998, pp. 445–453. [48] C. D. Manning and H. Schütze, Foundations of Statistical Natural Language Processing. Cam- bridge, MA, USA: MIT Press, 1999. [49] V. V. Raghavan, G. S. Jung, and P. Bollmann, “A critical investigation of recall and precision as measures of retrieval system performance.” ACM Trans. Inf. Syst., vol. 7, no. 3, pp. 205–229, 1989. [50] J. Bockhorst and M. Craven, “Markov networks for detecting overlapping elements in sequence data,” in Proceedings of the 17th International Conference on Neural Information Processing Systems, ser. NIPS’04. Cambridge, MA, USA: MIT Press, 2004, pp. 193–200. [51] R. Bunescu et al., “Comparative experiments on learning information extractors for proteins and their interactions,” Artif. Intell. Med., vol. 33, no. 2, pp. 139–155, Feb. 2005. [52] J. Davis et al., “View learning for statistical relational learning: With an application to mammography,” in Proceedings of the 19th International Joint Conference on Artificial Intelligence, ser. IJCAI’05. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2005, pp. 677–683. [53] M. Goadrich, L. Oliphant, and J. Shavlik, Learning Ensembles of First-Order Clauses for RecallPrecision Curves: A Case Study in Biomedical Information Extraction. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, pp. 98–115. [54] S. Kok and P. Domingos, “Learning the structure of markov logic networks,” in Proceedings of the 22Nd International Conference on Machine Learning, ser. ICML ’05. New York, NY, USA: ACM, 2005, pp. 441–448. [55] P. Singla and P. Domingos, “Discriminative training of markov logic networks,” in Proceedings of the 20th National Conference on Artificial Intelligence - Volume 2, ser. AAAI’05. AAAI Press, 2005, pp. 868–873. [56] C. J. V. Rijsbergen, Information Retrieval, 2nd ed. 1979. 96 Newton, MA, USA: Butterworth-Heinemann, [57] G. Hripcsak and A. S. Rothschild, “Technical brief: Agreement, the f-measure, and reliability in information retrieval.” JAMIA, vol. 12, no. 3, pp. 296–298, 2005. [58] “The value of PCI planning in LTE,” RF Assurance, 2015. Available at: http://main.rfassurance. com/?q=node/79, accessed: 07/June/2017. [59] R. Acedo-Hernández et al., “Analysis of the impact of PCI planning on downlink throughput performance in LTE,” Comput. Netw., vol. 76, pp. 42–54, Jan. 2015. [60] M. Christ, “TSFRESH,” https://github.com/blue-yonder/tsfresh, 2016. [61] L. Breiman, “Random forests,” Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001. [62] J. Zhu et al., “Multi-class adaboost,” Statistics and Its Interface, vol. 2, pp. 349–360, 2009. [63] W. McKinney, “Data structures for statistical computing in python,” in Proceedings of the 9th Python in Science Conference, S. van der Walt and J. Millman, Eds., 2010, pp. 51 – 56. 97