SQL injection Detection Using Deep Learning A Project Report Nanang Cahyadi 23219324, Rekayasa dan Manajemen Keamanan Informasi Sekolah Elektro dan Informatika Institut Teknologi Bandung Bandung en.nanangcahyadi@gmail.com Abstract—SQL injection is a major threat in the field of web application security which has become a concern in safe operating procedures. Various solutions to solve this attack by building a detection system and protecting aspects of the vulnerability in the code have been carried out. Building a system that can detect SQL injection is needed, especially during the growing field of artificial intelligence. However, the problem that occurs is the speed of implementation to overcome these problems. Therefore, this technical report will discuss the implementation and results obtained by the SQL injection detector using artificial intelligence. The approach proposed in this implementation is to generate detectors using transfer learning techniques. The results obtained from this system achieve a string detection accuracy that is rated as SQL injection of 99.77%. Keywords—Cybersecurity, SQL Injection, Transfer Learning I. INTRODUCTION With the increasing use of the web as a medium of information, the potential for attacks on web applications will increase. The potential for attack is also higher than the potential for damage that can occur because security technology is not standardized. The main problem in the prevention of SQL injection is the limited information technology and coding security. Therefore, several solutions have been proposed to overcome SQL injection by utilizing artificial intelligence. OWASP[1] implements the top 10 security threats facing web applications today every few years. In the OWASP TOP 10 of 2013 and 2017, the injection attack was ranked first, which proved its importance and danger. SQL injection is an exploit behavior that occurs at the application layer. This refers to the behavior of an attacker who bypasses web application access control to directly manipulate the database, which threatens the confidentiality, integrity, and availability of the application system. Based on research conducted by Jemaal[2], the structure of the SQL injection attack is obtained as follows. TABLE I. Parameter For Classification Attack Types Based on the exposure in table 1 it can be concluded that SQL injection attacks can come from user input, cookies which take advantage of the tautology attack type, incorrect logic of the code entered. Therefore, intuitively the solution to detect an SQL injection attack attempt is to inspect the input code from the source of the attack, then it can be determined whether the entered query is a normal SQL query or an attack attempt. II. RELATED WORK Several researchers have analyzed the measures that can be taken to prevent SQL injection. Katole[3] Demonstrated that combining several methodologies can improve SQL injection countermeasures. One of the methods is AMNESIA, Firewall method, and string analysis by deleting parameter values in SQL queries. Several researchers have tried to map various kinds of SQL injection detection proposals from previous studies. In previous studies, many have tried to propose solutions by building detectors with a machine learning approach. The following is a comparison of each proposed solution using a machine learning approach. TABLE II. SUMMARY COMPARISON OF RESEARCH RESULTS BY JEMAAL[2] Techniques Classifier(s) Performance Dataset HIPS Bayesian Naif Bayesian Multinomial Accuracy= 97.6% SQL injections data collection framework Backpropagation Network Overall accuracy= 96.8% Dataset 13,000 URL addresses including 500 benign URLs and 12,500 malicious URLs CLASSIFICATION PARAMATER DISCOVERED BY JEMAAL[2] Parameter For Classification Attack Sources Attack Goals Categories User input Cookies Server variables Second order injection Database fingerprinting Analyzing schema Extracting data Amending data Executing dos Equivocating detection Bypassing authentication Remote control Privilege intensification XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Categories Tautology Illegal/logically incorrect queries Union query Piggyback query Stored procedure Inference Alternate encoding SQLI-IDS Sheykhkanloo et al. Neural NetworkBased Model Accuracy= 95% Dataset 25000 URL addresses including 12250 benign URLs and 12250 malicious URLs Verbruggen et al. Decision Tree, SVM, Random Tree, Jribber, Neural Network, Random Forest Recognition Rate=0.986 6 different datasets contain 1876 malicious packets and Techniques Classifier(s) Performance Dataset Research Description detector SQL using the CNN 11444 normal packets. Wang et al. Stacked AutoEncoder Precision =100% 0.3 million TCP internal network Ingre e al. Decision Tree Accuracy= 83.7% NSL-KDD [50] Accuracy= 66.67% 300 SQL injection signatures and around 200 XSS signatures collected from different websites Moosa et al. Joshi et al. SQLiGOT Modsecurity with ML TbD-NNbR Neural network Feed Forward Naive Bayes Accuracy= 93.9% 178 Codes including 101 normal codes and 77 malicious codes SVM Accuracy= 96.23% 4610 injected sequences and 4884 genuine sequences Precision=97 % CSIC-2010 [56], DRUPAL [57], and PKDD2007 [58] Accuracy= 99.23% 1655 queries tested, 451 malicious queries and 1204 legal queries. K-NN (K=3), SVM, Random Forest Neural Network Based on the information presented in table 2, it can be seen that efforts to build a SQL injection detector with a machine learning approach have been able to provide very impressive accuracy results. The lowest limit of accuracy obtained is 66.67% while the highest accuracy is 99.23%. Even though we have obtained excellent results in building detectors using a machine learning approach, research to provide detectors is still being carried out using a deep learning approach. Several researchers have tried to map various kinds of SQL injection detection proposals from previous studies using neural networks. TABLE III. SUMMARY COMPARISON OF RESEARCH RESULTS FROM SQL INJECTION DETECTION USING DEEP LEARNING Research Sangeeta et al. [4] Abaimov et al. [5] Luo et al. [6] Description Proposed research to build a SQL injection detector using a residual network Proposing research to build an all injection detector both SQL and XSS using the CNN learning approach. Proposing research to build an injection Result 99.69% accuracy with 99.59% F1 Score for detection 99% accuracy with 99% and 93% precision and recall respectively. 99.50% accuracy with 99.49% F1 score. Result It can be seen from the data set, the highest accuracy that can be achieved is 99.69% with an F1 score of 99.59%. So based on the results of this research will be used as a limit to be achieved in the implementation of this research. III. DESIGN This section discusses the design of the proposed method of how the SQL injection detector works using the deep learning and transfer learning approach. A. Working Diagrams User Input Cookies SQL Generic Query Fig. 1. SQL Injection Detector Working Principle In Figure 1, a diagram of the working principle of the detector is shown. Strictly speaking, the SQL injection detector will act as a filter whose job is to filter and sort every query it receives to distinguish which queries are normal and which queries are loaded with attacks. In this case, the checks can come from user input as well as available cookies. Meanwhile, to be able to build detectors with a deep learning approach, the following is a method for building a SQL injection detector using a deep learning approach (transfer learning). Data Preparation •SQL Generic Queries •SQL Injection Queries Word Embedding Neural Network Evaluation Swivel Model Evaluation Fig. 2. SQL Injection Detector Build Method The development process is carried out by preparing the dataset used. Because the model will function as a filter, it is necessary to prepare data in a balanced manner between normal SQL queries and injection SQL queries. Then the dataset is obtained in such a way as the need so that it can be included in the model to be trained. The model used in this study is SWIVEL. SWIVEL is a model used in embedding text, especially the identification of an input string. The word embedding neural network used is a text embedding model from TensorFlow called gnews swivel 20 dim with oov[7]. After that, it is necessary to configure hyperparameters such as adding batch normalization or regularizers in the model to be trained. Then the study was carried out in epochs of 200 by saving only the best results. B. Dataset This project was inspired by Shreekan (GitHub: https://github.com/shreekanthsenthil/SQL-InjectionDetection#readme ) so that the source code and dataset were taken from his work. This study was adjusted by using a different version of the gnews swivel model and changing the final parameters of the learning process a little. Dataset consists of 4 different features such as 19 SQL normal queries, 10852 SQL Injection queries, a password that contains 1000 words, username that contains 486 words. IV. RESULT The results of the learning carried out, show that the accuracy that can be achieved by this model reaches 99.77% with a loss under 0.0218. By observing the learning graph in Figures 3 and 4, there is no visible overfitting that occurs, although by looking at the dataset statistics, it is clear that the data used in each feature is unbalanced. Although the results obtained in this implementation can exceed the state-of-the-art SQL injection detector from previous studies, it is necessary to note that statistically there should be overfitting due to the unbalanced dataset used. The main challenge in carrying out the lesson for the SQL injection detector is setting up the dataset. The variety of formats that are distributed (although still limited) and the different training data structures between one researcher and another, make the SQL injection detector development need to be further developed in the proper time frame. However, as a starting point for research on SQL injection detectors, the results obtained are sufficient to meet the requirements, where the final results can be used as initial expectations as a potential topic. V. CONCLUSION This research underlines the efforts to implement the SQL injection detector with a deep learning approach. The system implementation is carried out by learning using the SWIVEL architecture, which is one of the neural network models for text embedding released by TensorFlow. In general, implementing SQL injection detectors is not difficult and cheap, but it takes extra effort to find a dataset that suits your system development needs. However, by using the available dataset, expectations and a basic description of the performance of the resulting model can be obtained. So that for future works that can be used is to look for better and more datasets so that these datasets can be used as evaluators of the model being built. Due to the limitations of the dataset they have, it makes it difficult to evaluate the model that has been produced. REFERENCES [1] V. Dehalwar, A. Kalam, M. L. Kolhe, and A. Zayegh, “Review of web-based information security threats in smart grid,” 2017 7th Int. Conf. Power Syst. ICPS 2017, pp. 849–853, 2018, doi: 10.1109/ICPES.2017.8387407. [2] D. Chen, Q. Yan, C. Wu, and J. Zhao, “SQL Injection Attack Detection and Prevention Techniques Using Deep Learning,” J. Phys. Conf. Ser., vol. 1757, no. 1, 2021, doi: 10.1088/17426596/1757/1/012055. [3] R. A. Katole, “Detection of SQL Injection Attacks by Removing the Parameter Values of SQL Query Rajashree,” 2018 2nd Int. Conf. Inven. Syst. Control, no. Icisc, pp. 736–741, 2018. [4] Sangeeta, S. Nagasundari, and P. B. Honnavali, “SQL Injection Attack Detection using ResNet,” 2019 10th Int. Conf. Comput. Commun. Netw. Technol. ICCCNT 2019, no. July 2019, pp. 1–7, 2019, doi: 10.1109/ICCCNT45670.2019.8944874. [5] S. Abaimov and G. Bianchi, “CODDLE: CodeInjection Detection with Deep Learning,” IEEE Access, vol. 7, pp. 128617–128627, 2019, doi: 10.1109/ACCESS.2019.2939870. Fig. 3. Loss Graph Over Training Steps Fig. 4. Accuracy Graph Over Training Steps [6] A. Luo, W. Huang, and W. Fan, “A CNN-based Approach to the Detection of SQL Injection Attacks,” Proc. - 18th IEEE/ACIS Int. Conf. Comput. Inf. Sci. ICIS 2019, pp. 320–324, 2019, doi: 10.1109/ICIS46139.2019.8940196. [7] N. Shazeer, R. Doherty, C. Evans, and C. Waterson, “Swivel: Improving Embeddings by Noticing What’s Missing,” 2016, [Online]. Available: http://arxiv.org/abs/1602.02215.