Uploaded by abenezer ketema

reference18---2019

advertisement
SQL injection Detection Using Deep Learning
A Project Report
Nanang Cahyadi
23219324, Rekayasa dan Manajemen Keamanan Informasi
Sekolah Elektro dan Informatika Institut Teknologi Bandung
Bandung
en.nanangcahyadi@gmail.com
Abstract—SQL injection is a major threat in the field of web
application security which has become a concern in safe
operating procedures. Various solutions to solve this attack by
building a detection system and protecting aspects of the
vulnerability in the code have been carried out. Building a
system that can detect SQL injection is needed, especially during
the growing field of artificial intelligence. However, the problem
that occurs is the speed of implementation to overcome these
problems. Therefore, this technical report will discuss the
implementation and results obtained by the SQL injection
detector using artificial intelligence. The approach proposed in
this implementation is to generate detectors using transfer
learning techniques. The results obtained from this system
achieve a string detection accuracy that is rated as SQL
injection of 99.77%.
Keywords—Cybersecurity, SQL Injection, Transfer Learning
I. INTRODUCTION
With the increasing use of the web as a medium of
information, the potential for attacks on web applications will
increase. The potential for attack is also higher than the
potential for damage that can occur because security
technology is not standardized. The main problem in the
prevention of SQL injection is the limited information
technology and coding security. Therefore, several solutions
have been proposed to overcome SQL injection by utilizing
artificial intelligence.
OWASP[1] implements the top 10 security threats facing
web applications today every few years. In the OWASP TOP
10 of 2013 and 2017, the injection attack was ranked first,
which proved its importance and danger. SQL injection is an
exploit behavior that occurs at the application layer. This
refers to the behavior of an attacker who bypasses web
application access control to directly manipulate the database,
which threatens the confidentiality, integrity, and availability
of the application system.
Based on research conducted by Jemaal[2], the structure
of the SQL injection attack is obtained as follows.
TABLE I.
Parameter For Classification
Attack Types
Based on the exposure in table 1 it can be concluded that
SQL injection attacks can come from user input, cookies
which take advantage of the tautology attack type, incorrect
logic of the code entered. Therefore, intuitively the solution to
detect an SQL injection attack attempt is to inspect the input
code from the source of the attack, then it can be determined
whether the entered query is a normal SQL query or an attack
attempt.
II. RELATED WORK
Several researchers have analyzed the measures that can
be taken to prevent SQL injection. Katole[3] Demonstrated
that combining several methodologies can improve SQL
injection countermeasures. One of the methods is AMNESIA,
Firewall method, and string analysis by deleting parameter
values in SQL queries.
Several researchers have tried to map various kinds of
SQL injection detection proposals from previous studies. In
previous studies, many have tried to propose solutions by
building detectors with a machine learning approach. The
following is a comparison of each proposed solution using a
machine learning approach.
TABLE II.
SUMMARY COMPARISON OF RESEARCH RESULTS BY
JEMAAL[2]
Techniques
Classifier(s)
Performance
Dataset
HIPS
Bayesian Naif
Bayesian
Multinomial
Accuracy=
97.6%
SQL injections
data collection
framework
Backpropagation
Network
Overall
accuracy=
96.8%
Dataset 13,000
URL addresses
including 500
benign URLs
and 12,500
malicious URLs
CLASSIFICATION PARAMATER DISCOVERED BY JEMAAL[2]
Parameter For Classification
Attack Sources
Attack Goals
Categories
User input
Cookies
Server variables Second order
injection
Database fingerprinting
Analyzing schema Extracting
data Amending data Executing
dos Equivocating detection
Bypassing authentication
Remote control
Privilege intensification
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE
Categories
Tautology
Illegal/logically incorrect
queries
Union query
Piggyback query
Stored procedure
Inference Alternate encoding
SQLI-IDS
Sheykhkanloo
et al.
Neural NetworkBased Model
Accuracy=
95%
Dataset 25000
URL addresses
including 12250
benign URLs
and 12250
malicious URLs
Verbruggen et
al.
Decision Tree,
SVM, Random
Tree, Jribber,
Neural Network,
Random Forest
Recognition
Rate=0.986
6 different
datasets contain
1876 malicious
packets and
Techniques
Classifier(s)
Performance
Dataset
Research
Description
detector SQL using
the CNN
11444 normal
packets.
Wang et al.
Stacked
AutoEncoder
Precision
=100%
0.3
million TCP
internal network
Ingre e al.
Decision Tree
Accuracy=
83.7%
NSL-KDD [50]
Accuracy=
66.67%
300 SQL
injection
signatures and
around 200
XSS signatures
collected from
different
websites
Moosa et al.
Joshi et al.
SQLiGOT
Modsecurity
with ML
TbD-NNbR
Neural network
Feed Forward
Naive Bayes
Accuracy=
93.9%
178 Codes
including 101
normal codes
and 77
malicious codes
SVM
Accuracy=
96.23%
4610 injected
sequences and
4884 genuine
sequences
Precision=97
%
CSIC-2010
[56],
DRUPAL
[57], and
PKDD2007
[58]
Accuracy=
99.23%
1655 queries
tested, 451
malicious
queries and
1204 legal
queries.
K-NN (K=3),
SVM, Random
Forest
Neural Network
Based on the information presented in table 2, it can be
seen that efforts to build a SQL injection detector with a
machine learning approach have been able to provide very
impressive accuracy results. The lowest limit of accuracy
obtained is 66.67% while the highest accuracy is 99.23%.
Even though we have obtained excellent results in building
detectors using a machine learning approach, research to
provide detectors is still being carried out using a deep
learning approach. Several researchers have tried to map
various kinds of SQL injection detection proposals from
previous studies using neural networks.
TABLE III.
SUMMARY COMPARISON OF RESEARCH RESULTS FROM
SQL INJECTION DETECTION USING DEEP LEARNING
Research
Sangeeta et al. [4]
Abaimov et al. [5]
Luo et al. [6]
Description
Proposed research to
build
a
SQL
injection
detector
using a residual
network
Proposing research
to build an all
injection
detector
both SQL and XSS
using the CNN
learning approach.
Proposing research
to build an injection
Result
99.69% accuracy
with 99.59% F1
Score for detection
99% accuracy with
99% and 93%
precision and recall
respectively.
99.50% accuracy
with 99.49% F1
score.
Result
It can be seen from the data set, the highest accuracy that
can be achieved is 99.69% with an F1 score of 99.59%. So
based on the results of this research will be used as a limit to
be achieved in the implementation of this research.
III. DESIGN
This section discusses the design of the proposed method
of how the SQL injection detector works using the deep
learning and transfer learning approach.
A. Working Diagrams
User
Input
Cookies
SQL Generic Query
Fig. 1. SQL Injection Detector Working Principle
In Figure 1, a diagram of the working principle of the
detector is shown. Strictly speaking, the SQL injection
detector will act as a filter whose job is to filter and sort every
query it receives to distinguish which queries are normal and
which queries are loaded with attacks. In this case, the checks
can come from user input as well as available cookies.
Meanwhile, to be able to build detectors with a deep
learning approach, the following is a method for building a
SQL injection detector using a deep learning approach
(transfer learning).
Data Preparation
•SQL Generic Queries
•SQL Injection Queries
Word
Embedding
Neural Network
Evaluation
Swivel
Model
Evaluation
Fig. 2. SQL Injection Detector Build Method
The development process is carried out by preparing the
dataset used. Because the model will function as a filter, it is
necessary to prepare data in a balanced manner between
normal SQL queries and injection SQL queries. Then the
dataset is obtained in such a way as the need so that it can be
included in the model to be trained. The model used in this
study is SWIVEL. SWIVEL is a model used in embedding
text, especially the identification of an input string. The word
embedding neural network used is a text embedding model
from TensorFlow called gnews swivel 20 dim with oov[7].
After that, it is necessary to configure hyperparameters
such as adding batch normalization or regularizers in the
model to be trained. Then the study was carried out in epochs
of 200 by saving only the best results.
B. Dataset
This project was inspired by Shreekan (GitHub:
https://github.com/shreekanthsenthil/SQL-InjectionDetection#readme ) so that the source code and dataset were
taken from his work. This study was adjusted by using a
different version of the gnews swivel model and changing the
final parameters of the learning process a little. Dataset
consists of 4 different features such as 19 SQL normal queries,
10852 SQL Injection queries, a password that contains 1000
words, username that contains 486 words.
IV. RESULT
The results of the learning carried out, show that the
accuracy that can be achieved by this model reaches 99.77%
with a loss under 0.0218.
By observing the learning graph in Figures 3 and 4, there
is no visible overfitting that occurs, although by looking at the
dataset statistics, it is clear that the data used in each feature is
unbalanced. Although the results obtained in this
implementation can exceed the state-of-the-art SQL injection
detector from previous studies, it is necessary to note that
statistically there should be overfitting due to the unbalanced
dataset used.
The main challenge in carrying out the lesson for the SQL
injection detector is setting up the dataset. The variety of
formats that are distributed (although still limited) and the
different training data structures between one researcher and
another, make the SQL injection detector development need
to be further developed in the proper time frame. However, as
a starting point for research on SQL injection detectors, the
results obtained are sufficient to meet the requirements, where
the final results can be used as initial expectations as a
potential topic.
V. CONCLUSION
This research underlines the efforts to implement the SQL
injection detector with a deep learning approach. The system
implementation is carried out by learning using the SWIVEL
architecture, which is one of the neural network models for
text embedding released by TensorFlow.
In general, implementing SQL injection detectors is not
difficult and cheap, but it takes extra effort to find a dataset
that suits your system development needs. However, by using
the available dataset, expectations and a basic description of
the performance of the resulting model can be obtained. So
that for future works that can be used is to look for better and
more datasets so that these datasets can be used as evaluators
of the model being built. Due to the limitations of the dataset
they have, it makes it difficult to evaluate the model that has
been produced.
REFERENCES
[1]
V. Dehalwar, A. Kalam, M. L. Kolhe, and A.
Zayegh, “Review of web-based information security
threats in smart grid,” 2017 7th Int. Conf. Power
Syst. ICPS 2017, pp. 849–853, 2018, doi:
10.1109/ICPES.2017.8387407.
[2]
D. Chen, Q. Yan, C. Wu, and J. Zhao, “SQL
Injection Attack Detection and Prevention
Techniques Using Deep Learning,” J. Phys. Conf.
Ser., vol. 1757, no. 1, 2021, doi: 10.1088/17426596/1757/1/012055.
[3]
R. A. Katole, “Detection of SQL Injection Attacks
by Removing the Parameter Values of SQL Query
Rajashree,” 2018 2nd Int. Conf. Inven. Syst.
Control, no. Icisc, pp. 736–741, 2018.
[4]
Sangeeta, S. Nagasundari, and P. B. Honnavali,
“SQL Injection Attack Detection using ResNet,”
2019 10th Int. Conf. Comput. Commun. Netw.
Technol. ICCCNT 2019, no. July 2019, pp. 1–7,
2019, doi: 10.1109/ICCCNT45670.2019.8944874.
[5]
S. Abaimov and G. Bianchi, “CODDLE: CodeInjection Detection with Deep Learning,” IEEE
Access, vol. 7, pp. 128617–128627, 2019, doi:
10.1109/ACCESS.2019.2939870.
Fig. 3. Loss Graph Over Training Steps
Fig. 4. Accuracy Graph Over Training Steps
[6]
A. Luo, W. Huang, and W. Fan, “A CNN-based
Approach to the Detection of SQL Injection
Attacks,” Proc. - 18th IEEE/ACIS Int. Conf.
Comput. Inf. Sci. ICIS 2019, pp. 320–324, 2019,
doi: 10.1109/ICIS46139.2019.8940196.
[7]
N. Shazeer, R. Doherty, C. Evans, and C. Waterson,
“Swivel: Improving Embeddings by Noticing
What’s Missing,” 2016, [Online]. Available:
http://arxiv.org/abs/1602.02215.
Download