A project report submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Computer Science (Information Security)
Centre for Advanced Software Engineering (CASE)
Faculty of Computer Science and Information Systems
Universiti Teknologi Malaysia
APRIL 2009
To my beloved mother and father
First and foremost I offer my sincerest appreciation to my supervisor, Assoc.
Prof. Dr. Zailani Mohamed Sidek, who has supported me throughout my project with
his patience and knowledge. I attribute the level of my Masters degree to his
encouragement and effort and without him this thesis, too, would not have been
completed or written. One simply could not wish for a better or friendlier supervisor.
I also gratefully acknowledge all my colleagues for their advice, supervision, and
crucial contribution through my project.
In today's business world, information is the most valuable asset of
organizations and thus requires appropriate management and protection. Amongst
all types of data repositories, database is said to play the role of the heart in the body
of IT infrastructure. On the other hand, nowadays, a growing number of efforts have
concentrated on handling the vast variety of security attacks. The characteristic of
such handling method depends on when we want it to be occurred and how we intent
to deal with attack attempts. Generally there are two ways to handle subversion
attempts. One way is to equip our systems by security controls. However in reality
this is not feasible due to many reasons. Hence, we are interested in detecting the
security attacks.
Amongst different types of intrusion detection systems (like
network-based, host-based and application-based IDS), database intrusion detection
systems which are considered as a type of application-based IDS has become a
matter of increasing concern.
In this paper we proposed the architecture for a hybrid database intrusion
detection system (DB-IDS). This architecture consists of several component and
sub-components. It encompasses Anomaly Detection and Misuse Detection subcomponents as Detector component. Anomaly detection component works based on
the Profiles constructed by Profiler.
Suspicious sequence of events which are
considered as potential attacks would be detected by Misuse Detector. Data Collector
components is responsible for capturing necessary data for profiling. Moreover, the
Transformer component is in place to convert the raw log files into an
understandable format for Profiler. Finally, Anomaly Detector and Misuse Detector
components send alert to Responder component in case of detection any suspicious
Dalam urusan seharian kini, maklumat merupakan harta yang paling penting
bagi sesebuah organisasi. Oleh itu ia memerlukan pergurusan dan perlindungan yang
cekap. Diantara jenis-jenis penyimpanan data, pengkalan data merupakan bahagian
utama bagi sesebuah rangka infrastruktur untuk teknologi maklumat. Oleh yang
demikian, berbagai tindakan telah dilaksanakan bagi membendung masalah serangan
keselamatan yang berleluasa. Cara pengurusan masalah ini bergantung kepada
bagaimana kita mahu ia dilakukan dan bagaimana kita akan berurusan dengan
percubaan serangan keselamatan tersebut. Secara umumnya terdapat dua kaedah
bagi menguruskan masalah percubaan ‘subversion’. Salah satu cara adalah dengan
melengkapkan peralatan system komputer yang sediaada dengan system kawalan
Walaubagaimanapun, kaedah ini sukar dilaksanakan dewasa ini
disebabkan oleh masalah kewangan dan masalah kekurangan sumber manusia. Oleh
itu, mengesan masalah keselamatan merupakan tujuan
utama kajiselidik ini.
Diantara cara mengesan masalah keselamatan yang dikenalpasti adalah Sistem
Pengesanan Pencerobohan (IDS) seperti Sistem Pengesanan Pencerobohan bagi
dasar-rangkaian, dasar-kerangka utama, dasar-aplikasi, dan Sistem Pengesanan
pencerobohan pengkalan data.
Perkara ini semakin hangat dibincangkan bagi
meningkatkan lagi mutu kawalan keselamatan bagi
sesebuah sistem komputer.
Dalam kajian ini, kami mencadangkan sistem arkitektur bagi Sistem Pengesanan
pencerobohan pengkalan data ‘hybrid’ (DB-IDS). Arkitetur ini mengandungi
beberapa komponen dan cabang komponen. Ia termasuk Pengesanan ’Anomaly’ dan
Pengesanaan Penyalahgunaan yang dikenali sebagai komponen pengesanan.
Pengesanan ‘anomaly’ melaksanakan tugasnya berdasakan profil yang dijana oleh
Aktiviti-aktiviti yang mencurigakan di kenali sebagai kemungkinan
serangan akan dikesan oleh Pengesan Penyalahgunaan. Komponen pengumpulan
Data berfungsi bagi mengenalpasti data untuk ‘profiling’. Komponen ’transformer’
pula bersedia untuk menukar fail asal ‘log’ kepada format yang boleh dibaca oleh
‘profiler’. Komponen Pengesan ‘Anomaly’ dan Pengesan Penyalahgunaan akan
menghantar maklumat pengawasan kepada komponen penerima (Responder) bila
terdapat aktiviti-aktiviti yang mencurigakan.
1.1 Overview
1.2 Background
1.3 Problem Statement
1.4 Project Aim
1.5 Project Objectives
1.6 Project Scope
1.7 Summary
2.1 Introduction
2.2 Intrusion Detection History and Definitions
2.3 Taxonomy of IDS
2.4 IDS Classifications
2.4.1 Taxonomy of Intrusion Detection Principals Anomaly Detection
17 Self-Learning Systems
18 Programmed
19 Misuse Detection Programmed
22 Compound Detectors
2.4.2 Taxonomy of System Characteristics
24 Time of Detection
24 Granularity of Data-Processing
24 Source of Audit Data
24 Response to Detected Intrusions
27 Locus of Data-Processing
28 Locus of Data-Collection
28 Security
28 Degree of Inter-Operability
2.5 Intrusion Detection Systems using Data Mining
2.5.1 Applicable Data Mining Algorithms to Intrusion Detection
2.6 Database Intrusion Detection Systems
2.6.1 Database Intrusion Detection Using Data Mining
2.6.2 Database Anomaly Detection Systems
39 Learning-Based Anomaly Detection
2.6.3 Hybrid Methods
2.7 Database Intrusion Prevention Systems
2.8 Summary
3.1 Introduction
3.2 Project Methodology
3.2.1 Analysis
3.2.2 Design
3.2.3 Prototype Development
3.2.4 Prototype Implementation and Testing
3.3 Summary
4.1 Introduction
4.2 Approaches of Database Intrusion Detection
4.3 The Project Approach
4.4 The Proposed DB-IDS Architecture
4.4.1 Data Collector
4.4.2 Transformer
4.4.3 Profiler
4.4.4 Detector
62 Anomaly Detector
64 Misuse Detector
4.4.5 Responder
4.5 Overall Design
4.6 Summary
5.1 Introduction
5.2 Data Collector
5.2.1 Tracer and Audit Trace
5.2.2 Transformer
5.3 Auditor
5.4 Profiler and Profiles
5.4.1 Subject Profiles
5.4.2 Object Profiles
5.5 Detector
5.5.1 Anomaly Detector
5.5.2 Misuse Detector
5.6 Intrusion Detection Cycle
5.7 Summary
6.1 Introduction
6.2 Discussion
6.3 Future Works and Recommendations
6.4 Conclusion
Profiles Specification
Revised IDS taxonomy by Debar et al. (2000) ..................................14
Anti-Intrusion Techniques ................................................................15
IDS Taxonomy provided by Stefan Axelsson [20] ............................16
Data Mining Phases ..........................................................................29
Project Methodology ........................................................................45
Data Collector Sub-Components .......................................................55
Data Collector...................................................................................56
Transformer Component ...................................................................57
Profiler Component ..........................................................................59
Detector Sub-Components ................................................................63
Anomaly Detector Component..........................................................64
Misuse Detector Component .............................................................68
Responder Component ......................................................................70
Architecture of the DB-IDS ..............................................................75
A portion of LogRepositorySession Table .............................78
Audit Trace Sample ..........................................................................79
A Sample Table (MovieClick.dbo.Movies) .......................................80
A sample Audit Table (Audit_MovieClick_Movies) .........................80
A Cut of AuditUni Table ..............................................................80
General Log Profile ..........................................................................82
NumberOfLoginsPerDay Table ........................................................82
TtlLgnTm Table ...............................................................................83
NumberOfLoginsPerDayForDBRoles Table .....................................83
NumberOfLoginsPerDayForServerRoles Table ................................84
UserStmntCounter View ...................................................................85
UserStmntCounterDBLevel View .....................................................85
UserStmntCounter View ...................................................................86
UserDDLCounter View ....................................................................86
DBStmntCounter View .....................................................................87
TableStmntCounter View .................................................................87
A Sample DMLSeq Table (DMLSeq_test_test2)...............................88
An Alert Generated by FindSusLogin ...............................................91
UserWorkingHour Table...................................................................91
An Alert Generated by ErlrLgnTimeDtctr .........................................92
MaxNoLogins Table .........................................................................92
An Alert Generated by ExceededNumOfLgns...................................93
An Alert Generated by FndPssvLgns ................................................93
An Alert Generated by LgnMorThnOne............................................94
An Alert Generated by BrtFrcDtctr ...................................................94
Intrusion Detection Cycle .................................................................96
Artificial Intelligence
Artificial Immune System
Database Administration
Database Intrusion Detection System
Database System
Data Definition Language
Data Manipulation Language
Denial of Service
Intrusion Detection
Intrusion Detection System
Internet Protocol
Information Technology
Role Based Access Control
Rational Database Management System
System Admin
Structured Query Language
List of important events to be captured by Tracer sub-Component
Nowadays, a growing number of efforts have concentrated on handling the
vast variety of security attacks. The characteristic of such handling method depends
on when we want it to be occurred and how we intent to deal with attack attempts.
According to [1], generally there are two ways to handle subversion attempts. One
way is to equip our systems by all security controls such as cryptographic methods,
sophisticated access control mechanisms, rigorous authentication protocols and etc.
to prevent the subversion itself. However in reality this is not feasible due to many
reasons, for example, (a) flaws of cryptographic techniques, (b) trade-off between
efficiency and the level of access control and (c) insiders who abuse their privileges.
Doubtlessly, it is very important that the security mechanism of a system is designed
so as to prevent unauthorized access to system resources and data. However, as it
was mentioned before completely preventing breaches of security appear, at present,
unrealistic. We can, however, try to detect these intrusion attempts so that action
may be taken to repair the damage later [1] and this is what the Intrusion Detection
refers to.
Generally there are two types of intrusion detection techniques[1]. One is
named Anomaly Detection technique in which a profile is established for the system
and any activity that cause a deviation from the normal activity profile would be
flagged as an intrusion. This method may rely on statistical approaches or predictive
pattern generation. Another technique which is called Misuse Detection mostly is
based on signature or patterns of attacks.
In both techniques, however, the Artificial Intelligence [2] and Data Mining
[3] [4] [5] applications may be employed to reduce the human effort and to increase
the accuracy of the detection. In recent years, Data Mining-based intrusion detection
systems (IDSs) have demonstrated high accuracy, good generalization to novel types
of intrusion, and robust behavior in a changing environment [6].
Although a variety of approaches have been proposed for enhancing the
capabilities of intrusion detection as well as the efficiency and accuracy, most of
these efforts concentrated on detecting intrusions at the network or operating system
level (refers to Network-based and Host-based intrusion detection system
respectively). They are not capable of detecting malicious data corruptions, that is,
what particular data in the database are manipulated by which specific malicious
database transaction(s) [7]. So, this opens the issue of detecting the intrusions at the
database level.
In today's business world, information is the most valuable asset of
organizations and thus requires appropriate management and protection [8].
Amongst all types of data repositories, database is said that play the role of the heart
in the body of IT infrastructure. They not only allow the efficient management and
retrieval of huge amounts of data, but also because they provide mechanisms that can
be employed to ensure the integrity of the stored data [8].
Thus, obviously databases always have been the interesting target of attacks
for hackers. Getting access to a database containing hundreds thousands of credit
card numbers is almost every hacker’s dream. This is what indicates the violation of
confidentiality; however, an intrusion can be defined as “any set of actions that
attempt to compromise the integrity, confidentiality or availability of a resource” [9].
So, enough and balanced care should be taken to protect the whole of this triad.
Attack reports have been released [10] in which the intruder had updated the field of
the price in an on-line store website and decreased the values of specific items, and
then bought those items for just few dollars (integrity violations).
According to [11], The Privacy Rights Clearinghouse reports that during the
period from January 2005 to May 2007, more than 154 million records containing
sensitive information, including credit card numbers, Social Security numbers, bank
account numbers, and drivers license numbers, were stolen from United States
organizations. The actual total could be much higher. This number only represents
reported breaches and in many cases, the total records compromised remain
undetermined. Approximately one third of the reported breaches were the result of a
direct attack on the database.
Hence, today, the critical need for securing the databases has become much
more inevitable than any time before. Database Security is as old as the emergence
of the own database concept and encompasses a broad range of procedures that
protect a database from unintended activity. One of the most important techniques
for securing the database is applying the Intrusion Detection System which is used to
detect potential violations in database security.
Anderson [12] has classified intruders into two types, the external intruders
who are unauthorized users of the machines they attack, and internal intruders, who
have permission to access the system, but not some portions of it. He further divided
internal intruders into intruders who masquerade as another user, those with
legitimate access to sensitive data, and the most dangerous type, the clandestine
intruders who have the power to turn off audit control for them.
Despite the
necessity of protecting information stored in database systems (DBS), existing
security models are insufficient to prevent misuse, especially insider abuse by
legitimate users [8]. The external intrusions are supposed to be detected and handled
by network-based IDSs. However, when it comes to the database level, an intruder
cannot do anything otherwise he gets access to a valid credential and login to the
system. Means, the transactions must be issued by a valid database user, who has
logged in using a valid database login, no matter the login information is provided by
a legitimate user or not. Here, when we use the term of the “valid database user”, it
doesn’t indicate that this database user is necessarily associated with a legitimate
actual user. For example by using “social engineering” techniques, the intruder can
get access to some valid database user information.
Forrester Research estimates that nearly 80 percent of all database attacks are
internal and Gartner estimates that more than 95 percent of intrusions that result in
significant financial loss are perpetrated by insiders [11]. If one intruder gets access
to account information of a legitimate database user, (s)he may cause damage to the
database by executing transaction(s) that illegitimately manipulate the sensitive data.
In such case, the external intruder becomes an internal one who masquerades as
another user. In this scenario, identifying whether the data corruption indeed has
been done by legitimate user or by one who has got access to a legitimate user’s
account information is tough.
Risk from insiders comes in many forms and as attackers recognize the value
and importance of the information in the database, attacks are becoming more
focused. Attackers have also changed. In the past people hacked into networks to
“prove they could.” While those attacks were malicious, recently the motivation has
become financial. Attackers are seeking data to sell and that information resides in
the database [11].
Another common type of intrusions forensically analyzed in [10] is that one
the intruder logins to the database system as a high-privileged user (by brute force,
for example) and then creates an account for himself and starts to manipulate the
database by logging in as new-created account.
On the other hand, the illegal transactions executed by legitimate database
users who are not authorized to perform certain activities—and for any reason, the
logical permissions of these activities have been not denied for those users—seems
to be more difficult to be detected. Carter and Katz [13] have revealed that in
computer systems the primary security threat comes from insider abuse rather than
from intrusion. This observation results in the fact that much more emphasis has to
be placed on internal control mechanisms of systems. Furthermore, policies usually
do not sufficiently guard data stored in a database system against privileged users [8]
like sa and members of sysadmin fixed server role in MS SQL Server 2005 for
Problem Statement
As it was mentioned before, a great number of database attacks come from
inside the organization, either by privileged users, authorized users or unauthorized
users who hack into the system by gaining access to legitimate accounts. In either
case, we intend to be able to detect the attacks conducted by each of these intruder
categories. This question may be raised:
How can we enable database management systems to monitor, detect,
mitigate or/and prevent the attacks using some tools like DB-IDS and/or its built-in
Moreover, security policies often fail to prevent the database attacks. There
have been many scenarios in which authorized users are inadvertently granted access
to run certain operations. Initial database security configuration often fails to comply
with security policies of organizations. Users usually hold privileges which are not
supposed to be granted to them. We can assume that in a realistic environment none
of those users ever exceed their rights and use those privileges. But what if once a
hacker steals those accounts and enters to the system? In that case we cannot
guarantee that hacker adhere to the ethics. Now the additional question is:
How the database intrusion detection system may aid to revise the security
Project Aim
The aim of this project is to design an architecture for a hybrid intrusion
detection system for database. This architecture is containing different components
and sub-components which interact to each other. The system is called hybrid DBIDS since it encompasses anomaly detector and misuse detector modules.
proposed architecture could be adapted for any DBMS with consideration its features
and capabilities. Thought, we develop a model based on this architecture for MS
SQL Server 2005 to show how it works. Moreover, leveraging the information
provided by our DB-IDS, database security policies could be revised to strengthen
the system.
Project Objectives
As mentioned before, our hybrid DB-IDS consists of anomaly and misuse
detectors. In order to detect anomalies, a normal activity profile is created for
specific database objects. These objects may include but not limited to principals
and securables1. Any considerable deviation from captured normal behavior of the
system may be thought as an intrusion. Our model is designed in a way that enables
us to apply different methods range from statistical measures to artificial intelligence
techniques to build system profiles.
- The SQL Server 2005 security model relies on two fairly straightforward concepts: principals and
securables. Principals are those objects that may be granted permission to access particular database
objects, while securables are those objects to which access can be controlled [14].
To capture the behavior of database objects, we need to monitor and audit the
system operation.
This auditing system helps us to collect necessary data for
building database profiles. To be more accurate, whatever technique the profiler
utilizes to build the profiles, data gathered by auditing system provides necessary
input for it.
A security alert is raised in case of any anomaly and misuse detection.
Depending on the suspicious level or sensitivity of intrusion, detection mechanism
can contribute to Access Control system to deny access and prevent the intruder from
causing further corruption. However, although such capability is in place, the system
is not supposed to entirely function as an intrusion prevention system.
Another objective of this project is to help to revise database security policies
and configurations by providing daily bases reports. Based on the facts these reports
provide, database use policies can be changed, modified or even removed.
Furthermore, the reports may help us to create new database security policies and/or
re-configure the database security schema.
In the following we accordingly list the project objectives with the aim of
designing an architecture for DB-IDS:
Proposing a database anomaly detection model
Constructing sample profiles for a database system
Developing an database audit system
Proposing a brute-force detection model for the database systems
Proposing a model for database security policy revision
Project Scope
As we said before, intrusion detection systems do not usually take preventive
measures when an attack is detected; it is a reactive rather than pro-active agent. It
plays the role of an informant rather than a police officer [1]. So, the proposed
architecture only takes into account the detection of an intrusion. Though, the model
is capable to contribute with database access control system to stop the intruder from
making further damages to the system.
The architecture design of DB-IDS would be developed implemented and
tested for Microsoft SQL Server 2005 using Structured Query Language (SQL). All
components of the proposed architecture for DB-IDS are built using built-in means
including stored procedures, views and triggers and so on.
The range of the intrusions detectable by this model is limited to those which
have been conducted by either the external intruders who gets access to legitimate
database account information, or those insiders who abuse their privileges. External
intrusions and attacks like SQL injection will not be covered in this project.
In the last chapter of this project report, we go through the recommendations
and future works to enhance the accuracy, efficiency and scalability of proposed DBIDS. As the matter of fact, what is already beyond the scope of this project could be
considered as future works.
Nowadays, intrusion detection plays a vital role in security mechanisms.
Organizations need to be able to detect the intrusions into their database systems as
soon as they can to prevent further damages to their sensitive data which may cause
financial loss. The critical necessity of having intrusion detection system in place is
highly arisen when we hear that in many real life scenarios the intrusion remains
undetected for hours and even days. These concealed attacks are thought to be the
most horrible DBA’s nightmare which makes the recovery procedure too difficult
and time-consuming and even in some cases infeasible. On the other hand, from
forensic point of view, intactness of evidences is the key point in database intrusion
investigation, while the delay in detection of intrusion may lead to corruption of
digital evidences on which the guilt of intruder is based.
In this project we intend to come up with an architecture model for database
intrusion detection which tries to detect the suspicious transactions by comparing the
normal activity of the system to the current transaction.
As we saw in the
introduction section, such systems are named Anomaly Detection [1]. However,
there are challenges in the design of these systems such as selection of threshold
levels so that minimize the false negatives and false positives and selection of
features to monitor.
Furthermore, Anomaly detection systems are also
computationally expensive because of the overhead of keeping track of, and possibly
updating several system profile metrics [1]. The proposed architecture is intended to
be designed in such a way that address these challenges and balance the efficiency
and overhead.
The other mechanism of detecting database intrusions in this project relays on
the Misuse Detection concept in which we intend to identify the meaningful
sequence of events that turns to be a kind of database misuse. Misuse detection
works based on database attack patterns. The model enables us to monitor all
apparently irrelevant small events that if occur in a specific order, we may believe
the database is the target of an intrusion.
Nowadays, computer attacks are unglamorous.
By connecting our
organization’s computers or local network to the Internet, the risk of having someone
break in would be increased, installation of malicious programs and tools would be
most likely occurred, and possibly the systems would be used to attack other
machines on the Internet by remotely controlling them. The annual FBI/CSI survey
shows that even though virus based attacks are most frequent, attacks based on
unauthorized access, as well as Denial of Service attacks both from internal as well
as external sources, are increasing drastically.
The more sensitive information we have, the more probability of being the
target of security threats. Several major banks have been subject to attacks, in which
attackers gained access into customers' accounts and viewed detailed information
about the activities on these accounts. In some instances the attackers stole credit
card information to blackmail e-commerce companies by threatening to sell this
information to unauthorized entities [14].
In order to combat this growing trend of computer attacks and respond to this
increasing threat, both academic and industry groups have been developing systems
to monitor networks and systems and raise alerts of suspicious activities. These
systems are called Intrusion Detection Systems (IDS).
Intrusion Detection History and Definitions
An intrusion can be defined as any set of actions that attempt to compromise
the integrity, confidentiality or availability of a resource [9]. Intrusion Detection is
the process of tracking important events occurring in a computer system and
analyzing them for possible presence of intrusions [15]. As a more comprehensive
definition for intrusion detection, Alessandri has defined it as the set of practices and
mechanisms used towards detecting errors1 that may lead to security failures2 and
security failures (including anomaly and misuse detection) and diagnosing intrusions
and attacks [16].
Accordingly, an intrusion detection system (IDS) is an implementation of the
practices and mechanisms of intrusion detection [16]. It also can be thought as a
software tool that attempts to detect an intruder hacking into a system or a genuine
user exploiting the resources of the system. In other word, an IDS is a piece of
software that runs on a host, which monitors the activities of users and programs on
the host and monitors the network traffic on networks to which the host is attached
This method should be contrasted with those that aim to strengthen the
perimeter surrounding the computer system. It is believed that both of these methods
should be used, along with others, to increase the chances of mounting a successful
defense, relying on the age-old principle of defense in depth [17]. The goal of IDS is
to alert the system's administrator of any suspicious and possibly intrusive event and
possibly taking action to circumvent the intrusion. These actions can be as simple as
writing the activities to a log file or as complex as controlling the system's and
network's resources automatically by closing network ports or killing suspicious
processes [14].
- That part of the system state that is liable to lead to failure
- When the delivered service deviates from fulfilling the intended function; or, violation of a security
property of the intended security policy.
Originally, intrusion detection used to be manually performed by system
administrators. They monitored user activities via a central console and watched for
any abnormal occurrence and anomaly. They might detect intrusions by noticing, for
example, that a vacationing user is logged in locally or that a seldom-used printer is
unusually active. Obviously the scalability of such early form of intrusion detection
was dramatically poor. It was clearly tedious and error-prone. Therefore, it soon
became necessary to develop automated log file readers, searching for logged events
indicating irregularities or even an intrusion by unauthorized personnel [18]. The
next step in intrusion detection mechanism was examination of audit logs. In the late
‘70s and early ’80s, administrators typically printed audit logs on fan-folded paper,
which were often stacked four- to five-feet high by the end of an average week [19].
It is necessary to point out that this early ID Software (not Systems) was mostly
individually developed, programmed and not widely spread, as only very few
organisations were in need for this kind of technology before the dawn of the Internet
age [20]. Manually analysis of such a huge amount of papers of audit logs was
obviously time-consuming. Moreover, it could be only used as a forensic mean for
gathering the evidences related to a security incident after the fact, and not for
detecting an attack in progress.
As keeping the audit logs on digital storage became possible, developers
came up with automated data analysis programs to make the life easier for
administrators. However, analysis was slow and often computationally intensive
and, therefore, intrusion detection programs were usually run at night when the
system’s user load was low. Therefore, most intrusions were still detected after they
occurred. Until this point, intrusion detection had been a post factum analysis of
digital log files, allowing forensic analysis relatively long after the actual event with
possible adjustments to the infrastructure [20].
In the early ’90s, researchers
developed real-time intrusion detection systems that reviewed audit data as it was
This enabled the detection of attacks and attempted attacks as they
occurred, which in turn allowed for real-time response, and, in some cases, attack
preemption [19].
Due to market demand, the IT security industry now started to develop
former prototype software into actual Intrusion Detection Systems, consisting of user
friendly interfaces, methods to update attack patterns, various methods of alerts and
even some automatically triggered reactions or actual prevention methods, able to
stop attacks in progress [20]. By consideration of increasing security concerns,
countless new attack techniques and sophisticated attack tools, however, it doesn’t
seem that it is an easy job.
Taxonomy of IDS
Generally taxonomy serves several purposes. It helps us to describe the
world around us and assist to put the complex phenomena into a more manageable
fashion (Description). Moreover, by classifying a number of objects according to the
taxonomy and then observing the ‘holes’ where objects may be missing, we can
exploit the predictive qualities of a good taxonomy (Prediction). And finally, a good
taxonomy will provide us with clues about how to explain observed phenomena
Taxonomies of IDSs and ID related technologies backs to 1999.
taxonomy proposed by Debar et al. [21] seems to be the first real IDS taxonomies.
Other taxonomies have since been published, such as the one proposed by Axelsson
[17] and then, one proposed by Halme and Bauer [22]. Figure 2-1 illustrates the
taxonomy proposed by Debar et al. Actually this is a revised version of a previously
proposed one to which some other criteria for classification have been added. The
necessity of studying the taxonomy for intrusion detection appears when we try to
examine the different type of IDSs and various mechanisms which have been applied
in order to detect an intrusion.
Figure 2.1
Revised IDS taxonomy by Debar et al. (2000)
Taxonomy proposed by Halme and Bauer [22] which is named “A Taxonomy
of Anti-Intrusion Techniques”, as its name indicates, have focused on classifications
of different methods for combating the intrusive activity, rather than dedicatedly
dealing with IDSs. They introduce six anti-intrusion approaches, including [22]:
Prevention precludes or severely handicaps the likelihood of a particular
intrusion’s success.
Preemption strikes offensively against likely threat agents prior to an
intrusion attempt to lessen the likelihood of a particular intrusion occurring later.
Deterrence deters the initiation or continuation of an intrusion attempt by
increasing the necessary effort for an attack to succeed, increasing the risk associated
with the attack, and/or devaluing the perceived gain that would come with success.
Deflection leads an intruder to believe that he has succeeded in an intrusion
attempt, whereas instead he has been attracted or shunted off to where harm is
Detection discriminates intrusion attempts and intrusion preparation from
normal activity and alerts the authorities.
Countermeasures actively and autonomously counter an intrusion as it is
being attempted.
A graphical illustration of this taxonomy is shown in the figure 2-2.
Figure 2.2
Anti-Intrusion Techniques
Axelsson’s taxonomy [17] explicitly deals with intrusion detection systems.
It consists of a classification first of the detection principle, and second of certain
operational aspects of the intrusion detection system as such. This study tries to
examine the developed IDSs and in-progress research in this field into a structural
categorization which helps to comprehensively know the field.
The survey is accomplished by first identifying the different type of intrusion
sources (an action or activity that is generated by the intruder). From the nature of
source they move to the question of how to observe this source, and what problems
are likely to be raised in doing so, and ending with the ultimate result, the decision.
The main problem they have been faced is that most references do not
describe explicitly the decision rules employed, but rather the framework in which
such rules could be set. Thus they often have to stop the categorization when they
reach the level of the framework, for example the expert system.
In the following, we use the IDS taxonomy proposed by Axelsson to
categorize the different types of intrusion detection system.
IDS Classifications
As we said before, the classification of intrusion detection systems may be
accomplished by looking at the system from different points of view. In chapter
2.4.1 we show the taxonomy for intrusion detection systems which is built using the
principals of IDS in which the categorization is based on the detection method
include Anomaly detection, Signature detection and compound detection. Then, in
chapter 2.4.2 we introduce the other types of IDS classification which are based on
system characteristics, including the criteria such as time of detection, source of audit
data, response to detection intrusion and etc.
Figure 2.3
IDS Taxonomy provided by Stefan Axelsson [20]
2.4.1 Taxonomy of Intrusion Detection Principals
Intrusion detection systems determine if a set of actions constitute intrusions
on the basis of one or more models of intrusion. A model classifies a sequence of
states or actions as "good" (no intrusion) or "bad" (possible intrusions) [23]. By
consideration of detection methods, intrusion detection techniques can be categorized
into two main types, namely, Anomaly Detection System which sometimes is called
behavior-based intrusion detection system, and Misuse Detection System which is
based on signatures or patterns and basically is knowledge-based. Anomaly Detection
The anomaly detection systems, like what is presented in [24], bases its
decision on the profile of a system’s or user’s normal behavior. So, the construction
of such a detector starts by forming an opinion on what constitutes normal for the
observed subject (which can be users, groups of users, applications, or system
resource usage), and then deciding on what percentage of the activity to flag as
abnormal, and how to make this particular decision [17]. Anomaly detection system
flag observed activities that deviate significantly from the established normal usage
profiles as anomalies, i.e., possible intrusions. For example, the normal profile of a
user may contain the averaged frequencies of some system commands used in his or
her login sessions. If for a session that is being monitored, the frequencies are
significantly lower or higher, then an anomaly alarm will be raised. This type of
system is well suited for the detection of previously unknown attacks [23]. The main
advantage of anomaly detection is that it does not require prior knowledge of
intrusion and can thus detect new intrusions. The main drawback is that it may not
be able to describe what the attack is and may have false positive rate [3]. In other
word, the generated alarms by the system are meaningless because generally they
cannot provide any diagnostic information (fault-diagnosis) such as the type of attack
that was encountered. Means, they can only signal that something unusual happened
However, one of the benefits of this type of IDSs is that they are capable of
producing information that can in turn be used to define signatures for misuse
detectors [15]. In the following section we show the different classification of
anomaly detection systems cited in [17]. Self-Learning Systems
As the name indicates, no information about the attacks is feed into the
system. Self-Learning systems learn for example what constitutes normal for the
installation; typically by observing traffic for an extended period of time and
building some model of the underlying process [17].
Non-Time series
A collective term for detectors that model the normal behavior of the system
by the use of a stochastic model that does not take time series behavior into account
[17]. This type of self-learning anomaly detection system may be based on Rule
modeling or Descriptive statistics. In the rule modeling approach, the system itself
studies the traffic and formulates a number of rules that describe the normal
operation of the system. In the detection stage, the system applies the rules and
raises the alarm if the observed traffic forms a poor match (in a weighted sense) with
the rule base. But in descriptive statistic approach, the system collects simple,
descriptive, mono-modal statistics from certain system parameters into a profile, and
constructs a distance vector for the observed traffic and the profile. If the distance is
great enough the system raises the alarm [17].
Time series
This model is more complex due to taking time series behavior into account.
Examples of such techniques include a Hidden Markov Model (HMM), an Artificial
Neural Network (ANN).
19 Programmed
In this class, someone teaches the system-programs ito detect certain
anomalous events. Thus, this is the user of the system who forms the normal
behavior profiles of the system and decides what is considered abnormal enough for
the system to signal a security breach.
Descriptive Statistics
These systems build a profile of normal statistical behavior by the parameters
of the system by collecting descriptive statistics on a number of parameters. Such
parameters can be the number of unsuccessful logins, the number of network
connections, the number of commands with error returns, etc [17].
In Simple
statistics model, the collected statistics were used by higher level components to
make a more abstract intrusion detection decision. In Simple rule-based the user
provides the system with simple but still compound rules to apply to the collected
statistics. Threshold approach is arguably the simplest example of the programmeddescriptive statistics detector. When the system has collected the necessary statistics,
the user can program predefined thresholds (perhaps in the form of simple ranges)
that define whether to raise the alarm or not. An example is:
“Alarm if number of unsuccessful login attempts > 3”
Default deny
The main idea in this class is to explicitly state the status under which the
system operates in a safe and security-benign manner, and flag all deviations from
this status as intrusion. This approach intuitively may correspond with a default
deny security policy.
In State series modeling the policy for security benign
operation is encoded as a set of states. As in any state machine, once it has matched
one state, the intrusion detection system engine waits for the next transition to occur.
If the monitored action is described as allowed the system continues, while if the
transition would take the system to another state, any state that is not explicitly
mentioned will cause the system to sound the alarm [17].
Halme and Bauer [22] categorize the anomaly detection systems based on
system specifications and profiling. Depend on the components of the system whose
behaviors are to be captured and subsequently monitored, different classes may be
Threshold Monitoring sets values for metrics defining acceptable behavior
(e.g., fewer than some number of failed logins per time period). Thresholds provide
a clear, understandable definition of unacceptable behavior and can utilize other
facilities besides system audit logs. Unfortunately it is often difficult to characterize
intrusive behavior solely in terms of thresholds corresponding to available audit
records. It is difficult to establish proper threshold values and time intervals over
which to check. Approximation can result in a high rate of false positives, or high
rate of false negatives across a non-uniform user population [22].
User Work Profiling maintains individual work profiles to which the user is
expected to adhere in the future [22].
Group Work Profiling assigns users to specific work groups that
demonstrate a common work pattern and hence a common profile. A group profile is
calculated based upon the historic activities of the entire group. Individual users in
the group are expected to adhere to the group profile. This method can greatly
reduce the number of profiles needing to be maintained [22].
Resource Profiling monitors system-wide use of such resources as accounts,
applications, storage media, protocols, communications ports, etc., and develops a
historic usage profile. Continued system-wide resource usage - illustrating the user
community's use of system resources as a whole - is expected to adhere to the system
resources profile. However, it may be difficult to interpret the meaning of changes in
overall system usage [22].
Executable Profiling seeks to monitor executables’ use of system resources,
especially those whose activity cannot always be traced to a particular originating
user. Viruses, Trojan horses, worms, trapdoors, logic bombs and other such software
attacks are addressed by profiling how system objects such as files and printers are
normally used, not only by users, but also by other system subjects on the part of
users [22].
Static Work Profiling updates usage profiles only periodically at the behest
of the SSO. This prevents users from slowly broadening their profile by phasing in
abnormal or deviant activities which are then considered normal and included in the
user's adaptive profile calculation.
Performing profile updates may be at the
granularity of the whole profile base or, preferably, configurable to address
individual subjects [22].
Adaptive Work Profiling automatically manages work profiles to reflect
current (acceptable) activity. The work profile is continuously updated to reflect
recent system usage. Profiling may be on user, group, or application. Adaptive work
profiling may allow the SSO to specify whether flagged activity is: 1) intrusive, to be
acted upon; 2) not intrusive, and appropriate as a profile update to reflect this new
work pattern, or 3) not intrusive, but to be ignored as an aberration whose next
occurrence will again be of interest. Activity which is not flagged as intrusive is
normally automatically fed into a profile updating mechanism. If this mechanism is
automated, the SSO will not be bothered, but work profiles may change and continue
to change without the SSO's knowledge or approval [22].
Adaptive Rule Based Profiling differs from other profiling techniques by
capturing the historical usage patterns of a user, group, or application in the form of
Transactions describing current behavior are checked against the set of
developed rules, and changes from rule-predicted behavior flagged. As opposed to
misuse rule-based systems, no prior expert knowledge of security vulnerabilities of
the monitored system is required. "Normal usage" rules are generated by the tool in
its training period.
However, training may be sluggish compared to straight
statistical profiling methods. Also, to be effective, a vast number of rules must be
maintained with inherent performance issues [22]. Misuse Detection
Misuse Detection Systems like [25] and [26], use patterns of well-known
attacks or weak spots of the system to match and identify known intrusions. In such
systems which also called signature detection system, the intrusion detection decision
is formed on the basis of knowledge of a model of the intrusive process and what
traces it ought to leave in the observed system. For example, a signature rule for the
“guessing password attack” can be “there are more than 4 failed login attempts
within 2 minutes”. The main advantage of misuse detection is that it can accurately
and efficiently detect instances of known attacks. In addition, despite of anomaly
detection system, the alarms generated by misuse detection systems are meaningful
e.g., they contain diagnostic information about the cause of the alarm [16].
The main disadvantage is that it lacks the ability to detect the truly innovative
(i.e. newly invented) attacks [3] as well as those whose signatures are not available.
The database of attack signatures needs to be kept up-to-date, which is a tedious task
because new vulnerabilities are discovered on a daily basis.
However, most
commercial systems used today, like some of Cisco products, are knowledge-based
systems. Programmed
The system is initially programmed with an explicit decision rule set. The
detection rule is simple since it contains a straightforward coding of what can be
expected to be observed in the event of an intrusion. In State-Modeling method of
programmed misuse detection systems, the intrusion is encoded as a number of
different states, each of which has to be present in the observation space for the
intrusion to be considered to have taken place. They are by their nature time series
models. Two subclasses exist: in the first, state transition, the states that make up
the intrusion form a simple chain that has to be traversed from beginning to end; in
the second, petri-net, the states form a petri-net. In this case they can have a more
general tree structure, in which several preparatory states can be fulfilled in any
order, irrespective of where in the model they occur.
In Expert-System class, an expert system is employed to reason about the
security state of the system, given rules that describe intrusive behavior. Often
forward-chaining, production-based tool are used, since these are most appropriate
when dealing with systems where new facts (audit events) are constantly entered into
the system. String matching method is a simple, often case sensitive, substring
matching of the characters in text that is transmitted between systems, or that
otherwise arise from the use of the system. Simple Rule-Based systems are similar
to the more powerful expert system, but not as advanced.
This often leads to
speedier execution [17]. Note that the lack of detectors in the signature-self-learning
class is conspicuous. Compound Detectors
These detectors form a compound decision in view of a model of both the
normal behavior of the system and the intrusive behavior of the intruder. The
detector operates by detecting the intrusion against the background of the normal
traffic in the system. These detectors have-at least in theory-a much better chance of
correctly detecting truly interesting events in the supervised system, since they both
know the patterns of intrusive behavior and can relate them to the normal behavior of
the system [17].
2.4.2 Taxonomy of System Characteristics
Intrusion detection systems are more than just a detector. Due to this reason
IDSs also need to be categorized based on those characteristics that do not pertain
directly to the detection principle. In the following, we introduce some of the most
important aspects of categorization and briefly explain the respective classes. Time of Detection
Intrusion detection system can be divided into two classes based on the time
of detection: those that attempt to detect intrusions in real-time or near real-time, and
those that process audit data with some delay, postponing detection (non-real-time),
which in turn delays the time of detection. Obviously real-time intrusion detection
system can by run in the off-line mode on historical audit data [17]. Granularity of Data-Processing
With consideration of granularity on which the data is processed, we can
identify two types of IDSs; those that process data continuously and those that
process data in batches at a regular interval. This category is linked with the time of
detection category above, but it should be noted that they do not overlap, since a
system could process data continuously with a (perhaps) considerable delay, or could
process data in (small) batches in real-time [17]. Source of Audit Data
Regarding the different sources of event information used to detect intrusion,
IDSs could be divided into different categories. These sources can be drawn from
different levels of the system, with network, host, and application monitoring most
common [15].
Host-Based Intrusion Detection started in the early 1980’s before networks
were as prevalent, complex and inter-connected as they are today. In the 1980’s it
was common practice to review audit logs for suspicious and security relevant
activity. Today’s host-based IDS still use various audit logs but they are much more
automated, sophisticated, and real-time with their detection and responses.
Host-based IDSs operate on information collected from within an individual
computer system. (Note that application-based IDSs are actually a subset of hostbased IDSs.) This vantage point allows host-based IDSs to analyze activities with
great reliability and precision, determining exactly which processes and users are
involved in a particular attack on the operating system.
Furthermore, unlike
network-based IDSs, host-based IDSs can “see” the outcome of an attempted attack,
as they can directly access and monitor the data files and system processes usually
targeted by attacks [15].
Host-based IDSs normally utilize information sources of two types, operating
system audit trails, and system logs.
Operating system audit trails are usually
generated at the innermost (kernel) level of the operating system, and are therefore
more detailed and better protected than system logs. However, system logs are much
less obtuse and much smaller than audit trails, and are furthermore far easier to
comprehend [15].
The majority of commercial intrusion detection systems are Network-Based.
These IDSs detect attacks by capturing and analyzing network packets. Listening on
a network segment or switch, one network-based IDS can monitor the network traffic
affecting multiple hosts that are connected to the network segment, thereby
protecting those hosts. Network-based IDSs often consist of a set of single-purpose
sensors or hosts placed at various points in a network. These units monitor network
traffic, performing local analysis of that traffic and reporting attacks to a central
management console. As the sensors are limited to running the IDS, they can be
more easily secured against attack. Many of these sensors are designed to run in
“stealth” mode, in order to make it more difficult for an attacker to determine their
presence and location [15].
Application-Based IDSs are a special subset of host-based IDSs that analyze
the events transpiring within a software application. The most common information
sources used by application-based IDSs are the application’s transaction log files.
The ability to interface with the application directly, with significant domain or
application-specific knowledge included in the analysis engine, allows applicationbased IDSs to detect suspicious behavior due to authorized users exceeding their
This is because such problems are more likely to appear in the
interaction between the user, the data, and the application [15].
The use of application semantics to detect more subtle attacks can be found in
the literature since 1986. Since then, three different types of application-based IDSs
have emerged [27]. In the first type, the IDS uses intercepted traffic going in and out
of the application. In the second type, the IDS relies on third-party logs from
Operating Systems, databases and firewalls. Finally, in the last type, the IDS directly
uses internal application messages and library calls. Thus, the last group provides
the possibility of bidirectional on-line interaction between the IDS and the
application, and more precise IDS response and analysis [28].
A Hybrid IDS combines both Host-Based IDS and Network-Based IDS
approaches and can also combine different detection methods. IDSs scan network
traffic or also incoming and outgoing host traffic to find potentially malicious
packets. Thus, they analyze packets at OSI layers 3 (Network) and 4 (Transport) but
are unable to consider the semantics of application protocols like HTTP, for
example. As a consequence, IDSs are usually ineffective to detect inside intruders
who have access to more information than external intruders and may even be
familiar to the security controls of the applications, but who could still be detected by
closely inspecting the nature of their interactions within the applications [28].
27 Response to Detected Intrusions
Response may refer to the set of actions that the system takes once it detects
intrusions. These are typically grouped into active and passive measures, with active
measures involving some automated intervention on the part of the system, and
passive measures involving reporting IDS findings to humans, who are then expected
to take action based on those reports [15].
In [15], the active responses has got further categorized into three types: the
first and most innocuous one is the collection of additional information about a
suspected attack. It might involve increasing the level of sensitivity of information
sources (for instance, turning up the number of events logged by an operating system
audit trail, or increasing the sensitivity of a network monitor to capture all packets,
not just those targeting a particular port or target system.) The second approach is the
changing the environment which is to halt an attack in progress and then block
subsequent access by the attacker. Typically, IDSs do not have the ability to block a
specific person’s access, but instead block Internet Protocol (IP) addresses from
which the attacker appears to be coming. The third and the last approach is the
taking action against the intruder.
The most aggressive form of this response
involves launching attacks against or attempting to actively gain information about
the attacker’s host or site. However tempting it might be, this response is ill advised.
Due to legal ambiguities about civil liability, this option can represent a greater risk
that the attack it is intended to block.
The most conventional form of passive response consists of Alarms and
Notifications which are generated by IDSs to inform users when attacks are detected.
Most commercial IDSs allow users a great deal of latitude in determining how and
when alarms are generated and to whom they are displayed [15].
28 Locus of Data-Processing
The audit data can either be processed in a central location, irrespective of
whether the data originates from one-possibly the same-site or is collected and
collated from many different sources in a distributed fashion [17]. Locus of Data-Collection
Audit data for the processor/detector can be collected from many different
sources in a distributed fashion, or from a single point using the centralized approach
[17]. Security
Respect to the ability of the system to withstand against the attacks to
intrusion detection system itself, we can abstractly classify the IDSs into two classes
of high and low scale. Degree of Inter-Operability
The degree to which the system can operate in conjunction with other
intrusion detection systems, accept audit data from different sources, etc. [17].
Intrusion Detection Systems using Data Mining
Recently, there have been efforts to leverage the capabilities of variant
Artificial Intelligence techniques in Intrusion Detection Systems. These techniques
can lessen the human effort required to build IDSs and can get better their
Moreover, Learning and induction are used to improve the
performance of search problems, while clustering has been used for data analysis and
reduction. In addition, AI has recently been used in Intrusion Detection for anomaly
detection, data reduction and induction, or discovery, of rules explaining audit data
Amongst AI techniques, Data Mining may be thought of as the most
interesting one in accomplishment of intrusion detection. Data Mining refers to a
collection of methods by which large sets of stored data are filtered, transformed, and
organized into meaningful information sets [29]. There has been growing number of
researches in application of Data Mining algorithms to different phases of intrusion
detection mechanism [3-5, 30, 31]. Data Mining-based intrusion detection systems
have demonstrated high accuracy, good generalization to novel types of intrusion,
and robust behavior in a changing environment [6]. In Figure 2.4 we depicted (Pei et
al.: Data Mining Techniques for Intrusion Detection and Computer Security)
Figure 2.4
Data Mining Phases
Sample misuse detection systems that use Data Mining include Java Agent
for Meta-learning (JAM) [32], Mining Audit Data for Automated Models for
Intrusion Detection (MADAM ID) [33], and Automated Discovery of Concise
Predictive Rules for Intrusion Detection [34].
Application of Data Mining to
anomaly detection include Audit Data Analysis and Mining (ADAM) [35], Intrusion
Detection Using Data Mining (IDDM) [36], MINDS [37] and eBayes [38]. In the
following we briefly introduce some of these intrusion detection systems and show
how each of them apply Data Mining techniques to run different phases of intrusion
JAM uses Data Mining techniques to discover patterns of intrusion. It then
applies a meta-learning classifier to learn the signature of attacks. The association
rules algorithm determines relationships between fields in the audit trail records, and
the frequent episodes algorithm models sequential patterns of audit events. Features
are then extracted from both algorithms and used to compute models of intrusion
behavior. The classifiers build the signature of attacks. So, essentially, Data Mining
in JAM builds a misuse detection model [39].
MADAM ID [33, 40] project at Columbia University is a powerful Data
Mining framework for constructing the intrusion detection model. It consists of
classification and meta-classification programs, association rules and frequent
episodes programs, and a feature construction system. The end products are concise
and intuitive rules that can detect intrusions, and can be easily inspected and edited
by security experts when needed.
ADAM [41, 42] uses a combination of association rules mining and
classification to discover attacks in a TCPdump audit trail. First, ADAM builds a
repository of "normal" frequent item-sets that hold during attack-free periods. It
does so by mining data that is known to be free of attacks. Secondly, ADAM runs a
sliding-window, on-line algorithm that finds frequent item-sets in the last D
connections and compares them with those stored in the normal itemset repository,
discarding those that are deemed normal. With the rest, ADAM uses a classifier
which has been previously trained to classify the suspicious connections as a know
type of attack, an unknown type or a false alarm [35].
The IDDM project aims to explore Data Mining as a supporting paradigm in
extending intrusion detection capabilities. The project seeks to re-use, augment and
expand on previous works as required and introduce new principles from Data
Mining that are considered good candidates for this purpose.
Rather than
concentrating on the use of a particular technique in a certain application instance,
they intend to explore multiple uses for any given Data Mining principle in a variety
of ways [36].
The MINDS project [37] at University of Minnesota uses a suite of Data
Mining techniques to automatically detect attacks against computer networks and
systems. Their system uses an anomaly detection technique to assign a score to each
connection to determine how anomalous the connection is compared to normal
network traffic. Their experiments have shown that anomaly detection algorithms
can be successful in detecting numerous novel intrusions that could not be identified
using widely popular tools such as SNORT [43].
In [4], Data Mining techniques have been used to discover consistent and
useful patterns of system features that describe program and user behavior. They
also used the set of relevant system features to compute (inductively learned)
classifiers that can recognize anomalies and known intrusions. The development of
proposed framework in [4] consists of utilizing the auditing programs to extract an
extensive set of features that describe each network connection or host session, and
applying Data Mining programs to learn rules that accurately capture the behavior of
intrusions and normal activities [30].
However, there are some shortcomings in application of Data Mining
techniques to intrusion detection process.
The following are some of the
disadvantages of a Data Mining based IDS [44].
Data must be collected from a raw data stream and translated into a form that
is suitable for training. In some cases data needs to be clearly labeled as “attack” or
“normal”. This process of data preparation is expensive and labor intensive.
Data Mining based IDS generally do not perform well when trained in a
simulated environment and then deployed in a real environment. They generate a lot
of false alarms and it can be quite labor intensive to sift through this data.
In order to overcome these problems, there is a need to develop methods and
tools that can be used by the system security analyst to understand the massive
amount of data that is being collected by IDS, analyze and summarize the data and
determine the importance of an alert [44].
2.5.1 Applicable Data Mining Algorithms to Intrusion Detection
The recent rapid development in Data Mining has made available a wide
variety of algorithms, drawn from the fields of statistics, pattern recognition,
machine learning, and database. Several types of algorithms are particularly useful
for mining audit data:
Classification and prediction are two forms of data analysis that can be used
to extract models describing important data classes or to predict future data trends.
For example, a classification model can be built to categorize bank loan applications
as either safe or risky. In other word, classification maps a data item into one of
several pre-defined categories. These algorithms normally output “classifiers”. A
prediction model can be built to predict the expenditures of potential customers on
computer equipment given their income and occupation.
Some of the basic
techniques for data classification are decision tree induction, Bayesian classification
and neural networks.
These techniques find a set of models that describe the
different classes of objects. These models can be used to predict the proper class of
an object for which the class is unknown. The derived model can be represented as
rules (IF-THEN), decision trees or other formulae. An ideal application in intrusion
detection will be to gather sufficient “normal” and “abnormal” audit data for a user
or a program, then apply a classification algorithm to learn a classifier that can label
or predict new unseen audit data as belonging to the normal class or the abnormal
class [44].
Association analysis This involves discovery of association rules showing
attribute-value conditions that occur frequently together in a given set of data. In a
simple word, it determines relations between fields in the database records. This is
used frequently for market basket or transaction data analysis. For example, the
following rule says that if a customer is in age group 20 to 29 years and income is
greater than 40 K/year then he or she is likely to buy a DVD player.
Age (X, “20–29”) & income(X, “>40 K”) =>
buys (X, “DVD player”) [support = 2%, confidence = 60%]
Rule support and confidence are two measures of rule interestingness. A
support of 2% means that 2% of all transactions under analysis show that this rule is
true. A confidence of 60% means that among all customers in the age group 20–29
and income greater than 40 K, 60% of them bought DVD players [44]. In the
context of intrusion detection and analysis of audit data, correlations of system
features in audit data (for example, the correlation between command and argument
in the shell command history data of a user) can serve as the basis for constructing
normal usage profiles [44].
Sequence or path analysis models sequential patterns. These algorithms can
discover what (time-based) sequence of audit events are frequently occurring
These frequent event patterns provide guidelines for incorporating
temporal statistical measures into intrusion detection models. For example, patterns
from audit data containing network-based denial-of-service (DOS) attacks suggest
that several per-host and per-service measures should be included [30]. Moreover,
the mined frequent patterns are important elements for framing behavior profile of a
Clustering The training of the normality model for anomaly detection may
be performed by clustering, where similar data points are grouped together into
clusters using a distance function. As a Data Mining technique, clustering fits very
well for anomaly detection, since no knowledge of the attack classes is needed whilst
training. Contrast this to classification, where the classification algorithm needs to
be presented with both normal and known attack data to be able to separate those
classes during detection [45]. In the intrusion detection literature there have been
other researches in which a variety different clustering techniques have been used,
for example [45-48].
Database Intrusion Detection Systems
Database management systems (DBMS) represent the ultimate layer in
preventing malicious data access or corruption and implement several security
mechanisms to protect data [49].
Traditional commercial implementations of
database security mechanisms are very limited in defending successful data attacks.
These traditional database protection techniques like authorization, access control
mechanisms, inference control, multi-level secure databases, multi-level secure
transactions processing, database encryption and etc. mainly address primarily how
to protect the security of a database, especially its confidentiality. However, in
practice, these techniques may be fooled by knowledgeable attackers who thwart the
security mechanisms and gain access to sensitive data. On the other hand, authorized
but malicious transactions can make a database useless by impairing its integrity and
availability [50].
Neither network-based nor host-based IDSs can detect malicious behavior
from users at the database level, or more generally, the application level, because
they do not work at the application layer [51]. The inability of network-based
intrusion detection systems in detecting the database intrusions is straightforward.
Nevertheless, existing host-based intrusion detection systems use the operating
system log or the application log to detect misuse or anomaly activities. These
methods are not sufficient for detecting intrusion in database systems [52]. It also
can be said that host-based intrusion detection inability in database intrusion
detection is because of the fact that users who seek to gain database privileges will
likely be invisible at the operating systems level, and thus invisible to the host-based
intrusion detectors. Therefore, SQL injection [53] and other SQL-based attacks
targeted at databases cannot be effectively detected by network-based or host-based
IDSs [51].
Though, ideally, the database-specific IDS should work as a
complementary mechanism to the existing network-based and host-based intrusion
detection systems rather than replacing them [51].
When an attacker or a malicious user updates the database, the resulting
damage can spread very quickly to other parts of the database through valid users.
So, quick and accurate detection of a cyber attack on a database system is the
prerequisite for fast damage assessment and recovery [7]. Hence, there should be a
mechanism in place to practically and efficiently survive against successful database
attacks, which can seriously impair the integrity and availability of a database.
This may be thought of as the main motivation of advent of Database
Intrusion Detection Systems. Database IDSs try to detect or possibly prevent the
intrusions to RDBMSs which mainly is accomplished by malicious transactions1
either by outsiders or insiders like disgruntled employees who misuse their
privileges; as nowadays the greatest threats are from internal sources; which means
the perimeter-based security solutions may not be enough. Additionally, most of
companies solely implement network-based security solutions that are designed to
protect network resources; despite the fact that the information is more often the
target of the attack.
Database intrusion detection systems identify suspicious,
abnormal or downright malicious accesses to the database system [55]. Thus, the
existence of database intrusion detection system may be critical as a part of a
defense-in-depth security strategy. Due to this fact, many researches have been
conducted in database intrusion detection system [8, 49, 56-59].
Actually explicitly categorizing of database intrusion detection systems may
not be that simple, since the criteria based on which the different DB-IDSs have
established are considerably variant. Therefore, to soundly be able to thoroughly
understand the database intrusion detection area, a comprehensive study is needed to
construct a taxonomy of database intrusion detection systems which consider the
different criterion and based on them classifies DB-IDSs.
Constructing this
taxonomy is beyond the scope of this project. However, some of these criterions
may include:
types and sources of database attacks
- Malicious transactions may be defined as transactions that access database without authorization,
or transactions that are issued by users who are authorized but abuse their privileges 54. Dai, J. and
H. Miao, D_DIPS: An Intrusion Prevention System for Database Security. 2005.
response to the detected attack
detection strategy or analysis type (anomaly detection, misuse
detection or hybrid approaches)
In 2 or 3-tier architecture, the layer on which the DB-IDS is resided
sources of audit data which is needed to be analyzed and also the data
collection method (for example, [51] focuses on the data acquisition
methods tailored to the needs of database IDSs)
in anomaly detection systems the granularity of objects whose normal
behavior are to be profiled (for example in [49], three abstraction
levels are used to define the user profile representing his/her behavior
in database.)
In the literature review of database intrusion detection systems we faced
variant systems in each of which a combination of some mentioned features and
criterion had been applied, and it prevents us from simply classify the different DBIDSs into discrete and explicit categories. However in the following section we are
going to holistically discuss about some of the researches in the area. Our main
focus, however, will be on the database anomaly/misuse detection systems.
The contribution of Data Mining techniques and intrusion detection warfare
has resulted in a great number of researches which will be briefly discussed in the
2.5. However, apart from Data Mining, the other AI techniques like artificial neural
network have also been of interest in Data Mining intrusion detection [60]. The
contribution of disciplines like Artificial Immune System (AIS) with database
intrusion detection has also studied in [61] and [62]. Moreover, as like the other
types of intrusion detection systems (Network-based IDSs, for example), database
IDSs may also be categorized in anomaly detection and misuse detection. These
systems will be discussed in the following chapters.
The challenge to which most database intrusion detection systems have
encountered is that they assume the database system works in an isolated area. The
problem is that for a popular web site, it is nearly impossible to monitor access
pattern for each individual user due to the great number of daily visitors. Also, the
focus in systems, for example like [8], is mainly on isolated database system [56].
This fact has lead to the need for detecting intrusions against a web-based database
service which is studied in [56] and [63].
In addition in [53], an application layer intrusion detection system is
proposed in which an IDS sensor is situated at the database server that will detect
SQL injection attacks.
This sensor is specifically designed to inspect SQL
The concept of fingerprinting the database transaction is discussed in [55].
The presented technique characterizes legitimate accesses through fingerprinting
their constituent SQL statements.
These fingerprints are then used to detect
illegitimate accesses. The system works by matching SQL statements against a
known set of legitimate database transaction fingerprints. Subsequently, in [64]
which is the complement of previous work, the author explore the various issues that
arise in the collation, representation and summarization of potentially huge set of
legitimate transaction fingerprints.
Lee at al. in [57] used time signatures in discovering the intrusions in realtime database systems. If a transaction tried to attempt to update a temporal data
item which is already updated within certain period, the systems detects it as an
2.6.1 Database Intrusion Detection Using Data Mining
As we mentioned in previous chapters, recently, researchers have started
using Data Mining techniques in the emerging field of information and system
security and especially in intrusion detection systems [65].
Variety different
intrusion detection systems have applied Data Mining techniques especially in
constructing the normal behavior of the system and user profile in anomaly detection
systems. Meanwhile, these techniques also have been used in database intrusion
detection systems for building the normal working scope of different objects like
database users.
The other important application of Data Mining techniques in database
intrusion detection system is discovering the data dependencies among data-items in
the database. In [7], data dependency is defined as the data access correlations
between two or more data items.
The techniques employed use Data Mining
approach to generate data dependencies among data items. These dependencies
generated are in the form of classification rules, i.e., before one data item is updated
in the database what other data items probably need to be read and after this data
item is updated what other data items are most likely to be updated by the same
transaction. Transactions that are not compliant to the data dependencies generated
are flagged as anomalous transactions [7].
In [52] also, the identification of malicious database transaction is
accomplished by using data dependency relationships. Typically before a data item
is updated in the database some other data items are read or written. And after the
update other data items may also be written. These data items read or written in the
course of updating one data item construct the read set, pre-write set, and the postwrite set for this data item. The proposed method identifies malicious transactions
by comparing these sets with data items read or written in user transactions [52].
In [7], the author has come up with a comparison between existing approach
for modeling database behavior [66] and transaction characteristics [8, 57] to detect
malicious database transactions. The advantage of their approach is that it is less
sensitive to the change of user behaviors and database transactions. It is observed
from real-world database applications that although transaction program changes
often, the whole database structure and essential data correlations rarely change.
In [6] an Database-centric Architecture for Intrusion Detection (DAID) is
proposed for the Oracle Database. This RDBMS-centric framework can be used to
build, manage, deploy, score, and analyze Data Mining-based intrusion detection
In [67] and subsequently [68] an elementary transaction-level user profiles
mining algorithm is proposed which is based on user query frequent item-sets with
item constraints.
Srivastava, A., et al. in [65] and [23] propose an intrusion detection algorithm
named weighted data dependency rule miner (WDDRM) for finding dependencies
among the data items. The main idea is that in every database, there are a few
attributes or columns that are more important to be tracked or sensed for malicious
modifications as compared to the other attributes. The algorithm takes the sensitivity
of the attributes into consideration.
Sensitivity of an attribute signifies how
important the attribute is for tracking against malicious modifications.
2.6.2 Database Anomaly Detection Systems
Relational databases operate on attributes within relations, i.e., on data with a
very uniform structure, which makes them a prime target for anomaly detection
systems [69]. Generally, an anomaly-based IDS is a bi-modal system with a training
mode and detection mode [53]. Speaking in more details, in database anomaly
detection systems usually firstly an anomaly detector examines the regular state and
behavior of a system and computes from them a set of reference data, which captures
their characteristic properties.
Then, the same computations are applied to the
system in operation and the current set is compared with the reference set. Whenever
the difference exceeds a specified threshold, the anomaly detector reports an
anomaly, viz an unusual deviation [69].
Anomaly detection works best, i.e. produces the fewest wrong hints and
alarms, on systems with clear patterns of regularity. The identification or extraction
of these patterns is the most difficult task in the design of an anomaly detection
system (ADS) for networks and operating systems – with well designed relational
databases many of them come for free [69].
DEMIDS [8] is a misuse detection system for database which is essentially
based on this fact that the access patterns of users typically form some working
scopes which comprise sets of attributes that are usually referenced together with
some values. DEMIDS considers domain knowledge about the data structures and
semantics encoded in a given database schema through the notion of distance
measure. Distance measures are used to guide the search for frequent item-sets
describing the working scopes of users. In DEMIDS such frequent item-sets are
computed efficiently from audit logs using the database system's data management
and query processing features [8].
The tool designed in [59] analyses the transactions the users execute and
compares them with the profile of the authorized transactions that were previously
learned in order to detect potential deviations. This tool, named IIDD - Integrated
Intrusion Detection in Databases, works in two modes: transactions learning and
intrusion detection. During transactions learning, the IIDD extracts the information
it needs directly from the network packets sent from client applications to the
database server using a network sniffer. The result is the directed graph representing
the sequence of SQL commands that composes the authorized transactions. The
learned graph is used later on by the concurrent intrusion detection engine [59].
The approach proposed in [58] for intrusion detection is based on mining
database traces stored in log files. The result of the mining process is used to form
user profiles that can model normal behavior and identify intruders. An additional
feature of this approach is that the author couples the mechanism with Role Based
Access Control (RBAC).
The IDS is able to determine role intruders, that is,
individuals that while holding a specific role, have a behavior different from the
normal behavior of the role [58]. The main idea is that databases typically have very
large number of users. Thus, keeping a profile for each single user is not feasible in
practice. So they try to construct a normal behavior of database roles rather than
database users. Role profiles would be built using a classifier. This classifier is then
used for detecting anomalous behavior.
The proposed mechanism in [49] is based on anomaly detection and includes
a learning phase and a detection phase. Very briefly, the database utilization profile
is gathered as a first step to feed the learning phase. Once the database utilization
profile is established, the information collected is used to concurrently detect
database intrusions. Three abstraction levels are used to define the user profile
representing his/her database activity: command level, transaction level, and session
level. The intrusion detection is based on a set of security constraints defined at each
of these three levels. Learning-Based Anomaly Detection
Learning-based anomaly detection represents a class of approaches that relies
on training data to build profiles of the normal, benign behavior of users and
applications [63]. In [63] an anomaly-based system is developed for the detection of
attacks that exploit vulnerabilities in Web-based applications to compromise a backend database. The approach uses multiple models to characterize the profiles of
normal access to the database. These profiles are learned automatically during a
training phase by analyzing a number of sample database accesses. Then, during the
detection phase, the system is able to identify anomalous queries that might be
associated with an attack. Although this system mostly focuses on detecting the
database attacks originated from web-based systems, it still may be considered as
database intrusion detection, but the one that is located at the higher level than
2.6.3 Hybrid Methods
In systems like [69] a combination of anomaly detection and misuse detection
methods as well as statistical functions have been applied to construct a 3componenet system for database intrusion detection. This work presents a system for
the database extension and the user interaction with a DBMS; it also proposes a
misuse detection system for the database scheme. In a comprehensive investigation
the author compares two approaches to deal with the database extension, one based
on reference values and one based on ∆-relations, and show that already standard
statistical functions yield good detection results. The misuse detection nature of this
system is due to the fact of storing a list of possibly dangerous commands in a library
of signatures and comparing the current command to it.
Database Intrusion Prevention Systems
Intrusion prevention is a proactive defense technique which is extension of
intrusion detection. Intrusion prevention systems detect ongoing attacks in real time
and stop the attacks before they succeed, thus, avoids damage caused by the attacks
There also have been researches on database intrusion prevention systems
that detect attacks caused by malicious transactions and cancel them timely before
they succeed. Mattsson presented an intrusion prevention system for database in [50,
70]. His researches focus on monitoring database objects (such as tables, attributes,
etc) access rates associating each user, if the access rates exceed the threshold ,
notifying the access control system to make the user’s request an unauthorized
request before the result is transmitted to the user. In [54], the focus is on monitoring
transactions rather than access rates, and proactive protection is based on atomicity
of transactions rather than modification of users’ authorization.
In [71] a framework is describes for highly distributed real-time monitoring
approach to database security using Intelligent Multi-Agents.
The intrusion
prevention system described in this paper uses a combination of both statistical
anomaly prevention and rule based misuse prevention in order to detect a misuser.
The Misuse Prevention System uses a set of rules that define typical illegal user
behavior. A separate rule subsystem is designed for this misuse detection system and
it is known as Temporal Authorization Rule Markup Language (TARML).
In this chapter we studied the Intrusion Detection area in a top-down manner.
First, we stated the different definition of IDS and related terms as well as a brief
history about it. The necessity of application the IDS in security mechanisms is
discussed next. After that, we studied different taxonomies of Intrusion Detection
Systems as well as the importance of studying these taxonomies. Some classification
of intrusion detection systems and the criteria based on which these classification has
established was studied next. Then, we discussed about the application of Data
Mining techniques in intrusion detection systems and showed how these techniques
may improve the efficiency of IDSs. Finally, we studied the intrusion detection
methods at the database level and showed the lack of these systems in current
security world. In a more detail view, we reviewed the database intrusion detection
systems and the mechanism based on each of these system works.
In the next chapter we propose our methodology based on which we will
conduct the project. The different phases of our methodology will be discussed in
respective sections.
In this section we explain the methodology through which we reach to project
objectives. Firstly we talk a bit about the concept of methodology to give a better
understating on what is to be covered in this section. The methodology tends to
govern, or at least limit, the range of choices as how the data will collected, how it
will be analyzed, how results will be reported, and even the nature of the conclusions
that may reasonably drawn from the results [72].
There exist different categorizations of research types.
Each of these
categories is established based on a certain criteria. However, according to [73] the
basic types of research are as follows:
Descriptive vs. Analytical
Applied vs. Fundamental
Quantitative vs. Qualitative
Conceptual vs. Empirical
Based on the specification and requirements of each type of research, the
methodology which is needed to be followed differs.
Specific known research
methodologies may be customized and adapted by researcher in order to
appropriately address the requirement of a certain research. The current project is
much closer to an empirical research (experimental) for which we adapt a known
methodology and describe it in the next section.
Figure 3.1
Project Methodology
Project Methodology
The purpose of this section is to detail how the project is conducted. The
methodology of the proposed approach would be based on the Figure 3.1. Each step
of this methodology will be discussed as the following.
3.2.1 Analysis
At the first step, we would have a study on related works and analyze the
mechanism of similar systems in details. The proposed mechanisms for database
intrusion detection system in literature range from applying the statistical approaches
to leveraging the artificial intelligence and Data Mining techniques. It also range
from misuse detection in which the decision is mainly based on the signature of
normal and legitimate transaction; to anomaly detection systems which rely on the
comparison between the current state and the pre-established profile of normal
behavior of the system.
Depend on the objectives and scope of each system,
however unavoidably; it takes some constraints and assumption into account. In this
phase we aim to analyze the features and specification of each system and derive the
drawbacks and advantages of it. Using the result of this analysis, and also the project
objectives, in next step we will design our system architecture.
3.2.2 Design
In this phase of project we propose the architecture of required system
components. Based on what we learn in literature review of similar approaches, we
aim to appropriately design an efficient architecture which addresses the
requirements to achieve to project objectives. The way the different components of
our system connect to each other as well as the data flow through these components
can dramatically affect the performance and efficiency of our solution. Hence,
enough consideration should be taken to design and construct the components, as
well as how the output of one component may feed the other parts of the system.
The level of abstraction in which our system is supposed to work should be
also determined in this phase. In addition, different implementation alternatives for
data capturing can be followed, including external database approaches, such as
using a proxy or a sniffer, or taking advantage of database auditing features available
in most of DBMS.
According to our architecture specification, we choose an
alternative for our data capturing mechanism.
Proposing an architectural design for database intrusion detection system is
the main aim of this project. Based on this architecture, different models might be
implemented to address the attacks to different database environments. Means, the
proposed architecture for DB-IDS is scalable enough to be adapted for different
3.2.3 Prototype Development
In this step we develop a prototype according to the architecture designed in
the previous section.
The proposed model works based on the correlation and
interaction between different components. Each component has a specific job. All
these components will be constructed using the built-in SQL Server 2005 capabilities
such as SQL Server Jobs, SQL Server Trace, Auditing, Stored Procedures, Triggers
and etc. The scripts will be written using SQL (Standard Query Language).
3.2.4 Prototype Implementation and Testing
In this phase we aim to implement and test the model using a mock database
system. We have to assume that in a certain period of time there have been no any
intrusion to the system. This enables us to construct the normal behavior of the
different system objects. Here we need to make our system to receive some benign
transaction issued by legitimate users. Once the behavior profiles of the database
system is constructed, we try to intrude the database system by some malicious
transactions issued by, for example, insiders (a disgruntled user for instance). A
sample of the output of each component will be provided in this phase as well.
In this chapter we introduced the methodology which will be followed
through the project. Next chapter deals with the design phase of the project. First,
we will have an overview on different approaches for database intrusion detection
system and specifically those closer to our proposed approach. Then, through the
chapter, we separately discuss about different components of the system and step by
step show how these components interact with each other.
In this chapter first we are going to discuss about approaches of DB-IDS and
explain how they are able to detect database intrusions. Then, we show how our
model could improve the database intrusion detection efficiency and scalability. The
model architecture of our database intrusion detection system will be introduced
subsequently. This chapter is organized as follows. First we discuss about the
necessity of existence of each component. Then we illustrate a figure indicating that
component along with the correspond sub-components. Step by step we add other
components and expand the architecture.
Approaches of Database Intrusion Detection
Even though it is said that database intrusion detection may somehow seems
to be a new area, however recently valuable approaches have been proposed by
researchers which deal with detection the intrusion at the database level. These
approaches vary from misuse detection which is originally based on the transaction
fingerprinting to anomaly detection in which the detection of intrusion relays on the
comparison between the normal behavior of the system objects and the current state.
Database anomaly detection systems derive the normal behavior of the
system in different ways. For example in [69] which is a combined misuse and
anomaly detection, the anomaly detection component works based on the history of
changes of the values of the monitored attributes between two runs of the ADS. The
approaches like [7], [23] and [52] are based on the data dependency concept.
However, a great number of database anomaly detection approaches focus on
extracting the normal behavior of database users. The most similar work to ours is
[74] in which the author tries to derive profiles to describe the typical behavior for
users and roles in a relational database system using the Working scopes and distance
measure concept. Also the proposed model in [58] focuses on extracting the normal
behavior of roles in RBAC-administered databases.
The Project Approach
Previous database intrusion detection systems and models solely try to profile
the typical behavior of database subjects, and not objects. (In the SQL Server 2005
literature, database subjects and object are named principals and securables
respectively.) They are not designed in a manner to be able to construct the normal
behavior of objects (opposed the subjects).
It is believed that constructing the profiles for both principals and securables
could, at least in theory, efficiently improve the capability and quality of detection
the malicious transactions. By securable profiling, we mean capturing the manner in
which principals work with securables and query them. Besides principal profiling,
capturing the behavior in which the securables are treated could also help us to
accurately and efficiently detect intrusions.
We provide some examples to support this idea. But first we explain the
intrusion detection mechanism of a system that function only based on principal
profiles. Then we provide scenarios of database attacks which such a system is not
able to detect.
When we capture the behavior of principals (database users or roles for
example), we are able to detect those intrusions carried out by insider intruders like
disgruntled or fired employees; as well as those done by external intruders
(outsiders). We need to assume that for a time, the system works in an attack-free
benign environment. This period of time could be named training phase. This
makes us able to safely construct the normal behavior of system elements. After
principal profiles are built, If someday an inside user – a fired manager for example
who seeks to revenge his ex-boss - decide to attack the database system by exceeding
his/her authorization, if what is going to be done by him/her is beyond the scope of
his/her behavior profile, this could be thought as an attack and DBA would better to
be informed about it.
Also imagine a situation in which a hacker (external intruder) somehow gains
access to account information of a legitimate database user by for example brute
force attack or social engineering. By providing the account information, intruder
can login to the database and maliciously modify some sensitive data (this data
modification is not as an ordinary routine job of that legitimate user). If issued
transactions are not within the normal behavior borders of that legitimate user, the
intrusion detection system is able to detect those malicious transactions and send
alarm to the DBA.
Now suppose that the intruder hacks into a couple of database user accounts.
In such scenario the intruder could logs in to database several times; each time using
one of the accounts and perform a small part of his intrusion. In such situation, the
deviation from normal behavior of each user profile is too small to be detected as an
anomaly by the intrusion detection system that only keeps principal profiles.
However, if securable profiles are in place, the probability of detection that kind of
attack would be increased.
Another scenario is that in database attacks like [10], the intruder gets access
to sa account (by brute force, for example), creates a user and grants all the necessary
permissions to this new user. Then, he logs in as new-created user and maliciously
updates some sensitive values. It is obvious that solely profiling the user behavior
cannot detect such database intrusions, since the new-created database user does not
hold any profile of normal activities.
So, the system cannot detect any of
transactions issued by new user as intrusion since there is no any benchmark to
compare the transaction with. Yet the existence of profile of normal behavior of
securables (here, the attribute), enables the system to detect this kind of intrusions.
For example, if the intruder in our scenario tries to modify a value in an unusual
manner as it does not conform to the normal profile of that table, this modification
may be considered as a suspicious activity.
One may ask why RDBMS access control systems (Discretionary Access
Control) are not sufficient for protecting the data resided in databases. There are a
couple of answers to this question. First of all, according to [74], very often, security
officers do not use the available means to guard against the information stored in the
DBS because the security policies are not well known. On the other hand, not all the
essential security policies can be addressed appropriately by only the means of
database access control system. No matter how accurate and rigid the association
between security policies and access control mechanism is, unavoidably some gaps
may be left, which afterward could be exploited by outsider or insider to attack
On the other hand, in a poor database access control mechanism design, some
permission may be neglected to be denied from some users. It means that one
database user may issue transactions that actually are unauthorized; while access
control system does not prevent him/her. Simply, assuming the security officer
inadvertently has forgotten to deny some permission from specific users. Another
scenario which may be raised is that database designers often ignore to provide the
data integrity constraints on table. For example in an underlying database of a web
shop, in a specific table the range of prices cannot be less and more than 40$ and
400$ respectively. Doubtlessly, this constraint can be forced by database capabilities
(using constraints in SQL Server).
However, due to a poor design, it may be
neglected to define such a constraint to prevent the intruder from maliciously
modifying the value of this column.
In practice, however, the values of this column may rarely get close to the
border values. It is believed that the range of values normally get around a certain
value. So, if one transaction tries to abnormally update a value in this column, and
the updated value is considerably far from the normal range, the transaction may be
The Proposed DB-IDS Architecture
In this section we show the process of designing the architecture of the
underlying components of our database intrusion detection system and show how
they communicate to each other. Four basic components of our system are Data
Collector, Profiler, Detector and Responder.
architecture is basically derived from [8].
The structure of the proposed
However we eliminate/add some
components from/to it to be able to address our project objectives. For example we
add Responder component which is responsible for gathering raised alarms and
taking necessary action.
In the following sections we explain the job of each
component in details. We also show how the components interact with each other.
As it was mentioned before in section 1.4, the proposed architecture could be
adopted for any commercial RDBMSs. However, the structure and arrangement of
components and data flows may need to be customized according to the capabilities
of that RDBMS. Nevertheless, we have assumed that designed model would be
implemented in MS SQL Server 2005. This assumption, however could affect the
structure of proposed architecture.
4.4.1 Data Collector
Two different types of data acquisition methods for database IDSs are
mentioned in [51], namely Interception and built-in DBMS auditing. Anomalybased IDS presented in [63] detects attacks exploiting vulnerabilities in web-based
applications to compromise a back-end database.
The IDS taps into the
communication channel between web-based applications and the database server.
SQL queries performed by the applications are intercepted and sent to the IDS for
analysis. However the IDS relies on modifying the library of MySQL (an opensource database) that is responsible for the communications between the two parties.
Thus, this approach cannot be generalized for commercial databases such as MS
SQL Server.
Most database IDSs in the literature use the native auditing functionality
provided by the DBMS to collect audit data. This includes using built-in auditing
tools of the DBMS, manipulating the database log files and invoking DBMS-specific
utilities [51]. DB-IDSs like [8], [55], [58], [69] and [71] all use the built-in auditing
functionality to capture data.
Jin et al. in [51] name some advantages and
disadvantages of using built-in auditing capabilities. Advantages include:
Easy To Deploy
Accessibility To More Comprehensive Information
And disadvantages include:
Impact On Performance
Complex And Hard To Meet Individual Corporate Auditing
Control of the Database Implies Full Control of the Auditing
However, in our model we apply the built-in auditing functionality of
database system as well as a custom method for collecting audit data which built-in
auditing mechanism is unable to provide.
Data Collector component is responsible for gathering the necessary data for
building the system profiles. A set of interesting features consisting principals and
securables to audit is selected by the security officer, depending on the security
policy to establish or verify. In this component we embed the SQL trace built-in
capability of SQL Server 2005 to capture the desirable data.
SQL Trace is a mean that allows us to capture SQL Server events from the
server and saves those events in what's known as a trace file. The trace files usually
are used to analyze performance problems. However, we can leverage this utility to
monitor several areas of server activity like analyzing SQL and auditing and
reviewing security activity as well. We are interested in events like successful login,
failed login, logout, add/drop login, schema object access and etc. Afterward we
derive necessary data from trace files and feed them into Profiler component.
Figure 4.1
Data Collector Sub-Components
Nonetheless, SQL Trace is unable to provide us all we need to know to build
system profiles. For example if a table is updated, old and new values are not
captured in the trace files. Hence, we need to develop our own data capturing
component which is called Auditor. This sub-component actually functions as an
auditor which keeps track of DML (data manipulation language) statements. In next
chapter we explain how the Auditor component actually works. But in a nutshell, we
set up a mechanism to capture the INSERT, DELETE and UPDATE statements to a
table. A copy of any inserted or deleted value to the table is exported to another
table to keep the track of table before and after the transaction. Likewise, old and
new values in an UPDATE transaction are captured as well. Additional information
like the login name who issue the transaction, timestamp, the application from which
the transaction is coming from and etc. will be stored too. These collected data by
Tracer and Auditor are then fed into Profiler component.
Figure 4.2
Data Collector
SQL trace can capture different events. But, we are not interested in all of
them. Depending on business requirements, security policies and other criterion,
security officer decide which events are needed to be captured.
In the
Implementation chapter we list important events which worth to be captured.
Likewise, auditing mechanism should be set for only sensitive elements of database
system. Undoubtedly not all the database tables and views for example require
auditing mechanism. We may only want to monitor certain tables, users, database
roles and etc. In other word, there should be a set of policy in place to determine
which database elements are needed to be profiled based on their importance and
sensitivity (Figure 4.2).
4.4.2 Transformer
Tracer produces Audit Trace and Auditor produces data which we name it
Refined Logs. Actually we use the term Refined Logs for those data ready to be fed
into Profiler component. Since Auditor sub-component is designed by our own, we
can implement it in a manner that instantly and directly generate what we need to
make the profiles. However, Audit Trace needs to be refined to provide us clean
Thus we need to consider another component which takes the raw data
produced by Tracer and convert it to Refined Logs.
Figure 4.3
Transformer Component
As we can see in Figure 4.3, Transformer component is responsible for
deriving meaningful and clean data from raw log files provided by Tracer. Though
we are able to configure the trace in a way that only capture desired events, however
dozens of unnecessary extra data would be collected unavoidably. And that is why
we put Transformer in place. The Transformer is responsible for preprocessing the
raw data in the audit logs and converting the raw data into appropriate data structures
for the Profiler. More importantly, it groups the raw audit data into audit sessions.
This is a critical step because the way audit data are aggregated into audit sessions
determines what profiles are generated.
For instance, the data can be grouped
according to users or roles adopted by the users. User profiles are generated in the
former case and role profiles in the latter [8].
4.4.3 Profiler
Refined Logs are data prepared to be analyzed for constructing system
profiles. This is accomplished by Profiler component. Profiler generates profiles for
principals and securables. No matter what technique would be applied to create the
profiles, the necessary underlying data is prepared beforehand by Data Collector
module. We can look at the Profile concept from different points of view. Means,
normal behavior of the system elements could be captured based on variant criterion.
What could be considered as profile in one system may not be regarded as profile in
another. For example the number of logins per day for each user could form a part of
that user’s profile in one system, while it could be unimportant in another
We emphasis again the most important objective of our project is proposing
the architecture for a novel database intrusion detection system.
The provided
primary platform is tried to be scalable enough. So, variant methods range from
statistical approaches to AI techniques such as Data Mining or a combination of them
could be utilized to draw the working scope of each processed element. It depends
on the creativity of the security officer or DBA and also on the requirements of the
system to derive different profiles from data logs. Over the time, the Profiler could
be extended to encompass various types of profiles and become larger.
Figure 4.4
Profiler Component
Depend on what technique is used, the profile structure varies. Applying
Data Mining for profiling the database objects and subjects results in data mining
models. A data mining model, or mining model, can be thought of as a relational
table. Each model is associated with a Data Mining algorithm on which the model is
trained. Training a mining model means finding patterns in the training dataset by
using specified Data Mining algorithms with proper algorithm parameters. Model
training is also called model processing. In fact during the training stage, Data
Mining algorithms consume input cases (Refined Logs in our terminology) and
analyze correlations among attribute values. After training, the data mining model
stores patterns that the Data Mining algorithm discovered about the dataset [75]. In
our terminology the training dataset and discovered patterns are equal to Refined
Logs and profiles respectively. Fortunately MS SQL Server 2005 introduces built-in
Data Mining feature. It enables us to integrate the Data Mining-based profiling into
our database intrusion detection architecture.
Although it may affect the
performance of the system, there are several obvious advantages. For instance, data
flow between different modules of the system became much easier rather than if we
had to export the dataset to an external module and send the patterns (profiles) back
to the Detector component.
There are a number of algorithms available in SQL Server 2005 including:
Decision Trees, Association Rules, Naive Bayes, Sequence Clustering, Time Series,
Neural Nets and Text Mining [76]. As it was mentioned in section 2.5.1, Association
Rules, Sequence Clustering and Decision Trees (a type of Classification algorithms)
are thought to be the most suitable Data Mining algorithm for intrusion detection.
SQL Server Data Mining features are embedded throughout the process and are able
to run in real time and the results can be fed back into the process of integration,
analysis, or reporting [76]. Hence, it is believed that in case of detection of any
suspicious transaction, appropriate response could be triggered right away.
Statistical approaches are another technique for building the profiles. Spalka
et al. in [69] apply basic statistical functions on the elements of single attributes to
obtain references values. They believe statistical approach yields surprisingly good
results, so they dropped the initial intention of applying Data Mining techniques to
the extension.
The characteristics of the database system on which the IDS is
running specify which methods suits for profiling. For example proposed IDS in
[69] works best for databases in which deletions or updates of a large number of
tuples occur only seldom.
Profiling technique also varies in web-based applications (online shopping
websites for example), organizational systems (for instance a company intranet) and
complex infrastructures (systems with huge databases and data warehouses). The
association between database logins and real users is another criterion which has an
effect on applied technique and the nature of established profiles.
The Profile term in our project is abstract. It could contain different data
structures. For example a database user profile may consist of several database
tables, views and mining models, each one pertaining to specific aspect of user
behavior. The combination of these data structures forms the database user profile.
For the purpose of this chapter and to show some sample profiles, we
mention a number of aspects of database user behavior as well as database securables
related profiles. In the next chapter we show in details how to implement such
profiles using SQL.
Principals related profiles:
Total number of times each user has logged in to the system using
distinct application from distinct host.
This profile tells us for
example database user Bob until now has 50 times logged in to the
system using SQLCMD application from PC06-MSWin2008 host, 45
times using OSQL application from PC06-MSWin2008 host and so
Total number of times each user has logged in to the system per day.
This profile tells us for instance database user Bob has six times
logged in to the system on July 4 2009, eight times on July 5 2009 and
so on.
Total number of times the members of each database role have logged
in to the system per day.
This profile tells us for example the
members of db_datawriter database role have totally 32 times logged
in to the system on July 4 2009 and so on.
Securables related profiles:
Total number of DELETE/INSERT/UPDATE/SELECT commands
issued on each table by database users.
This profile tells us for
example database user Bob has issued 34 DELETE commands on
Employee table.
In a dynamic environment, system profiles need to be kept updated. After the
initial training phase of the system in which the primary profiles are built, over the
time we should take the natural changes in the system behavior into the account and
renew the profiles in a regular basis. Essentially we need to compare the current
state of the system with what reflects the normal behavior of the system. Thus, we
should guarantee that system profiles present the actual normal behavior of the
system. Otherwise, it could be resulting in false positives (Anomalous activities that
are not intrusive are flagged as intrusive [1]) and false negatives (events are not
flagged intrusive, though they actually are [1]).
Therefore, the update mechanism must be in place. We can for example let
the current activities be merged into the existing profile if no intrusive activity is
detected. Also the process might be done manually by DBA. Means, the DBA could
inspect the activities during the last n minutes/hours/day for example, and
discretionary combine them with Refined Logs which are information ready to be fed
into the Profiler. However, if any intrusion remains undetected, it could spoil the
profiles since it unknowingly would be spread into the Refined Logs. That is why
the Anomaly Detection Systems are thought to be prone to the error.
4.4.4 Detector
Detector is responsible for identifying the suspicious activities in database
system. It could be considered as the most important component in our architecture.
As mentioned in previous chapters, there are two types of intrusion: anomaly and
misuse. Thus the Detector component is divided into two sub-components: anomaly
detector and misuse detector (Figure 4.5).
In the following we restate some
advantages and disadvantages of anomaly detection and misuse detection systems.
The main advantage of anomaly detection is that it does not require prior
knowledge of intrusion and can thus detect new intrusions. The main drawback is
that it may not be able to describe what the attack is and may have false positive rate
[3]. In other word, the generated alarms by the system are meaningless because
generally they cannot provide any diagnostic information (fault-diagnosis) such as
the type of attack that was encountered. Means, they can only signal that something
unusual happened [16]. However, one of the benefits of this type of IDSs is that they
are capable of producing information that can in turn be used to define signatures for
misuse detectors [15].
Figure 4.5
Detector Sub-Components
The main advantage of misuse detection is that it can accurately and
efficiently detect instances of known attacks.
In addition, despite of anomaly
detection system, the alarms generated by misuse detection systems are meaningful
e.g., they contain diagnostic information about the cause of the alarm [16]. The main
disadvantage is that it lacks the ability to detect the truly innovative (i.e. newly
invented) attacks [3] as well as those whose attack patterns are not available. The
database of attack signatures needs to be kept up-to-date, which is a tedious task
because new vulnerabilities are discovered on a daily basis.
Figure 4.6
Anomaly Detector Component
As we can see both anomaly and misuse detection has benefits and
drawbacks. Yet, applying both of them could – at least in theory – result in efficient
and more accurate detection of intrusions. In the following we explain each subcomponent in more details. Anomaly Detector
Basically, anomaly detector job is identifying any activity with considerable
deviation from pre-established profiles. These abnormal activities could be thought
of as potential attacks. The comparison method depends on the time of detection
(real-time vs. non-real-time), profile structure and what is considered as the current
state of the system which is supposed to be compared with profiles.
Plainly the comparison methods that need heavy computational works and
several resources could not be run in real-time. Instead, it could be happen in idle
times of the system or at the end of the working hour. If Data Mining approach is
applied for profiling, the comparison between profiles (pattern) and recent
transactions (case) is named model prediction in Data Mining terminology. In many
Data Mining projects, finding patterns is just half of the work; the final goal is to use
these models for prediction. Prediction is also called scoring. To give predictions,
we need to have a trained model and a set of new cases [75]. Prediction can tell us
whether new cases (recent events) are conforming to patterns or not. For example if
the classification algorithm is used, model prediction specifies whether transaction is
classified as an attack or not.
In the real-time and near-real-time anomaly detection, we need to compare
the coming transactions with profiles as soon as they reach to database engine or
with a small detail. As illustrated in Figure 4.6 there exist a connection between
Audit Trace and Anomaly Detector component. Here Audit Trace indicates the
coming transactions reaching to database engine.
It is necessary to determine in which intervals we plan the anomaly detection
takes place. Depending on applied techniques and the sensitivity of anomalies there
could be couple of intervals.
Anomaly detector may consist of several sub-
components, each of which responsible for detecting specific type of anomaly. Thus,
these sub-components (let’s say stored procedures for instance) could be run in
different intervals such as every n minute, every n hour, every day and so forth. The
more sensitive anomaly, the faster we want it to be detected.
However in cases that detection process is not real-time, we are still
interested in detecting the anomalies. It enables us to analyze the attack, inspect the
attack propagation, investigate data damages and recover corrupted data by tracking
the audit trace files and log files. All is needed to do so have been recorded by Data
Collector component beforehand.
For the purpose of this project and to demonstrate how the anomaly detector
works, in next chapter we develop some sub-components of anomaly detector of DBIDS prototype through SQL stored procedures. These modules compare the recent
events with profiles every two minutes and identify the anomalies. Additionally,
some anomaly detector sub-components run once a day.
In the following we name some sub-components of anomaly detector module
which work based on the sample profiles provided in previous section,
Find Suspicious Login: If one user logs in to the system for the first
time from/using an odd host/application, it could be thought of as a
suspicious login. Find Suspicious Login procedure is responsible for
detecting such logins. Suppose that database user Bob always logs in
to the system using Microsoft SQL Server Management Studio
applications or SQLCMD command tool from PC06-MSWin2008
host. If someday Bob use OSQL command tool to logs in to the
system, it is considered as a suspicious login which needs the BDA
attention. It is probable that Bob’s account information is stolen and
hacker is using his account to log in, with an application that Bob has
never used before.
ExceededNumOfLgns: If one database user logs in to the system more
than allowable times within the day, it could be considered as a
suspicious activity. We can set the threshold statically or calculate the
average times each user logs in to the system within the day. This
anomaly would be detected by ExceededNumOfLgns stored
67 Misuse Detector
Essentially misuse detector job is detecting the database intrusions based on
the attack patterns. A meaningful sequence of events, commands and statements
could indicate a database misuse, while immaterial if considered separately. This
meaningful sequence forms database attack pattern. Unfortunately database attack
patterns are not studied as much as network attack patterns. Network traffic is
generally based on the TCP/IP protocol and it enables the researchers to easily model
the network attack patterns. Different networks IDSs are then able to share the
detected intrusion patterns and therefore function more efficiently. Anti viruses is
another type of security system that work based on attack patterns. They could be
considered as a type of host-based IDSs.
However, for database intrusion misuse detections, there not exists such
unified approach for standardization the attacks. Nonetheless in this project we
embed a simple database attack pattern repository in the DB-IDS architecture.
Misuse detector compares the sequence of coming events and statements with the
attack patterns. As a matter of fact, we define a table of events which seem to be
deletions/insertion/update/select to a table, login/role creation, login/role deletion,
database drop, table drop and so forth. Then, we define the specific suspicious
sequence of events as probable database misuse.
For example, the following
sequence could likely be a database attack:
Considerable number of failed logins for sa >
sa successfully logs in to the system >
sa creates the login Alice >
Alice logs in to the system >
Alice drops a database table
As illustrated in Figure 4.7 Misuse Detector component has two inputs: One
from Audit Trace and the other one from Attack Patterns. The connection between
Misuse Detector and Attack Pattern is straightforward. However, here, by Audit
Trace we mean the coming stream of transactions and events.
Attack pattern repository could by populated in two ways: either manually by
DBA, or based on what is reported by Responder component as an attack. As it is
illustrated in Figure 4.7 there exists a connection between Policy and Attack Patterns
data repositories. This connection indicates that attack patterns could be defined
manually by the DBA for example. For instance a specific sequence of events in on
system may be reflected an attack, while natural in another system. According to
discretion of DBA and based on the security requirements and policies of a system,
attack pattern repository would be populated.
Figure 4.7
Misuse Detector Component
Moreover, with the support of anomaly detector component, new database
attacks could be identified and translated to the understandable format for attack
pattern repository. It is accomplished via Responder component. Either manually by
DBA or with the support of Anomaly Detector component, attack pattern repository
needs to be kept updated, addressing new attacks. The update mechanism should be
defined. It could take place in a n-minute bases or daily, depending on the severity
of attacks. Since it tremendously affects the performance of the system, attack
pattern repository could not be updated frequently. This is because the coming
events should be compared with all the patterns. So we need to keep the pattern table
in the memory to facilitate comparison.
We stated before that one of the objectives of our project is providing an
approach for database policy revision. These are mainly database security policies
which pertain to discretionary access control system.
We also embed the sub-
components responsible for database policy revisions in Misuse Detector module.
One of these sub-components is FndPssvLgns stored procedure which is responsible
for identifying those database users who has not logged in to the system within the
day at all. At the end of the working hour, the list of those users will be created and
stored. It helps DBA to find out which users are passive. The login account of those
could be disabled if necessary.
Additionally, another sub-component which is embedded into Misuse
Detector is BrtFrcDtctr. As the name indicates, it is responsible for detecting the
brute force attacks into the system. In many real-life scenarios like [10] the intruder
uses brute force tools to hack into the system. Therefore it is essential to detect such
attacks to stop further corruption. BrtFrcDtctr which would be implemented using
SQL stored procedures, raise an alarm if notifies that within a period of time, a
number of failed logins has occurred. The threshold could be set by the DBA. DBA
is also able to configure BrtFrcDtctr in a manner that disables the suspicious login
account. So, BrtFrcDtctr not only can detect the brute force attack, but also is able to
function as a database intrusion prevention system which contribute with database
access control system.
4.4.5 Responder
The Responder component is responsible for taking necessary action against
the detected intrusion. In a very simple form, the Responder just acts as a monitoring
center which holds the specification of attacks; information like the associated login
name, host, application, timestamp and so forth. DBA can regularly check the Alert
table and takes the countermeasure action.
Figure 4.8
Responder Component
Misuse Detector and Anomaly Detector send the information about detected
intrusions to the Responder component.
Sub-components of both Misuse and
Anomaly Detector are tuned in a way to report the information about detected
intrusions in a unified format and understandable for DBA.
Different levels of severity could be assigned to the intrusions. Based on the
severity of each intrusion, appropriate respond might be chosen against it. For
example, in case of dangerous attacks like brute force, as discussed before,
Responder could talk to the database access control system to disable the login
(Notice to the connection between Responder and database access control system in
Figure 4.8). Afterward, even if the correct password is provided, that user would not
be able to logs in to the system since it has been disabled.
Another type of respond to the intrusions could be enabling the higher level
of auditing, C2 auditing for example.
C2 auditing allows DBAs to meet U.S.
government standards for auditing both unauthorized use of and damage to resources
and data. It records information goes beyond server-level events, such as shutdown
or restart, successful and failed login attempts, extending it to successful and failed
use of permissions when accessing individual database objects and executing all Data
Definition, Data Access Control, and Data Manipulation Language statements [77].
Having this information in place facilitates and accelerates the attack inspection and
data recovery.
The audit information contains the timestamp, identifier of the
account that triggered the event, target server name, event type, its outcome (success
or failure), name of the user's application and Server process ID of the user's
connection and name of the database.
However, the main limitation of the auditing is that it reduces the
performance of the SQL Server. This happens due to saving the every action to the
file. Second limitation is the hard disk space. These auditing files grow rapidly,
which will reduce the disk space. According to the C2, if it is not able to write to the
trace file, SQL Server will be shutdown.
We can also utilize the E-mail functionality for sending the attack
information and notifications to DBA via email. Almost all commercial DBMSs
support SQL Mail concept. SQL Server 2005 introduces Database Mail, which is
SMTP based, rather than MAPI based [78]. In the following we highlight the mail
features of Database Mail:
Database Mail can be configured with multiple profiles and multiple
SMTP accounts, which can be on several SMTP servers. In the case
of failure of one SMTP server, the next available server will take up
the task of sending e-mails.
This increases the reliability of the
mailing system.
Mailing is an external process so it does not reduce our database
performance. This external process is handled by an executable called
DatabaseMail90.Exe located in the MSSQL\Binn directory.
Availability of an auditing facility is a major improvement in
Database Mail. Formerly, DBAs could not verify whether the system
had sent an e-mail. All mail events are logged so that DBAs can
easily view the mail history. In addition, DBAs can view the errors to
fix SMTP related issues. Plus, there is the capability to send HTML
We can configure the Responder component to address several types of
responds. DBA decides what responds need to be defines, and also assigns each of
them to specific level of severity. For example, severity of critical cases like brute
force attack or disabling the audit trace is reasonable to be the highest.
appropriate respond to such attacks could be minimizing or even disabling the
authority of launcher the commands. Less severe events might only be reported to
DBA by email. However, all these configurations should be accomplished by DBA
based on the security requirements of the system, or according to security or database
use policies (Notice to the connection from Policy box to Responder in Figure 4.8).
In addition, as mentioned before, our DB-IDS help to revise the database
security policies. As a matter of fact using the information provided by Responder,
we realize which policy need revision. The connection from Responder to Policy
box in Figure 2.1 abstractly indicates this matter.
Overall Design
The final schema of the proposed architecture for DB-IDS is illustrated in
Figure 4.9.
In this chapter we explained the process of designing the DB-IDS
The components of our architecture are Data Collector (including
Tracer and Auditor), Transformer, Profiler, Detector (including Anomaly Detector
and Misuse Detector) and Responder. As we can see in the Figure 4.9, different
users connect to the database server via different applications.
However, all
transactions reach to database engine. Necessary data for system profiling would be
collected by Data Collector component.
System/database use/security policies
specify which subjects and objects worth to be profiled. Tracer utilizes the built-in
means of tracing the commercial DBMSs. However, we need to develop our own
Auditor sub-component to capture those data which could not be collected by Tracer.
Auditor is responsible for gathering data values in DML statements. Since it would
be developed by ourselves, we can tune it in a manner to capture only what we need.
Anyhow, data collected by Tracer, which is called Audit Trace, requires
transformation to become understandable for Profiler.
This is the job of
Transformer. Profiler component derive the characteristics of normal behavior of
database subjects and objects from Refined Logs and generate the Profiles. Profiles
reflect the attack-free state of the system. They are considered as a benchmark for
Anomaly Detector component.
Generally, Detector component is responsible for detecting the malicious
activities in the database system.
Detector is a twofold component: Anomaly
Detector and Misuse Detector. Anomaly Detector discovers previously unknown
attacks. It identifies any activity with significant deviation from profiles. Misuse
Detector works based on the attack patterns. Specific sequences of events in the
database system form attack pattern or signature. Attack pattern database might be
populated manually or automatically. Detection process could take place in different
intervals. In real-time or near real-time intrusion detection, Detector needs to inspect
the recent transaction with system profiles. However, in non real-time intrusion
detection, it could occur after the working hour for example. In either case, we need
to guarantee that profiles are updated and truly reflect the benign state of the system.
In other word, we should prevent the attacks to breach to the profiles and ruin them.
Information about database intrusions detected by Detector component would
be reported to the Responder. A set of administrative policies specify what action
needs to be taken against the detected attacks. Several types of responses could be
defined according to the severity and criticality of the attacks.
For example,
Responder may contribute with database access control system to deny the
authorization of a malicious insider to prevent further damages.
provided information by Responder could help us to revise system/database
use/security policies.
In next chapter, we show the implementation of our DB-IDS prototype. All
components and sub-components would be developed using built-in means of MS
SQL Server 2005.
Figure 4.9
Architecture of the DB-IDS
In this chapter we are going to demonstrate the implementation of the DBIDS prototype.
The aim of prototype implementation is to show how several
components and sub-components of our DB-IDS could be established and
communicate with each other. The prototype is implemented in MS SQL Server
2005. Nonetheless, the architecture could be adopted for other commercial DBMSs.
We have used underlying database objects such as tables, views, stored procedures,
functions and triggers to build the components. Moreover, we have utilized SQL
Server Agent to make Jobs to iterate detection process.
The implemented prototype is a simplified model of the DB-IDS. We apply
simple statistical techniques for building the profiles. The Responder component in
this prototype functions as a monitoring module. DBA could check the information
provided by Responder to take the necessary action. In the following we step-bystep explain about each component and associated database objects.
Data Collector
For constructing Data Collector component we apply the server trace utility
in MS SQL Server 2005.
Also using triggers, we develop the Auditor sub-
In the following we go in details about building these two sub-
components of Data Collector.
5.2.1 Tracer and Audit Trace
We are interested in capturing several server events. We can next derive the
system profiles from this trace data.
In the Appendix A, we list some of the
important events worth to be captured. Most of these events are security related
ones. However, DBA might choose a selection of these events according to business
Besides choosing the specific events, we can also select which columns to be
captured. Generally we are interested in columns StartTime, EndTime, Duration,
TextData, SPID,
DBUserName, NTUserName, NTDomainName, HostName and ApplicationName
amongst others.
We can even filter the columns to only capture or not capture specific values.
The finer and cleaner the audit trace, the more efficient Profiler could derive the
profiles. For example, we can set a filter to not capture the events of tempdb,
ReportServer, mssqlsystemresource and msdb databases, since most of
rows related to these system databases are immaterial for us. In the Figure 5.2 we
have illustrated a part of a audit trace. We store the audit trace in a table named
LogRepository. We suppose that the data in LogRepository table is attack-
free. It enables us to derive the profiles from this table.
5.2.2 Transformer
As we said before, the job of Transformer is converting the raw audit trace to
an understandable format for Profiler. For the purpose of this prototype, we only
intend to derive the transactions of the sessions in a sequential order.
MakeSession stored procedure draws the sessions from LogRepository table
and put them into LogRepositorySessions table. This table tells us what have
been done in each session. In the Figure 5.1 we can see two sessions in red boxes.
Event class 14 and 15 indicate Login to the system and Logout from the system
respectively. We can observe what the database user has done since (s)he has logged
in till logout.
Figure 5.1
A portion of LogRepositorySession Table
We have implemented the Auditor using database triggers. Suppose that we
want to audit the MovieClick.dbo.Customers table. Simply we send the table
name to the MkAdtTbl stored procedure as a parameter.
trigger for us.
It then creates
As a matter of fact,
audit_*.* table is considered as Refined Logs (Figure 4.9).
The trigger is
responsible for inserting the data values of DML statements (Delete, Insert and
Update) into audit_MovieClick.dbo.Customers table.
Figure 5.2
Audit Trace Sample
The structure of audit_*.* tables is similar to the underlying table, plus
some extra columns for recording additional information about the transaction.
[audit_terminal], [audit_login], [audit_user], [audit_statement]
These information tell us respectively the time of the
transaction, the name of the application from which the transaction is issued, host
name, login name who has issued the transaction, database user associated with login
name, audit statement (Delete, Insert, Update) and audit value type (New and old). If
a row is inserted into the table, a copy of inserted values would be sent to the
audit_*.* table. The audit value type for Insert transaction is New, indicating a
new row is inserted into the table. If a Delete command is issued, a copy of deleted
values would be sent to the audit_*.* table.
The audit value type in such case
is Old. For Update transactions, two rows are sent to the audit_*.* table, the old
value as well as new value. In the Figure 5.3 and Figure 5.4 we can see a sample
table with its associated audit table. Having audit_*.* tables in place enables us to
recover the damaged data if the database attack causes data corruption.
Figure 5.3
Figure 5.4
A Sample Table (MovieClick.dbo.Movies)
A sample Audit Table (Audit_MovieClick_Movies)
Another table which is a part of the Refined Logs is AuditUni. This table is
actually the union of audit_*.* tables, but without the data values.
Figure 5.5
A Cut of AuditUni Table
AuditUni table tells us who, when, using which application and from which
host has issued which type of statement on which table of which database. This table
plays an important role for making user profiles. It tells us that each user mostly
issues which type of statements on which tables.
Profiler and Profiles
In this section we introduce several profiles of the system. These profiles are
mainly built based on the LogRepository table. In the following, we explain each
profile in details and show how it reflects the normal behavior of the system.
However, the profiles are not limited to what we mention here. In different systems,
variant profiles could be established depend of characteristics and requirements of
the system.
5.4.1 Subject Profiles
Subject profiles are supposed to reflect the normal behavior of database users
and roles. In the following we introduce several subject profiles. These profiles are
grouped in daily basis. However they could be grouped in weekly and monthly
basis, depend on the business requirements.
GnrlLgnPrfl (General Log Profile) is simply a view on the
LogRepository table. It tells us how many times each login has logged in to the
system from distinct host and using distinct application. For instance as we can see
in Figure 5.6, sa has 4 times logged in to the system from SHARAGIM host and via
SQLCMD command line tool.
Figure 5.6
General Log Profile
NumberOfLoginsPerDay (Number of Logins per Day) is another view on
the LogRepository table. It specifies how many times each user has logged in to
the system per day. The average number of logins (or any other metric reflecting the
behavior of database user regarding the number of logins per day) could be derived
from this profile. The appropriate anomaly detection sub-component can discover
whether the number of logins for a specific user within a day conform the expected
value or not.
Figure 5.7
NumberOfLoginsPerDay Table
Besides the number of logins per day, the total amount of time each user has
been logged in to the system is also important for us. TtlLgnTm (Total Login
Time) stored procedure returns a table indicating the total login time for each user
per day. As we can see in the Figure 5.8, for instance, sa has been logged in to the
system for 2 hours on February 1 2009.
Figure 5.8
TtlLgnTm Table
If the total time a user is expected to be logged in to the system within a day
is more/less than the expected value, it could be thought of as an anomaly. The
appropriate anomaly detector sub-component might be assigned for detecting that
kind of anomaly.
NumberOfLoginsPerDayForDBRoles (Number of Logins per Day for
Database Roles) is a stored procedure that returns a table specifying how many
times the member of each database role has totally logged in to the system per day.
This helps us to model the behavior of the roles which are considered as a type of
Figure 5.9
NumberOfLoginsPerDayForDBRoles Table
The database role name is unique within the database. This is why we can
see db_datawriter in both MyProject and AdventureWorks databases. The table
tells us, for example, the members of db_ddladmin database role of the MyProject
database have totally 12 times logged in to the system on February 1 2009 and so
NumberOfLoginsPerDayForServerRoles (Number of Logins per Day for
Server Roles) is similar to the previous profile. The only difference is that it returns
the total number of logins of the members of each server role per day. Server roles
are another type of roles in SQL Server 2005 which are defined at the level of the
server. The sample profile is illustrated in Figure 5.10.
Figure 5.10
NumberOfLoginsPerDayForServerRoles Table
The number of select, insert, delete and update statements the user issues is
considered as an important part of his/her behavior in the database system. Based on
these values, we can monitor the number of each statement the user is issuing at the
time and raise an alert if it is discovered that the amount of issued specific statements
does not conform the normal behavior. In the following we name several profiles
pertaining to the number of statements.
UserStmntCounter (User Statement Counter) is a view on the
LogRepository table.
It tells us how many insert, delete, update and select
statements each user has issued per day. As shown in Figure 5.11, for example, sa
has totally issued 467 select statements to the database system on September 9 2008.
Figure 5.11
UserStmntCounter View
UserStmntCounterDBLevel (User Statement Counter Database Level) is
similar to the previous profile, but it goes to the database level. This profile tells us
how many insert, delete, update and select statements each user has issued on each
database per day. We can see in the Figure 5.12 that sa has issued 3 and 464 select
statements on “a” and MyProject databases on September 9 2008 respectively.
Figure 5.12
UserStmntCounterDBLevel View
UserStmntCounterTableLevel (User Statement Counter Table Level) is
similar to tow previous profile, but goes to the table level. It shows us how many
Insert, Delete, Update and Select statement each user has issued on which tables per
day. As illustrated in Figure 5.13, we can see for example, on September 9 2008 sa
has issued 2 and 1 select statements on “c” and “b” tables of “a” database
respectively and so on.
Figure 5.13
UserStmntCounter View
UserDDLCounter (User DDL Counter) is similar to the UserStmntCounter
view, but instead of DML statements it returns the number of DDL events each user
has issued per day.
DDL statements include Create, Drop and Alter.
statements affect on database, table, procedure, trigger, view, login, user and role.
Figure 5.14
UserDDLCounter View
5.4.2 Object Profiles
In previous chapters we have explained why merely subject profiling is not
sufficient for detecting the anomalies in the database systems. In this section we
introduce some securable profiles reflecting the manner the objects are being treated.
Securables are the database objects to which we can control access and to which we
can grant principals permissions. SQL Server 2005 distinguishes between three
scopes at which different objects can be secured: server scope, database scope and
schema scope [78]. Among those and for the purpose of this prototype, we are
interested in databases (server scope) and tables (schema scope).
DBStmntCounter (Database Statement Counter) profile specifies how
many insert, delete, update and select statement have been issued on each database
per day, no matter which user has issued them.
For example if correspondent
anomaly detector sub-component finds out that the number of delete statements
within the day is considerably exceeding than the expected value, it could report this
anomaly to the Responder component.
Figure 5.15
DBStmntCounter View
TableStmntCounter (Table Statement Counter) profile how many Insert,
Delete, Update and Select statements have been issued on each table per day. This
profile is actually similar to the previous one, but goes to the table level. As we can
see in Figure 5.16, 17 Select statements have been issued on the IQTable on
September 2 2009. Note that in DBStmntCounter and DBStmntCounter profiles, we
have considered the number of statements, but not the affected rows.
appropriate profiles could be built to address the number of affected rows as well.
Figure 5.16
TableStmntCounter View
DMLSeq_*_* tables show us that in what order Insert, Delete, Update and
Select statements have been issued on the underlying table. Not only the number of
each statement issued to the table per day is important for us, but also sometimes the
order of those statements is important as well. For example in the Figure 5.17 we
can see the order and the number of statements on the test.test2 table on March 12
Figure 5.17
A Sample DMLSeq Table (DMLSeq_test_test2)
Using Data Mining techniques like sequence analysis, we can derive more
meaningful profiles from DMLSeq tables. Sequence analysis is used to find patterns
in a discrete series. A sequence is composed of a series of discrete values or states
[75] (statements, in our example). We can discover the more frequent sequence of
statements on the table, which in some context reflects the manner the table is being
DMLSeq_*_* tables are associated with correspondent underlying tables.
However, it could be extended to databases. We might be interested in discovering
the frequent sequence of statements issued on each database as well.
In the Table 5.1, we summarize the profiles we introduce in this section.
Table 5.1
Profiles Specification
Profile Alias
Profile Name
General Log
Number of
Specifies how many times each login
has logged in to the system from distinct
host and using distinct application.
Specifies how many times each user has
Logins per
Total Login
Number of
Logins per
Day for
Number of
Logins per
Day for Server
Counter Table
User DDL
logged in to the system per day.
Specifies the total login time for each
user per day.
Specifies how many times the member
of each database role has totally logged
in to the system per day.
Specifies how many times the member
of each server role has totally logged in
to the system per day.
Specifies how many insert, delete,
update and select statements each user
has issued per day.
Specifies how many insert, delete,
update and select statements each user
has issued on each database per day.
Specifies how many Insert, Delete,
Update and Select statement each user
has issued on which tables per day.
Specifies how many Create, Drop and
Alter statement each user has issued
affecting database, table, procedure,
trigger, view, login, user and role per
Specifies how many Insert, Delete,
Update and Select statements have been
issued to each database per day.
Specifies how many Insert, Delete,
Update and Select statements have been
issued to each table per day.
Specifies in what order Insert, Delete,
Update and Select statements have been
issued on the underlying table.
Generally intrusion detection is divided into misuse detection and anomaly
detection. In the following sections we explain about the implementation of each of
these sub-components.
5.5.1 Anomaly Detector
In this section we introduce the anomaly detector sub-components developed
for our prototype. These sub-components are mainly implemented using SQL stored
procedures. Each procedure is responsible for detecting a specific type of anomaly.
Means, first we must decide what anomaly we intend to be addressed. Then, the
appropriate anomaly detector sub-component could be developed.
The procedures we explain in this section work based on the profiles we
introduced in previous section. In case of detecting any anomaly, the procedure
sends an alert to the Responder component. These alerts are stored in a table named
Alerts. Additional information such as hostname, application name, login name,
SPID, timestamp and the name of the procedure generating the alert are stored as
FindSusLogin (Find Suspicious Login)
We assume that each database user logs in to the system from the host and
using the application (s)he used to use before. In the GnrlLgnPrfl view, we keep
the host name and application name from which each user has connected to the
database server so far. So, it helps us to discover whether the user is connecting to
the database server from a usual host and application or not.
For example in the Figure 5.6 we can see that login “to” so far has used
Microsoft SQL Server Management Studio and OSQL-32 application to logs in to
the system from RAHANA host. So, if we discover that “to” is logging to the
system from SHARAGIM host, it is considered as an anomaly, since it is probable
that to’s account is stolen and the hacker is connecting to the database from a
suspicious host. FindSusLogin is responsible for detecting that kind of anomaly.
The generated alert looks like the Figure 5.18.
Figure 5.18
An Alert Generated by FindSusLogin
This alert indicates that “to” login has connected to the database server from
SQLCMD command line tool.
ErlrLgnTimeDtctr (Earlier Login Time Detector) and LtrLgnTimeDtctr
(Later Login Time Detector)
We assume that each user is allowed to logs in to the system within the
allowable range of time. It could be thought of as the working hour. We keep these
ranges in the table named UserWorkingHour (Figure 5.19).
Figure 5.19
UserWorkingHour Table
Once a login is created, the login name automatically would be inserted in the
UserWorkingHour table with the default values of 8 AM till 4 PM as the start time
and end time. It means that the user is only allowed to logs in to the system within
the specified range of time. However, these values might be modified.
Now, if one user logs in to the system before or after allowable time, it might
be considered as an anomaly. ErlrLgnTimeDtctr and LtrLgnTimeDtctr procedures
are responsible for detecting such anomaly.
Figure 5.20
An Alert Generated by ErlrLgnTimeDtctr
The generated alert raised by these procedures is illustrated in Figure 5.20.
As we can see in the figure, “to” login has logged in to the system on 7:25 AM which
is before allowable time.
ExceededNumOfLgns (Exceeding Number of Logins)
The number of logins of each user per day is also important for us. We
assume that users are not allowed to logs in to the system more than allowable times.
Once a user is created, the login name automatically would be inserted into a table
named MaxNoLogins with a number specifying the maximum number of logging as
10. Means, each user could log in to the system not more than 10 times per day.
This value could be modified.
Figure 5.21
MaxNoLogins Table
If “to”, for example, connects to the database more than 10 times within a
day, it could be considered as an anomaly. ExceededNumOfLgns procedure is
responsible for detecting such anomaly. It would raise an alert like Figure 5.22 in the
Figure 5.22
An Alert Generated by ExceededNumOfLgns
FndPssvLgns (Find Passive Logins)
This procedure inspects those database users who have not logged in to the
system at the end of a working day. It could help the DBA to find out the passive
logins and consider the dropping those if necessary. The alert generated by this
procedure lists the passive login names at the end of the day.
FndPssvLgns procedure runs once a day.
Figure 5.23
An Alert Generated by FndPssvLgns
This sub-component is one those that could help us to revision the database
security policies.
5.5.2 Misuse Detector
In this section we introduce stored procedure responsible for detecting the
database misuse.
The function of anomaly detector sub-components is actually
based on the comparison between the current state with profiles. However, misuse
detection sub-components work based on the attack or misuse patterns. In this
section we explain the procedures developed as the misuse detector subcomponents
of our prototype.
LgnMorThnOne (Login More than one time)
Since we have assumed that each login is actually associated with one real
database user, logically each user is supposed to be able to connect to the database
once at the time. So, if we discover that one database user has logged in to the
system more than one time at the time, it is considered as a database misuse.
LgnMorThnOne procedure is responsible for detecting such a misuse.
Figure 5.24
An Alert Generated by LgnMorThnOne
BrtFrcDtctr (Brute Force Detector)
BrtFrcDtctr procedure is responsible for detecting the brute force attacks. If a
number of failed logins occurs within a period of time, it could be a probable brute
force attack. The appropriate threshold (number of failed logins) as well as the time
within which we want the count the failed logins could be set. The setting depends
on the power of brute force tools. The more powerful brute force tools, the smaller
the threshold needs to be set.
Figure 5.25
An Alert Generated by BrtFrcDtctr
As we can see in Figure 5.25, 60 failed logins has happened within last 2
minutes since this alert is raised. It is more likely that a brute force tool is trying to
guess the password of Bob login. In case of any brute force detection, BrtFrcDtctr
procedure disables the correspond login.
Therefore, then, even if the correct
password is provided, the login cannot logs in to the system. In some context, it
could be interpreted as a intrusion prevention job, since it stops the intruder from
further attacks.
Intrusion Detection Cycle
In this section we briefly explain the mechanism of running the detector subcomponents.
As mentioned before, we utilize the SQL Server Agent and Jobs
capability to iterate the detection process. Means, we define a Job and schedule it to
run every 2 minutes. The logs of every 2 minutes are stored in a temporary table
named CurrentLogRepository.
It makes us able to execute the detector
procedures on the logs of last 2 minutes. Therefore, we are able to detect any
anomalous activity at most 2 minutes after it occurs. If no anomalous activity is
detected, the content of CurrentLogRepository table could be flushed into
LogRepository table. The cycle of intrusion detection is illustrated in Figure
The interval of intrusion detection process depends on the complexity of
detector procedures. In this prototype, 2 minutes intervals is appropriate to suitably
run all the procedures.
Empty the
Execute detector
sub-components on
Any intrusion
is detected?
CurrentLogRepository into
LogRepository table
Every 2 minutes
Flush the trace file content of
last 2 minutes into
Send alert to
Lunch the new trace
Figure 5.26
Intrusion Detection Cycle
In this chapter we walked through the implementation phases of database
intrusion detection prototype. We utilize the server-side tracing in SQL Server 2005
to capture the necessary data for profiling. Moreover we developed several triggers
to capture data values in DML statements. Data collected by Tracer needs to be
converted to the understandable format for Profiler. Using data provided by Tracer
and Auditor (which are called Refined Logs), Profiler derives the profiles from
Refined Logs. Then, we introduce some object and subject profiles. However,
profiles are not limited to those addressed in this chapter. Using several methods,
different type of profiles could be derived from Refined Logs. Detector component
was responsible for detecting the database anomalies and misuses. Anomaly detector
compares the current state of the database with profile to finds out any anomalous
activity. By current state we mean the log files within last n-minute. In another
word, the state of the system is reflected in most recent log files. Anomaly detector
raises an alert and sends it to Responder component in case of any suspicious
activity. However, Responder in this prototype only functions as a repository for
holding the alerts. On the other hand, Misuse detector seeks for database misuses in
most recent log files and like anomaly detector raises an alert if discovers any
database misuse. Misuse detector works based on the database attack patterns. We
defined some simple attack pattern to show how the mechanism works. Moreover
we developed another misuse detector sub-component that addresses brute attacks.
The detection process iterates every 2-minute. Means, we are able to detect
database intrusions within 2 minutes. However, they only response defined in our
prototype is simply disabling the intrusion perpetrator login.
Next chapter deals with the conclusion of this project as well as several
recommendations to enhance the implemented model. In addition, we will discuss
about future works related to this project.
Databases have become increasingly vulnerable to attacks. Generally, there
are two approaches for handling the attacks. First of all, strengthening the systems
by security controls like cryptographic techniques, sophisticated authentication
methods and etc. to prevent the subversion itself.
However, in practice these
preventive measures usually fail to stop the attacks. Hence, detecting the attacks is
considered as one the most critical steps in handling security breaches. Obviously,
undetected attacks could cause further damages to valuable information of
organizations. According to [79], it usually takes the average attacker less than 10
seconds to hack in and out of a database - hardly enough time for the database
administrator to even notice the intruder. So it’s no surprise that many database
attacks go unnoticed by organizations until long after the data has been
compromised. Yet, the importance of detecting database breaches is straightforward.
It could help us to identify database vulnerabilities and come up with solution to stop
future attacks.
Nowadays, enterprise database infrastructures, which often contain the crown
jewels of an organization, are subject to a wide range of attacks. Therefore, amongst
different types of intrusion detection systems (like network-based, host-based and
application-based IDS), DB-IDS which is considered as a type of application-based
IDS has become a matter of increasing concern.
The key aim of this project was to propose an architectural design for DBIDS.
The proposed architecture is a comprehensive platform based on which
different practical models could be implemented. These IDSs might be adapted for
commercial and open source DBMSs.
Additionally, according to the database
specification and business requirements, the configuration of DB-IDS may be varied.
However, the proposed DB-IDS architecture could be used to develop customized
database intrusion detection system.
In chapter one, we first briefly discuss about the challenges of security world
and specifically database system. Furthermore, it was mentioned how important it is
for enterprises to equip their database with security controls, since DBMSs represent
the ultimate layer in preventing malicious data access or corruption [49].
Afterward, the problem statements of this project were proposed. We then
came up with the aim, objectives and scope of this project. In following sections we
discuss how we reach to mentioned project objectives.
Chapter two was entirely dedicated to a comprehensive literature review
around intrusion detection systems. We followed a top-down approach in which
firstly we presented the taxonomy of intrusion detection systems. Next, intrusion
detection systems using data mining techniques was discussed. Finally we discuss
about proposed different DB-IDSs in literature, followed by those leveraging data
mining techniques.
Approaches of anomaly and misuse detection systems for
databases were studied then. This comprehensive study on DB-IDSs helps us to
figure out advantages and disadvantages of different systems and come up with a fair
architecture for a hybrid DB-IDS.
In chapter three we present the methodology which was followed in this
project. Our methodology included analysis, design, development, implementation
and testing phases. In chapter four - analysis and design of the DB-IDS architecture
- first we briefly went through the similar studies to our approach for DB-IDS, and
then begin to design different components of our system.
Based on the architectural design for DB-IDS, in fifth chapter we developed a
model to demonstrate how it could be implemented in a mock database system. The
prototype was implemented in MS SQL Server 2005. All components and subcomponents were built using SQL language and basic means like procedures,
triggers, tables and views. We also utilized the Job capability of SQL Server Agent
to iterate the process of detection.
The results of components and sub-components of our DB-IDS were
presented along with the explanation of each component. We provided a couple of
snapshots to show the mechanism of our DB-IDS tested on a mock database system
in the MS SQL Server 2005.
Future Works and Recommendations
In the previous chapters we have mentioned a couple of points for enhancing
the capabilities and accuracy of the database intrusion detection system. These
recommendations range from applying the more comprehensive techniques for
profiling to Responder component improvements.
The data collection method
introduced in this project works properly for small and medium-sized database
However, for large database system with hundreds and thousands
transactions per second, using triggers and server-side tracing could considerably
affect the performance of the system.
Other data collection methods such as
applying third party applications could be considered to enhance the scalability of the
A comprehensive study on the profiling techniques (deriving the profiles
from Refined Logs) could be considered as another future work. For the purpose of
this project and to show how the subject profiling mechanism works, we mostly
focused on the behavior of logins such as the number of logins per day, the total time
of logins per day, login times and so on. However, the normal behavior of the
database users could be modeled in different ways using different methods.
In this section we intend to show how the objectives of this project were
achieved. First we review the objectives discussed in section 1.5, followed by the
achievement explanation.
Proposing an architectural design for hybrid DB-IDS
An architectural design for a hybrid DB-IDS was the core aim of this project.
The overall schema of the proposed architecture is illustrated in Figure 4.9. In the
chapter 2, we studied different DB-IDSs and examine their advantages and
Then, we come up with our architecture which we intended to
encompass the advantages of other systems.
One the important keys of our proposed architecture is its capability to be
segregated. Means, for example, the Refined Logs might be provided by any Data
Collection mechanism.
Even we are able to utilize third party applications to
efficiently gather necessary data to be fed into Profiler component. It is important
because in different DBMSs the efficiency of data collection methods might vary.
So, in one system we may rather to apply built-in means for data collection, while in
another system we might have to use third party tools to do so.
It also works for almost all the components. For example, we can use any
technique and tool to derive the profiles from Refined Logs. As mentioned before,
one the well-know methods for profiling in anomaly detection systems is Data
Mining. We may either apply the built-in data mining algorithms of DBMS – if it
supports – for profiling, or use third party data mining applications to do so. All we
need to make sure is that the data flow between different components and subcomponents is appropriately established. Also, data structure of the inputs and
outputs of each component should be in a understandable format for other
Proposing a database anomaly detection model
Anomaly Detector component is responsible for comparing the present state
of the system with pre-established profiles. In the section we presented the
design principals of this component. Later in section 5.5.1 we implemented a sample
anomaly detector which works based on securable and principal profiles.
Constructing sample profiles for a database system
In the section 5.4 we construct some sample profiles for database elements
such as database users and roles (as principals) and table (as securables) to
demonstrate how a profile looks like.
Although these profiles might seem
straightforward, however they reflect the simple, yet important aspects of the normal
behavior of the system elements. Furthermore more comprehensive profiles could be
derived from these simple profiles. Principal profiles presented in section 5.4 mostly
reflect the login behavior of the database users. Measures like the number of logins
per day and the total logging time into the system and also the host and application
via which the user has connected to the system are the basic aspects on which the
user profiles are established.
For securable profiling, we also focused on the number and sequence of DML
statements on the databases and tables as well. We believed that securable profiling
besides principal profiling could – at least in theory – enhance the accuracy and
quality of the detection mechanism. In section 4.3, we also demonstrated about
database attack scenarios for which solely principal profiling could not help us to
detect the attacks.
Developing an database audit system
We developed our own database auditing system (Auditor) to capture specific
data which are not collectable by Tracer. Auditor is responsible for capturing the
DML statement values and sending them to another table. The implementation
details are presented in section 5.3. Independent from DB-IDS, this component
could be utilized in any database system for auditing purposes.
Proposing a brute-force detection model for the database systems
Multiple failed logins often indicate brute-force or enumeration attempts or
an ongoing attack in progress. One of the first challenges for an attacker is to
penetrate the authentication of a database - that is, either to authenticate as a
legitimate existing user or to bypass the authentication process, in order to access the
contents of the database. Among the different techniques for an attacker to penetrate
the database is to Brute-Force the database authentication - this can be used for
several purposes, such as:
a) Guessing a user's password using an automated attack tool
b) Enumeration of usernames in order to validate the existence of an
From the security point of view, one of the ways to identify such an attack
attempt is to monitor the failed login attempts. Multiple failed login attempts during
a short period of time may very likely be an attack attempt in progress.
developed a procedure named BrtFrcDtctr which is responsible for detecting the
brute force attack into the database system.
Proposing a model for database security policy revision
According to the information provided by the Responder component, we are
able to revise the current database security policies. For example, we may notice that
a user usually logs in to the system around 10 AM. However, (s)he is allowed to
logs in to the system from 9 AM. In such case, the DBA could reconsider the start of
the working hour of that user and change it to 9 AM from 10 AM. Therefore, if
someday the account information of that user is stolen and the intruder tries to logs in
to the system earlier than 10 AM, an alert would be sent to the DBA indicating a
misuse into the system. Even such a scenario might seem immaterial, in many real
life attack scenarios; the intrusion could be detected and mitigated then.
We also develop a procedure which identifies the logins who have not logged
in to the system at the end of the day.
The deletion of such logins could be
considered by the DBA to stop any potential attacks. These logins are known as
orphan logins. Those enabled yet passive logins could be exploited by hackers to
intrude to the system.
Sundaram, A., An Introduction to Intrusion Detection. 1996.
Frank, J., Artificial Intelligence and Intrusion Detection: Current and Future
Directions. In Proceedings of the 17th National Computer Security
Conference, 1994.
Lee, W., et al., A Data Mining and CIDF Based Approach for Detecting
Novel and Distributed Intrusions. In Proceedings of 3rd International
Workshop on the Recent Advances in Intrusion Detection, 2000.
Lee, W. And S.J. Stolfo, Data Mining Approaches for Intrusion Detection. in
the Proceedings of the 7th USENIX Security Symposium San Antonio,
Texas, 1998.
Lee, W., Applying Data Mining to Intrusion Detection: the Quest for
Automation, Efficiency, and Credibility.
Campos, M.M. And B.L. Milenova, Creation and Deployment of Data
Mining-Based Intrusion Detection Systems in Oracle Database 10g.
Hu, Y. And B. Panda, A Data Mining Approach for Database Intrusion
Detection. ACM Symposium on Applied Computing, 2004.
Chung, C.Y., M. Gertz, and K. Levitt, DEMIDS: A Misuse Detection System
for Database Systems. In Third Annual IFIP TC-11 WG 11.5 Working
Conference on Integrity and Internal Control in Information Systems, 1999.
Heady, R., et al., The architecture of a network level intrusion detection
system. 1990.
Fowler, K., Forensic Analysis of a SQL Server 2005 Database Server. 2007.
(2009) Addressing the Insider Threat, Improving Database Security to
Manage Risk within the Federal Government.
Computer Security Threat Monitoring and Surveillance. 1980.
Carter, D.L. And A.J. Katz, Computer Crime: An Emerging Challenge for
Law Enforcement. FBI Law Enforcement Bulletin, 1997: p. 1-8.
Labib, K., Computer Security and Intrusion Detection, in The ACM Student
Bace, R. And P. Mell, NIST Special Publication on Intrusion Detection
Alessandri, D., Towards a Taxonomy of Intrusion Detection Systems and
Attacks. MAFTIA deliverable D3, 2001.
Axelsson, S., Intrusion Detection Systems : A Survey and Taxonomy. 2000.
Fuchsberger, A., Intrusion Detection Systems and Intrusion Prevention
Systems. Information Security Technical Report, 2005. 10: p. 134-139.
Kemmerer, R.A. And G. Vigna, Intrusion Detection : A Brief History and
Overview. 2002.
Allen, J., et al., State of the Practice of Intrusion Detection Technologies.
Debar, H., M. Dacier, and A. Wespi, Towards a Taxonomy of Intrusion
Detection Systems. Computer Networks, 1999. 31.
Halme, L.R. And R.K. Bauer, AINT Misbehaving: A Taxonomy of AntiIntrusion Techniques.
Srivastava, A., S. Sural, and A.K. Majumdar, Database Intrusion Detection
using Weighted Sequence Mining. JOURNAL OF COMPUTERS, 2006. 1(4).
Lunt, T., et al., A real-time intrusion detection expert system (IDES). 1992.
Kumar, S. And E.H. Spafford, A software architecture to support misuse
intrusion detection. Proceedings of the 18th National Information Security
Conference, 1995: p. 194–204.
Ilgun, K., R.A. Kemmerer, and P.A. Porras, State transition analysis: A rulebased intrusion detection approach. IEEE Transactions on Software
Engineering, 1995.
Marc, G.W. And H. Andrew, Interfacing Trusted Applications with
Intrusion Detection Systems, in Proceedings of the 4th International
Symposium on Recent Advances in Intrusion Detection. 2001, SpringerVerlag.
Access Control from an Intrusion Detection Perspective.
Fayyad, U., G.P. Shapiro, and P. Smyth, The KDD Process for Extracting
Useful Knowledge from Volumes of Data. Communications of the ACM,
Lee, W., S.J. Stolfo, and K.W. Mok, A Data Mining Framework for Building
Intrusion Detection Models.
Pietraszek, T. And A. Tanner, Data mining and machine learning - Towards
reducing false positives in intrusion detection. Information Security
Technical Report, 2005.
Stolfo, S., et al., JAM: Java agents for Meta-Learning over Distributed
Databases. 1997.
Lee, W., A Data Mining Framework for Constructing Features and Models
for Intrusion Detection Systems, in Graduate School of Arts and Sciences.
Helmer, G., J. Wong, and V.H.L. Miller, Automated Discovery of Concise
Predictive Rules for Intrusion Detection. 1999.
Daniel, B., et al., ADAM: a testbed for exploring the use of data mining in
intrusion detection. SIGMOD Rec., 2001. 30(4): p. 15-24.
Abraham, T., IDDM: Intrusion Detection using Data Mining Techniques.
Lazarevic, A., et al., A Comparative Study of Anomaly Detection Schemes in
Network Intrusion Detection. in Proc. Third SIAM International Conference
on Data Mining, San Francisco, 2003.
Valdes, A. And K. Skinner, Adaptive, Model-based Monitoring for Cyber
Attack Detection. SRI International.
Daniel, B., Applications of Data Mining in Computer Security, ed. J. Sushil.
2002: Kluwer Academic Publishers. 272.
Lee, W., S.J. Stolfo, and K.W. Mok, Mining Audit Data to Build Intrusion
Detection Models. in Proc. Fourth International Conference on Knowledge
Discovery and Data Mining, NewYork, 1998.
Barbara, D., et al., ADAM: Detecting Intrusions by Data Mining. Proceedings
of the 2001 IEEE Workshop on Information Assurance and Security, 2001.
Barbar´a, D., N. Wu, and S. Jajodia, Detecting novel network intrusions using
bayes estimators. in Proc. First SIAM Conference on Data Mining, Chicago,
SNORT, SNORT Intrusion Detection System.
[cited; Available from:
Anoop, S. And J. Sushil, Data warehousing and data mining techniques for
intrusion detection systems. Distrib. Parallel Databases, 2006. 20(2): p. 149166.
Burbeck, K. And S. Nadjm-Tehrani, Adaptive real-time anomaly detection
with incremental clustering. information security technical report 12, 2007.
Portnoy, L., Intrusion detection with unlabeled data using clustering.
Leung, K. And C. Leckie, Unsupervised Anomaly Detection in Network
Intrusion Detection Using Clusters. 28th Australasian Computer Science
Conference, The University of Newcastle, Australia, 2005.
Shah, H., J. Undercoffer, and A. Joshi, Fuzzy Clustering for Intrusion
Detection. The IEEE International Conference on Fuzzy Systems, 2003.
Fonseca, J., M. Vieira, and H. Madeira, Monitoring Database Application
Behavior for Intrusion Detection. 12th Pacific Rim International Symposium
on Dependable Computing, 2006.
MATTSSON, U.T., A Practical Implementation of a Real-time Intrusion
Prevention System for Commercial Enterprise Databases.
Jin, X. And S.L. Osborn, Architecture for Data Collection in Database
Intrusion Detection Systems. 2007.
Hu, Y. And B. Panda, Identification of Malicious Transactions in Database
Systems. Proceedings of the Seventh International Database Engineering and
Applications Symposium (IDEAS’03), 2003.
Rietta, F.S., Application Layer Intrusion Detection for SQL Injection. ACM
Symposium on Applied Computing, 2006.
Dai, J. And H. Miao, D_DIPS: An Intrusion Prevention System for Database
Security. 2005.
Wenhui, S. And D. Tan T H, A Novel Intrusion Detection System Model for
Securing Web-based Database Systems. 25th Annual International Computer
Software and Applications Conference (COMPSAC'01), 2001: p. 249.
Lee, V.C.S., J.A. Stankovic, and S.H. Son, Intrusion Detection in Real-time
Database Systems Via Time Signatures. In Proceedings of the Sixth IEEE
Real Time Technology and Applications Symposium, 2000.
Elisa, B., et al., Intrusion Detection in RBAC-administered Databases, in
Proceedings of the 21st Annual Computer Security Applications Conference.
2005, IEEE Computer Society.
Fonseca, J., M. Vieira, and H. Madeira, Integrated Intrusion Detection in
Databases. 2007.
Ramasubramanian, P. And A. Kannan, A genetic-algorithm based neural
network short-term forecasting framework for database intrusion prediction
system 2005.
Asmawi, A. And Z.M. Sidek, A Survey on Artificial Immune System-Based
Intrusion Detection System for DBMS. Postgraduate Annual Research
Seminar, 2007.
Chen, K., G. Chen, and J. Dong, An Immunity-Based Intrusion Detection
Solution for Database Systems.
Valeur, F., D. Mutz, and G. Vigna, A Learning-Based Approach to the
Detection of SQL Attacks.
Lee, S.Y., W.L. Low, and P.Y. Wong, Learning Fingerprints for a Database
Intrusion Detection System. 2002.
Srivastava, A., S. Sural, and A.K. Majumdar, Weighted Intra-transactional
Rule Mining for Database Intrusion Detection. 2006.
Barbara, D., R. Goel, and S. Jajodia, Mining malicious data corruption with
hidden markov models. in Research Directions in Data and Applications
Security, 2002.
International Conference on Machine Learning and Cybemetics, Shanghai,
Zhong, Y. And X.-l. Qin, Database Intrusion Detection Based on User
Query Frequent Itemsets Mining with Item Constraints. Conference
InfoSecu04, 2004.
Spalka, A. And J. Lehnhardt, A Comprehensive Approach to Anomaly
Detection in Relational Databases. 2005.
Ramasubramanian, P. And A. Kannan, Intelligent Multi-agent Based
Database Hybrid Intrusion Prevention System. 2004.
Mauch, J. And N. Park, Guide To The Successful Thesis And Dissertation - A
Handbook For Students And Faculty. 2003: Routledge, USA.
Kothari, C.R., Research Methodology: Methods & Techniques. 2005: New
Age Publishers.
Chung, C.Y., M. Gertz, and K. Levitt, Misuse Detection in Database Systems
Through User Profiling.
ZhaoHui Tang and, J.M., Data Mining with SQL Server 2005. 2005: Wiley
Publishing, Inc.
Utley, C. (2005) Introduction to SQL Server 2005 Data Mining.
Asanka, D. (2004) Basics of C2 Auditing.
Rizzo, T., et al., Pro SQL Server 2005. 2006: Apress.
Higgins, K.J. (2008) Hacker's Choice: Top Six Database Attacks.
List of important events to be captured by Tracer sub-Component
Event Name
Audit Login
Occurs when a user successfully logs in to SQL Server.
Audit Logout
Occurs when a user logs out of SQL Server.
Audit Login Failed
Indicates that a login attempt to SQL Server from a client
Indicates that error events have been logged in the SQL
Server error log.
Audit Statement GDR
Occurs every time a GRANT, DENY, REVOKE for a
statement permission is issued by any user in SQL Server.
Audit Object GDR
Occurs every time a GRANT, DENY, REVOKE for an object
permission is issued by any user in SQL Server.
Audit AddLogin Event
Occurs when a SQL Server login is added or removed; for
sp_addlogin and sp_droplogin.
Audit Login GDR Event
Occurs when a Windows login right is added or removed;
for sp_grantlogin, sp_revokelogin, and sp_denylogin.
Audit Login Change
Property Event
Occurs when a property of a login, except passwords, is
modified; for sp_defaultdb and sp_defaultlanguage.
Audit Login Change
Password Event
Occurs when a SQL Server login password is changed.
Passwords are not recorded.
Audit Add Login to
Server Role Event
Occurs when a login is added or removed from a fixed
server role; for sp_addsrvrolemember, and
Audit Add DB User
Occurs when a login is added or removed as a database
user (Windows or SQL Server) to a database; for
sp_grantdbaccess, sp_revokedbaccess, sp_adduser, and
Audit Add Member to
DB Role Event
Occurs when a login is added or removed as a database
user (fixed or user-defined) to a database; for
sp_addrolemember, sp_droprolemember, and
Audit Add Role Event
Occurs when a login is added or removed as a database
user to a database; for sp_addrole and sp_droprole.
Audit App Role Change
Password Event
Occurs when a password of an application role is changed.
Audit Statement
Permission Event
Occurs when a statement permission (such as CREATE
TABLE) is used.
Audit Schema Object
Access Event
Occurs when an object permission (such as SELECT) is
used, both successfully or unsuccessfully.
Audit Backup/Restore
Occurs when a BACKUP or RESTORE command is issued.
Audit Object Derived
Permission Event
Occurs when a CREATE, ALTER, and DROP object
commands are issued.
Audit Database
Management Event
Occurs when a database is created, altered, or dropped.
Audit Database Object
Management Event
Occurs when a CREATE, ALTER, or DROP statement
executes on database objects, such as schemas.
Audit Database
Principal Management
Occurs when principals, such as users, are created,
altered, or dropped from a database.
Audit Schema Object
Management Event
Audit Server Object
Take Ownership Event
Occurs when the owner is changed for objects in server
Audit Database Object
Take Ownership Event
Occurs when a change of owner for objects within
database scope occurs.
Audit Change Database
Occurs when ALTER AUTHORIZATION is used to change
the owner of a database and permissions are checked to
do that.
Audit Schema Object
Take Ownership Event
Occurs when ALTER AUTHORIZATION is used to assign an
owner to an object and permissions are checked to do
Audit Server Scope
GDR Event
Indicates that a grant, deny, or revoke event for
permissions in server scope occurred, such as creating a
Audit Server Object
GDR Event
Indicates that a grant, deny, or revoke event for a schema
object, such as a table or function, occurred.
Audit Database Object
GDR Event
Indicates that a grant, deny, or revoke event for database
objects, such as assemblies and schemas, occurred.
Audit Server Operation
Occurs when Security Audit operations such as altering
settings, resources, external access, or authorization are
Audit Server Object
Management Event
Occurs when server objects are created, altered, or
Audit Server Principal
Management Event
Occurs when server principals are created, altered, or
Audit Database Object
Access Event
Occurs when database objects, such as schemas, are