AN ARCHITECTURAL DESIGN FOR A HYBRID INTRUSION DETECTION SYSTEM FOR DATABASE MOHAMMAD HOSSEIN HARATIAN A project report submitted in partial fulfillment of the requirements for the award of the degree of Master of Computer Science (Information Security) Centre for Advanced Software Engineering (CASE) Faculty of Computer Science and Information Systems Universiti Teknologi Malaysia APRIL 2009 iii To my beloved mother and father iv ACKNOWLEDGEMENT First and foremost I offer my sincerest appreciation to my supervisor, Assoc. Prof. Dr. Zailani Mohamed Sidek, who has supported me throughout my project with his patience and knowledge. I attribute the level of my Masters degree to his encouragement and effort and without him this thesis, too, would not have been completed or written. One simply could not wish for a better or friendlier supervisor. I also gratefully acknowledge all my colleagues for their advice, supervision, and crucial contribution through my project. v ABSTRACT In today's business world, information is the most valuable asset of organizations and thus requires appropriate management and protection. Amongst all types of data repositories, database is said to play the role of the heart in the body of IT infrastructure. On the other hand, nowadays, a growing number of efforts have concentrated on handling the vast variety of security attacks. The characteristic of such handling method depends on when we want it to be occurred and how we intent to deal with attack attempts. Generally there are two ways to handle subversion attempts. One way is to equip our systems by security controls. However in reality this is not feasible due to many reasons. Hence, we are interested in detecting the security attacks. Amongst different types of intrusion detection systems (like network-based, host-based and application-based IDS), database intrusion detection systems which are considered as a type of application-based IDS has become a matter of increasing concern. In this paper we proposed the architecture for a hybrid database intrusion detection system (DB-IDS). This architecture consists of several component and sub-components. It encompasses Anomaly Detection and Misuse Detection subcomponents as Detector component. Anomaly detection component works based on the Profiles constructed by Profiler. Suspicious sequence of events which are considered as potential attacks would be detected by Misuse Detector. Data Collector components is responsible for capturing necessary data for profiling. Moreover, the Transformer component is in place to convert the raw log files into an understandable format for Profiler. Finally, Anomaly Detector and Misuse Detector components send alert to Responder component in case of detection any suspicious activity. vi ABSTRAK Dalam urusan seharian kini, maklumat merupakan harta yang paling penting bagi sesebuah organisasi. Oleh itu ia memerlukan pergurusan dan perlindungan yang cekap. Diantara jenis-jenis penyimpanan data, pengkalan data merupakan bahagian utama bagi sesebuah rangka infrastruktur untuk teknologi maklumat. Oleh yang demikian, berbagai tindakan telah dilaksanakan bagi membendung masalah serangan keselamatan yang berleluasa. Cara pengurusan masalah ini bergantung kepada bagaimana kita mahu ia dilakukan dan bagaimana kita akan berurusan dengan percubaan serangan keselamatan tersebut. Secara umumnya terdapat dua kaedah bagi menguruskan masalah percubaan ‘subversion’. Salah satu cara adalah dengan melengkapkan peralatan system komputer yang sediaada dengan system kawalan keselamatan. Walaubagaimanapun, kaedah ini sukar dilaksanakan dewasa ini disebabkan oleh masalah kewangan dan masalah kekurangan sumber manusia. Oleh itu, mengesan masalah keselamatan merupakan tujuan utama kajiselidik ini. Diantara cara mengesan masalah keselamatan yang dikenalpasti adalah Sistem Pengesanan Pencerobohan (IDS) seperti Sistem Pengesanan Pencerobohan bagi dasar-rangkaian, dasar-kerangka utama, dasar-aplikasi, dan Sistem Pengesanan pencerobohan pengkalan data. Perkara ini semakin hangat dibincangkan bagi meningkatkan lagi mutu kawalan keselamatan bagi sesebuah sistem komputer. Dalam kajian ini, kami mencadangkan sistem arkitektur bagi Sistem Pengesanan pencerobohan pengkalan data ‘hybrid’ (DB-IDS). Arkitetur ini mengandungi beberapa komponen dan cabang komponen. Ia termasuk Pengesanan ’Anomaly’ dan Pengesanaan Penyalahgunaan yang dikenali sebagai komponen pengesanan. Pengesanan ‘anomaly’ melaksanakan tugasnya berdasakan profil yang dijana oleh ‘profiler’ Aktiviti-aktiviti yang mencurigakan di kenali sebagai kemungkinan serangan akan dikesan oleh Pengesan Penyalahgunaan. Komponen pengumpulan Data berfungsi bagi mengenalpasti data untuk ‘profiling’. Komponen ’transformer’ pula bersedia untuk menukar fail asal ‘log’ kepada format yang boleh dibaca oleh vii ‘profiler’. Komponen Pengesan ‘Anomaly’ dan Pengesan Penyalahgunaan akan menghantar maklumat pengawasan kepada komponen penerima (Responder) bila terdapat aktiviti-aktiviti yang mencurigakan. viii TABLE OF CONTENTS CHAPTER 1 2 TITLE PAGE DECLARAION ii DEDICATION iii ACKNOWLEDGEMENT iv ABSTRACT v ABSTRAK vi TABLE OF CONTENT viii LIST OF FIGURES xiii LIST OF ABBREVIATIONS xv LIST OF APPENDICES xvi INTRODUCTION 1 1.1 Overview 1 1.2 Background 2 1.3 Problem Statement 5 1.4 Project Aim 6 1.5 Project Objectives 6 1.6 Project Scope 8 1.7 Summary 8 LITERATURE REVIEW 10 2.1 Introduction 10 2.2 Intrusion Detection History and Definitions 11 ix 2.3 Taxonomy of IDS 13 2.4 IDS Classifications 16 2.4.1 Taxonomy of Intrusion Detection Principals 2.4.1.1 Anomaly Detection 17 2.4.1.1.1 Self-Learning Systems 18 2.4.1.1.2 Programmed 19 2.4.1.2 Misuse Detection 2.4.1.2.1 Programmed 22 22 2.4.1.3 Compound Detectors 23 2.4.2 Taxonomy of System Characteristics 24 2.4.2.1 Time of Detection 24 2.4.2.2 Granularity of Data-Processing 24 2.4.2.3 Source of Audit Data 24 2.4.2.4 Response to Detected Intrusions 27 2.4.2.5 Locus of Data-Processing 28 2.4.2.6 Locus of Data-Collection 28 2.4.2.7 Security 28 2.4.2.8 Degree of Inter-Operability 28 2.5 Intrusion Detection Systems using Data Mining 2.5.1 Applicable Data Mining Algorithms to Intrusion Detection 2.6 Database Intrusion Detection Systems 28 32 34 2.6.1 Database Intrusion Detection Using Data Mining 37 2.6.2 Database Anomaly Detection Systems 39 2.6.2.1 Learning-Based Anomaly Detection 2.6.3 Hybrid Methods 3 17 41 42 2.7 Database Intrusion Prevention Systems 42 2.8 Summary 43 PROJECT METHODOLOGY 44 3.1 Introduction 44 x 4 3.2 Project Methodology 45 3.2.1 Analysis 45 3.2.2 Design 46 3.2.3 Prototype Development 47 3.2.4 Prototype Implementation and Testing 47 3.3 Summary 47 ANALYSIS AND DESIGN OF THE DB-IDS ARCHITECTURE 49 4.1 Introduction 49 4.2 Approaches of Database Intrusion Detection 49 4.3 The Project Approach 50 4.4 The Proposed DB-IDS Architecture 53 4.4.1 Data Collector 54 4.4.2 Transformer 57 4.4.3 Profiler 58 4.4.4 Detector 62 4.4.4.1 Anomaly Detector 64 4.4.4.2 Misuse Detector 67 4.4.5 Responder 5 70 4.5 Overall Design 73 4.6 Summary 73 PROTOTYPE DEVELOPMENT, IMPLEMENTATION AND TESTING 76 5.1 Introduction 76 5.2 Data Collector 77 5.2.1 Tracer and Audit Trace 77 5.2.2 Transformer 78 5.3 Auditor 78 5.4 Profiler and Profiles 81 xi 5.4.1 Subject Profiles 81 5.4.2 Object Profiles 86 5.5 Detector 6 89 5.5.1 Anomaly Detector 90 5.5.2 Misuse Detector 93 5.6 Intrusion Detection Cycle 95 5.7 Summary 96 CONCLUSION AND FUTURE WORKS 98 6.1 Introduction 98 6.2 Discussion 99 6.3 Future Works and Recommendations 100 6.4 Conclusion 101 REFERENCES 105 APPENDIX A 110 xii LIST OF TABLES TABLE NO. 5.1 TITLE Profiles Specification PAGE 88 xiii LIST OF FIGURES FIGURE NO. TITLE PAGE 2.1 Revised IDS taxonomy by Debar et al. (2000) ..................................14 2.2 Anti-Intrusion Techniques ................................................................15 2.3 IDS Taxonomy provided by Stefan Axelsson [20] ............................16 2.4 Data Mining Phases ..........................................................................29 3.1 Project Methodology ........................................................................45 4.1 Data Collector Sub-Components .......................................................55 4.2 Data Collector...................................................................................56 4.3 Transformer Component ...................................................................57 4.4 Profiler Component ..........................................................................59 4.5 Detector Sub-Components ................................................................63 4.6 Anomaly Detector Component..........................................................64 4.7 Misuse Detector Component .............................................................68 4.8 Responder Component ......................................................................70 4.9 Architecture of the DB-IDS ..............................................................75 5.1 A portion of LogRepositorySession Table .............................78 5.2 Audit Trace Sample ..........................................................................79 5.3 A Sample Table (MovieClick.dbo.Movies) .......................................80 5.4 A sample Audit Table (Audit_MovieClick_Movies) .........................80 5.5 A Cut of AuditUni Table ..............................................................80 xiv 5.6 General Log Profile ..........................................................................82 5.7 NumberOfLoginsPerDay Table ........................................................82 5.8 TtlLgnTm Table ...............................................................................83 5.9 NumberOfLoginsPerDayForDBRoles Table .....................................83 5.10 NumberOfLoginsPerDayForServerRoles Table ................................84 5.11 UserStmntCounter View ...................................................................85 5.12 UserStmntCounterDBLevel View .....................................................85 5.13 UserStmntCounter View ...................................................................86 5.14 UserDDLCounter View ....................................................................86 5.15 DBStmntCounter View .....................................................................87 5.16 TableStmntCounter View .................................................................87 5.17 A Sample DMLSeq Table (DMLSeq_test_test2)...............................88 5.18 An Alert Generated by FindSusLogin ...............................................91 5.19 UserWorkingHour Table...................................................................91 5.20 An Alert Generated by ErlrLgnTimeDtctr .........................................92 5.21 MaxNoLogins Table .........................................................................92 5.22 An Alert Generated by ExceededNumOfLgns...................................93 5.23 An Alert Generated by FndPssvLgns ................................................93 5.24 An Alert Generated by LgnMorThnOne............................................94 5.25 An Alert Generated by BrtFrcDtctr ...................................................94 5.26 Intrusion Detection Cycle .................................................................96 xv LIST OF ABBREVIATIONS AI - Artificial Intelligence AIS - Artificial Immune System DBA - Database Administration DB-IDS - Database Intrusion Detection System DBS - Database System DDL - Data Definition Language DML - Data Manipulation Language DOS - Denial of Service ID - Intrusion Detection IDS - Intrusion Detection System IP - Internet Protocol IT - Information Technology RBAC - Role Based Access Control RDBMS - Rational Database Management System sa - System Admin SQL - Structured Query Language xvi LIST OF APPENDICES APPENDIX A TITLE PAGE List of important events to be captured by Tracer sub-Component 111 CHAPTER 1 INTRODUCTION 1.1 Overview Nowadays, a growing number of efforts have concentrated on handling the vast variety of security attacks. The characteristic of such handling method depends on when we want it to be occurred and how we intent to deal with attack attempts. According to [1], generally there are two ways to handle subversion attempts. One way is to equip our systems by all security controls such as cryptographic methods, sophisticated access control mechanisms, rigorous authentication protocols and etc. to prevent the subversion itself. However in reality this is not feasible due to many reasons, for example, (a) flaws of cryptographic techniques, (b) trade-off between efficiency and the level of access control and (c) insiders who abuse their privileges. Doubtlessly, it is very important that the security mechanism of a system is designed so as to prevent unauthorized access to system resources and data. However, as it was mentioned before completely preventing breaches of security appear, at present, unrealistic. We can, however, try to detect these intrusion attempts so that action may be taken to repair the damage later [1] and this is what the Intrusion Detection refers to. Generally there are two types of intrusion detection techniques[1]. One is named Anomaly Detection technique in which a profile is established for the system and any activity that cause a deviation from the normal activity profile would be flagged as an intrusion. This method may rely on statistical approaches or predictive 2 pattern generation. Another technique which is called Misuse Detection mostly is based on signature or patterns of attacks. In both techniques, however, the Artificial Intelligence [2] and Data Mining [3] [4] [5] applications may be employed to reduce the human effort and to increase the accuracy of the detection. In recent years, Data Mining-based intrusion detection systems (IDSs) have demonstrated high accuracy, good generalization to novel types of intrusion, and robust behavior in a changing environment [6]. Although a variety of approaches have been proposed for enhancing the capabilities of intrusion detection as well as the efficiency and accuracy, most of these efforts concentrated on detecting intrusions at the network or operating system level (refers to Network-based and Host-based intrusion detection system respectively). They are not capable of detecting malicious data corruptions, that is, what particular data in the database are manipulated by which specific malicious database transaction(s) [7]. So, this opens the issue of detecting the intrusions at the database level. 1.2 Background In today's business world, information is the most valuable asset of organizations and thus requires appropriate management and protection [8]. Amongst all types of data repositories, database is said that play the role of the heart in the body of IT infrastructure. They not only allow the efficient management and retrieval of huge amounts of data, but also because they provide mechanisms that can be employed to ensure the integrity of the stored data [8]. Thus, obviously databases always have been the interesting target of attacks for hackers. Getting access to a database containing hundreds thousands of credit card numbers is almost every hacker’s dream. This is what indicates the violation of confidentiality; however, an intrusion can be defined as “any set of actions that 3 attempt to compromise the integrity, confidentiality or availability of a resource” [9]. So, enough and balanced care should be taken to protect the whole of this triad. Attack reports have been released [10] in which the intruder had updated the field of the price in an on-line store website and decreased the values of specific items, and then bought those items for just few dollars (integrity violations). According to [11], The Privacy Rights Clearinghouse reports that during the period from January 2005 to May 2007, more than 154 million records containing sensitive information, including credit card numbers, Social Security numbers, bank account numbers, and drivers license numbers, were stolen from United States organizations. The actual total could be much higher. This number only represents reported breaches and in many cases, the total records compromised remain undetermined. Approximately one third of the reported breaches were the result of a direct attack on the database. Hence, today, the critical need for securing the databases has become much more inevitable than any time before. Database Security is as old as the emergence of the own database concept and encompasses a broad range of procedures that protect a database from unintended activity. One of the most important techniques for securing the database is applying the Intrusion Detection System which is used to detect potential violations in database security. Anderson [12] has classified intruders into two types, the external intruders who are unauthorized users of the machines they attack, and internal intruders, who have permission to access the system, but not some portions of it. He further divided internal intruders into intruders who masquerade as another user, those with legitimate access to sensitive data, and the most dangerous type, the clandestine intruders who have the power to turn off audit control for them. Despite the necessity of protecting information stored in database systems (DBS), existing security models are insufficient to prevent misuse, especially insider abuse by legitimate users [8]. The external intrusions are supposed to be detected and handled by network-based IDSs. However, when it comes to the database level, an intruder 4 cannot do anything otherwise he gets access to a valid credential and login to the system. Means, the transactions must be issued by a valid database user, who has logged in using a valid database login, no matter the login information is provided by a legitimate user or not. Here, when we use the term of the “valid database user”, it doesn’t indicate that this database user is necessarily associated with a legitimate actual user. For example by using “social engineering” techniques, the intruder can get access to some valid database user information. Forrester Research estimates that nearly 80 percent of all database attacks are internal and Gartner estimates that more than 95 percent of intrusions that result in significant financial loss are perpetrated by insiders [11]. If one intruder gets access to account information of a legitimate database user, (s)he may cause damage to the database by executing transaction(s) that illegitimately manipulate the sensitive data. In such case, the external intruder becomes an internal one who masquerades as another user. In this scenario, identifying whether the data corruption indeed has been done by legitimate user or by one who has got access to a legitimate user’s account information is tough. Risk from insiders comes in many forms and as attackers recognize the value and importance of the information in the database, attacks are becoming more focused. Attackers have also changed. In the past people hacked into networks to “prove they could.” While those attacks were malicious, recently the motivation has become financial. Attackers are seeking data to sell and that information resides in the database [11]. Another common type of intrusions forensically analyzed in [10] is that one the intruder logins to the database system as a high-privileged user (by brute force, for example) and then creates an account for himself and starts to manipulate the database by logging in as new-created account. On the other hand, the illegal transactions executed by legitimate database users who are not authorized to perform certain activities—and for any reason, the 5 logical permissions of these activities have been not denied for those users—seems to be more difficult to be detected. Carter and Katz [13] have revealed that in computer systems the primary security threat comes from insider abuse rather than from intrusion. This observation results in the fact that much more emphasis has to be placed on internal control mechanisms of systems. Furthermore, policies usually do not sufficiently guard data stored in a database system against privileged users [8] like sa and members of sysadmin fixed server role in MS SQL Server 2005 for example. 1.3 Problem Statement As it was mentioned before, a great number of database attacks come from inside the organization, either by privileged users, authorized users or unauthorized users who hack into the system by gaining access to legitimate accounts. In either case, we intend to be able to detect the attacks conducted by each of these intruder categories. This question may be raised: How can we enable database management systems to monitor, detect, mitigate or/and prevent the attacks using some tools like DB-IDS and/or its built-in capabilities? Moreover, security policies often fail to prevent the database attacks. There have been many scenarios in which authorized users are inadvertently granted access to run certain operations. Initial database security configuration often fails to comply with security policies of organizations. Users usually hold privileges which are not supposed to be granted to them. We can assume that in a realistic environment none of those users ever exceed their rights and use those privileges. But what if once a hacker steals those accounts and enters to the system? In that case we cannot guarantee that hacker adhere to the ethics. Now the additional question is: 6 How the database intrusion detection system may aid to revise the security policies? 1.4 Project Aim The aim of this project is to design an architecture for a hybrid intrusion detection system for database. This architecture is containing different components and sub-components which interact to each other. The system is called hybrid DBIDS since it encompasses anomaly detector and misuse detector modules. The proposed architecture could be adapted for any DBMS with consideration its features and capabilities. Thought, we develop a model based on this architecture for MS SQL Server 2005 to show how it works. Moreover, leveraging the information provided by our DB-IDS, database security policies could be revised to strengthen the system. 1.5 Project Objectives As mentioned before, our hybrid DB-IDS consists of anomaly and misuse detectors. In order to detect anomalies, a normal activity profile is created for specific database objects. These objects may include but not limited to principals and securables1. Any considerable deviation from captured normal behavior of the system may be thought as an intrusion. Our model is designed in a way that enables us to apply different methods range from statistical measures to artificial intelligence techniques to build system profiles. 1 - The SQL Server 2005 security model relies on two fairly straightforward concepts: principals and securables. Principals are those objects that may be granted permission to access particular database objects, while securables are those objects to which access can be controlled [14]. 7 To capture the behavior of database objects, we need to monitor and audit the system operation. This auditing system helps us to collect necessary data for building database profiles. To be more accurate, whatever technique the profiler utilizes to build the profiles, data gathered by auditing system provides necessary input for it. A security alert is raised in case of any anomaly and misuse detection. Depending on the suspicious level or sensitivity of intrusion, detection mechanism can contribute to Access Control system to deny access and prevent the intruder from causing further corruption. However, although such capability is in place, the system is not supposed to entirely function as an intrusion prevention system. Another objective of this project is to help to revise database security policies and configurations by providing daily bases reports. Based on the facts these reports provide, database use policies can be changed, modified or even removed. Furthermore, the reports may help us to create new database security policies and/or re-configure the database security schema. In the following we accordingly list the project objectives with the aim of designing an architecture for DB-IDS: i. Proposing a database anomaly detection model ii. Constructing sample profiles for a database system iii. Developing an database audit system iv. Proposing a brute-force detection model for the database systems v. Proposing a model for database security policy revision 8 1.6 Project Scope As we said before, intrusion detection systems do not usually take preventive measures when an attack is detected; it is a reactive rather than pro-active agent. It plays the role of an informant rather than a police officer [1]. So, the proposed architecture only takes into account the detection of an intrusion. Though, the model is capable to contribute with database access control system to stop the intruder from making further damages to the system. The architecture design of DB-IDS would be developed implemented and tested for Microsoft SQL Server 2005 using Structured Query Language (SQL). All components of the proposed architecture for DB-IDS are built using built-in means including stored procedures, views and triggers and so on. The range of the intrusions detectable by this model is limited to those which have been conducted by either the external intruders who gets access to legitimate database account information, or those insiders who abuse their privileges. External intrusions and attacks like SQL injection will not be covered in this project. In the last chapter of this project report, we go through the recommendations and future works to enhance the accuracy, efficiency and scalability of proposed DBIDS. As the matter of fact, what is already beyond the scope of this project could be considered as future works. 1.7 Summary Nowadays, intrusion detection plays a vital role in security mechanisms. Organizations need to be able to detect the intrusions into their database systems as soon as they can to prevent further damages to their sensitive data which may cause financial loss. The critical necessity of having intrusion detection system in place is highly arisen when we hear that in many real life scenarios the intrusion remains 9 undetected for hours and even days. These concealed attacks are thought to be the most horrible DBA’s nightmare which makes the recovery procedure too difficult and time-consuming and even in some cases infeasible. On the other hand, from forensic point of view, intactness of evidences is the key point in database intrusion investigation, while the delay in detection of intrusion may lead to corruption of digital evidences on which the guilt of intruder is based. In this project we intend to come up with an architecture model for database intrusion detection which tries to detect the suspicious transactions by comparing the normal activity of the system to the current transaction. As we saw in the introduction section, such systems are named Anomaly Detection [1]. However, there are challenges in the design of these systems such as selection of threshold levels so that minimize the false negatives and false positives and selection of features to monitor. Furthermore, Anomaly detection systems are also computationally expensive because of the overhead of keeping track of, and possibly updating several system profile metrics [1]. The proposed architecture is intended to be designed in such a way that address these challenges and balance the efficiency and overhead. The other mechanism of detecting database intrusions in this project relays on the Misuse Detection concept in which we intend to identify the meaningful sequence of events that turns to be a kind of database misuse. Misuse detection works based on database attack patterns. The model enables us to monitor all apparently irrelevant small events that if occur in a specific order, we may believe the database is the target of an intrusion. CHAPTER 2 LITERATURE REVIEW 2.1 Introduction Nowadays, computer attacks are unglamorous. By connecting our organization’s computers or local network to the Internet, the risk of having someone break in would be increased, installation of malicious programs and tools would be most likely occurred, and possibly the systems would be used to attack other machines on the Internet by remotely controlling them. The annual FBI/CSI survey shows that even though virus based attacks are most frequent, attacks based on unauthorized access, as well as Denial of Service attacks both from internal as well as external sources, are increasing drastically. The more sensitive information we have, the more probability of being the target of security threats. Several major banks have been subject to attacks, in which attackers gained access into customers' accounts and viewed detailed information about the activities on these accounts. In some instances the attackers stole credit card information to blackmail e-commerce companies by threatening to sell this information to unauthorized entities [14]. In order to combat this growing trend of computer attacks and respond to this increasing threat, both academic and industry groups have been developing systems to monitor networks and systems and raise alerts of suspicious activities. These systems are called Intrusion Detection Systems (IDS). 11 2.2 Intrusion Detection History and Definitions An intrusion can be defined as any set of actions that attempt to compromise the integrity, confidentiality or availability of a resource [9]. Intrusion Detection is the process of tracking important events occurring in a computer system and analyzing them for possible presence of intrusions [15]. As a more comprehensive definition for intrusion detection, Alessandri has defined it as the set of practices and mechanisms used towards detecting errors1 that may lead to security failures2 and security failures (including anomaly and misuse detection) and diagnosing intrusions and attacks [16]. Accordingly, an intrusion detection system (IDS) is an implementation of the practices and mechanisms of intrusion detection [16]. It also can be thought as a software tool that attempts to detect an intruder hacking into a system or a genuine user exploiting the resources of the system. In other word, an IDS is a piece of software that runs on a host, which monitors the activities of users and programs on the host and monitors the network traffic on networks to which the host is attached [14]. This method should be contrasted with those that aim to strengthen the perimeter surrounding the computer system. It is believed that both of these methods should be used, along with others, to increase the chances of mounting a successful defense, relying on the age-old principle of defense in depth [17]. The goal of IDS is to alert the system's administrator of any suspicious and possibly intrusive event and possibly taking action to circumvent the intrusion. These actions can be as simple as writing the activities to a log file or as complex as controlling the system's and network's resources automatically by closing network ports or killing suspicious processes [14]. 1 - That part of the system state that is liable to lead to failure - When the delivered service deviates from fulfilling the intended function; or, violation of a security property of the intended security policy. 2 12 History Originally, intrusion detection used to be manually performed by system administrators. They monitored user activities via a central console and watched for any abnormal occurrence and anomaly. They might detect intrusions by noticing, for example, that a vacationing user is logged in locally or that a seldom-used printer is unusually active. Obviously the scalability of such early form of intrusion detection was dramatically poor. It was clearly tedious and error-prone. Therefore, it soon became necessary to develop automated log file readers, searching for logged events indicating irregularities or even an intrusion by unauthorized personnel [18]. The next step in intrusion detection mechanism was examination of audit logs. In the late ‘70s and early ’80s, administrators typically printed audit logs on fan-folded paper, which were often stacked four- to five-feet high by the end of an average week [19]. It is necessary to point out that this early ID Software (not Systems) was mostly individually developed, programmed and not widely spread, as only very few organisations were in need for this kind of technology before the dawn of the Internet age [20]. Manually analysis of such a huge amount of papers of audit logs was obviously time-consuming. Moreover, it could be only used as a forensic mean for gathering the evidences related to a security incident after the fact, and not for detecting an attack in progress. As keeping the audit logs on digital storage became possible, developers came up with automated data analysis programs to make the life easier for administrators. However, analysis was slow and often computationally intensive and, therefore, intrusion detection programs were usually run at night when the system’s user load was low. Therefore, most intrusions were still detected after they occurred. Until this point, intrusion detection had been a post factum analysis of digital log files, allowing forensic analysis relatively long after the actual event with possible adjustments to the infrastructure [20]. In the early ’90s, researchers developed real-time intrusion detection systems that reviewed audit data as it was produced. This enabled the detection of attacks and attempted attacks as they occurred, which in turn allowed for real-time response, and, in some cases, attack preemption [19]. 13 Due to market demand, the IT security industry now started to develop former prototype software into actual Intrusion Detection Systems, consisting of user friendly interfaces, methods to update attack patterns, various methods of alerts and even some automatically triggered reactions or actual prevention methods, able to stop attacks in progress [20]. By consideration of increasing security concerns, countless new attack techniques and sophisticated attack tools, however, it doesn’t seem that it is an easy job. 2.3 Taxonomy of IDS Generally taxonomy serves several purposes. It helps us to describe the world around us and assist to put the complex phenomena into a more manageable fashion (Description). Moreover, by classifying a number of objects according to the taxonomy and then observing the ‘holes’ where objects may be missing, we can exploit the predictive qualities of a good taxonomy (Prediction). And finally, a good taxonomy will provide us with clues about how to explain observed phenomena (Explanation). Taxonomies of IDSs and ID related technologies backs to 1999. The taxonomy proposed by Debar et al. [21] seems to be the first real IDS taxonomies. Other taxonomies have since been published, such as the one proposed by Axelsson [17] and then, one proposed by Halme and Bauer [22]. Figure 2-1 illustrates the taxonomy proposed by Debar et al. Actually this is a revised version of a previously proposed one to which some other criteria for classification have been added. The necessity of studying the taxonomy for intrusion detection appears when we try to examine the different type of IDSs and various mechanisms which have been applied in order to detect an intrusion. 14 Figure 2.1 Revised IDS taxonomy by Debar et al. (2000) Taxonomy proposed by Halme and Bauer [22] which is named “A Taxonomy of Anti-Intrusion Techniques”, as its name indicates, have focused on classifications of different methods for combating the intrusive activity, rather than dedicatedly dealing with IDSs. They introduce six anti-intrusion approaches, including [22]: Prevention precludes or severely handicaps the likelihood of a particular intrusion’s success. Preemption strikes offensively against likely threat agents prior to an intrusion attempt to lessen the likelihood of a particular intrusion occurring later. Deterrence deters the initiation or continuation of an intrusion attempt by increasing the necessary effort for an attack to succeed, increasing the risk associated with the attack, and/or devaluing the perceived gain that would come with success. 15 Deflection leads an intruder to believe that he has succeeded in an intrusion attempt, whereas instead he has been attracted or shunted off to where harm is minimized. Detection discriminates intrusion attempts and intrusion preparation from normal activity and alerts the authorities. Countermeasures actively and autonomously counter an intrusion as it is being attempted. A graphical illustration of this taxonomy is shown in the figure 2-2. Figure 2.2 Anti-Intrusion Techniques Axelsson’s taxonomy [17] explicitly deals with intrusion detection systems. It consists of a classification first of the detection principle, and second of certain operational aspects of the intrusion detection system as such. This study tries to examine the developed IDSs and in-progress research in this field into a structural categorization which helps to comprehensively know the field. The survey is accomplished by first identifying the different type of intrusion sources (an action or activity that is generated by the intruder). From the nature of source they move to the question of how to observe this source, and what problems are likely to be raised in doing so, and ending with the ultimate result, the decision. 16 The main problem they have been faced is that most references do not describe explicitly the decision rules employed, but rather the framework in which such rules could be set. Thus they often have to stop the categorization when they reach the level of the framework, for example the expert system. In the following, we use the IDS taxonomy proposed by Axelsson to categorize the different types of intrusion detection system. 2.4 IDS Classifications As we said before, the classification of intrusion detection systems may be accomplished by looking at the system from different points of view. In chapter 2.4.1 we show the taxonomy for intrusion detection systems which is built using the principals of IDS in which the categorization is based on the detection method include Anomaly detection, Signature detection and compound detection. Then, in chapter 2.4.2 we introduce the other types of IDS classification which are based on system characteristics, including the criteria such as time of detection, source of audit data, response to detection intrusion and etc. Figure 2.3 IDS Taxonomy provided by Stefan Axelsson [20] 17 2.4.1 Taxonomy of Intrusion Detection Principals Intrusion detection systems determine if a set of actions constitute intrusions on the basis of one or more models of intrusion. A model classifies a sequence of states or actions as "good" (no intrusion) or "bad" (possible intrusions) [23]. By consideration of detection methods, intrusion detection techniques can be categorized into two main types, namely, Anomaly Detection System which sometimes is called behavior-based intrusion detection system, and Misuse Detection System which is based on signatures or patterns and basically is knowledge-based. 2.4.1.1 Anomaly Detection The anomaly detection systems, like what is presented in [24], bases its decision on the profile of a system’s or user’s normal behavior. So, the construction of such a detector starts by forming an opinion on what constitutes normal for the observed subject (which can be users, groups of users, applications, or system resource usage), and then deciding on what percentage of the activity to flag as abnormal, and how to make this particular decision [17]. Anomaly detection system flag observed activities that deviate significantly from the established normal usage profiles as anomalies, i.e., possible intrusions. For example, the normal profile of a user may contain the averaged frequencies of some system commands used in his or her login sessions. If for a session that is being monitored, the frequencies are significantly lower or higher, then an anomaly alarm will be raised. This type of system is well suited for the detection of previously unknown attacks [23]. The main advantage of anomaly detection is that it does not require prior knowledge of intrusion and can thus detect new intrusions. The main drawback is that it may not be able to describe what the attack is and may have false positive rate [3]. In other word, the generated alarms by the system are meaningless because generally they cannot provide any diagnostic information (fault-diagnosis) such as the type of attack that was encountered. Means, they can only signal that something unusual happened [16]. 18 However, one of the benefits of this type of IDSs is that they are capable of producing information that can in turn be used to define signatures for misuse detectors [15]. In the following section we show the different classification of anomaly detection systems cited in [17]. 2.4.1.1.1 Self-Learning Systems As the name indicates, no information about the attacks is feed into the system. Self-Learning systems learn for example what constitutes normal for the installation; typically by observing traffic for an extended period of time and building some model of the underlying process [17]. Non-Time series A collective term for detectors that model the normal behavior of the system by the use of a stochastic model that does not take time series behavior into account [17]. This type of self-learning anomaly detection system may be based on Rule modeling or Descriptive statistics. In the rule modeling approach, the system itself studies the traffic and formulates a number of rules that describe the normal operation of the system. In the detection stage, the system applies the rules and raises the alarm if the observed traffic forms a poor match (in a weighted sense) with the rule base. But in descriptive statistic approach, the system collects simple, descriptive, mono-modal statistics from certain system parameters into a profile, and constructs a distance vector for the observed traffic and the profile. If the distance is great enough the system raises the alarm [17]. Time series This model is more complex due to taking time series behavior into account. Examples of such techniques include a Hidden Markov Model (HMM), an Artificial Neural Network (ANN). 19 2.4.1.1.2 Programmed In this class, someone teaches the system-programs ito detect certain anomalous events. Thus, this is the user of the system who forms the normal behavior profiles of the system and decides what is considered abnormal enough for the system to signal a security breach. Descriptive Statistics These systems build a profile of normal statistical behavior by the parameters of the system by collecting descriptive statistics on a number of parameters. Such parameters can be the number of unsuccessful logins, the number of network connections, the number of commands with error returns, etc [17]. In Simple statistics model, the collected statistics were used by higher level components to make a more abstract intrusion detection decision. In Simple rule-based the user provides the system with simple but still compound rules to apply to the collected statistics. Threshold approach is arguably the simplest example of the programmeddescriptive statistics detector. When the system has collected the necessary statistics, the user can program predefined thresholds (perhaps in the form of simple ranges) that define whether to raise the alarm or not. An example is: “Alarm if number of unsuccessful login attempts > 3” Default deny The main idea in this class is to explicitly state the status under which the system operates in a safe and security-benign manner, and flag all deviations from this status as intrusion. This approach intuitively may correspond with a default deny security policy. In State series modeling the policy for security benign operation is encoded as a set of states. As in any state machine, once it has matched one state, the intrusion detection system engine waits for the next transition to occur. If the monitored action is described as allowed the system continues, while if the 20 transition would take the system to another state, any state that is not explicitly mentioned will cause the system to sound the alarm [17]. Halme and Bauer [22] categorize the anomaly detection systems based on system specifications and profiling. Depend on the components of the system whose behaviors are to be captured and subsequently monitored, different classes may be appeared. Threshold Monitoring sets values for metrics defining acceptable behavior (e.g., fewer than some number of failed logins per time period). Thresholds provide a clear, understandable definition of unacceptable behavior and can utilize other facilities besides system audit logs. Unfortunately it is often difficult to characterize intrusive behavior solely in terms of thresholds corresponding to available audit records. It is difficult to establish proper threshold values and time intervals over which to check. Approximation can result in a high rate of false positives, or high rate of false negatives across a non-uniform user population [22]. User Work Profiling maintains individual work profiles to which the user is expected to adhere in the future [22]. Group Work Profiling assigns users to specific work groups that demonstrate a common work pattern and hence a common profile. A group profile is calculated based upon the historic activities of the entire group. Individual users in the group are expected to adhere to the group profile. This method can greatly reduce the number of profiles needing to be maintained [22]. Resource Profiling monitors system-wide use of such resources as accounts, applications, storage media, protocols, communications ports, etc., and develops a historic usage profile. Continued system-wide resource usage - illustrating the user community's use of system resources as a whole - is expected to adhere to the system resources profile. However, it may be difficult to interpret the meaning of changes in overall system usage [22]. 21 Executable Profiling seeks to monitor executables’ use of system resources, especially those whose activity cannot always be traced to a particular originating user. Viruses, Trojan horses, worms, trapdoors, logic bombs and other such software attacks are addressed by profiling how system objects such as files and printers are normally used, not only by users, but also by other system subjects on the part of users [22]. Static Work Profiling updates usage profiles only periodically at the behest of the SSO. This prevents users from slowly broadening their profile by phasing in abnormal or deviant activities which are then considered normal and included in the user's adaptive profile calculation. Performing profile updates may be at the granularity of the whole profile base or, preferably, configurable to address individual subjects [22]. Adaptive Work Profiling automatically manages work profiles to reflect current (acceptable) activity. The work profile is continuously updated to reflect recent system usage. Profiling may be on user, group, or application. Adaptive work profiling may allow the SSO to specify whether flagged activity is: 1) intrusive, to be acted upon; 2) not intrusive, and appropriate as a profile update to reflect this new work pattern, or 3) not intrusive, but to be ignored as an aberration whose next occurrence will again be of interest. Activity which is not flagged as intrusive is normally automatically fed into a profile updating mechanism. If this mechanism is automated, the SSO will not be bothered, but work profiles may change and continue to change without the SSO's knowledge or approval [22]. Adaptive Rule Based Profiling differs from other profiling techniques by capturing the historical usage patterns of a user, group, or application in the form of rules. Transactions describing current behavior are checked against the set of developed rules, and changes from rule-predicted behavior flagged. As opposed to misuse rule-based systems, no prior expert knowledge of security vulnerabilities of the monitored system is required. "Normal usage" rules are generated by the tool in its training period. However, training may be sluggish compared to straight 22 statistical profiling methods. Also, to be effective, a vast number of rules must be maintained with inherent performance issues [22]. 2.4.1.2 Misuse Detection Misuse Detection Systems like [25] and [26], use patterns of well-known attacks or weak spots of the system to match and identify known intrusions. In such systems which also called signature detection system, the intrusion detection decision is formed on the basis of knowledge of a model of the intrusive process and what traces it ought to leave in the observed system. For example, a signature rule for the “guessing password attack” can be “there are more than 4 failed login attempts within 2 minutes”. The main advantage of misuse detection is that it can accurately and efficiently detect instances of known attacks. In addition, despite of anomaly detection system, the alarms generated by misuse detection systems are meaningful e.g., they contain diagnostic information about the cause of the alarm [16]. The main disadvantage is that it lacks the ability to detect the truly innovative (i.e. newly invented) attacks [3] as well as those whose signatures are not available. The database of attack signatures needs to be kept up-to-date, which is a tedious task because new vulnerabilities are discovered on a daily basis. However, most commercial systems used today, like some of Cisco products, are knowledge-based systems. 2.4.1.2.1 Programmed The system is initially programmed with an explicit decision rule set. The detection rule is simple since it contains a straightforward coding of what can be expected to be observed in the event of an intrusion. In State-Modeling method of programmed misuse detection systems, the intrusion is encoded as a number of different states, each of which has to be present in the observation space for the 23 intrusion to be considered to have taken place. They are by their nature time series models. Two subclasses exist: in the first, state transition, the states that make up the intrusion form a simple chain that has to be traversed from beginning to end; in the second, petri-net, the states form a petri-net. In this case they can have a more general tree structure, in which several preparatory states can be fulfilled in any order, irrespective of where in the model they occur. In Expert-System class, an expert system is employed to reason about the security state of the system, given rules that describe intrusive behavior. Often forward-chaining, production-based tool are used, since these are most appropriate when dealing with systems where new facts (audit events) are constantly entered into the system. String matching method is a simple, often case sensitive, substring matching of the characters in text that is transmitted between systems, or that otherwise arise from the use of the system. Simple Rule-Based systems are similar to the more powerful expert system, but not as advanced. This often leads to speedier execution [17]. Note that the lack of detectors in the signature-self-learning class is conspicuous. 2.4.1.3 Compound Detectors These detectors form a compound decision in view of a model of both the normal behavior of the system and the intrusive behavior of the intruder. The detector operates by detecting the intrusion against the background of the normal traffic in the system. These detectors have-at least in theory-a much better chance of correctly detecting truly interesting events in the supervised system, since they both know the patterns of intrusive behavior and can relate them to the normal behavior of the system [17]. 24 2.4.2 Taxonomy of System Characteristics Intrusion detection systems are more than just a detector. Due to this reason IDSs also need to be categorized based on those characteristics that do not pertain directly to the detection principle. In the following, we introduce some of the most important aspects of categorization and briefly explain the respective classes. 2.4.2.1 Time of Detection Intrusion detection system can be divided into two classes based on the time of detection: those that attempt to detect intrusions in real-time or near real-time, and those that process audit data with some delay, postponing detection (non-real-time), which in turn delays the time of detection. Obviously real-time intrusion detection system can by run in the off-line mode on historical audit data [17]. 2.4.2.2 Granularity of Data-Processing With consideration of granularity on which the data is processed, we can identify two types of IDSs; those that process data continuously and those that process data in batches at a regular interval. This category is linked with the time of detection category above, but it should be noted that they do not overlap, since a system could process data continuously with a (perhaps) considerable delay, or could process data in (small) batches in real-time [17]. 2.4.2.3 Source of Audit Data Regarding the different sources of event information used to detect intrusion, IDSs could be divided into different categories. These sources can be drawn from 25 different levels of the system, with network, host, and application monitoring most common [15]. Host-Based Intrusion Detection started in the early 1980’s before networks were as prevalent, complex and inter-connected as they are today. In the 1980’s it was common practice to review audit logs for suspicious and security relevant activity. Today’s host-based IDS still use various audit logs but they are much more automated, sophisticated, and real-time with their detection and responses. Host-based IDSs operate on information collected from within an individual computer system. (Note that application-based IDSs are actually a subset of hostbased IDSs.) This vantage point allows host-based IDSs to analyze activities with great reliability and precision, determining exactly which processes and users are involved in a particular attack on the operating system. Furthermore, unlike network-based IDSs, host-based IDSs can “see” the outcome of an attempted attack, as they can directly access and monitor the data files and system processes usually targeted by attacks [15]. Host-based IDSs normally utilize information sources of two types, operating system audit trails, and system logs. Operating system audit trails are usually generated at the innermost (kernel) level of the operating system, and are therefore more detailed and better protected than system logs. However, system logs are much less obtuse and much smaller than audit trails, and are furthermore far easier to comprehend [15]. The majority of commercial intrusion detection systems are Network-Based. These IDSs detect attacks by capturing and analyzing network packets. Listening on a network segment or switch, one network-based IDS can monitor the network traffic affecting multiple hosts that are connected to the network segment, thereby protecting those hosts. Network-based IDSs often consist of a set of single-purpose sensors or hosts placed at various points in a network. These units monitor network traffic, performing local analysis of that traffic and reporting attacks to a central management console. As the sensors are limited to running the IDS, they can be 26 more easily secured against attack. Many of these sensors are designed to run in “stealth” mode, in order to make it more difficult for an attacker to determine their presence and location [15]. Application-Based IDSs are a special subset of host-based IDSs that analyze the events transpiring within a software application. The most common information sources used by application-based IDSs are the application’s transaction log files. The ability to interface with the application directly, with significant domain or application-specific knowledge included in the analysis engine, allows applicationbased IDSs to detect suspicious behavior due to authorized users exceeding their authorization. This is because such problems are more likely to appear in the interaction between the user, the data, and the application [15]. The use of application semantics to detect more subtle attacks can be found in the literature since 1986. Since then, three different types of application-based IDSs have emerged [27]. In the first type, the IDS uses intercepted traffic going in and out of the application. In the second type, the IDS relies on third-party logs from Operating Systems, databases and firewalls. Finally, in the last type, the IDS directly uses internal application messages and library calls. Thus, the last group provides the possibility of bidirectional on-line interaction between the IDS and the application, and more precise IDS response and analysis [28]. A Hybrid IDS combines both Host-Based IDS and Network-Based IDS approaches and can also combine different detection methods. IDSs scan network traffic or also incoming and outgoing host traffic to find potentially malicious packets. Thus, they analyze packets at OSI layers 3 (Network) and 4 (Transport) but are unable to consider the semantics of application protocols like HTTP, for example. As a consequence, IDSs are usually ineffective to detect inside intruders who have access to more information than external intruders and may even be familiar to the security controls of the applications, but who could still be detected by closely inspecting the nature of their interactions within the applications [28]. 27 2.4.2.4 Response to Detected Intrusions Response may refer to the set of actions that the system takes once it detects intrusions. These are typically grouped into active and passive measures, with active measures involving some automated intervention on the part of the system, and passive measures involving reporting IDS findings to humans, who are then expected to take action based on those reports [15]. In [15], the active responses has got further categorized into three types: the first and most innocuous one is the collection of additional information about a suspected attack. It might involve increasing the level of sensitivity of information sources (for instance, turning up the number of events logged by an operating system audit trail, or increasing the sensitivity of a network monitor to capture all packets, not just those targeting a particular port or target system.) The second approach is the changing the environment which is to halt an attack in progress and then block subsequent access by the attacker. Typically, IDSs do not have the ability to block a specific person’s access, but instead block Internet Protocol (IP) addresses from which the attacker appears to be coming. The third and the last approach is the taking action against the intruder. The most aggressive form of this response involves launching attacks against or attempting to actively gain information about the attacker’s host or site. However tempting it might be, this response is ill advised. Due to legal ambiguities about civil liability, this option can represent a greater risk that the attack it is intended to block. The most conventional form of passive response consists of Alarms and Notifications which are generated by IDSs to inform users when attacks are detected. Most commercial IDSs allow users a great deal of latitude in determining how and when alarms are generated and to whom they are displayed [15]. 28 2.4.2.5 Locus of Data-Processing The audit data can either be processed in a central location, irrespective of whether the data originates from one-possibly the same-site or is collected and collated from many different sources in a distributed fashion [17]. 2.4.2.6 Locus of Data-Collection Audit data for the processor/detector can be collected from many different sources in a distributed fashion, or from a single point using the centralized approach [17]. 2.4.2.7 Security Respect to the ability of the system to withstand against the attacks to intrusion detection system itself, we can abstractly classify the IDSs into two classes of high and low scale. 2.4.2.8 Degree of Inter-Operability The degree to which the system can operate in conjunction with other intrusion detection systems, accept audit data from different sources, etc. [17]. 2.5 Intrusion Detection Systems using Data Mining Recently, there have been efforts to leverage the capabilities of variant Artificial Intelligence techniques in Intrusion Detection Systems. These techniques can lessen the human effort required to build IDSs and can get better their 29 performance. Moreover, Learning and induction are used to improve the performance of search problems, while clustering has been used for data analysis and reduction. In addition, AI has recently been used in Intrusion Detection for anomaly detection, data reduction and induction, or discovery, of rules explaining audit data [2]. Amongst AI techniques, Data Mining may be thought of as the most interesting one in accomplishment of intrusion detection. Data Mining refers to a collection of methods by which large sets of stored data are filtered, transformed, and organized into meaningful information sets [29]. There has been growing number of researches in application of Data Mining algorithms to different phases of intrusion detection mechanism [3-5, 30, 31]. Data Mining-based intrusion detection systems have demonstrated high accuracy, good generalization to novel types of intrusion, and robust behavior in a changing environment [6]. In Figure 2.4 we depicted (Pei et al.: Data Mining Techniques for Intrusion Detection and Computer Security) Figure 2.4 Data Mining Phases Sample misuse detection systems that use Data Mining include Java Agent for Meta-learning (JAM) [32], Mining Audit Data for Automated Models for Intrusion Detection (MADAM ID) [33], and Automated Discovery of Concise Predictive Rules for Intrusion Detection [34]. Application of Data Mining to anomaly detection include Audit Data Analysis and Mining (ADAM) [35], Intrusion Detection Using Data Mining (IDDM) [36], MINDS [37] and eBayes [38]. In the 30 following we briefly introduce some of these intrusion detection systems and show how each of them apply Data Mining techniques to run different phases of intrusion detection. JAM uses Data Mining techniques to discover patterns of intrusion. It then applies a meta-learning classifier to learn the signature of attacks. The association rules algorithm determines relationships between fields in the audit trail records, and the frequent episodes algorithm models sequential patterns of audit events. Features are then extracted from both algorithms and used to compute models of intrusion behavior. The classifiers build the signature of attacks. So, essentially, Data Mining in JAM builds a misuse detection model [39]. MADAM ID [33, 40] project at Columbia University is a powerful Data Mining framework for constructing the intrusion detection model. It consists of classification and meta-classification programs, association rules and frequent episodes programs, and a feature construction system. The end products are concise and intuitive rules that can detect intrusions, and can be easily inspected and edited by security experts when needed. ADAM [41, 42] uses a combination of association rules mining and classification to discover attacks in a TCPdump audit trail. First, ADAM builds a repository of "normal" frequent item-sets that hold during attack-free periods. It does so by mining data that is known to be free of attacks. Secondly, ADAM runs a sliding-window, on-line algorithm that finds frequent item-sets in the last D connections and compares them with those stored in the normal itemset repository, discarding those that are deemed normal. With the rest, ADAM uses a classifier which has been previously trained to classify the suspicious connections as a know type of attack, an unknown type or a false alarm [35]. The IDDM project aims to explore Data Mining as a supporting paradigm in extending intrusion detection capabilities. The project seeks to re-use, augment and expand on previous works as required and introduce new principles from Data Mining that are considered good candidates for this purpose. Rather than 31 concentrating on the use of a particular technique in a certain application instance, they intend to explore multiple uses for any given Data Mining principle in a variety of ways [36]. The MINDS project [37] at University of Minnesota uses a suite of Data Mining techniques to automatically detect attacks against computer networks and systems. Their system uses an anomaly detection technique to assign a score to each connection to determine how anomalous the connection is compared to normal network traffic. Their experiments have shown that anomaly detection algorithms can be successful in detecting numerous novel intrusions that could not be identified using widely popular tools such as SNORT [43]. In [4], Data Mining techniques have been used to discover consistent and useful patterns of system features that describe program and user behavior. They also used the set of relevant system features to compute (inductively learned) classifiers that can recognize anomalies and known intrusions. The development of proposed framework in [4] consists of utilizing the auditing programs to extract an extensive set of features that describe each network connection or host session, and applying Data Mining programs to learn rules that accurately capture the behavior of intrusions and normal activities [30]. However, there are some shortcomings in application of Data Mining techniques to intrusion detection process. The following are some of the disadvantages of a Data Mining based IDS [44]. Data must be collected from a raw data stream and translated into a form that is suitable for training. In some cases data needs to be clearly labeled as “attack” or “normal”. This process of data preparation is expensive and labor intensive. Data Mining based IDS generally do not perform well when trained in a simulated environment and then deployed in a real environment. They generate a lot of false alarms and it can be quite labor intensive to sift through this data. 32 In order to overcome these problems, there is a need to develop methods and tools that can be used by the system security analyst to understand the massive amount of data that is being collected by IDS, analyze and summarize the data and determine the importance of an alert [44]. 2.5.1 Applicable Data Mining Algorithms to Intrusion Detection The recent rapid development in Data Mining has made available a wide variety of algorithms, drawn from the fields of statistics, pattern recognition, machine learning, and database. Several types of algorithms are particularly useful for mining audit data: Classification and prediction are two forms of data analysis that can be used to extract models describing important data classes or to predict future data trends. For example, a classification model can be built to categorize bank loan applications as either safe or risky. In other word, classification maps a data item into one of several pre-defined categories. These algorithms normally output “classifiers”. A prediction model can be built to predict the expenditures of potential customers on computer equipment given their income and occupation. Some of the basic techniques for data classification are decision tree induction, Bayesian classification and neural networks. These techniques find a set of models that describe the different classes of objects. These models can be used to predict the proper class of an object for which the class is unknown. The derived model can be represented as rules (IF-THEN), decision trees or other formulae. An ideal application in intrusion detection will be to gather sufficient “normal” and “abnormal” audit data for a user or a program, then apply a classification algorithm to learn a classifier that can label or predict new unseen audit data as belonging to the normal class or the abnormal class [44]. Association analysis This involves discovery of association rules showing attribute-value conditions that occur frequently together in a given set of data. In a simple word, it determines relations between fields in the database records. This is 33 used frequently for market basket or transaction data analysis. For example, the following rule says that if a customer is in age group 20 to 29 years and income is greater than 40 K/year then he or she is likely to buy a DVD player. Age (X, “20–29”) & income(X, “>40 K”) => buys (X, “DVD player”) [support = 2%, confidence = 60%] Rule support and confidence are two measures of rule interestingness. A support of 2% means that 2% of all transactions under analysis show that this rule is true. A confidence of 60% means that among all customers in the age group 20–29 and income greater than 40 K, 60% of them bought DVD players [44]. In the context of intrusion detection and analysis of audit data, correlations of system features in audit data (for example, the correlation between command and argument in the shell command history data of a user) can serve as the basis for constructing normal usage profiles [44]. Sequence or path analysis models sequential patterns. These algorithms can discover what (time-based) sequence of audit events are frequently occurring together. These frequent event patterns provide guidelines for incorporating temporal statistical measures into intrusion detection models. For example, patterns from audit data containing network-based denial-of-service (DOS) attacks suggest that several per-host and per-service measures should be included [30]. Moreover, the mined frequent patterns are important elements for framing behavior profile of a user. Clustering The training of the normality model for anomaly detection may be performed by clustering, where similar data points are grouped together into clusters using a distance function. As a Data Mining technique, clustering fits very well for anomaly detection, since no knowledge of the attack classes is needed whilst training. Contrast this to classification, where the classification algorithm needs to be presented with both normal and known attack data to be able to separate those classes during detection [45]. In the intrusion detection literature there have been other researches in which a variety different clustering techniques have been used, for example [45-48]. 34 2.6 Database Intrusion Detection Systems Database management systems (DBMS) represent the ultimate layer in preventing malicious data access or corruption and implement several security mechanisms to protect data [49]. Traditional commercial implementations of database security mechanisms are very limited in defending successful data attacks. These traditional database protection techniques like authorization, access control mechanisms, inference control, multi-level secure databases, multi-level secure transactions processing, database encryption and etc. mainly address primarily how to protect the security of a database, especially its confidentiality. However, in practice, these techniques may be fooled by knowledgeable attackers who thwart the security mechanisms and gain access to sensitive data. On the other hand, authorized but malicious transactions can make a database useless by impairing its integrity and availability [50]. Neither network-based nor host-based IDSs can detect malicious behavior from users at the database level, or more generally, the application level, because they do not work at the application layer [51]. The inability of network-based intrusion detection systems in detecting the database intrusions is straightforward. Nevertheless, existing host-based intrusion detection systems use the operating system log or the application log to detect misuse or anomaly activities. These methods are not sufficient for detecting intrusion in database systems [52]. It also can be said that host-based intrusion detection inability in database intrusion detection is because of the fact that users who seek to gain database privileges will likely be invisible at the operating systems level, and thus invisible to the host-based intrusion detectors. Therefore, SQL injection [53] and other SQL-based attacks targeted at databases cannot be effectively detected by network-based or host-based IDSs [51]. Though, ideally, the database-specific IDS should work as a complementary mechanism to the existing network-based and host-based intrusion detection systems rather than replacing them [51]. When an attacker or a malicious user updates the database, the resulting damage can spread very quickly to other parts of the database through valid users. 35 So, quick and accurate detection of a cyber attack on a database system is the prerequisite for fast damage assessment and recovery [7]. Hence, there should be a mechanism in place to practically and efficiently survive against successful database attacks, which can seriously impair the integrity and availability of a database. This may be thought of as the main motivation of advent of Database Intrusion Detection Systems. Database IDSs try to detect or possibly prevent the intrusions to RDBMSs which mainly is accomplished by malicious transactions1 either by outsiders or insiders like disgruntled employees who misuse their privileges; as nowadays the greatest threats are from internal sources; which means the perimeter-based security solutions may not be enough. Additionally, most of companies solely implement network-based security solutions that are designed to protect network resources; despite the fact that the information is more often the target of the attack. Database intrusion detection systems identify suspicious, abnormal or downright malicious accesses to the database system [55]. Thus, the existence of database intrusion detection system may be critical as a part of a defense-in-depth security strategy. Due to this fact, many researches have been conducted in database intrusion detection system [8, 49, 56-59]. Actually explicitly categorizing of database intrusion detection systems may not be that simple, since the criteria based on which the different DB-IDSs have established are considerably variant. Therefore, to soundly be able to thoroughly understand the database intrusion detection area, a comprehensive study is needed to construct a taxonomy of database intrusion detection systems which consider the different criterion and based on them classifies DB-IDSs. Constructing this taxonomy is beyond the scope of this project. However, some of these criterions may include: i. types and sources of database attacks 1 - Malicious transactions may be defined as transactions that access database without authorization, or transactions that are issued by users who are authorized but abuse their privileges 54. Dai, J. and H. Miao, D_DIPS: An Intrusion Prevention System for Database Security. 2005. 36 ii. response to the detected attack iii. detection strategy or analysis type (anomaly detection, misuse detection or hybrid approaches) iv. In 2 or 3-tier architecture, the layer on which the DB-IDS is resided v. sources of audit data which is needed to be analyzed and also the data collection method (for example, [51] focuses on the data acquisition methods tailored to the needs of database IDSs) vi. in anomaly detection systems the granularity of objects whose normal behavior are to be profiled (for example in [49], three abstraction levels are used to define the user profile representing his/her behavior in database.) In the literature review of database intrusion detection systems we faced variant systems in each of which a combination of some mentioned features and criterion had been applied, and it prevents us from simply classify the different DBIDSs into discrete and explicit categories. However in the following section we are going to holistically discuss about some of the researches in the area. Our main focus, however, will be on the database anomaly/misuse detection systems. The contribution of Data Mining techniques and intrusion detection warfare has resulted in a great number of researches which will be briefly discussed in the 2.5. However, apart from Data Mining, the other AI techniques like artificial neural network have also been of interest in Data Mining intrusion detection [60]. The contribution of disciplines like Artificial Immune System (AIS) with database intrusion detection has also studied in [61] and [62]. Moreover, as like the other types of intrusion detection systems (Network-based IDSs, for example), database IDSs may also be categorized in anomaly detection and misuse detection. These systems will be discussed in the following chapters. The challenge to which most database intrusion detection systems have encountered is that they assume the database system works in an isolated area. The 37 problem is that for a popular web site, it is nearly impossible to monitor access pattern for each individual user due to the great number of daily visitors. Also, the focus in systems, for example like [8], is mainly on isolated database system [56]. This fact has lead to the need for detecting intrusions against a web-based database service which is studied in [56] and [63]. In addition in [53], an application layer intrusion detection system is proposed in which an IDS sensor is situated at the database server that will detect SQL injection attacks. This sensor is specifically designed to inspect SQL statements. The concept of fingerprinting the database transaction is discussed in [55]. The presented technique characterizes legitimate accesses through fingerprinting their constituent SQL statements. These fingerprints are then used to detect illegitimate accesses. The system works by matching SQL statements against a known set of legitimate database transaction fingerprints. Subsequently, in [64] which is the complement of previous work, the author explore the various issues that arise in the collation, representation and summarization of potentially huge set of legitimate transaction fingerprints. Lee at al. in [57] used time signatures in discovering the intrusions in realtime database systems. If a transaction tried to attempt to update a temporal data item which is already updated within certain period, the systems detects it as an anomaly. 2.6.1 Database Intrusion Detection Using Data Mining As we mentioned in previous chapters, recently, researchers have started using Data Mining techniques in the emerging field of information and system security and especially in intrusion detection systems [65]. Variety different intrusion detection systems have applied Data Mining techniques especially in constructing the normal behavior of the system and user profile in anomaly detection 38 systems. Meanwhile, these techniques also have been used in database intrusion detection systems for building the normal working scope of different objects like database users. The other important application of Data Mining techniques in database intrusion detection system is discovering the data dependencies among data-items in the database. In [7], data dependency is defined as the data access correlations between two or more data items. The techniques employed use Data Mining approach to generate data dependencies among data items. These dependencies generated are in the form of classification rules, i.e., before one data item is updated in the database what other data items probably need to be read and after this data item is updated what other data items are most likely to be updated by the same transaction. Transactions that are not compliant to the data dependencies generated are flagged as anomalous transactions [7]. In [52] also, the identification of malicious database transaction is accomplished by using data dependency relationships. Typically before a data item is updated in the database some other data items are read or written. And after the update other data items may also be written. These data items read or written in the course of updating one data item construct the read set, pre-write set, and the postwrite set for this data item. The proposed method identifies malicious transactions by comparing these sets with data items read or written in user transactions [52]. In [7], the author has come up with a comparison between existing approach for modeling database behavior [66] and transaction characteristics [8, 57] to detect malicious database transactions. The advantage of their approach is that it is less sensitive to the change of user behaviors and database transactions. It is observed from real-world database applications that although transaction program changes often, the whole database structure and essential data correlations rarely change. In [6] an Database-centric Architecture for Intrusion Detection (DAID) is proposed for the Oracle Database. This RDBMS-centric framework can be used to 39 build, manage, deploy, score, and analyze Data Mining-based intrusion detection models. In [67] and subsequently [68] an elementary transaction-level user profiles mining algorithm is proposed which is based on user query frequent item-sets with item constraints. Srivastava, A., et al. in [65] and [23] propose an intrusion detection algorithm named weighted data dependency rule miner (WDDRM) for finding dependencies among the data items. The main idea is that in every database, there are a few attributes or columns that are more important to be tracked or sensed for malicious modifications as compared to the other attributes. The algorithm takes the sensitivity of the attributes into consideration. Sensitivity of an attribute signifies how important the attribute is for tracking against malicious modifications. 2.6.2 Database Anomaly Detection Systems Relational databases operate on attributes within relations, i.e., on data with a very uniform structure, which makes them a prime target for anomaly detection systems [69]. Generally, an anomaly-based IDS is a bi-modal system with a training mode and detection mode [53]. Speaking in more details, in database anomaly detection systems usually firstly an anomaly detector examines the regular state and behavior of a system and computes from them a set of reference data, which captures their characteristic properties. Then, the same computations are applied to the system in operation and the current set is compared with the reference set. Whenever the difference exceeds a specified threshold, the anomaly detector reports an anomaly, viz an unusual deviation [69]. Anomaly detection works best, i.e. produces the fewest wrong hints and alarms, on systems with clear patterns of regularity. The identification or extraction of these patterns is the most difficult task in the design of an anomaly detection 40 system (ADS) for networks and operating systems – with well designed relational databases many of them come for free [69]. DEMIDS [8] is a misuse detection system for database which is essentially based on this fact that the access patterns of users typically form some working scopes which comprise sets of attributes that are usually referenced together with some values. DEMIDS considers domain knowledge about the data structures and semantics encoded in a given database schema through the notion of distance measure. Distance measures are used to guide the search for frequent item-sets describing the working scopes of users. In DEMIDS such frequent item-sets are computed efficiently from audit logs using the database system's data management and query processing features [8]. The tool designed in [59] analyses the transactions the users execute and compares them with the profile of the authorized transactions that were previously learned in order to detect potential deviations. This tool, named IIDD - Integrated Intrusion Detection in Databases, works in two modes: transactions learning and intrusion detection. During transactions learning, the IIDD extracts the information it needs directly from the network packets sent from client applications to the database server using a network sniffer. The result is the directed graph representing the sequence of SQL commands that composes the authorized transactions. The learned graph is used later on by the concurrent intrusion detection engine [59]. The approach proposed in [58] for intrusion detection is based on mining database traces stored in log files. The result of the mining process is used to form user profiles that can model normal behavior and identify intruders. An additional feature of this approach is that the author couples the mechanism with Role Based Access Control (RBAC). The IDS is able to determine role intruders, that is, individuals that while holding a specific role, have a behavior different from the normal behavior of the role [58]. The main idea is that databases typically have very large number of users. Thus, keeping a profile for each single user is not feasible in practice. So they try to construct a normal behavior of database roles rather than 41 database users. Role profiles would be built using a classifier. This classifier is then used for detecting anomalous behavior. The proposed mechanism in [49] is based on anomaly detection and includes a learning phase and a detection phase. Very briefly, the database utilization profile is gathered as a first step to feed the learning phase. Once the database utilization profile is established, the information collected is used to concurrently detect database intrusions. Three abstraction levels are used to define the user profile representing his/her database activity: command level, transaction level, and session level. The intrusion detection is based on a set of security constraints defined at each of these three levels. 2.6.2.1 Learning-Based Anomaly Detection Learning-based anomaly detection represents a class of approaches that relies on training data to build profiles of the normal, benign behavior of users and applications [63]. In [63] an anomaly-based system is developed for the detection of attacks that exploit vulnerabilities in Web-based applications to compromise a backend database. The approach uses multiple models to characterize the profiles of normal access to the database. These profiles are learned automatically during a training phase by analyzing a number of sample database accesses. Then, during the detection phase, the system is able to identify anomalous queries that might be associated with an attack. Although this system mostly focuses on detecting the database attacks originated from web-based systems, it still may be considered as database intrusion detection, but the one that is located at the higher level than database. 42 2.6.3 Hybrid Methods In systems like [69] a combination of anomaly detection and misuse detection methods as well as statistical functions have been applied to construct a 3componenet system for database intrusion detection. This work presents a system for the database extension and the user interaction with a DBMS; it also proposes a misuse detection system for the database scheme. In a comprehensive investigation the author compares two approaches to deal with the database extension, one based on reference values and one based on ∆-relations, and show that already standard statistical functions yield good detection results. The misuse detection nature of this system is due to the fact of storing a list of possibly dangerous commands in a library of signatures and comparing the current command to it. 2.7 Database Intrusion Prevention Systems Intrusion prevention is a proactive defense technique which is extension of intrusion detection. Intrusion prevention systems detect ongoing attacks in real time and stop the attacks before they succeed, thus, avoids damage caused by the attacks [54]. There also have been researches on database intrusion prevention systems that detect attacks caused by malicious transactions and cancel them timely before they succeed. Mattsson presented an intrusion prevention system for database in [50, 70]. His researches focus on monitoring database objects (such as tables, attributes, etc) access rates associating each user, if the access rates exceed the threshold , notifying the access control system to make the user’s request an unauthorized request before the result is transmitted to the user. In [54], the focus is on monitoring transactions rather than access rates, and proactive protection is based on atomicity of transactions rather than modification of users’ authorization. In [71] a framework is describes for highly distributed real-time monitoring approach to database security using Intelligent Multi-Agents. The intrusion 43 prevention system described in this paper uses a combination of both statistical anomaly prevention and rule based misuse prevention in order to detect a misuser. The Misuse Prevention System uses a set of rules that define typical illegal user behavior. A separate rule subsystem is designed for this misuse detection system and it is known as Temporal Authorization Rule Markup Language (TARML). 2.8 Summary In this chapter we studied the Intrusion Detection area in a top-down manner. First, we stated the different definition of IDS and related terms as well as a brief history about it. The necessity of application the IDS in security mechanisms is discussed next. After that, we studied different taxonomies of Intrusion Detection Systems as well as the importance of studying these taxonomies. Some classification of intrusion detection systems and the criteria based on which these classification has established was studied next. Then, we discussed about the application of Data Mining techniques in intrusion detection systems and showed how these techniques may improve the efficiency of IDSs. Finally, we studied the intrusion detection methods at the database level and showed the lack of these systems in current security world. In a more detail view, we reviewed the database intrusion detection systems and the mechanism based on each of these system works. In the next chapter we propose our methodology based on which we will conduct the project. The different phases of our methodology will be discussed in respective sections. CHAPTER 3 PROJECT METHODOLOGY 3.1 Introduction In this section we explain the methodology through which we reach to project objectives. Firstly we talk a bit about the concept of methodology to give a better understating on what is to be covered in this section. The methodology tends to govern, or at least limit, the range of choices as how the data will collected, how it will be analyzed, how results will be reported, and even the nature of the conclusions that may reasonably drawn from the results [72]. There exist different categorizations of research types. Each of these categories is established based on a certain criteria. However, according to [73] the basic types of research are as follows: i. Descriptive vs. Analytical ii. Applied vs. Fundamental iii. Quantitative vs. Qualitative iv. Conceptual vs. Empirical Based on the specification and requirements of each type of research, the methodology which is needed to be followed differs. Specific known research methodologies may be customized and adapted by researcher in order to appropriately address the requirement of a certain research. The current project is 45 much closer to an empirical research (experimental) for which we adapt a known methodology and describe it in the next section. Figure 3.1 3.2 Project Methodology Project Methodology The purpose of this section is to detail how the project is conducted. The methodology of the proposed approach would be based on the Figure 3.1. Each step of this methodology will be discussed as the following. 3.2.1 Analysis At the first step, we would have a study on related works and analyze the mechanism of similar systems in details. The proposed mechanisms for database intrusion detection system in literature range from applying the statistical approaches to leveraging the artificial intelligence and Data Mining techniques. It also range 46 from misuse detection in which the decision is mainly based on the signature of normal and legitimate transaction; to anomaly detection systems which rely on the comparison between the current state and the pre-established profile of normal behavior of the system. Depend on the objectives and scope of each system, however unavoidably; it takes some constraints and assumption into account. In this phase we aim to analyze the features and specification of each system and derive the drawbacks and advantages of it. Using the result of this analysis, and also the project objectives, in next step we will design our system architecture. 3.2.2 Design In this phase of project we propose the architecture of required system components. Based on what we learn in literature review of similar approaches, we aim to appropriately design an efficient architecture which addresses the requirements to achieve to project objectives. The way the different components of our system connect to each other as well as the data flow through these components can dramatically affect the performance and efficiency of our solution. Hence, enough consideration should be taken to design and construct the components, as well as how the output of one component may feed the other parts of the system. The level of abstraction in which our system is supposed to work should be also determined in this phase. In addition, different implementation alternatives for data capturing can be followed, including external database approaches, such as using a proxy or a sniffer, or taking advantage of database auditing features available in most of DBMS. According to our architecture specification, we choose an alternative for our data capturing mechanism. Proposing an architectural design for database intrusion detection system is the main aim of this project. Based on this architecture, different models might be implemented to address the attacks to different database environments. Means, the 47 proposed architecture for DB-IDS is scalable enough to be adapted for different environments. 3.2.3 Prototype Development In this step we develop a prototype according to the architecture designed in the previous section. The proposed model works based on the correlation and interaction between different components. Each component has a specific job. All these components will be constructed using the built-in SQL Server 2005 capabilities such as SQL Server Jobs, SQL Server Trace, Auditing, Stored Procedures, Triggers and etc. The scripts will be written using SQL (Standard Query Language). 3.2.4 Prototype Implementation and Testing In this phase we aim to implement and test the model using a mock database system. We have to assume that in a certain period of time there have been no any intrusion to the system. This enables us to construct the normal behavior of the different system objects. Here we need to make our system to receive some benign transaction issued by legitimate users. Once the behavior profiles of the database system is constructed, we try to intrude the database system by some malicious transactions issued by, for example, insiders (a disgruntled user for instance). A sample of the output of each component will be provided in this phase as well. 3.3 Summary In this chapter we introduced the methodology which will be followed through the project. Next chapter deals with the design phase of the project. First, we will have an overview on different approaches for database intrusion detection 48 system and specifically those closer to our proposed approach. Then, through the chapter, we separately discuss about different components of the system and step by step show how these components interact with each other. CHAPTER 4 ANALYSIS AND DESIGN OF THE DB-IDS ARCHITECTURE 4.1 Introduction In this chapter first we are going to discuss about approaches of DB-IDS and explain how they are able to detect database intrusions. Then, we show how our model could improve the database intrusion detection efficiency and scalability. The model architecture of our database intrusion detection system will be introduced subsequently. This chapter is organized as follows. First we discuss about the necessity of existence of each component. Then we illustrate a figure indicating that component along with the correspond sub-components. Step by step we add other components and expand the architecture. 4.2 Approaches of Database Intrusion Detection Even though it is said that database intrusion detection may somehow seems to be a new area, however recently valuable approaches have been proposed by researchers which deal with detection the intrusion at the database level. These approaches vary from misuse detection which is originally based on the transaction fingerprinting to anomaly detection in which the detection of intrusion relays on the comparison between the normal behavior of the system objects and the current state. 50 Database anomaly detection systems derive the normal behavior of the system in different ways. For example in [69] which is a combined misuse and anomaly detection, the anomaly detection component works based on the history of changes of the values of the monitored attributes between two runs of the ADS. The approaches like [7], [23] and [52] are based on the data dependency concept. However, a great number of database anomaly detection approaches focus on extracting the normal behavior of database users. The most similar work to ours is [74] in which the author tries to derive profiles to describe the typical behavior for users and roles in a relational database system using the Working scopes and distance measure concept. Also the proposed model in [58] focuses on extracting the normal behavior of roles in RBAC-administered databases. 4.3 The Project Approach Previous database intrusion detection systems and models solely try to profile the typical behavior of database subjects, and not objects. (In the SQL Server 2005 literature, database subjects and object are named principals and securables respectively.) They are not designed in a manner to be able to construct the normal behavior of objects (opposed the subjects). It is believed that constructing the profiles for both principals and securables could, at least in theory, efficiently improve the capability and quality of detection the malicious transactions. By securable profiling, we mean capturing the manner in which principals work with securables and query them. Besides principal profiling, capturing the behavior in which the securables are treated could also help us to accurately and efficiently detect intrusions. We provide some examples to support this idea. But first we explain the intrusion detection mechanism of a system that function only based on principal profiles. Then we provide scenarios of database attacks which such a system is not able to detect. 51 When we capture the behavior of principals (database users or roles for example), we are able to detect those intrusions carried out by insider intruders like disgruntled or fired employees; as well as those done by external intruders (outsiders). We need to assume that for a time, the system works in an attack-free benign environment. This period of time could be named training phase. This makes us able to safely construct the normal behavior of system elements. After principal profiles are built, If someday an inside user – a fired manager for example who seeks to revenge his ex-boss - decide to attack the database system by exceeding his/her authorization, if what is going to be done by him/her is beyond the scope of his/her behavior profile, this could be thought as an attack and DBA would better to be informed about it. Also imagine a situation in which a hacker (external intruder) somehow gains access to account information of a legitimate database user by for example brute force attack or social engineering. By providing the account information, intruder can login to the database and maliciously modify some sensitive data (this data modification is not as an ordinary routine job of that legitimate user). If issued transactions are not within the normal behavior borders of that legitimate user, the intrusion detection system is able to detect those malicious transactions and send alarm to the DBA. Now suppose that the intruder hacks into a couple of database user accounts. In such scenario the intruder could logs in to database several times; each time using one of the accounts and perform a small part of his intrusion. In such situation, the deviation from normal behavior of each user profile is too small to be detected as an anomaly by the intrusion detection system that only keeps principal profiles. However, if securable profiles are in place, the probability of detection that kind of attack would be increased. Another scenario is that in database attacks like [10], the intruder gets access to sa account (by brute force, for example), creates a user and grants all the necessary permissions to this new user. Then, he logs in as new-created user and maliciously 52 updates some sensitive values. It is obvious that solely profiling the user behavior cannot detect such database intrusions, since the new-created database user does not hold any profile of normal activities. So, the system cannot detect any of transactions issued by new user as intrusion since there is no any benchmark to compare the transaction with. Yet the existence of profile of normal behavior of securables (here, the attribute), enables the system to detect this kind of intrusions. For example, if the intruder in our scenario tries to modify a value in an unusual manner as it does not conform to the normal profile of that table, this modification may be considered as a suspicious activity. One may ask why RDBMS access control systems (Discretionary Access Control) are not sufficient for protecting the data resided in databases. There are a couple of answers to this question. First of all, according to [74], very often, security officers do not use the available means to guard against the information stored in the DBS because the security policies are not well known. On the other hand, not all the essential security policies can be addressed appropriately by only the means of database access control system. No matter how accurate and rigid the association between security policies and access control mechanism is, unavoidably some gaps may be left, which afterward could be exploited by outsider or insider to attack database. On the other hand, in a poor database access control mechanism design, some permission may be neglected to be denied from some users. It means that one database user may issue transactions that actually are unauthorized; while access control system does not prevent him/her. Simply, assuming the security officer inadvertently has forgotten to deny some permission from specific users. Another scenario which may be raised is that database designers often ignore to provide the data integrity constraints on table. For example in an underlying database of a web shop, in a specific table the range of prices cannot be less and more than 40$ and 400$ respectively. Doubtlessly, this constraint can be forced by database capabilities (using constraints in SQL Server). However, due to a poor design, it may be 53 neglected to define such a constraint to prevent the intruder from maliciously modifying the value of this column. In practice, however, the values of this column may rarely get close to the border values. It is believed that the range of values normally get around a certain value. So, if one transaction tries to abnormally update a value in this column, and the updated value is considerably far from the normal range, the transaction may be suspicious. 4.4 The Proposed DB-IDS Architecture In this section we show the process of designing the architecture of the underlying components of our database intrusion detection system and show how they communicate to each other. Four basic components of our system are Data Collector, Profiler, Detector and Responder. architecture is basically derived from [8]. The structure of the proposed However we eliminate/add some components from/to it to be able to address our project objectives. For example we add Responder component which is responsible for gathering raised alarms and taking necessary action. In the following sections we explain the job of each component in details. We also show how the components interact with each other. As it was mentioned before in section 1.4, the proposed architecture could be adopted for any commercial RDBMSs. However, the structure and arrangement of components and data flows may need to be customized according to the capabilities of that RDBMS. Nevertheless, we have assumed that designed model would be implemented in MS SQL Server 2005. This assumption, however could affect the structure of proposed architecture. 54 4.4.1 Data Collector Two different types of data acquisition methods for database IDSs are mentioned in [51], namely Interception and built-in DBMS auditing. Anomalybased IDS presented in [63] detects attacks exploiting vulnerabilities in web-based applications to compromise a back-end database. The IDS taps into the communication channel between web-based applications and the database server. SQL queries performed by the applications are intercepted and sent to the IDS for analysis. However the IDS relies on modifying the library of MySQL (an opensource database) that is responsible for the communications between the two parties. Thus, this approach cannot be generalized for commercial databases such as MS SQL Server. Most database IDSs in the literature use the native auditing functionality provided by the DBMS to collect audit data. This includes using built-in auditing tools of the DBMS, manipulating the database log files and invoking DBMS-specific utilities [51]. DB-IDSs like [8], [55], [58], [69] and [71] all use the built-in auditing functionality to capture data. Jin et al. in [51] name some advantages and disadvantages of using built-in auditing capabilities. Advantages include: i. Easy To Deploy ii. Accessibility To More Comprehensive Information And disadvantages include: i. Impact On Performance ii. Complex And Hard To Meet Individual Corporate Auditing Requirements iii. Control of the Database Implies Full Control of the Auditing Functionality 55 However, in our model we apply the built-in auditing functionality of database system as well as a custom method for collecting audit data which built-in auditing mechanism is unable to provide. Data Collector component is responsible for gathering the necessary data for building the system profiles. A set of interesting features consisting principals and securables to audit is selected by the security officer, depending on the security policy to establish or verify. In this component we embed the SQL trace built-in capability of SQL Server 2005 to capture the desirable data. SQL Trace is a mean that allows us to capture SQL Server events from the server and saves those events in what's known as a trace file. The trace files usually are used to analyze performance problems. However, we can leverage this utility to monitor several areas of server activity like analyzing SQL and auditing and reviewing security activity as well. We are interested in events like successful login, failed login, logout, add/drop login, schema object access and etc. Afterward we derive necessary data from trace files and feed them into Profiler component. Figure 4.1 Data Collector Sub-Components Nonetheless, SQL Trace is unable to provide us all we need to know to build system profiles. For example if a table is updated, old and new values are not captured in the trace files. Hence, we need to develop our own data capturing component which is called Auditor. This sub-component actually functions as an auditor which keeps track of DML (data manipulation language) statements. In next chapter we explain how the Auditor component actually works. But in a nutshell, we 56 set up a mechanism to capture the INSERT, DELETE and UPDATE statements to a table. A copy of any inserted or deleted value to the table is exported to another table to keep the track of table before and after the transaction. Likewise, old and new values in an UPDATE transaction are captured as well. Additional information like the login name who issue the transaction, timestamp, the application from which the transaction is coming from and etc. will be stored too. These collected data by Tracer and Auditor are then fed into Profiler component. Figure 4.2 Data Collector SQL trace can capture different events. But, we are not interested in all of them. Depending on business requirements, security policies and other criterion, security officer decide which events are needed to be captured. In the Implementation chapter we list important events which worth to be captured. Likewise, auditing mechanism should be set for only sensitive elements of database system. Undoubtedly not all the database tables and views for example require auditing mechanism. We may only want to monitor certain tables, users, database roles and etc. In other word, there should be a set of policy in place to determine which database elements are needed to be profiled based on their importance and sensitivity (Figure 4.2). 57 4.4.2 Transformer Tracer produces Audit Trace and Auditor produces data which we name it Refined Logs. Actually we use the term Refined Logs for those data ready to be fed into Profiler component. Since Auditor sub-component is designed by our own, we can implement it in a manner that instantly and directly generate what we need to make the profiles. However, Audit Trace needs to be refined to provide us clean data. Thus we need to consider another component which takes the raw data produced by Tracer and convert it to Refined Logs. Figure 4.3 Transformer Component As we can see in Figure 4.3, Transformer component is responsible for deriving meaningful and clean data from raw log files provided by Tracer. Though we are able to configure the trace in a way that only capture desired events, however dozens of unnecessary extra data would be collected unavoidably. And that is why we put Transformer in place. The Transformer is responsible for preprocessing the 58 raw data in the audit logs and converting the raw data into appropriate data structures for the Profiler. More importantly, it groups the raw audit data into audit sessions. This is a critical step because the way audit data are aggregated into audit sessions determines what profiles are generated. For instance, the data can be grouped according to users or roles adopted by the users. User profiles are generated in the former case and role profiles in the latter [8]. 4.4.3 Profiler Refined Logs are data prepared to be analyzed for constructing system profiles. This is accomplished by Profiler component. Profiler generates profiles for principals and securables. No matter what technique would be applied to create the profiles, the necessary underlying data is prepared beforehand by Data Collector module. We can look at the Profile concept from different points of view. Means, normal behavior of the system elements could be captured based on variant criterion. What could be considered as profile in one system may not be regarded as profile in another. For example the number of logins per day for each user could form a part of that user’s profile in one system, while it could be unimportant in another environment. We emphasis again the most important objective of our project is proposing the architecture for a novel database intrusion detection system. The provided primary platform is tried to be scalable enough. So, variant methods range from statistical approaches to AI techniques such as Data Mining or a combination of them could be utilized to draw the working scope of each processed element. It depends on the creativity of the security officer or DBA and also on the requirements of the system to derive different profiles from data logs. Over the time, the Profiler could be extended to encompass various types of profiles and become larger. 59 Figure 4.4 Profiler Component Depend on what technique is used, the profile structure varies. Applying Data Mining for profiling the database objects and subjects results in data mining models. A data mining model, or mining model, can be thought of as a relational table. Each model is associated with a Data Mining algorithm on which the model is trained. Training a mining model means finding patterns in the training dataset by using specified Data Mining algorithms with proper algorithm parameters. Model training is also called model processing. In fact during the training stage, Data Mining algorithms consume input cases (Refined Logs in our terminology) and analyze correlations among attribute values. After training, the data mining model stores patterns that the Data Mining algorithm discovered about the dataset [75]. In our terminology the training dataset and discovered patterns are equal to Refined Logs and profiles respectively. Fortunately MS SQL Server 2005 introduces built-in Data Mining feature. It enables us to integrate the Data Mining-based profiling into 60 our database intrusion detection architecture. Although it may affect the performance of the system, there are several obvious advantages. For instance, data flow between different modules of the system became much easier rather than if we had to export the dataset to an external module and send the patterns (profiles) back to the Detector component. There are a number of algorithms available in SQL Server 2005 including: Decision Trees, Association Rules, Naive Bayes, Sequence Clustering, Time Series, Neural Nets and Text Mining [76]. As it was mentioned in section 2.5.1, Association Rules, Sequence Clustering and Decision Trees (a type of Classification algorithms) are thought to be the most suitable Data Mining algorithm for intrusion detection. SQL Server Data Mining features are embedded throughout the process and are able to run in real time and the results can be fed back into the process of integration, analysis, or reporting [76]. Hence, it is believed that in case of detection of any suspicious transaction, appropriate response could be triggered right away. Statistical approaches are another technique for building the profiles. Spalka et al. in [69] apply basic statistical functions on the elements of single attributes to obtain references values. They believe statistical approach yields surprisingly good results, so they dropped the initial intention of applying Data Mining techniques to the extension. The characteristics of the database system on which the IDS is running specify which methods suits for profiling. For example proposed IDS in [69] works best for databases in which deletions or updates of a large number of tuples occur only seldom. Profiling technique also varies in web-based applications (online shopping websites for example), organizational systems (for instance a company intranet) and complex infrastructures (systems with huge databases and data warehouses). The association between database logins and real users is another criterion which has an effect on applied technique and the nature of established profiles. 61 The Profile term in our project is abstract. It could contain different data structures. For example a database user profile may consist of several database tables, views and mining models, each one pertaining to specific aspect of user behavior. The combination of these data structures forms the database user profile. For the purpose of this chapter and to show some sample profiles, we mention a number of aspects of database user behavior as well as database securables related profiles. In the next chapter we show in details how to implement such profiles using SQL. Principals related profiles: i. Total number of times each user has logged in to the system using distinct application from distinct host. This profile tells us for example database user Bob until now has 50 times logged in to the system using SQLCMD application from PC06-MSWin2008 host, 45 times using OSQL application from PC06-MSWin2008 host and so on. ii. Total number of times each user has logged in to the system per day. This profile tells us for instance database user Bob has six times logged in to the system on July 4 2009, eight times on July 5 2009 and so on. iii. Total number of times the members of each database role have logged in to the system per day. This profile tells us for example the members of db_datawriter database role have totally 32 times logged in to the system on July 4 2009 and so on. Securables related profiles: 62 i. Total number of DELETE/INSERT/UPDATE/SELECT commands issued on each table by database users. This profile tells us for example database user Bob has issued 34 DELETE commands on Employee table. In a dynamic environment, system profiles need to be kept updated. After the initial training phase of the system in which the primary profiles are built, over the time we should take the natural changes in the system behavior into the account and renew the profiles in a regular basis. Essentially we need to compare the current state of the system with what reflects the normal behavior of the system. Thus, we should guarantee that system profiles present the actual normal behavior of the system. Otherwise, it could be resulting in false positives (Anomalous activities that are not intrusive are flagged as intrusive [1]) and false negatives (events are not flagged intrusive, though they actually are [1]). Therefore, the update mechanism must be in place. We can for example let the current activities be merged into the existing profile if no intrusive activity is detected. Also the process might be done manually by DBA. Means, the DBA could inspect the activities during the last n minutes/hours/day for example, and discretionary combine them with Refined Logs which are information ready to be fed into the Profiler. However, if any intrusion remains undetected, it could spoil the profiles since it unknowingly would be spread into the Refined Logs. That is why the Anomaly Detection Systems are thought to be prone to the error. 4.4.4 Detector Detector is responsible for identifying the suspicious activities in database system. It could be considered as the most important component in our architecture. As mentioned in previous chapters, there are two types of intrusion: anomaly and misuse. Thus the Detector component is divided into two sub-components: anomaly 63 detector and misuse detector (Figure 4.5). In the following we restate some advantages and disadvantages of anomaly detection and misuse detection systems. The main advantage of anomaly detection is that it does not require prior knowledge of intrusion and can thus detect new intrusions. The main drawback is that it may not be able to describe what the attack is and may have false positive rate [3]. In other word, the generated alarms by the system are meaningless because generally they cannot provide any diagnostic information (fault-diagnosis) such as the type of attack that was encountered. Means, they can only signal that something unusual happened [16]. However, one of the benefits of this type of IDSs is that they are capable of producing information that can in turn be used to define signatures for misuse detectors [15]. Figure 4.5 Detector Sub-Components The main advantage of misuse detection is that it can accurately and efficiently detect instances of known attacks. In addition, despite of anomaly detection system, the alarms generated by misuse detection systems are meaningful e.g., they contain diagnostic information about the cause of the alarm [16]. The main disadvantage is that it lacks the ability to detect the truly innovative (i.e. newly invented) attacks [3] as well as those whose attack patterns are not available. The database of attack signatures needs to be kept up-to-date, which is a tedious task because new vulnerabilities are discovered on a daily basis. 64 Figure 4.6 Anomaly Detector Component As we can see both anomaly and misuse detection has benefits and drawbacks. Yet, applying both of them could – at least in theory – result in efficient and more accurate detection of intrusions. In the following we explain each subcomponent in more details. 4.4.4.1 Anomaly Detector Basically, anomaly detector job is identifying any activity with considerable deviation from pre-established profiles. These abnormal activities could be thought of as potential attacks. The comparison method depends on the time of detection 65 (real-time vs. non-real-time), profile structure and what is considered as the current state of the system which is supposed to be compared with profiles. Plainly the comparison methods that need heavy computational works and several resources could not be run in real-time. Instead, it could be happen in idle times of the system or at the end of the working hour. If Data Mining approach is applied for profiling, the comparison between profiles (pattern) and recent transactions (case) is named model prediction in Data Mining terminology. In many Data Mining projects, finding patterns is just half of the work; the final goal is to use these models for prediction. Prediction is also called scoring. To give predictions, we need to have a trained model and a set of new cases [75]. Prediction can tell us whether new cases (recent events) are conforming to patterns or not. For example if the classification algorithm is used, model prediction specifies whether transaction is classified as an attack or not. In the real-time and near-real-time anomaly detection, we need to compare the coming transactions with profiles as soon as they reach to database engine or with a small detail. As illustrated in Figure 4.6 there exist a connection between Audit Trace and Anomaly Detector component. Here Audit Trace indicates the coming transactions reaching to database engine. It is necessary to determine in which intervals we plan the anomaly detection takes place. Depending on applied techniques and the sensitivity of anomalies there could be couple of intervals. Anomaly detector may consist of several sub- components, each of which responsible for detecting specific type of anomaly. Thus, these sub-components (let’s say stored procedures for instance) could be run in different intervals such as every n minute, every n hour, every day and so forth. The more sensitive anomaly, the faster we want it to be detected. However in cases that detection process is not real-time, we are still interested in detecting the anomalies. It enables us to analyze the attack, inspect the attack propagation, investigate data damages and recover corrupted data by tracking 66 the audit trace files and log files. All is needed to do so have been recorded by Data Collector component beforehand. For the purpose of this project and to demonstrate how the anomaly detector works, in next chapter we develop some sub-components of anomaly detector of DBIDS prototype through SQL stored procedures. These modules compare the recent events with profiles every two minutes and identify the anomalies. Additionally, some anomaly detector sub-components run once a day. In the following we name some sub-components of anomaly detector module which work based on the sample profiles provided in previous section, i. Find Suspicious Login: If one user logs in to the system for the first time from/using an odd host/application, it could be thought of as a suspicious login. Find Suspicious Login procedure is responsible for detecting such logins. Suppose that database user Bob always logs in to the system using Microsoft SQL Server Management Studio applications or SQLCMD command tool from PC06-MSWin2008 host. If someday Bob use OSQL command tool to logs in to the system, it is considered as a suspicious login which needs the BDA attention. It is probable that Bob’s account information is stolen and hacker is using his account to log in, with an application that Bob has never used before. ii. ExceededNumOfLgns: If one database user logs in to the system more than allowable times within the day, it could be considered as a suspicious activity. We can set the threshold statically or calculate the average times each user logs in to the system within the day. This anomaly would be detected by ExceededNumOfLgns stored procedure. 67 4.4.4.2 Misuse Detector Essentially misuse detector job is detecting the database intrusions based on the attack patterns. A meaningful sequence of events, commands and statements could indicate a database misuse, while immaterial if considered separately. This meaningful sequence forms database attack pattern. Unfortunately database attack patterns are not studied as much as network attack patterns. Network traffic is generally based on the TCP/IP protocol and it enables the researchers to easily model the network attack patterns. Different networks IDSs are then able to share the detected intrusion patterns and therefore function more efficiently. Anti viruses is another type of security system that work based on attack patterns. They could be considered as a type of host-based IDSs. However, for database intrusion misuse detections, there not exists such unified approach for standardization the attacks. Nonetheless in this project we embed a simple database attack pattern repository in the DB-IDS architecture. Misuse detector compares the sequence of coming events and statements with the attack patterns. As a matter of fact, we define a table of events which seem to be unimportant apparently. It could be include unexpected number of deletions/insertion/update/select to a table, login/role creation, login/role deletion, database drop, table drop and so forth. Then, we define the specific suspicious sequence of events as probable database misuse. For example, the following sequence could likely be a database attack: Considerable number of failed logins for sa > sa successfully logs in to the system > sa creates the login Alice > Alice logs in to the system > Alice drops a database table As illustrated in Figure 4.7 Misuse Detector component has two inputs: One from Audit Trace and the other one from Attack Patterns. The connection between 68 Misuse Detector and Attack Pattern is straightforward. However, here, by Audit Trace we mean the coming stream of transactions and events. Attack pattern repository could by populated in two ways: either manually by DBA, or based on what is reported by Responder component as an attack. As it is illustrated in Figure 4.7 there exists a connection between Policy and Attack Patterns data repositories. This connection indicates that attack patterns could be defined manually by the DBA for example. For instance a specific sequence of events in on system may be reflected an attack, while natural in another system. According to discretion of DBA and based on the security requirements and policies of a system, attack pattern repository would be populated. Figure 4.7 Misuse Detector Component 69 Moreover, with the support of anomaly detector component, new database attacks could be identified and translated to the understandable format for attack pattern repository. It is accomplished via Responder component. Either manually by DBA or with the support of Anomaly Detector component, attack pattern repository needs to be kept updated, addressing new attacks. The update mechanism should be defined. It could take place in a n-minute bases or daily, depending on the severity of attacks. Since it tremendously affects the performance of the system, attack pattern repository could not be updated frequently. This is because the coming events should be compared with all the patterns. So we need to keep the pattern table in the memory to facilitate comparison. We stated before that one of the objectives of our project is providing an approach for database policy revision. These are mainly database security policies which pertain to discretionary access control system. We also embed the sub- components responsible for database policy revisions in Misuse Detector module. One of these sub-components is FndPssvLgns stored procedure which is responsible for identifying those database users who has not logged in to the system within the day at all. At the end of the working hour, the list of those users will be created and stored. It helps DBA to find out which users are passive. The login account of those could be disabled if necessary. Additionally, another sub-component which is embedded into Misuse Detector is BrtFrcDtctr. As the name indicates, it is responsible for detecting the brute force attacks into the system. In many real-life scenarios like [10] the intruder uses brute force tools to hack into the system. Therefore it is essential to detect such attacks to stop further corruption. BrtFrcDtctr which would be implemented using SQL stored procedures, raise an alarm if notifies that within a period of time, a number of failed logins has occurred. The threshold could be set by the DBA. DBA is also able to configure BrtFrcDtctr in a manner that disables the suspicious login account. So, BrtFrcDtctr not only can detect the brute force attack, but also is able to function as a database intrusion prevention system which contribute with database access control system. 70 4.4.5 Responder The Responder component is responsible for taking necessary action against the detected intrusion. In a very simple form, the Responder just acts as a monitoring center which holds the specification of attacks; information like the associated login name, host, application, timestamp and so forth. DBA can regularly check the Alert table and takes the countermeasure action. Figure 4.8 Responder Component 71 Misuse Detector and Anomaly Detector send the information about detected intrusions to the Responder component. Sub-components of both Misuse and Anomaly Detector are tuned in a way to report the information about detected intrusions in a unified format and understandable for DBA. Different levels of severity could be assigned to the intrusions. Based on the severity of each intrusion, appropriate respond might be chosen against it. For example, in case of dangerous attacks like brute force, as discussed before, Responder could talk to the database access control system to disable the login (Notice to the connection between Responder and database access control system in Figure 4.8). Afterward, even if the correct password is provided, that user would not be able to logs in to the system since it has been disabled. Another type of respond to the intrusions could be enabling the higher level of auditing, C2 auditing for example. C2 auditing allows DBAs to meet U.S. government standards for auditing both unauthorized use of and damage to resources and data. It records information goes beyond server-level events, such as shutdown or restart, successful and failed login attempts, extending it to successful and failed use of permissions when accessing individual database objects and executing all Data Definition, Data Access Control, and Data Manipulation Language statements [77]. Having this information in place facilitates and accelerates the attack inspection and data recovery. The audit information contains the timestamp, identifier of the account that triggered the event, target server name, event type, its outcome (success or failure), name of the user's application and Server process ID of the user's connection and name of the database. However, the main limitation of the auditing is that it reduces the performance of the SQL Server. This happens due to saving the every action to the file. Second limitation is the hard disk space. These auditing files grow rapidly, which will reduce the disk space. According to the C2, if it is not able to write to the trace file, SQL Server will be shutdown. 72 We can also utilize the E-mail functionality for sending the attack information and notifications to DBA via email. Almost all commercial DBMSs support SQL Mail concept. SQL Server 2005 introduces Database Mail, which is SMTP based, rather than MAPI based [78]. In the following we highlight the mail features of Database Mail: i. Database Mail can be configured with multiple profiles and multiple SMTP accounts, which can be on several SMTP servers. In the case of failure of one SMTP server, the next available server will take up the task of sending e-mails. This increases the reliability of the mailing system. ii. Mailing is an external process so it does not reduce our database performance. This external process is handled by an executable called DatabaseMail90.Exe located in the MSSQL\Binn directory. iii. Availability of an auditing facility is a major improvement in Database Mail. Formerly, DBAs could not verify whether the system had sent an e-mail. All mail events are logged so that DBAs can easily view the mail history. In addition, DBAs can view the errors to fix SMTP related issues. Plus, there is the capability to send HTML messages. We can configure the Responder component to address several types of responds. DBA decides what responds need to be defines, and also assigns each of them to specific level of severity. For example, severity of critical cases like brute force attack or disabling the audit trace is reasonable to be the highest. The appropriate respond to such attacks could be minimizing or even disabling the authority of launcher the commands. Less severe events might only be reported to DBA by email. However, all these configurations should be accomplished by DBA based on the security requirements of the system, or according to security or database use policies (Notice to the connection from Policy box to Responder in Figure 4.8). 73 In addition, as mentioned before, our DB-IDS help to revise the database security policies. As a matter of fact using the information provided by Responder, we realize which policy need revision. The connection from Responder to Policy box in Figure 2.1 abstractly indicates this matter. 4.5 Overall Design The final schema of the proposed architecture for DB-IDS is illustrated in Figure 4.9. 4.6 Summary In this chapter we explained the process of designing the DB-IDS architecture. The components of our architecture are Data Collector (including Tracer and Auditor), Transformer, Profiler, Detector (including Anomaly Detector and Misuse Detector) and Responder. As we can see in the Figure 4.9, different users connect to the database server via different applications. However, all transactions reach to database engine. Necessary data for system profiling would be collected by Data Collector component. System/database use/security policies specify which subjects and objects worth to be profiled. Tracer utilizes the built-in means of tracing the commercial DBMSs. However, we need to develop our own Auditor sub-component to capture those data which could not be collected by Tracer. Auditor is responsible for gathering data values in DML statements. Since it would be developed by ourselves, we can tune it in a manner to capture only what we need. Anyhow, data collected by Tracer, which is called Audit Trace, requires transformation to become understandable for Profiler. This is the job of Transformer. Profiler component derive the characteristics of normal behavior of database subjects and objects from Refined Logs and generate the Profiles. Profiles reflect the attack-free state of the system. They are considered as a benchmark for Anomaly Detector component. 74 Generally, Detector component is responsible for detecting the malicious activities in the database system. Detector is a twofold component: Anomaly Detector and Misuse Detector. Anomaly Detector discovers previously unknown attacks. It identifies any activity with significant deviation from profiles. Misuse Detector works based on the attack patterns. Specific sequences of events in the database system form attack pattern or signature. Attack pattern database might be populated manually or automatically. Detection process could take place in different intervals. In real-time or near real-time intrusion detection, Detector needs to inspect the recent transaction with system profiles. However, in non real-time intrusion detection, it could occur after the working hour for example. In either case, we need to guarantee that profiles are updated and truly reflect the benign state of the system. In other word, we should prevent the attacks to breach to the profiles and ruin them. Information about database intrusions detected by Detector component would be reported to the Responder. A set of administrative policies specify what action needs to be taken against the detected attacks. Several types of responses could be defined according to the severity and criticality of the attacks. For example, Responder may contribute with database access control system to deny the authorization of a malicious insider to prevent further damages. Additionally, provided information by Responder could help us to revise system/database use/security policies. In next chapter, we show the implementation of our DB-IDS prototype. All components and sub-components would be developed using built-in means of MS SQL Server 2005. 75 Figure 4.9 Architecture of the DB-IDS CHAPTER 5 PROTOTYPE DEVELOPMENT, IMPLEMENTATION AND TESTING 5.1 Introduction In this chapter we are going to demonstrate the implementation of the DBIDS prototype. The aim of prototype implementation is to show how several components and sub-components of our DB-IDS could be established and communicate with each other. The prototype is implemented in MS SQL Server 2005. Nonetheless, the architecture could be adopted for other commercial DBMSs. We have used underlying database objects such as tables, views, stored procedures, functions and triggers to build the components. Moreover, we have utilized SQL Server Agent to make Jobs to iterate detection process. The implemented prototype is a simplified model of the DB-IDS. We apply simple statistical techniques for building the profiles. The Responder component in this prototype functions as a monitoring module. DBA could check the information provided by Responder to take the necessary action. In the following we step-bystep explain about each component and associated database objects. 77 5.2 Data Collector For constructing Data Collector component we apply the server trace utility in MS SQL Server 2005. component. Also using triggers, we develop the Auditor sub- In the following we go in details about building these two sub- components of Data Collector. 5.2.1 Tracer and Audit Trace We are interested in capturing several server events. We can next derive the system profiles from this trace data. In the Appendix A, we list some of the important events worth to be captured. Most of these events are security related ones. However, DBA might choose a selection of these events according to business requirements. Besides choosing the specific events, we can also select which columns to be captured. Generally we are interested in columns StartTime, EndTime, Duration, EventClass, TextData, SPID, ClientProcessID, DatabaseName, LoginName, DBUserName, NTUserName, NTDomainName, HostName and ApplicationName amongst others. We can even filter the columns to only capture or not capture specific values. The finer and cleaner the audit trace, the more efficient Profiler could derive the profiles. For example, we can set a filter to not capture the events of tempdb, ReportServer, mssqlsystemresource and msdb databases, since most of rows related to these system databases are immaterial for us. In the Figure 5.2 we have illustrated a part of a audit trace. We store the audit trace in a table named LogRepository. We suppose that the data in LogRepository table is attack- free. It enables us to derive the profiles from this table. 78 5.2.2 Transformer As we said before, the job of Transformer is converting the raw audit trace to an understandable format for Profiler. For the purpose of this prototype, we only intend to derive the transactions of the sessions in a sequential order. MakeSession stored procedure draws the sessions from LogRepository table and put them into LogRepositorySessions table. This table tells us what have been done in each session. In the Figure 5.1 we can see two sessions in red boxes. Event class 14 and 15 indicate Login to the system and Logout from the system respectively. We can observe what the database user has done since (s)he has logged in till logout. Figure 5.1 5.3 A portion of LogRepositorySession Table Auditor We have implemented the Auditor using database triggers. Suppose that we want to audit the MovieClick.dbo.Customers table. Simply we send the table name to the MkAdtTbl stored procedure as a parameter. audit_MovieClick.dbo.Customers tr_MovieClick.dbo.Customers trigger for us. table It then creates and As a matter of fact, audit_*.* table is considered as Refined Logs (Figure 4.9). The trigger is 79 responsible for inserting the data values of DML statements (Delete, Insert and Update) into audit_MovieClick.dbo.Customers table. Figure 5.2 Audit Trace Sample The structure of audit_*.* tables is similar to the underlying table, plus some extra columns for recording additional information about the transaction. These columns include: [audit_timestamp], [audit_appname], [audit_terminal], [audit_login], [audit_user], [audit_statement] [audit_value_type]. and These information tell us respectively the time of the transaction, the name of the application from which the transaction is issued, host name, login name who has issued the transaction, database user associated with login name, audit statement (Delete, Insert, Update) and audit value type (New and old). If a row is inserted into the table, a copy of inserted values would be sent to the audit_*.* table. The audit value type for Insert transaction is New, indicating a new row is inserted into the table. If a Delete command is issued, a copy of deleted values would be sent to the audit_*.* table. The audit value type in such case is Old. For Update transactions, two rows are sent to the audit_*.* table, the old value as well as new value. In the Figure 5.3 and Figure 5.4 we can see a sample table with its associated audit table. Having audit_*.* tables in place enables us to recover the damaged data if the database attack causes data corruption. 80 Figure 5.3 Figure 5.4 A Sample Table (MovieClick.dbo.Movies) A sample Audit Table (Audit_MovieClick_Movies) Another table which is a part of the Refined Logs is AuditUni. This table is actually the union of audit_*.* tables, but without the data values. Figure 5.5 A Cut of AuditUni Table 81 AuditUni table tells us who, when, using which application and from which host has issued which type of statement on which table of which database. This table plays an important role for making user profiles. It tells us that each user mostly issues which type of statements on which tables. 5.4 Profiler and Profiles In this section we introduce several profiles of the system. These profiles are mainly built based on the LogRepository table. In the following, we explain each profile in details and show how it reflects the normal behavior of the system. However, the profiles are not limited to what we mention here. In different systems, variant profiles could be established depend of characteristics and requirements of the system. 5.4.1 Subject Profiles Subject profiles are supposed to reflect the normal behavior of database users and roles. In the following we introduce several subject profiles. These profiles are grouped in daily basis. However they could be grouped in weekly and monthly basis, depend on the business requirements. GnrlLgnPrfl (General Log Profile) is simply a view on the LogRepository table. It tells us how many times each login has logged in to the system from distinct host and using distinct application. For instance as we can see in Figure 5.6, sa has 4 times logged in to the system from SHARAGIM host and via SQLCMD command line tool. 82 Figure 5.6 General Log Profile NumberOfLoginsPerDay (Number of Logins per Day) is another view on the LogRepository table. It specifies how many times each user has logged in to the system per day. The average number of logins (or any other metric reflecting the behavior of database user regarding the number of logins per day) could be derived from this profile. The appropriate anomaly detection sub-component can discover whether the number of logins for a specific user within a day conform the expected value or not. Figure 5.7 NumberOfLoginsPerDay Table Besides the number of logins per day, the total amount of time each user has been logged in to the system is also important for us. TtlLgnTm (Total Login Time) stored procedure returns a table indicating the total login time for each user 83 per day. As we can see in the Figure 5.8, for instance, sa has been logged in to the system for 2 hours on February 1 2009. Figure 5.8 TtlLgnTm Table If the total time a user is expected to be logged in to the system within a day is more/less than the expected value, it could be thought of as an anomaly. The appropriate anomaly detector sub-component might be assigned for detecting that kind of anomaly. NumberOfLoginsPerDayForDBRoles (Number of Logins per Day for Database Roles) is a stored procedure that returns a table specifying how many times the member of each database role has totally logged in to the system per day. This helps us to model the behavior of the roles which are considered as a type of subjects. Figure 5.9 NumberOfLoginsPerDayForDBRoles Table 84 The database role name is unique within the database. This is why we can see db_datawriter in both MyProject and AdventureWorks databases. The table tells us, for example, the members of db_ddladmin database role of the MyProject database have totally 12 times logged in to the system on February 1 2009 and so forth. NumberOfLoginsPerDayForServerRoles (Number of Logins per Day for Server Roles) is similar to the previous profile. The only difference is that it returns the total number of logins of the members of each server role per day. Server roles are another type of roles in SQL Server 2005 which are defined at the level of the server. The sample profile is illustrated in Figure 5.10. Figure 5.10 NumberOfLoginsPerDayForServerRoles Table The number of select, insert, delete and update statements the user issues is considered as an important part of his/her behavior in the database system. Based on these values, we can monitor the number of each statement the user is issuing at the time and raise an alert if it is discovered that the amount of issued specific statements does not conform the normal behavior. In the following we name several profiles pertaining to the number of statements. UserStmntCounter (User Statement Counter) is a view on the LogRepository table. It tells us how many insert, delete, update and select statements each user has issued per day. As shown in Figure 5.11, for example, sa has totally issued 467 select statements to the database system on September 9 2008. 85 Figure 5.11 UserStmntCounter View UserStmntCounterDBLevel (User Statement Counter Database Level) is similar to the previous profile, but it goes to the database level. This profile tells us how many insert, delete, update and select statements each user has issued on each database per day. We can see in the Figure 5.12 that sa has issued 3 and 464 select statements on “a” and MyProject databases on September 9 2008 respectively. Figure 5.12 UserStmntCounterDBLevel View UserStmntCounterTableLevel (User Statement Counter Table Level) is similar to tow previous profile, but goes to the table level. It shows us how many Insert, Delete, Update and Select statement each user has issued on which tables per day. As illustrated in Figure 5.13, we can see for example, on September 9 2008 sa has issued 2 and 1 select statements on “c” and “b” tables of “a” database respectively and so on. 86 Figure 5.13 UserStmntCounter View UserDDLCounter (User DDL Counter) is similar to the UserStmntCounter view, but instead of DML statements it returns the number of DDL events each user has issued per day. DDL statements include Create, Drop and Alter. These statements affect on database, table, procedure, trigger, view, login, user and role. Figure 5.14 UserDDLCounter View 5.4.2 Object Profiles In previous chapters we have explained why merely subject profiling is not sufficient for detecting the anomalies in the database systems. In this section we introduce some securable profiles reflecting the manner the objects are being treated. Securables are the database objects to which we can control access and to which we can grant principals permissions. SQL Server 2005 distinguishes between three scopes at which different objects can be secured: server scope, database scope and schema scope [78]. Among those and for the purpose of this prototype, we are interested in databases (server scope) and tables (schema scope). 87 DBStmntCounter (Database Statement Counter) profile specifies how many insert, delete, update and select statement have been issued on each database per day, no matter which user has issued them. For example if correspondent anomaly detector sub-component finds out that the number of delete statements within the day is considerably exceeding than the expected value, it could report this anomaly to the Responder component. Figure 5.15 DBStmntCounter View TableStmntCounter (Table Statement Counter) profile how many Insert, Delete, Update and Select statements have been issued on each table per day. This profile is actually similar to the previous one, but goes to the table level. As we can see in Figure 5.16, 17 Select statements have been issued on the IQTable on September 2 2009. Note that in DBStmntCounter and DBStmntCounter profiles, we have considered the number of statements, but not the affected rows. The appropriate profiles could be built to address the number of affected rows as well. Figure 5.16 TableStmntCounter View 88 DMLSeq_*_* tables show us that in what order Insert, Delete, Update and Select statements have been issued on the underlying table. Not only the number of each statement issued to the table per day is important for us, but also sometimes the order of those statements is important as well. For example in the Figure 5.17 we can see the order and the number of statements on the test.test2 table on March 12 2009. Figure 5.17 A Sample DMLSeq Table (DMLSeq_test_test2) Using Data Mining techniques like sequence analysis, we can derive more meaningful profiles from DMLSeq tables. Sequence analysis is used to find patterns in a discrete series. A sequence is composed of a series of discrete values or states [75] (statements, in our example). We can discover the more frequent sequence of statements on the table, which in some context reflects the manner the table is being queried. DMLSeq_*_* tables are associated with correspondent underlying tables. However, it could be extended to databases. We might be interested in discovering the frequent sequence of statements issued on each database as well. In the Table 5.1, we summarize the profiles we introduce in this section. Table 5.1 Profiles Specification Profile Alias Profile Name Type GnrlLgnPrfl General Log Profile Subject NumberOfLoginsPerDay Number of Subject Description Specifies how many times each login has logged in to the system from distinct host and using distinct application. Specifies how many times each user has 89 TtlLgnTm NumberOfLoginsPerDayFor DBRoles NumberOfLoginsPerDayForS erverRoles UserStmntCounter UserStmntCounterDBLevel UserStmntCounterTableLevel UserDDLCounter DBStmntCounter TableStmntCounter DMLSeq_*_* 5.5 Logins per Day Total Login Time Number of Logins per Day for Database Roles Number of Logins per Day for Server Roles User Statement Counter User Statement Counter Database Level User Statement Counter Table Level User DDL Counter Database Statement Counter Table Statement Counter DML Sequence Counter logged in to the system per day. Subject Specifies the total login time for each user per day. Subject Specifies how many times the member of each database role has totally logged in to the system per day. Subject Specifies how many times the member of each server role has totally logged in to the system per day. Subject Specifies how many insert, delete, update and select statements each user has issued per day. Subject Specifies how many insert, delete, update and select statements each user has issued on each database per day. Subject Specifies how many Insert, Delete, Update and Select statement each user has issued on which tables per day. Subject Object Object Object Specifies how many Create, Drop and Alter statement each user has issued affecting database, table, procedure, trigger, view, login, user and role per day. Specifies how many Insert, Delete, Update and Select statements have been issued to each database per day. Specifies how many Insert, Delete, Update and Select statements have been issued to each table per day. Specifies in what order Insert, Delete, Update and Select statements have been issued on the underlying table. Detector Generally intrusion detection is divided into misuse detection and anomaly detection. In the following sections we explain about the implementation of each of these sub-components. 90 5.5.1 Anomaly Detector In this section we introduce the anomaly detector sub-components developed for our prototype. These sub-components are mainly implemented using SQL stored procedures. Each procedure is responsible for detecting a specific type of anomaly. Means, first we must decide what anomaly we intend to be addressed. Then, the appropriate anomaly detector sub-component could be developed. The procedures we explain in this section work based on the profiles we introduced in previous section. In case of detecting any anomaly, the procedure sends an alert to the Responder component. These alerts are stored in a table named Alerts. Additional information such as hostname, application name, login name, SPID, timestamp and the name of the procedure generating the alert are stored as well. FindSusLogin (Find Suspicious Login) We assume that each database user logs in to the system from the host and using the application (s)he used to use before. In the GnrlLgnPrfl view, we keep the host name and application name from which each user has connected to the database server so far. So, it helps us to discover whether the user is connecting to the database server from a usual host and application or not. For example in the Figure 5.6 we can see that login “to” so far has used Microsoft SQL Server Management Studio and OSQL-32 application to logs in to the system from RAHANA host. So, if we discover that “to” is logging to the system from SHARAGIM host, it is considered as an anomaly, since it is probable that to’s account is stolen and the hacker is connecting to the database from a suspicious host. FindSusLogin is responsible for detecting that kind of anomaly. The generated alert looks like the Figure 5.18. 91 Figure 5.18 An Alert Generated by FindSusLogin This alert indicates that “to” login has connected to the database server from SQLCMD command line tool. ErlrLgnTimeDtctr (Earlier Login Time Detector) and LtrLgnTimeDtctr (Later Login Time Detector) We assume that each user is allowed to logs in to the system within the allowable range of time. It could be thought of as the working hour. We keep these ranges in the table named UserWorkingHour (Figure 5.19). Figure 5.19 UserWorkingHour Table Once a login is created, the login name automatically would be inserted in the UserWorkingHour table with the default values of 8 AM till 4 PM as the start time and end time. It means that the user is only allowed to logs in to the system within the specified range of time. However, these values might be modified. Now, if one user logs in to the system before or after allowable time, it might be considered as an anomaly. ErlrLgnTimeDtctr and LtrLgnTimeDtctr procedures are responsible for detecting such anomaly. 92 Figure 5.20 An Alert Generated by ErlrLgnTimeDtctr The generated alert raised by these procedures is illustrated in Figure 5.20. As we can see in the figure, “to” login has logged in to the system on 7:25 AM which is before allowable time. ExceededNumOfLgns (Exceeding Number of Logins) The number of logins of each user per day is also important for us. We assume that users are not allowed to logs in to the system more than allowable times. Once a user is created, the login name automatically would be inserted into a table named MaxNoLogins with a number specifying the maximum number of logging as 10. Means, each user could log in to the system not more than 10 times per day. This value could be modified. Figure 5.21 MaxNoLogins Table If “to”, for example, connects to the database more than 10 times within a day, it could be considered as an anomaly. ExceededNumOfLgns procedure is responsible for detecting such anomaly. It would raise an alert like Figure 5.22 in the case. 93 Figure 5.22 An Alert Generated by ExceededNumOfLgns FndPssvLgns (Find Passive Logins) This procedure inspects those database users who have not logged in to the system at the end of a working day. It could help the DBA to find out the passive logins and consider the dropping those if necessary. The alert generated by this procedure lists the passive login names at the end of the day. Obviously FndPssvLgns procedure runs once a day. Figure 5.23 An Alert Generated by FndPssvLgns This sub-component is one those that could help us to revision the database security policies. 5.5.2 Misuse Detector In this section we introduce stored procedure responsible for detecting the database misuse. The function of anomaly detector sub-components is actually based on the comparison between the current state with profiles. However, misuse detection sub-components work based on the attack or misuse patterns. In this section we explain the procedures developed as the misuse detector subcomponents of our prototype. 94 LgnMorThnOne (Login More than one time) Since we have assumed that each login is actually associated with one real database user, logically each user is supposed to be able to connect to the database once at the time. So, if we discover that one database user has logged in to the system more than one time at the time, it is considered as a database misuse. LgnMorThnOne procedure is responsible for detecting such a misuse. Figure 5.24 An Alert Generated by LgnMorThnOne BrtFrcDtctr (Brute Force Detector) BrtFrcDtctr procedure is responsible for detecting the brute force attacks. If a number of failed logins occurs within a period of time, it could be a probable brute force attack. The appropriate threshold (number of failed logins) as well as the time within which we want the count the failed logins could be set. The setting depends on the power of brute force tools. The more powerful brute force tools, the smaller the threshold needs to be set. Figure 5.25 An Alert Generated by BrtFrcDtctr As we can see in Figure 5.25, 60 failed logins has happened within last 2 minutes since this alert is raised. It is more likely that a brute force tool is trying to guess the password of Bob login. In case of any brute force detection, BrtFrcDtctr procedure disables the correspond login. Therefore, then, even if the correct password is provided, the login cannot logs in to the system. In some context, it 95 could be interpreted as a intrusion prevention job, since it stops the intruder from further attacks. 5.6 Intrusion Detection Cycle In this section we briefly explain the mechanism of running the detector subcomponents. As mentioned before, we utilize the SQL Server Agent and Jobs capability to iterate the detection process. Means, we define a Job and schedule it to run every 2 minutes. The logs of every 2 minutes are stored in a temporary table named CurrentLogRepository. It makes us able to execute the detector procedures on the logs of last 2 minutes. Therefore, we are able to detect any anomalous activity at most 2 minutes after it occurs. If no anomalous activity is detected, the content of CurrentLogRepository table could be flushed into LogRepository table. The cycle of intrusion detection is illustrated in Figure 5.26. The interval of intrusion detection process depends on the complexity of detector procedures. In this prototype, 2 minutes intervals is appropriate to suitably run all the procedures. 96 Empty the CurrentLogRepository table Execute detector sub-components on CurrentLogRepository table No Any intrusion is detected? Flush CurrentLogRepository into LogRepository table Yes Every 2 minutes Flush the trace file content of last 2 minutes into CurrentLogRepository table Send alert to Responder Lunch the new trace Figure 5.26 5.7 Intrusion Detection Cycle Summary In this chapter we walked through the implementation phases of database intrusion detection prototype. We utilize the server-side tracing in SQL Server 2005 to capture the necessary data for profiling. Moreover we developed several triggers to capture data values in DML statements. Data collected by Tracer needs to be 97 converted to the understandable format for Profiler. Using data provided by Tracer and Auditor (which are called Refined Logs), Profiler derives the profiles from Refined Logs. Then, we introduce some object and subject profiles. However, profiles are not limited to those addressed in this chapter. Using several methods, different type of profiles could be derived from Refined Logs. Detector component was responsible for detecting the database anomalies and misuses. Anomaly detector compares the current state of the database with profile to finds out any anomalous activity. By current state we mean the log files within last n-minute. In another word, the state of the system is reflected in most recent log files. Anomaly detector raises an alert and sends it to Responder component in case of any suspicious activity. However, Responder in this prototype only functions as a repository for holding the alerts. On the other hand, Misuse detector seeks for database misuses in most recent log files and like anomaly detector raises an alert if discovers any database misuse. Misuse detector works based on the database attack patterns. We defined some simple attack pattern to show how the mechanism works. Moreover we developed another misuse detector sub-component that addresses brute attacks. The detection process iterates every 2-minute. Means, we are able to detect database intrusions within 2 minutes. However, they only response defined in our prototype is simply disabling the intrusion perpetrator login. Next chapter deals with the conclusion of this project as well as several recommendations to enhance the implemented model. In addition, we will discuss about future works related to this project. CHAPTER 6 CONCLUSION AND FUTURE WORKS 6.1 Introduction Databases have become increasingly vulnerable to attacks. Generally, there are two approaches for handling the attacks. First of all, strengthening the systems by security controls like cryptographic techniques, sophisticated authentication methods and etc. to prevent the subversion itself. However, in practice these preventive measures usually fail to stop the attacks. Hence, detecting the attacks is considered as one the most critical steps in handling security breaches. Obviously, undetected attacks could cause further damages to valuable information of organizations. According to [79], it usually takes the average attacker less than 10 seconds to hack in and out of a database - hardly enough time for the database administrator to even notice the intruder. So it’s no surprise that many database attacks go unnoticed by organizations until long after the data has been compromised. Yet, the importance of detecting database breaches is straightforward. It could help us to identify database vulnerabilities and come up with solution to stop future attacks. Nowadays, enterprise database infrastructures, which often contain the crown jewels of an organization, are subject to a wide range of attacks. Therefore, amongst different types of intrusion detection systems (like network-based, host-based and application-based IDS), DB-IDS which is considered as a type of application-based IDS has become a matter of increasing concern. 99 The key aim of this project was to propose an architectural design for DBIDS. The proposed architecture is a comprehensive platform based on which different practical models could be implemented. These IDSs might be adapted for commercial and open source DBMSs. Additionally, according to the database specification and business requirements, the configuration of DB-IDS may be varied. However, the proposed DB-IDS architecture could be used to develop customized database intrusion detection system. 6.2 Discussion In chapter one, we first briefly discuss about the challenges of security world and specifically database system. Furthermore, it was mentioned how important it is for enterprises to equip their database with security controls, since DBMSs represent the ultimate layer in preventing malicious data access or corruption [49]. Afterward, the problem statements of this project were proposed. We then came up with the aim, objectives and scope of this project. In following sections we discuss how we reach to mentioned project objectives. Chapter two was entirely dedicated to a comprehensive literature review around intrusion detection systems. We followed a top-down approach in which firstly we presented the taxonomy of intrusion detection systems. Next, intrusion detection systems using data mining techniques was discussed. Finally we discuss about proposed different DB-IDSs in literature, followed by those leveraging data mining techniques. Approaches of anomaly and misuse detection systems for databases were studied then. This comprehensive study on DB-IDSs helps us to figure out advantages and disadvantages of different systems and come up with a fair architecture for a hybrid DB-IDS. In chapter three we present the methodology which was followed in this project. Our methodology included analysis, design, development, implementation 100 and testing phases. In chapter four - analysis and design of the DB-IDS architecture - first we briefly went through the similar studies to our approach for DB-IDS, and then begin to design different components of our system. Based on the architectural design for DB-IDS, in fifth chapter we developed a model to demonstrate how it could be implemented in a mock database system. The prototype was implemented in MS SQL Server 2005. All components and subcomponents were built using SQL language and basic means like procedures, triggers, tables and views. We also utilized the Job capability of SQL Server Agent to iterate the process of detection. The results of components and sub-components of our DB-IDS were presented along with the explanation of each component. We provided a couple of snapshots to show the mechanism of our DB-IDS tested on a mock database system in the MS SQL Server 2005. 6.3 Future Works and Recommendations In the previous chapters we have mentioned a couple of points for enhancing the capabilities and accuracy of the database intrusion detection system. These recommendations range from applying the more comprehensive techniques for profiling to Responder component improvements. The data collection method introduced in this project works properly for small and medium-sized database system. However, for large database system with hundreds and thousands transactions per second, using triggers and server-side tracing could considerably affect the performance of the system. Other data collection methods such as applying third party applications could be considered to enhance the scalability of the DB-IDS. A comprehensive study on the profiling techniques (deriving the profiles from Refined Logs) could be considered as another future work. For the purpose of 101 this project and to show how the subject profiling mechanism works, we mostly focused on the behavior of logins such as the number of logins per day, the total time of logins per day, login times and so on. However, the normal behavior of the database users could be modeled in different ways using different methods. 6.4 Conclusion In this section we intend to show how the objectives of this project were achieved. First we review the objectives discussed in section 1.5, followed by the achievement explanation. • Proposing an architectural design for hybrid DB-IDS An architectural design for a hybrid DB-IDS was the core aim of this project. The overall schema of the proposed architecture is illustrated in Figure 4.9. In the chapter 2, we studied different DB-IDSs and examine their advantages and disadvantages. Then, we come up with our architecture which we intended to encompass the advantages of other systems. One the important keys of our proposed architecture is its capability to be segregated. Means, for example, the Refined Logs might be provided by any Data Collection mechanism. Even we are able to utilize third party applications to efficiently gather necessary data to be fed into Profiler component. It is important because in different DBMSs the efficiency of data collection methods might vary. So, in one system we may rather to apply built-in means for data collection, while in another system we might have to use third party tools to do so. It also works for almost all the components. For example, we can use any technique and tool to derive the profiles from Refined Logs. As mentioned before, one the well-know methods for profiling in anomaly detection systems is Data Mining. We may either apply the built-in data mining algorithms of DBMS – if it 102 supports – for profiling, or use third party data mining applications to do so. All we need to make sure is that the data flow between different components and subcomponents is appropriately established. Also, data structure of the inputs and outputs of each component should be in a understandable format for other components. i. Proposing a database anomaly detection model Anomaly Detector component is responsible for comparing the present state of the system with pre-established profiles. In the section 4.4.4.1 we presented the design principals of this component. Later in section 5.5.1 we implemented a sample anomaly detector which works based on securable and principal profiles. ii. Constructing sample profiles for a database system In the section 5.4 we construct some sample profiles for database elements such as database users and roles (as principals) and table (as securables) to demonstrate how a profile looks like. Although these profiles might seem straightforward, however they reflect the simple, yet important aspects of the normal behavior of the system elements. Furthermore more comprehensive profiles could be derived from these simple profiles. Principal profiles presented in section 5.4 mostly reflect the login behavior of the database users. Measures like the number of logins per day and the total logging time into the system and also the host and application via which the user has connected to the system are the basic aspects on which the user profiles are established. For securable profiling, we also focused on the number and sequence of DML statements on the databases and tables as well. We believed that securable profiling besides principal profiling could – at least in theory – enhance the accuracy and quality of the detection mechanism. In section 4.3, we also demonstrated about database attack scenarios for which solely principal profiling could not help us to detect the attacks. 103 iii. Developing an database audit system We developed our own database auditing system (Auditor) to capture specific data which are not collectable by Tracer. Auditor is responsible for capturing the DML statement values and sending them to another table. The implementation details are presented in section 5.3. Independent from DB-IDS, this component could be utilized in any database system for auditing purposes. iv. Proposing a brute-force detection model for the database systems Multiple failed logins often indicate brute-force or enumeration attempts or an ongoing attack in progress. One of the first challenges for an attacker is to penetrate the authentication of a database - that is, either to authenticate as a legitimate existing user or to bypass the authentication process, in order to access the contents of the database. Among the different techniques for an attacker to penetrate the database is to Brute-Force the database authentication - this can be used for several purposes, such as: a) Guessing a user's password using an automated attack tool b) Enumeration of usernames in order to validate the existence of an account From the security point of view, one of the ways to identify such an attack attempt is to monitor the failed login attempts. Multiple failed login attempts during a short period of time may very likely be an attack attempt in progress. We developed a procedure named BrtFrcDtctr which is responsible for detecting the brute force attack into the database system. v. Proposing a model for database security policy revision According to the information provided by the Responder component, we are able to revise the current database security policies. For example, we may notice that 104 a user usually logs in to the system around 10 AM. However, (s)he is allowed to logs in to the system from 9 AM. In such case, the DBA could reconsider the start of the working hour of that user and change it to 9 AM from 10 AM. Therefore, if someday the account information of that user is stolen and the intruder tries to logs in to the system earlier than 10 AM, an alert would be sent to the DBA indicating a misuse into the system. Even such a scenario might seem immaterial, in many real life attack scenarios; the intrusion could be detected and mitigated then. We also develop a procedure which identifies the logins who have not logged in to the system at the end of the day. The deletion of such logins could be considered by the DBA to stop any potential attacks. These logins are known as orphan logins. Those enabled yet passive logins could be exploited by hackers to intrude to the system. 105 REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Sundaram, A., An Introduction to Intrusion Detection. 1996. Frank, J., Artificial Intelligence and Intrusion Detection: Current and Future Directions. In Proceedings of the 17th National Computer Security Conference, 1994. Lee, W., et al., A Data Mining and CIDF Based Approach for Detecting Novel and Distributed Intrusions. In Proceedings of 3rd International Workshop on the Recent Advances in Intrusion Detection, 2000. Lee, W. And S.J. Stolfo, Data Mining Approaches for Intrusion Detection. in the Proceedings of the 7th USENIX Security Symposium San Antonio, Texas, 1998. Lee, W., Applying Data Mining to Intrusion Detection: the Quest for Automation, Efficiency, and Credibility. Campos, M.M. And B.L. Milenova, Creation and Deployment of Data Mining-Based Intrusion Detection Systems in Oracle Database 10g. Hu, Y. And B. Panda, A Data Mining Approach for Database Intrusion Detection. ACM Symposium on Applied Computing, 2004. Chung, C.Y., M. Gertz, and K. Levitt, DEMIDS: A Misuse Detection System for Database Systems. In Third Annual IFIP TC-11 WG 11.5 Working Conference on Integrity and Internal Control in Information Systems, 1999. Heady, R., et al., The architecture of a network level intrusion detection system. 1990. Fowler, K., Forensic Analysis of a SQL Server 2005 Database Server. 2007. (2009) Addressing the Insider Threat, Improving Database Security to Manage Risk within the Federal Government. Computer Security Threat Monitoring and Surveillance. 1980. Carter, D.L. And A.J. Katz, Computer Crime: An Emerging Challenge for Law Enforcement. FBI Law Enforcement Bulletin, 1997: p. 1-8. Labib, K., Computer Security and Intrusion Detection, in The ACM Student Magazine. Bace, R. And P. Mell, NIST Special Publication on Intrusion Detection Systems. Alessandri, D., Towards a Taxonomy of Intrusion Detection Systems and Attacks. MAFTIA deliverable D3, 2001. Axelsson, S., Intrusion Detection Systems : A Survey and Taxonomy. 2000. 106 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. Fuchsberger, A., Intrusion Detection Systems and Intrusion Prevention Systems. Information Security Technical Report, 2005. 10: p. 134-139. Kemmerer, R.A. And G. Vigna, Intrusion Detection : A Brief History and Overview. 2002. Allen, J., et al., State of the Practice of Intrusion Detection Technologies. TECHNICAL REPORT CMU/SEI-99-TR-028, 2000. Debar, H., M. Dacier, and A. Wespi, Towards a Taxonomy of Intrusion Detection Systems. Computer Networks, 1999. 31. Halme, L.R. And R.K. Bauer, AINT Misbehaving: A Taxonomy of AntiIntrusion Techniques. Srivastava, A., S. Sural, and A.K. Majumdar, Database Intrusion Detection using Weighted Sequence Mining. JOURNAL OF COMPUTERS, 2006. 1(4). Lunt, T., et al., A real-time intrusion detection expert system (IDES). 1992. Kumar, S. And E.H. Spafford, A software architecture to support misuse intrusion detection. Proceedings of the 18th National Information Security Conference, 1995: p. 194–204. Ilgun, K., R.A. Kemmerer, and P.A. Porras, State transition analysis: A rulebased intrusion detection approach. IEEE Transactions on Software Engineering, 1995. Marc, G.W. And H. Andrew, Interfacing Trusted Applications with Intrusion Detection Systems, in Proceedings of the 4th International Symposium on Recent Advances in Intrusion Detection. 2001, SpringerVerlag. Access Control from an Intrusion Detection Perspective. Fayyad, U., G.P. Shapiro, and P. Smyth, The KDD Process for Extracting Useful Knowledge from Volumes of Data. Communications of the ACM, 1996. Lee, W., S.J. Stolfo, and K.W. Mok, A Data Mining Framework for Building Intrusion Detection Models. Pietraszek, T. And A. Tanner, Data mining and machine learning - Towards reducing false positives in intrusion detection. Information Security Technical Report, 2005. Stolfo, S., et al., JAM: Java agents for Meta-Learning over Distributed Databases. 1997. Lee, W., A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems, in Graduate School of Arts and Sciences. 1999, COLUMBIA UNIVERSITY. Helmer, G., J. Wong, and V.H.L. Miller, Automated Discovery of Concise Predictive Rules for Intrusion Detection. 1999. Daniel, B., et al., ADAM: a testbed for exploring the use of data mining in intrusion detection. SIGMOD Rec., 2001. 30(4): p. 15-24. 107 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. Abraham, T., IDDM: Intrusion Detection using Data Mining Techniques. 2000. Lazarevic, A., et al., A Comparative Study of Anomaly Detection Schemes in Network Intrusion Detection. in Proc. Third SIAM International Conference on Data Mining, San Francisco, 2003. Valdes, A. And K. Skinner, Adaptive, Model-based Monitoring for Cyber Attack Detection. SRI International. Daniel, B., Applications of Data Mining in Computer Security, ed. J. Sushil. 2002: Kluwer Academic Publishers. 272. Lee, W., S.J. Stolfo, and K.W. Mok, Mining Audit Data to Build Intrusion Detection Models. in Proc. Fourth International Conference on Knowledge Discovery and Data Mining, NewYork, 1998. Barbara, D., et al., ADAM: Detecting Intrusions by Data Mining. Proceedings of the 2001 IEEE Workshop on Information Assurance and Security, 2001. Barbar´a, D., N. Wu, and S. Jajodia, Detecting novel network intrusions using bayes estimators. in Proc. First SIAM Conference on Data Mining, Chicago, 2001. SNORT, SNORT Intrusion Detection System. [cited; Available from: http://www.snort.org. Anoop, S. And J. Sushil, Data warehousing and data mining techniques for intrusion detection systems. Distrib. Parallel Databases, 2006. 20(2): p. 149166. Burbeck, K. And S. Nadjm-Tehrani, Adaptive real-time anomaly detection with incremental clustering. information security technical report 12, 2007. Portnoy, L., Intrusion detection with unlabeled data using clustering. Leung, K. And C. Leckie, Unsupervised Anomaly Detection in Network Intrusion Detection Using Clusters. 28th Australasian Computer Science Conference, The University of Newcastle, Australia, 2005. Shah, H., J. Undercoffer, and A. Joshi, Fuzzy Clustering for Intrusion Detection. The IEEE International Conference on Fuzzy Systems, 2003. Fonseca, J., M. Vieira, and H. Madeira, Monitoring Database Application Behavior for Intrusion Detection. 12th Pacific Rim International Symposium on Dependable Computing, 2006. MATTSSON, U.T., A Practical Implementation of a Real-time Intrusion Prevention System for Commercial Enterprise Databases. Jin, X. And S.L. Osborn, Architecture for Data Collection in Database Intrusion Detection Systems. 2007. Hu, Y. And B. Panda, Identification of Malicious Transactions in Database Systems. Proceedings of the Seventh International Database Engineering and Applications Symposium (IDEAS’03), 2003. Rietta, F.S., Application Layer Intrusion Detection for SQL Injection. ACM Symposium on Applied Computing, 2006. 108 54. 55. 56. 57. 58. 59. 60. 61. 62. 63. 64. 65. 66. 67. 68. 69. Dai, J. And H. Miao, D_DIPS: An Intrusion Prevention System for Database Security. 2005. LOW, W.L., J. LEE, and P. TEOH, DIDAFIT: DETECTING INTRUSIONS IN DATABASES THROUGH FINGERPRINTING TRANSACTIONS. Wenhui, S. And D. Tan T H, A Novel Intrusion Detection System Model for Securing Web-based Database Systems. 25th Annual International Computer Software and Applications Conference (COMPSAC'01), 2001: p. 249. Lee, V.C.S., J.A. Stankovic, and S.H. Son, Intrusion Detection in Real-time Database Systems Via Time Signatures. In Proceedings of the Sixth IEEE Real Time Technology and Applications Symposium, 2000. Elisa, B., et al., Intrusion Detection in RBAC-administered Databases, in Proceedings of the 21st Annual Computer Security Applications Conference. 2005, IEEE Computer Society. Fonseca, J., M. Vieira, and H. Madeira, Integrated Intrusion Detection in Databases. 2007. Ramasubramanian, P. And A. Kannan, A genetic-algorithm based neural network short-term forecasting framework for database intrusion prediction system 2005. Asmawi, A. And Z.M. Sidek, A Survey on Artificial Immune System-Based Intrusion Detection System for DBMS. Postgraduate Annual Research Seminar, 2007. Chen, K., G. Chen, and J. Dong, An Immunity-Based Intrusion Detection Solution for Database Systems. Valeur, F., D. Mutz, and G. Vigna, A Learning-Based Approach to the Detection of SQL Attacks. Lee, S.Y., W.L. Low, and P.Y. Wong, Learning Fingerprints for a Database Intrusion Detection System. 2002. Srivastava, A., S. Sural, and A.K. Majumdar, Weighted Intra-transactional Rule Mining for Database Intrusion Detection. 2006. Barbara, D., R. Goel, and S. Jajodia, Mining malicious data corruption with hidden markov models. in Research Directions in Data and Applications Security, 2002. ZHONG, Y. And X.-L. QIN, RESEARCH ON ALGORITHM OF USER QUERY FREQUENT ITEMSETS MINING. Proceedings of the Third International Conference on Machine Learning and Cybemetics, Shanghai, 2004. Zhong, Y. And X.-l. Qin, Database Intrusion Detection Based on User Query Frequent Itemsets Mining with Item Constraints. Conference InfoSecu04, 2004. Spalka, A. And J. Lehnhardt, A Comprehensive Approach to Anomaly Detection in Relational Databases. 2005. 109 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. Mattsson, U.T., A REAL-TIME INTRUSION PREVENTION SYSTEM FOR COMMERCIAL ENTERPRISE DATABASES AND FILE SYSTEMS. Ramasubramanian, P. And A. Kannan, Intelligent Multi-agent Based Database Hybrid Intrusion Prevention System. 2004. Mauch, J. And N. Park, Guide To The Successful Thesis And Dissertation - A Handbook For Students And Faculty. 2003: Routledge, USA. Kothari, C.R., Research Methodology: Methods & Techniques. 2005: New Age Publishers. Chung, C.Y., M. Gertz, and K. Levitt, Misuse Detection in Database Systems Through User Profiling. ZhaoHui Tang and, J.M., Data Mining with SQL Server 2005. 2005: Wiley Publishing, Inc. Utley, C. (2005) Introduction to SQL Server 2005 Data Mining. Asanka, D. (2004) Basics of C2 Auditing. Rizzo, T., et al., Pro SQL Server 2005. 2006: Apress. Higgins, K.J. (2008) Hacker's Choice: Top Six Database Attacks. 110 APPENDIX A List of important events to be captured by Tracer sub-Component Event No Event Name Description 14 Audit Login Occurs when a user successfully logs in to SQL Server. 15 Audit Logout Occurs when a user logs out of SQL Server. 20 Audit Login Failed Indicates that a login attempt to SQL Server from a client failed. 22 ErrorLog Indicates that error events have been logged in the SQL Server error log. 102 Audit Statement GDR Event Occurs every time a GRANT, DENY, REVOKE for a statement permission is issued by any user in SQL Server. 103 Audit Object GDR Event Occurs every time a GRANT, DENY, REVOKE for an object permission is issued by any user in SQL Server. 104 Audit AddLogin Event Occurs when a SQL Server login is added or removed; for sp_addlogin and sp_droplogin. 105 Audit Login GDR Event Occurs when a Windows login right is added or removed; for sp_grantlogin, sp_revokelogin, and sp_denylogin. 106 Audit Login Change Property Event Occurs when a property of a login, except passwords, is modified; for sp_defaultdb and sp_defaultlanguage. 107 Audit Login Change Password Event Occurs when a SQL Server login password is changed. Passwords are not recorded. 108 Audit Add Login to Server Role Event Occurs when a login is added or removed from a fixed server role; for sp_addsrvrolemember, and sp_dropsrvrolemember. 109 Audit Add DB User Event Occurs when a login is added or removed as a database user (Windows or SQL Server) to a database; for sp_grantdbaccess, sp_revokedbaccess, sp_adduser, and sp_dropuser. 110 Audit Add Member to DB Role Event Occurs when a login is added or removed as a database user (fixed or user-defined) to a database; for sp_addrolemember, sp_droprolemember, and sp_changegroup. 111 111 Audit Add Role Event Occurs when a login is added or removed as a database user to a database; for sp_addrole and sp_droprole. 112 Audit App Role Change Password Event Occurs when a password of an application role is changed. 113 Audit Statement Permission Event Occurs when a statement permission (such as CREATE TABLE) is used. 114 Audit Schema Object Access Event Occurs when an object permission (such as SELECT) is used, both successfully or unsuccessfully. 115 Audit Backup/Restore Event Occurs when a BACKUP or RESTORE command is issued. 118 Audit Object Derived Permission Event Occurs when a CREATE, ALTER, and DROP object commands are issued. 128 Audit Database Management Event Occurs when a database is created, altered, or dropped. 129 Audit Database Object Management Event Occurs when a CREATE, ALTER, or DROP statement executes on database objects, such as schemas. 130 Audit Database Principal Management Event Occurs when principals, such as users, are created, altered, or dropped from a database. 131 Audit Schema Object Management Event 134 Audit Server Object Take Ownership Event Occurs when the owner is changed for objects in server scope. 135 Audit Database Object Take Ownership Event Occurs when a change of owner for objects within database scope occurs. 152 Audit Change Database Owner Occurs when ALTER AUTHORIZATION is used to change the owner of a database and permissions are checked to do that. 153 Audit Schema Object Take Ownership Event Occurs when ALTER AUTHORIZATION is used to assign an owner to an object and permissions are checked to do that. 170 Audit Server Scope GDR Event Indicates that a grant, deny, or revoke event for permissions in server scope occurred, such as creating a login. 171 Audit Server Object GDR Event Indicates that a grant, deny, or revoke event for a schema object, such as a table or function, occurred. 172 Audit Database Object GDR Event Indicates that a grant, deny, or revoke event for database objects, such as assemblies and schemas, occurred. 173 Audit Server Operation Event Occurs when Security Audit operations such as altering settings, resources, external access, or authorization are used. 112 176 Audit Server Object Management Event Occurs when server objects are created, altered, or dropped. 177 Audit Server Principal Management Event Occurs when server principals are created, altered, or dropped. 180 Audit Database Object Access Event Occurs when database objects, such as schemas, are accessed.