QUERY RE-EVALUATION FOR HANDLING SQL INJECTION ATTACKS Xiaoying Shen B.S., Donghua University, China, 2000 PROJECT Submitted in partial satisfaction of the requirements for the degree of MASTER OF SCIENCE in COMPUTER SCIENCE at CALIFORNIA STATE UNIVERSITY, SACRAMENTO FALL 2011 QUERY RE-EVALUATION FOR HANDLING SQL INJECTION ATTACKS A Project by Xiaoying Shen Approved by: __________________________________, Committee Chair Ying Jin, Ph.D. __________________________________, Second Reader Jinsong Ouyang, Ph.D. ____________________________ Date ii Student: Xiaoying Shen I certify that this student has met the requirements for format contained in the University format manual, and that this project is suitable for shelving in the Library and credit is to be awarded for the Project. __________________________, Graduate Coordinator Nikrouz Faroughi, Ph.D. Department of Computer Science iii ________________ Date Abstract of QUERY RE-EVALUATION FOR HANDLING SQL INJECTION ATTACKS by Xiaoying Shen Most modern web applications rely on retrieving updated data from a database. In response to a request from a web page, the application will generate a SQL query, and often incorporate portions of the user input into the query. SQL injection refers to injecting crafted malicious SQL query segments to change the intended effect of a SQL query. The hacker could access unauthorized data, or even gain complete control over the web server or back-end database system. SQL injection attack has become one of the top web application vulnerabilities. In this project, I surveyed different types of SQL injection attacks and the corresponding countermeasure strategies proposed by other researchers. A new technique to detect and prevent SQL injection attacks is presented; the basic idea is to insert a validation process between the generation of SQL query and the query execution. The technique consists of both static analysis of web application code and runtime validation check of dynamically generated SQL query. Following four steps are involved: Identify hotspot; analyze SQL iv query; initialization; and runtime validation check. The project was implemented using JAVA. Performance evaluation was also conducted. _______________________, Committee Chair Ying Jin, Ph D. _______________________ Date v ACKNOWLEDGMENTS I would like to thank my advisor Dr Ying Jin, for the guidance that she has provided throughout this project. I also want to thank my second reader Dr. Jinsong Ouyang for taking time read the report and giving me advice. I would like to thank my family for their continuous support during my graduate study. vi TABLE OF CONTENTS Page Acknowledgements ............................................................................................................ vi List of Tables ..................................................................................................................... ix List of Figures ..................................................................................................................... x Chapter 1. INTRODUCTION .......................................................................................................... 1 2. SQL INJECTION DEFINITION AND BACKGROUND ............................................. 3 2.1 Background of SQL Injection ................................................................................... 3 2.2 Different Types of SQL Injection ............................................................................. 6 2.2.1 SQL Manipulation ............................................................................................. 6 2.2.2 Code Injection .................................................................................................... 7 2.2.3 Fingerprinting and Enumeration ........................................................................ 8 2.2.4 Denial of Service Attack .................................................................................... 8 3. DETECTION AND PREVENTION OF SQL INJECTION ATTACKS ....................... 9 3.1 Defensive Coding ..................................................................................................... 9 3.1.1 Prepared Statement ............................................................................................ 9 3.2 Detection and Prevention Techniques .................................................................... 10 3.2.1 Static Analysis ................................................................................................. 10 3.2.2 Combine Static and Runtime Analysis ............................................................ 11 vii 3.2.3 Runtime Analysis ............................................................................................. 12 3.2.4 Using Intrusion Detection System ................................................................... 12 4. IMPLEMENTATION ................................................................................................... 14 4.1 System Architecture ................................................................................................ 14 4.1.1 Identify Hotspots .............................................................................................. 15 4.1.2 SQL Query Analyzer ....................................................................................... 16 4.1.3 Validation Check Initialization ........................................................................ 18 4.1.4 Runtime Validation Check ............................................................................... 19 4.2 Implementation Discussion ..................................................................................... 20 4.2.1 Hash Function .................................................................................................. 20 4.2.2 Rolling Key ...................................................................................................... 23 4.2.3 Building Index for encryptionT Table ............................................................. 25 4.2.4 Trigger.............................................................................................................. 25 4.3 Performance Evaluation .......................................................................................... 28 5. CONCLUSION ............................................................................................................. 30 Appendix Source Code ..................................................................................................... 31 Bibiography....................................................................................................................... 40 viii LIST OF TABLES Tables Page 1. user_tb table ...................................................................................................... 19 2. encryptionT table .............................................................................................. 19 ix LIST OF FIGURES Figures Page 1. Three-tiered Web Application Architecture ............................................................ 3 2. Percent of Web application Vulnerabilities ............................................................. 5 3. Prepared Statement Example................................................................................. 10 4. User Login Code in JAVA; Hotspot is Bolded ..................................................... 15 5. Wrap the Hotspot with Additional Validation Check ........................................... 16 6. XML Parse Tree Generated by General SQL Parser .......................................... 167 7. Java Code Implementing HMAC Function ........................................................... 22 8. Java Code Implementing a Task to be Scheduled ................................................. 23 9. Java Code Implementing Scheduling a Task ........................................................ 24 10. Create encryptionT table with index ..................................................................... 25 11. Query encryptionT Table with Index .................................................................... 25 12. Create a insertTrigger on user_tb .......................................................................... 26 13. Create a deleteTrigger on user_tb ......................................................................... 27 14. Create a updateTrigger on user_tb ........................................................................ 27 15. Runtime Response Time ....................................................................................... 28 x 1 Chapter 1 INTRODUCTION As the use of internet growing rapidly in recent years, database driven web application is widely used in all types of area, including large and small companies, government agencies, universities and institutions. Network security is of utmost importance. Confidentiality, Integrity, and Availability are three fundamental objectives of security. Confidentiality refers to limiting information access and disclosure to authorized users with sufficient privileges and preventing access by or disclosure to unauthorized ones. When information is accessed by someone who are not authorized to do so, the result is known as loss of confidentiality. Integrity means that data cannot be modified undetectable. When information is modified in an unexpected way, the result is known as loss of integrity. Availability ensures that the resource is available when it is needed. SQL injection attack refers to injecting crafted malicious SQL query segments to change the intended effect of a SQL query. It can result in loss of confidentiality, integrity and availability. SQL injection attack allows the attackers to access unauthorized data, or even gain complete control over the web server or back-end database system. Because the database typically contains the critical data for the application, it is very attractive for the attacker. In addition, the web application code contains vulnerabilities. SQL Injection attack has become one of the top ten web application vulnerabilities. 2 This project presents our approach to handle SQL Injection attacks. In the first phase of this project, I have studied different forms of SQL injection attacks, as well as the prevention and detection techniques proposed by other researchers. In the second phase, a new technique to detect and prevent SQL injection attack is presented. The basic idea is to insert a validation process between the generation of SQL query and the query execution. After the system architecture is built, the implementation detail of each individual step is discussed. The system performance is also evaluated at the end. This report is organized as follows. Chapter 2 introduces the background and different types of SQL injection attacks. Chapter 3 gives a survey on related work that has been done and the countermeasure techniques proposed by other researchers. In Chapter 4, our approach to detect and prevent SQL injection attack is presented. Chapter 5 concludes the project and proposes future work plan. 3 Chapter 2 SQL INJECTION DEFINITION AND BACKGROUND SQL Injection is a code injection technique. By injecting illegal content to the SQL query, attacker can gain unauthorized access to the backend database of the web application and retrieve, modify, or delete the data stored on the databases. 2.1 Background of SQL Injection Figure 1 shows typical three-tiered web application architecture. Figure 1 Three-tiered Web Application Architecture 4 The client tier runs only the user interface on the end user computer. Middle tier is the application server, which runs the business logic, processes the data, and queries the backend database. The backend database server stores information for the web application. Web pages present to users with dynamically generated content. Based on user input, the web application retrieves and shows data to the user interface. For example, in an online library application, two users search for books and they will get different search results. The application server queries database to access the data. The interaction is normally implemented with a general-purpose programming language, such as Java, and through an application-programming interface (API), such as JDBC. To simplify it, the web application gets input from users and incorporates it into SQL queries to query the underlying database. Based on the Web Application Security Statistics Project sponsored by Web Application Security Consortium (WASC), about 49% of web applications contain vulnerabilities of high risk level. SQL Injection attack is one of the top 10 Web Application Security Risks. Figure 2 [1] shows that among all web application vulnerabilities, about 7% of the vulnerabilities are caused by SQL injection attack. 5 Figure 2 Percent of Web application Vulnerabilities SQL injection refers to a class of code-injection attacks in which data provided by the user is included in a SQL query in such a way that part of the user input is treated as SQL code. The causes of SQL injection is mainly the following two reasons. First SQL queries are constructed by incorporating user input, if the user input is not handled properly, it can cause serious system vulnerability. Second, the underlying database often contains sensitive and confidential information, which is very attractive to the attacker. Using SQL injection, the attacker may extract, insert or modify data in the database; bypassing authentication; performing privilege escalation; performing database fingerprinting to discover the type and version of database that the web application is using; or 6 performing denial of service attack. SQL injection can cause very severe consequence. On April 2011, attackers from Japan and China used SQL injection to gain access to customers’ credit card data from Neo Beat, an Osaka based company. The theft affected 12,191 customers. The attack can be targeted at all types of database server, including Oracle, Microsoft SQL Server, MySql. The main reason of SQL injection vulnerabilities is insufficient validation of user input. 2.2 Different Types of SQL Injection In this section, I present four main types of SQL Injection attacks. 2.2.1 SQL Manipulation This is the most common type of SQL Injection attack. The attacker manipulates the SQL statement by changing the where clause [2]. The following SQL statement can be used to check user authentication in a web application, if there is any row returned that means the user is authenticated. SELECT * FROM users WHERE username =’aUser’ and PASSWORD = ‘aPassword’ The attacker may manipulate the SQL statement; put a tautology to disable password verification; enter value “’ or 1 =1 --” as input for password, then the original SQL statement becomes to : 7 SELECT * FROM user WHERE username = ‘or 1 =1 -- ’ and PASSWORD = ‘’ Based on operator precedence, it is always true for the where clause. When above SQL query is executed all the information in user table will be returned. So the attacker can gain unauthorized access to the application without a valid username and password. 2.2.2 Code Injection Code injection attack attempts to add additional SQL statement to the existing SQL statement [2]. This type of attack works only if the database supports multiple SQL statements execution per database request. Oracle does not support this, the code injection attack against an Oracle database will result in an error. It is frequently used against Microsoft SQL Server application. One way to perform this type of attack is by inserting a UNION query into the SQL statement. For example, if the attacker enters “’UNION SELECT * from sometable --“ for the username, the query will become following: SELECT * FROM users where username = ‘’ UNION SELECT * FROM sometable --‘ AND password = ‘anypassword’; The first query returns null set, and the second query will return data from sometable. The database union the result set of these two queries and return them to the application. Without a legitimate username and password, the above SQL statement will return all the records in sometable. More severe damage can be caused by attach the DML and DDL queries. 8 Another form of code injection attack is by adding a second query to the original query. For example, the attacker enters “’; drop table sometable --“ for password input, the query becomes like the following: SELECT * FROM users WHERE user=’someuser’ AND password=’ ‘; drop table sometable -- ‘ After completing the first query, the database will execute the second query, which is to drop sometable. Generally, any type of SQL query can be inserted. Therefore, this type of attack is extremely harmful. 2.2.3 Fingerprinting and Enumeration The attack usually starts with fingerprinting: gathering information of the target system. The useful information includes type and version of database server, table schemas. Most web applications display database errors to users and different types of database server returns its unique error message. The attacker can use some crafted inputs to get the error message, and gather database information. 2.2.4 Denial of Service Attack SQL injection can be used to perform denial of service (DoS) attacks. A SQL injection DoS is to overload a target system by submitting queries that would consume a large amount of system resource so that the system is unable to provide normal services to the user. 9 Chapter 3 DETECTION AND PREVENTION OF SQL INJECTION ATTACKS Researchers have proposed different techniques to detect and prevent SQL injection attacks. In this section, I summarize advantage and disadvantage associated with each technique. 3.1 Defensive Coding The most effective way to prevent SQL injection attacks is to apply defensive coding practices. 3.1.1 Prepared Statement Prepared statement (parameterized queries) force the code to firstly define all the user input type, and then pass in each parameter to the query later. It ensures that an attacker is not able to change the intent of a query. In the Java code example below in Figure 3, string query contains a SQL query with question marks, which serve as placeholders for the user input. User input for the bind variables is passed to the placeholders by setString method. If an attack enters “someuser’ or ‘1=1’”, the parameterized query will look for a username which literally matches the entire string and the return result will be failed. 10 String query = "SELECT * FROM users WHERE username=? AND password=?”; PreparedStatement pstmt = conection.prepareStatement(query); pstmt.setString(1, username); pstmt.setString(2, password); ResultSet resultset = pstmt.executeQuery(); Prepared Statement Example The example I showed is Figure in Java.3 Practically all common programming languages Although enforcing defensive coding practices is the best way to prevent SQL injection attacks, it still cannot guarantee a flawless application code because human errors are inevitable. Sometimes, developers forgot or did not perform adequate input validation check. In addition, many legacy application codes exist; to patch all those systems can be very tedious. 3.2 Detection and Prevention Techniques Researchers have proposed different types of detection and prevention techniques for the countermeasure of SQL injection attacks. 3.2.1 Static Analysis Huang and colleagues [3] propose WAVES (Web Application Vulnerability and Error Scanner), a black-box technique for testing web applications for SQL injection vulnerabilities. It attacks the target system, monitors the application’s response and use machine learning techniques to improve its attack. The drawback of this approach is that it cannot provide guarantee of completeness. 11 Gould and colleagues [4, 5] proposed a technique called JDBC Checker. It performs static analysis and verify the type correctness of dynamically generated SQL queries. They use finite state automata to enforce type correctness. It is able to find type mismatch errors in the web application. However, it would not find more general forms of SQL injection attacks that generate syntactically and type correct queries. 3.2.2 Combine Static and Runtime Analysis Halfond and Orso[ 6,7] proposed a tool called AMNESIA to detect SQL injection attacks by combining static analysis and runtime monitoring techniques. At static time, they use string analysis technique to identify the hotspots and build a SQL-query model for each hotspot. At runtime, they check the dynamically generated queries against the SQL-query model, reject and report queries that violate the model. However, the primary limitation of this technique is that its success is depending on the accuracy of its static analysis for building query models. In certain situations the tool can generate false positives and false negatives. Huang et al. [8] developed a tool called WebSSARI (Web application Security by Static Analysis and Runtime Inspection), which detects input validation related errors using information flow analysis. It uses a lattice-based static analysis algorithm. They check taint flows against preconditions for sensitive functions at static time, and insert a runtime guard to the points where preconditions have not been met. The filters and sanitization 12 functions are added to web application code to satisfy the precondition. The primary drawback of WebSSARI is that they cannot find all the vulnerabilities in a system. 3.2.3 Runtime Analysis SQLGuard [9] is a runtime analysis technique that uses parse tree validation to detect SQL injection attacks. This approach detects the attack by comparing the tree structure of the SQL query before and after the concatenation of user input. It uses a secret key to wrap user input so this technique requires code changes. The key should be kept secret, otherwise the attacker may bypass the check. Boyd and Keromytis [10] proposed SQLrand. It is a technique based on instruction-set randomization. Developers can create queries using randomized instructions instead of normal SQL keywords. A proxy filter then de-randomizes the randomized query and converts it to proper query for the database. The code injected by attacker would not have been constructed using the randomized instruction set and would cause runtime exceptions. The drawback of this technique is the complexity of configuration, and the security of the approach is dependent on the security of the key. 3.2.4 Using Intrusion Detection System Valeur and colleagues [11] propose the use of Intrusion Detection System (IDS) to detect SQL injection attacks. It is an anomaly-based IDS based on machine learning technique. In the training phase, it feeds a number of normal application queries into the machine- 13 learning algorithm, which generates models that can characterize the profiles of normal usages. In the detection phase, the technique monitors the application to identify queries that do not match the model. The system is able to detect attacks with a high rate of success. However, a poor training set can cause the learning technique to generate a large number of false positives and false negatives. 14 Chapter 4 IMPLEMENTATION This chapter presents our approach to detect and prevent SQL injection attacks. The basic idea is to insert a validation process between the generation of SQL query and the query execution. This chapter covers the implementation details. 4.1 System Architecture The system consists of two parts: static analysis of web application code and runtime validity check of dynamically generated SQL query. The following four steps are involved, where Steps 1, 2, and 3 are static time analysis, and Step 4 is run time analysis. 1. Identify hotspots: Parses the web application code to detect and locate the hotspots. 2. SQL query analyzer: Analyzes SQL query string for each hotspot. 3. Validation check Initialization: Initializes for validity check. It uses the table name and attribute name retrieved from the second step, including applying hash function on each attribute value, building encryptionT table and creating triggers on tables being queried. 4. Runtime validation check: checks the validity of user input by applying hash function on each user input and comparing the hash value with the value in encryptionT table. The following subsections present the details of each step. 15 4.1.1 Identify Hotspots The hotspot is the place where the code interacts with underlying database and the SQL query is executed. In web applications implemented in JAVA, the typical hotspot can be the place where any of the following method is implemented: execute; executeQuery; executeUpdate. In the code example shown in Figure 4, the hotspot is at line 4. Only the SQL query string with user input is further analyzed. When a hotspot is identified, we add two methods to wrap the hotspot: SQL query analyzer and SQL query validation check, as the code shown in Figure 5. We use the approach of [12] to implement hotspot indentificaiton. A regular expression can be created to match all different souce code forms of hotspot. Using this regular expression, a matcher is created to parse the web application code to identify the hotspots. public class SomeApp extends HttpServlet{ public ResultSet getUser(String username, String password) { 1. Java.sql.Connection connection = DriverManager.getConnection( ); 2. Java.sql.Statement statement = connection.createStatement(); 3. String queryString = "SELECT * FROM user_tb WHERE username=" + username + " AND password=" + password; 4. ResultSet resultSet = statement.executeQuery(queryString); 5. Return resultSet; } } Figure 4 User Login Code in JAVA; Hotspot is Bolded 16 4a. 4b. { 4c. 5. QueryAnalyze(queryString); if (QueryCheck.hashCompare()) ResultSet resultSet = statement.executeQuery(queryString); Return resultSet; } Figure 5 Wrap the Hotspot with Additional Validation Check 4.1.2 SQL Query Analyzer SQL query analyzer is to retrieve the table and attribute name from a SQL query string. We built our SQL query analyzer using General SQL Parser [13]. General SQL Parser consists of two main components: 1) Lex parser; 2) Yacc parser. SQL query string is firstly tokenized into a list of tokens by the lex parser. Then based on the BNF of different database dialects, Yacc parser converts source tokens to a parse tree. The top node of the parse tree is the type of sql statement such as SelectStatement, InsertStatement, DeleteStatement, UpdateStatement etc. The sub tree nodes include columnlist, tablelist, where clause etc. SQL Query Analyzer consists of two sub steps. Firstly, it takes a SQL query string as input, parse it and write the result to an XML file. For example, the following SQL query is converted to an XML file shown in Figure 6: SELECT * FROM user_tb WHERE username=’panda’; 17 <sqlscript> <TStatementList size='1'> <TSelectSqlStatement setOperator='0'> <TResultColumnList size='1'> <TResultColumn> <TExpression type='15'> <TObjectName type='1'> *</TObjectName> </TExpression> </TResultColumn> </TResultColumnList> <TJoinList size='1'> <TJoin type='1'> <TTable type='objectname'> user_tb</TTable> <TJoinItemList size='0'> </TJoinItemList> </TJoin> </TJoinList> <TWhereClause> <TExpression type='40'> <comparisonOperator> =</comparisonOperator> <TExpression type='15'> <TObjectName type='1'> username</TObjectName> </TExpression> <TExpression type='16'> <TConstant> 'panda'</TConstant> </TExpression> </TExpression> </TWhereClause> </TSelectSqlStatement> </TStatementList> </sqlscript> Figure 6 XML Parse Tree Generated by General SQL Parser The advantage to convert the SQL query string to XML file is that this kind of conversion works for any types of query, including SELECT, DELETE, INSERT AND UPDATE. 18 Moreover, it is able to retrieve joined tables in from clause. Then by parsing the converted XML file, the analyzer returns the name of the table being queried and the attribute names in the where clause. For the example shown above, the table name will be ‘user_tb’ and column names will be ‘username’. For each hotspot, the table name and attribute values are recorded. They can be utilized to do the initialization and runtime validity check steps. 4.1.3 Validation Check Initialization Initialization process includes 1) create and initialize encryptionT table; 2) setup triggers for original table. For example, user_tb shown in Table 2 has two columns, username and password. The EncryptionT table shown in Table 2 consists of four columns: table_name, attribute_name, attribute_value, and hashed_ value. Table_name keeps track of the name of the table. Attribute_name keeps tracks of the name of the attribute. Attribute_value stores the value of the corresponding attribute. Hashed_value stores the hashing result of each attribute value in original user_tb table. The particular hash function we use here will be discussed in Section 4.2.1. Since user_tb can be modified at anytime, to reflect the most recent update in user_tb in encryptionT table, triggers have to be created. The detailed implementation of the trigger will be discussed in Section 4.2.4. 19 Table 1 user_tb table username password panda 123 cat 456 Table 2 encryptionT table table_ attribute_ attribute_ hashed_value name name value user_tb username panda bf52d6e7c590c26743c917658fe7c6ee014725cf user_tb password 123 b14e92eb17f6b78ec5a205ee0e1ab220fb7f86d7 user_tb username cat e778c7ffbb72f2c05d426d84b3aeaae0b3952105 user_tb password 456 ab567f1ae9fcb23472379151a24705cbc106ea0e 4.1.4 Runtime Validation Check Since we wrapped the SQL query execution (hotspot) with the query validation check method, every time a hotspot occurs, only if our runtime validity check returns true, the program continues executing, otherwise a warning message is given and the execution halts. The runtime input validity checker takes the user input; apply the same hash function used for initialization. Say the hashed value got at runtime is v1, using v1 to query encryptionT table: 20 SELECT attribute_value as v_set FROM encryptionT WHERE hashed_value = v1 If the user input matches any value in v_set, it is considered as a valid input; the program execution will continue. However if there is a malicious input, no value will be returned from the above query; the program cannot pass the validation. 4.2 Implementation Discussion In this section, the implementation details are discussed. 4.2.1 Hash Function A Hash function is a mathematical function that maps a string of arbitrary length (up to a pre-determined maximum size) to a fixed length string. Keyed-hash based message authentication code (HMAC) is a message authentication code that uses a cryptographic key in conjunction with a hash function. HMAC can be used with any iterative cryptographic hash function. The cryptographic strength of HMAC depends on the property of underlying hash function. Below is the definition for HMAC [14]: Let: H(·) be a cryptographic hash function K be a secret key padded to the right with extra zeros to the input block size of the hash function, or the hash of the original key if it's longer than that block size m be the message to be authenticated ∥ denote concatenation ⊕ denote exclusive or (XOR) 21 opad be the outer padding (0x5c5c5c…5c5c, one-block-long hexadecimal constant) ipad be the inner padding (0x363636…3636, one-block-long hexadecimal constant) Then HMAC(K,m) is mathematically defined by HMAC(K,m) = H((K ⊕ opad) ∥ H((K ⊕ ipad) ∥ m)) The size of the key for HMAC can be of any length up to 64 bytes, the block length of the hash function. The keys longer than B bytes will first be hashed using H and then use the result as the actual key to HMAC. The minimal recommended length for K is the byte-length of hash outputs (20 bytes for SHA1 [14]). We use a 40 bytes key for the hash function. The code in Figure 7 is used for calculating hash value for an input string. Java Cryptography Extention (JCE) is used to implement the HMAC function. First, a 40 bytes random alphanumeric string is generated by Apache commons RandomStringUtils class, and then converted to a byte array. SecretKeySpec class can be used to construct a secret key for a specified HMAC algorithm from the given byte array. Mac class provides the functionality of Message Authentication Code algorithm. At last, the result byte array is converted back to a string output. 22 public static void GetKey() { key = RandomStringUtils.randomAlphanumeric(40); } //HMAC public static String CalHmac(String input, String aKey) { byte [] digest = null; try { //Construct a secret key with a byte array SecretKeySpec secret = new SecretKeySpec(aKey.getBytes(),"HmacSHA1"); Mac mac; //Get instance of Mac object implementing HMAC-SHA1 and //initialize it with the secret key mac = Mac.getInstance("HmacSHA1"); mac.init(secret); digest = mac.doFinal(input.getBytes()); } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } catch (InvalidKeyException e) { e.printStackTrace(); } StringBuffer sb=new StringBuffer(); for (int i = 0; i < digest.length; i++) { String hex=Integer.toHexString(0xff & digest[i]); if(hex.length()==1) sb.append('0'); sb.append(hex); } return sb.toString(); } Figure 7 Java Code Implementing HMAC Function 23 4.2.2 Rolling Key It is suggested by NSF [14] that the key used for HMAC function should be refreshed periodically to limit the damage of an exposed key. There is no specific recommended frequency for key changes. In my implementation, it is set to 24 hours. So every 24 hours, a new random key is generated. Consequently, the encryptionT table needs to be updated, as well as the trigger. As shown in Figure 8, QueryCheck class extends TimerTask class, the tasks need to be scheduled are implemented in TimerTask’s abstract run method. Java scheduler is used to scheduling the task. The task can be scheduled to perform starting at a particular date and at any time interval, for example, once a day at 3 am, as the code shown in Figure 9. Import java.util.TimerTask; public class QueryCheck extends TimerTask { public void run( ) { con = getConnection( ); //generate a new key GetKey(); //reset EncryptionT table InitializeEncryptionT (con,tableName,columns); //update three triggers setTrigger(con, tableName, columns); } } Figure 8 Java Code Implementing a Task to be Scheduled 24 import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar; import java.util.Timer; import java.util.TimerTask; //schedule a task everyday at 3am, beginning at a specific date public class ScheduleTask { private final static long PERIOD = 1000*60*60*24; private final static int YEAR = 2011; private final static int MONTH = 9; private final static int DAY = 1; private final static int HOUR = 3; private final static int MINUTE = 0; public static void main(String[ ] args) { TimerTask task = new SqlValidityCheck( ); Timer timer = new Timer( ); timer.scheduleAtFixedRate(task, setDate(YEAR,MONTH,DAY,HOUR,MINUTE), PERIOD); } //construct a GregorianCalendar with the given date and time private static Date setDate(int year, int month, int day, int hour, int minute) { Calendar result = new GregorianCalendar(year,month,day,hour,minute); return result.getTime(); } } Figure 9 Java Code Implementing Scheduling a Task 25 4.2.3 Building Index for encryptionT Table If there are many related tables and attributes, the encryptionT table can be very large. To enhance the search performance, we build an index using table_name and attribute_name for encryptionT table as shown in Figure 10. CREATE TABLE EncryptionT (table_name varchar(15), attribute_name varchar(15), attribute_value varchar(15), hashvalue varchar(40), index (table_name, attribute_name); Figure 10 Create encryptionT table with index The query in Error! Reference source not found.can be used to query encryptionT able. SELECT attribute_value FROM encryptionT WHERE hash_value = v1 AND table_name =’user_tb’ AND attribute_name =’ username’; Figure 11 Query encryptionT Table with Index 4.2.4 Trigger Since the database content of a web application is not static, it can be edited dynamically. For example, a new user can be added, deleted or modified in the database. Three triggers 26 have been built to reflect the most recent updates the content of encryptionT in case the original table was updated. Use user_tb table as an example. Three triggers have been created on user_tb: insertTrigger, deleteTrigger and updateTrigger. After a new row is inserted in user_tb, insertTrigger apples the HMAC function to the inserted attribute value and insert two new rows to encryptionT table, one for each attribute in user_tb: username and password . The code to create insertTrigger is shown in Figure 12. DROP TRIGGER IF EXISTS insertTrigger; CREATE TRIGGER insertTrigger AFTER INSERT ON user_tb FOR EACH ROW BEGIN INSERT INTO EncryptionT SET table_name = 'user_tb', attribute_name = 'username', attribute_value = NEW.username, hashvalue = HMACSHA1(key, NEW.username); INSERT INTO EncryptionT SET table_name = 'user_tb', attribute_name = 'password', attribute_value = NEW.password, hashvalue = HMACSHA1(key, NEW.password); END Figure 12 Create a insertTrigger on user_tb Figure 13 shows the code to create a deleteTrigger. It deletes all the records that are relate to the rows being deleted in the user_tb table. 27 updateTrigger applies the HMAC function to the updated attribute value and updates all the records which relate to the rows being updated in the user_tb table. The code is shown in Figure 14. DROP TRIGGER IF EXISTS deleteTrigger; CREATE TRIGGER deleteTrigger AFTER DELETE ON user_tb FOR EACH ROW BEGIN DELETE FROM EncryptionT WHERE attribute_name = 'username' AND attribute_value = OLD.username; DELETE FROM EncryptionT WHERE attribute_name = 'password' AND attribute_value = OLD.password; END Figure 13 Create a deleteTrigger on user_tb DROP TRIGGER IF EXISTS updateTrigger; CREATE TRIGGER updateTrigger AFTER UPDATE ON user_tb FOR EACH ROW BEGIN UPDATE EncryptionT SET attribute_value = NEW.username, hashvalue = HMACSHA1(key,NEW.username) WHERE attribute_name = 'username'AND attribute_value = OLD.username; UPDATE EncryptionT SET attribute_value = NEW.password, hashvalue = HMACSHA1(key,NEW.password) WHERE attribute_name = 'password' AND attribute_value = OLD.password;" END Figure 14 Create a updateTrigger on user_tb 28 4.3 Performance Evaluation The evaluation was performed on Intel Core i7 CPU with 4 GB RAM, 64 bit Windows 7 Enterprise operating system. The database server is MySQL 5.1, which is installed on the same machine as the application server. The proposed technique introduces some overhead at runtime. The overhead mainly results from querying encryptionT table. For evaluation purpose, the user_tb table is populated with records of 100, 250, 750, 2500, 5000, 25000 rows.. The table schema is shown in table 1. Accordingly, encryptionT table has records of 200, 500, 1500, 5000, 10000, 50000 rows. The overhead is measured by executing the query in Figure 10 to query the hashed value in encryptionT table. The result is shown in Figure 15. Our approach Execution Time (ms) 300 250 200 150 100 50 0 200 500 1500 5000 10000 Number of Rows in encryptionT table Figure 15 Runtime Response Time 50000 29 The time overhead of proposed approach is less than 250 milliseconds even for large set of records. It has no significance impact on the performance of web application. 30 Chapter 5 CONCLUSION In this project, different types of SQL injection attacks were examined, and the countermeasure techniques proposed by researchers were surveyed. Based on the study, this project presented a new technique to prevent and detect SQL injection attack. The technique combines static analysis of web application code and runtime input validation check. At static time, after identifying the code segments where web application code interacts with underlying database, it analyzes the SQL query string. The retrieved table name and attribute name are used to build encryptionT table, which stores related attribute value and hashed value. At run time, same hash function is applied on the user input. Only if both hashed value and user input match the data stored in encryptionT table, the execution of web application can be resumed. The advantage of this approach is that the existing databases tables used in the web application are not affected. The implementation details were described in the report. There are a few improvements that can be done in the future. Currently, the system was implemented in Java. It is applicable to Java-based web application with JDBC API. In the future, more generalized system, which work for Java persistent API, .NET and PHP based web application, can be developed. Moreover, extensive and realistic evaluation needs to be performed to test the system. 31 APPENDIX Source Code Base.java import java.sql.*; import org.apache.commons.lang.*; import java.security.*; import javax.crypto.*; import javax.crypto.spec.SecretKeySpec; /** * Class Base handles database connection and HMAC calculation * @author Xiaoying Shen * */ public class Base { //Connect to db public static Connection getConnection(){ Connection con = null; String url = "jdbc:mysql://localhost/oo"; String username = "sa"; String password = "B33f34t3r"; try { Class.forName("com.mysql.jdbc.Driver"); con = DriverManager.getConnection(url, username, password); System.out.println("Connect to MySql."); } catch (Exception e){ e.printStackTrace(); } return con; } //Close db connection public static void closeConnection(Connection con) { if(con!= null){ try { con.close(); } catch (SQLException e) { 32 e.printStackTrace(); } } } //initiate appuser table, populate the table with specified rows of data public static void InitiateUserTable(Connection con, int rowNum) { ResultSet rs = null; String user = ""; String pass = ""; try { for(int i = 0; i <rowNum; i++) { //randomly generate username and password user = RandomStringUtils.randomAlphabetic(6); pass = RandomStringUtils.randomAlphanumeric(6); String query = "insert into user_tb values('"+ user+"','" +pass+"')"; Statement stmt = con.createStatement(); stmt.executeUpdate(query); } } catch (Exception e){ e.printStackTrace(); } } //HMAC public static String CalHmac(String input, String aKey) { byte [] digest = null; try { SecretKeySpec secret = new SecretKeySpec(aKey.getBytes(),"HmacSHA1"); Mac mac; mac = Mac.getInstance("HmacSHA1"); mac.init(secret); digest = mac.doFinal(input.getBytes()); } catch (NoSuchAlgorithmException e) { e.printStackTrace(); } 33 catch (InvalidKeyException e) { e.printStackTrace(); } StringBuffer sb=new StringBuffer(); for (int i = 0; i < digest.length; i++) { String hex=Integer.toHexString(0xff & digest[i]); if(hex.length()==1) sb.append('0'); sb.append(hex); } return sb.toString(); } } QueryCheck.java import java.sql.Connection; import java.sql.PreparedStatement; import java.sql.ResultSet; import java.sql.Statement; import java.util.TimerTask; import org.apache.commons.lang.RandomStringUtils; /** * Class QueryCheck initializes encryptionT table, creates trigger, and performs run time user input validation check. * @author Xiaoying Shen * */ public class QueryCheck extends TimerTask{ static String key = ""; public static void main(String[] args) { Connection con = null; String tableName = "user_tb"; String[] columns = {"username", "password"}; String attribute = "username"; String userInput = "'1=1--"; try { 34 con = Base.getConnection(); //Base.InitiateUserTable(con, 100); //Initialization GetKey(); InitializeEncryptionT(con, tableName, columns); setTrigger(con, tableName, columns); //runtime check System.out.println(hashCompare(con,tableName, attribute, userInput)); } catch (Exception e) { e.printStackTrace(); } finally { Base.closeConnection(con); } } public static void GetKey() { key = RandomStringUtils.randomAlphanumeric(40); } //read the tuples in user_tb, calculate hash value //and write the input and hashValue to encryptionT public static void InitializeEncryptionT(Connection con, String tableName, String[] attributes) { //Connection con = null; ResultSet rs = null; String attribute = null; String input; String hashValue; String query = null; Statement stmt = null; try{ stmt = con.createStatement(); query="drop table if exists EncryptionT"; stmt.executeUpdate(query); query = "CREATE TABLE EncryptionT(table_name varchar(15), " + "attribute_name varchar(15), attribute_value varchar(15), hashvalue varchar(40),index(table_name, attribute_name))"; stmt.executeUpdate(query); query = "select * from " + tableName; rs = stmt.executeQuery(query); 35 while(rs.next()){ for(int i = 0; i < attributes.length; i++) { attribute = attributes[i]; input = rs.getString(i+1); System.out.println(input); hashValue = Base.CalHmac(input, key); writeToDb(con, tableName,attribute, input, hashValue); } } } catch (Exception e){ e.printStackTrace(); } } //write the hashed input into EncryptionT table private static void writeToDb (Connection con, String tableName, String attribute, String input, String hashInput){ try{ String query = "insert into EncryptionT values('"+ tableName +"','"+ attribute+"','" + input+ "','"+hashInput+"')"; //System.out.println(query); Statement stmt = con.createStatement(); stmt.executeUpdate(query); } catch(Exception e){ e.printStackTrace(); } } //create insertTrigger, deleteTrigger and updateTrigger public static void setTrigger(Connection con, String tableName, String [] attributes) { try { Statement stmt = con.createStatement(); //insert trigger String dropQuery = "DROP TRIGGER IF EXISTS insertTrigger"; stmt.executeUpdate(dropQuery); String insertQuery = ""; for (int i = 0; i < attributes.length; i++) { insertQuery += "INSERT INTO EncryptionT SET table_name = '"+ tableName+"', "+ "attribute_name = '"+ attributes[i] +"', "+"attribute_value = NEW."+attributes[i]+ ", “hashvalue = HMACSHA1('"+ key + "',"+"New."+attributes[i]+");"; 36 } String insertTriggerQuery = "CREATE TRIGGER insertTrigger " + "AFTER INSERT ON " + tableName + " FOR EACH ROW" + " BEGIN " + insertQuery+ " END"; stmt.executeUpdate(insertTriggerQuery); //delete trigger dropQuery = "DROP TRIGGER IF EXISTS deleteTrigger"; stmt.executeUpdate(dropQuery); String deleteQuery = ""; for (int i = 0; i < attributes.length; i++) { deleteQuery += "DELETE FROM EncryptionT WHERE table_name = '" + tableName + "' and "+ "attribute_name = '"+ attributes[i] + "' and "+"attribute_value = OLD."+ attributes[i] + " ;"; } String deleteTriggerQuery = "CREATE TRIGGER deleteTrigger "+ "AFTER DELETE ON "+ tableName + " FOR EACH ROW"+ " BEGIN " + deleteQuery + " END"; stmt.executeUpdate(deleteTriggerQuery); //update trigger dropQuery = "DROP TRIGGER IF EXISTS updateTrigger"; stmt.executeUpdate(dropQuery); String updateQuery = ""; for (int i = 0; i < attributes.length; i++) { updateQuery += "UPDATE EncryptionT SET attribute_value = NEW."+attributes[i]+", "+ "hashvalue = HMACSHA1('"+ key + "',"+"New."+attributes[i]+") "+"WHERE table_name = '" + tableName + "' and "+"attribute_name = '"+ attributes[i] + "' and "+"attribute_value = OLD."+ attributes[i] + " ;"; } String updateTriggerQuery = "CREATE TRIGGER updateTrigger " + "AFTER UPDATE ON " + tableName + " FOR EACH ROW" + " BEGIN " + updateQuery + 37 " END"; stmt.executeUpdate(updateTriggerQuery); } catch (Exception e){ e.printStackTrace(); } } //when there's a user input in web app, calculate the hash value for the input first, then try to find //the same hashValue in EncryptionT table, if there's a match, check if the attribute and user input value also match. public static boolean hashCompare(Connection con, String tableInput, String attriInput, String userInput){ boolean result = false; ResultSet rs = null; String input = null; //calculate hash value for user input try{ String hashedInput = Base.CalHmac(userInput, key); String query = "select attribute_value from EncryptionT where hashvalue=? and table_name=? and attribute_name=?"; PreparedStatement stmt = con.prepareStatement(query); stmt.setString(1, hashedInput); stmt.setString(2, tableInput); stmt.setString(3, attriInput); rs = stmt.executeQuery(); while(rs.next()){ input = rs.getString(1); //compare user input with return result from query if (input.equals(userInput)) { result = true; } } } catch (Exception e){ e.printStackTrace(); } return result; } @Override 38 public void run() { Connection con = Base.getConnection(); //generate a new key GetKey(); //initialize EncryptionT table String[] columns = {"username","password"}; InitializeEncryptionT(con,"user_tb",columns); System.out.println("encryptionT table updated."); //update three triggers setTrigger(con, "user_tb", columns); System.out.println("triggers updated."); Base.closeConnection(con); } } ScheduleTask.java import java.util.Calendar; import java.util.Date; import java.util.GregorianCalendar; import java.util.Timer; import java.util.TimerTask; /** * schedule a task everyday at 3am, beginning at a specific date * @author Xiaoying Shen * */ public class ScheduleTask { private final static long PERIOD = 1000*60*60*24; private final static int YEAR = 2011; //month is 0 based, January is 0 private final static int MONTH = 10; private final static int DAY = 8; private final static int HOUR = 3; private final static int MINUTE = 0; public static void main(String[] args) { System.out.println("About to schedule task."); 39 TimerTask task = new QueryCheck(); Timer timer = new Timer(); timer.scheduleAtFixedRate(task, setDate(YEAR,MONTH,DAY,HOUR,MINUTE), PERIOD); //timer.schedule(task, setDate(YEAR, MONTH, DAY, HOUR, MINUTE)); System.out.println("Task scheduled."); } //construct a GregorianCalendar with the given date and time private static Date setDate(int year, int month, int day, int hour, int minute) { Calendar result = new GregorianCalendar(year,month,day,hour,minute); return result.getTime(); } } 40 BIBLIOGRAPHY [1] Web Application Security Statistics, [Online] Available: http://www.webappsec.org/projects/statistics/ [2] W. G. Halfond, J. Viegas, and A. Orso, “A Classification of SQL-Injection Attacks and Countermeasures,” In Proceedings of the Intern. Symposium on Secure Software Engineering (ISSSE 2006), Mar. 2006. [3] Y. Huang, S. Huang, T. Lin, and C. Tsai. Web Application Security Assessment by Fault Injection and Behavior Monitoring. In Proceedings of the 11th International World Wide Web Conference (WWW 03), May 2003. [4] C. Gould, Z. Su, and P. Devanbu. JDBC Checker: A Static Analysis Tool for SQL/JDBC Applications. In Proceedings of the 26th International Conference on Software Engineering (ICSE 04) –Formal Demos, pages 697–698, 2004. [5] C. Gould, Z. Su, and P. Devanbu. Static Checking of Dynamically Generated Queries in Database Applications. In Proceedings of the 26th International Conference on Software Engineering (ICSE 04), pages 645–654, 2004. [6] W. G. Halfond and A. Orso. AMNESIA: Analysis and Monitoring for NEutralizing SQL-Injection Attacks. In Proceedings of the IEEE and ACM International Conference on Automated SoftwareEngineering (ASE 2005), Long Beach, CA, USA, Nov 2005. [7] W. G. Halfond and A. Orso. Combining Static Analysis and Runtime Monitoring to Counter SQL-Injection Attacks. In Proceedings of the Third International ICSE 41 Workshop on Dynamic Analysis (WODA 2005), pages 22–28, St. Louis, MO, USA, May 2005. [8] Y. Huang, F. Yu, C. Hang, C. H. Tsai, D. T. Lee, and S. Y. Kuo. Securing Web Application Code by Static Analysis and Runtime Protection. In Proceedings of the 12th International World Wide Web Conference (WWW 04), May 2004. [9] G. T. Buehrer, B. W. Weide, and P. A. G. Sivilotti. Using Parse Tree Validation to Prevent SQL Injection Attacks. In International Workshop on Software Engineering and Middleware (SEM), 2005. [10] S. W. Boyd and A. D. Keromytis. SQLrand: Preventing SQL Injection Attacks. In Proceedings of the 2nd Applied Cryptography and Network Security (ACNS) Conference, pages 292–302, June 2004. [11] F. Valeur, D. Mutz, and G. Vigna. A Learning-Based Approach to the Detection of SQL Attacks. In Proceedings of the Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), Vienna, Austria, July 2005. [12] Smith, B. and Williams, L. Using SQL Hotspots in a Prioritization Heuristic for Detecting Web Application Vulnerabilities. International Conference on Software Testing, Verification, and Validation (ICST), Berlin, pp. 220-229, 2011. [13] General SQL Parser Documents, [Online] http://www.sqlparser.com/document.php [14] HMAC: Keyed-Hashing for Message Authentication, [Online] Available: http://www.ietf.org/rfc/rfc2104.txt