Software Security Research Group (SSRG), University of Ottawa In collaboration with IBM Solving Some Modeling Challenges when Testing Rich Internet Applications for Security SSRG Members University of Ottawa • Prof. Guy-Vincent Jourdan • Prof. Gregor v. Bochmann • Suryakant Choudhary • Emre Dincturk • Khaled Ben Hafaiedh • Seyed M. Mir Taheri • Ali Moosavi (Master student) (PhD student) (PhD student) (PhD student) (Master student) In collaboration with Research and Development, IBM® Security AppScan® Enterprise • Iosif Viorel Onut (PhD) Introduction: Traditional Web Applications • Navigation is achieved using the links (URLs) • Synchronous communication Traditional Synchronous Communication Pattern Full Page Refresh User Interaction User Interaction User Waiting Request User Waiting Response Server Processing Full Page Refresh User Interaction Response Request Server Processing Introduction : Rich Internet Applications • More interactive and responsive web apps ▫ Page changes via client-side code (JavaScript) ▫ Asynchronous communication Asynchronous Communication Pattern (in RIAs) User Interaction Request Partial Page Update Partial Page Update Request Partial Page Update Request Response Response Response Server Processing Server Processing Crawling and web application security testing • All parts of the application must be discovered before we analyze for security. • Why automatic crawling algorithm are important for security testing ? ▫ Most RIAs are too large for manual exploration ▫ Efficiency ▫ Coverage What we present… • Techniques and Approaches to make web application security assessment tools perform better • How to improve the performance? ▫ Make them efficient by analysing only what’s important and ignore irrelevant information ▫ Making rich internet applications accessible to them. Web application crawlers • Main components: ▫ Crawling strategy Algorithm which guides the crawler ▫ State equivalence Algorithm which indicates what should be considered new State Equivalence • Client states • Decides if two client states of an application should be considered different or the same. • Why important? ▫ Infinite runs or state explosion ▫ Incomplete coverage of the application Techniques • Load-Reload: Discovering non-relevant dynamic content of web pages • Identifying Session Variables and Parameters 1. Load-Reload: Discovering non-relevant dynamic content of web pages • Extracting the relevant information from a page. What we propose • Reload the web page (URL) to determine the parts of the content that are relevant. Calculate Delta (X): Content that changed between the two loads. What we propose (2) • Delta(X): X is any web page and Delta(X) is collection of xpaths of the contents that are not relevant • E.g. Delta(X) = {html\body\div\, html\body\a\@href} Example Example (2) 2. Identifying Session Variables and Parameters • What is a session? ▫ A session is a conversation between the server and a client. ▫ Why should a session be maintained? ▫ HTTP is Stateless: When there is a series of continuous request and response from a same client to a server, the server cannot identify from which client it is getting requests. Identifying Session Variables and Parameters (2) • Session tracking methods: ▫ User authorization ▫ Hidden fields ▫ URL rewriting ▫ Cookies ▫ Session tracking API • Problems that are addressed: ▫ Redundant crawling: Might result in crawler trap or infinite runs. ▫ Session termination problem: Incomplete coverage of the application if application requires session throughout the access. What we propose • Two recordings of the log-in sequence are done on the same website, using the same user input (e.g. same user name and password) and the same user actions. Example 3. Crawling Strategies For RIAs • Crawling extracts a “model” of the application that consists of ▫ States, which are “distinct” web pages ▫ Transitions are triggered by event executions • Strategy decides how the application exploration should proceed Standard Crawling Strategies • Breadth-First and Depth-First • They are not flexible ▫ They do not adapt themselves to the application • Breadth-First often goes back to the initial page ▫ Increases the number of reloads (loading the URL) • Depth-First requires traversing long paths ▫ Increases the number of event executions What we propose • Model Based Crawling Model is an assumption about the structure of the application Specify a good strategy for crawling any application that follows the model. Specify how to adapt the crawling strategy in case that the application being crawled deviates from the model. {} What we propose (2) e2 • Existing models: ▫ Hypercube Model e1 {e1} {e2} e1 1. Independent events e2 {e1,e2} 2. The set of enabled events at a state are the same as the initial state except the ones executed to reach it. ▫ Probability Model Statistics gathered about event execution results are used to guide the application exploration strategy Conclusion • Crawling is essential for automated security testing of web applications • We introduced two techniques to enhance security testing of web applications ▫ Identifying and ignoring irrelevant web page contents ▫ Identifying and ignoring session information • We have worked on new crawling algorithms Thank You ! Demonstration • Rich Internet Application Security Testing - IBM® Security AppScan® Enterprise DEMO – IBM® Security AppScan® Enterprise • IBM Security AppScan Enterprise is an automated web application scanner • We added RIA crawling capability on a prototype of AppScan • We will demo how the coverage of the tool increases with RIA crawling capability DEMO – Test Site (Altoro Mutual) DEMO – Results • Without RIA Crawling DEMO - Results • With RIA Crawling