Identifying Malicious Web Requests through Changes in Locality and Temporal Sequence DIMACS Workshop on Security of Web Services and E-Commerce Li-Chiou Chen lchen@pace.edu School of Computer Science and Information Systems Pace University May 4th, 2005 Needs for anomaly detection in distributed network traces The fast spreading Internet worms or malicious programs interrupts web services Early detection and response is a vital approach These attacks are usually launched from distributed locations Network traces left at distributed locations are invaluable for searching clues of potential future attacks E.g. Dshield, the Honeynet Project © Li-Chiou Chen, 5/6/2005 2 Types of IDS Based on data Network-based IDS Host-based IDS Monitors and inspects network traffic Runs on a single host Based on detection techniques Signature-based IDS Uses pattern matching to identify known attacks Anomaly-based IDS Uses statistical, data mining or other techniques to distinguish normal from abnormal activities © Li-Chiou Chen, 5/6/2005 3 Outline Toolkits for inferring anomaly patterns from distributed network traces Previous works Changes of locality over time Markov chain analysis Preliminary results Summary Future works © Li-Chiou Chen, 5/6/2005 4 TIAP: Toolkits for inferring anomalous patterns in distributed network traces Network traces (web log, tcpdump, etc) Data conversion Locality pattern analysis Sequence pattern analysis Alerts from other IDS or TIAP peers (using IDMEF) Response module Alerts to other IDS or TIAP peers (using IDMEF) © Li-Chiou Chen, 5/6/2005 Alerts to administrators 5 Web level IDS Anomaly detection Structure of a HTTP request (Kruegel and Vigna 03) Normality on streams of data access patterns (Sion et al 03) Misuse detection State transition analysis of HTTP requests (Vigna et al 03) Look for attack signatures (Almgren et al 01) © Li-Chiou Chen, 5/6/2005 6 Changes in locality patterns and temporal sequence patterns Locality where the web request is sent, such as the source IP address, which web server is requested, such as the destination IP address Temporal sequence the order of requested objects during a given period of time © Li-Chiou Chen, 5/6/2005 7 Locality pattern analysis in distributed network traces ABAA ABCD KIKL ABPO t1: AB t2: .... t3: …. t4: …. © Li-Chiou Chen, 5/6/2005 8 An example: web traces in common log format from 6 web servers S1 S2 S3 S4 S5 S6 tstamp, ip, server, doc_tpe, user_agent 62978, 38.0.69.1, 1, 2, 3 62979, 38.0.69.1, 1, 2, 3 A session 62979, 38.0.69.1, 2, 2, 3 63001, 38.0.69.1, 1, 2, 3 …….. ……… © Li-Chiou Chen, 5/6/2005 9 Data profiles 6 web servers (2 of them have links to each other, 4 of them are independent) One day web trace One session: a distinct IP, 10 minutes interval 193,070 HTTP requests, 11,177 sessions HTTP requests from outside of the organization © Li-Chiou Chen, 5/6/2005 10 Number of web site accessed 1 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 Number of document type accessed 1 2 3 4 5 6 7 1 2 3 4 2 3 4 2 3 4 © Li-Chiou Chen, 5/6/2005 % browser 99% 22% 94% 93% 0% 100% 100% 0% 12% 0% 0% 0% 0% 0% 0% 0% 0% % web bot 1% 78% 6% 7% 100% 0% 0% 100% 88% 100% 100% 100% 100% 100% 100% 100% 100% Locality pattern analysis 86 sessions by only two web bots 11 Markov chain analysis X X YY Y XX ZZ XX ZZZ W W W N S NS S OS NS OS NSS N S S N S NS S OS NS OS t1 t2 t6 t7 t8 t9 t10 t11 t3 t4 t5 sampling window 1 NSS © Li-Chiou Chen, 5/6/2005 S ………………….. ………………….. t14 t15 t16 ……………. t12 t13 sampling window 2 N O N S ………………….. N S O S 12 Data profiles 1 web servers One week web traces Window size 30 Reference list 30 © Li-Chiou Chen, 5/6/2005 13 Change of distinct IP over time- browsers Number of unqie IP per five minutes 100 90 80 70 60 50 40 30 20 10 0 0 24 48 72 96 120 144 168 192 Hours (since 09/30/2004 0:00AM) © Li-Chiou Chen, 5/6/2005 14 Change of distinct IP over time- web bots Number of unique IP per five minutes 25 20 15 10 5 0 0 24 48 72 96 120 144 168 192 Hours (since 09/30/2004 00:00AM) © Li-Chiou Chen, 5/6/2005 15 Markov chain results 0.43(0.14) Old (O) 0.42(0.21) 0.43(0.17) 0.13 (0.10) New (N) 0.18 (0.16) © Li-Chiou Chen, 5/6/2005 0.13 (0.08) 0.40 (0.22) 0.06 (0.04) Same (S) 0.83 (0.10) 16 Illustration of the state transition probability 1 0.9 0.8 Probability 0.7 0.6 S->S S->O S->N 0.5 0.4 0.3 0.2 0.1 0 0 24 48 72 96 120 144 168 Hours (since 09/30/04, 0:00AM) © Li-Chiou Chen, 5/6/2005 17 Summary The preliminary locality pattern analysis works well with identifying distinct web bot access patterns The Markov chain analysis provides a way to infer attacks that utilize random IP addresses A combination of the two approaches is needed © Li-Chiou Chen, 5/6/2005 18 Ongoing works Incorporate the analytical results for malware or intrusion detections A distributed framework of data collection and information sharing for inferring malwares or intrusion attempts across servers/platforms/geographical locations Collection of attack logs for analytical purpose Use of the Intrusion Detection Message Exchange Format (IDMEF) for message changes among servers © Li-Chiou Chen, 5/6/2005 19