Report on statistical Intrusion Detection systems By Ganesh Godavari Outline of the talk • Intrusion Detection • Motivation • Approaches for intrusion detection Intrusion Detection & Data Fusion • Intrusion Detection System – Protect and provide availability, confidentiality and integrity of critical information infrastructures • Data Fusion : task of data processing aiming at making decisions on the basis of distributed data sources specifying an object Motivation & challenges • Threat analysis – Known & unknown Pattern templates, traffic analysis, statistical-anomaly detection and state based detection • Provide Reliability – Reduce false alarms, increase user confidence Characteristics of IDS • Key to an IDS – Minimize the occurrence of non-justified alerts (false-positive) – Maximize accurate alerts (true-positive) • Some of the methods – Data mining – Statistical – Signature based or rule based Signature based method • Signature based IDS is as strong as its rule-sets • If X events of interest are detected across a Ysized time window – raise an alert • Advantages – Potential for low alarm rates – Accuracy of detection – Detailed textual log • Disadvantages – Need to update rules every time – Inability to detect new and previously unidentified attacks Statistical-Based Intrusion Detection (SBID) • Determine the normal network activity all network traffic pattern outside the normal scope is not normal • SBID system relies on statistical models like Bayes’ theorem to detect anomalous packets on the network disadvantages • SBID system must learn what is normal traffic for a particular network • Longer time to adapt and cannot be handy in smaller run unlike signature based intrusion detection system • If Normal traffic is malicious SBID system will be rendered useless • Alerts produced have no meaning to untrained eye Snort IDS • Snort – popular open IDS – uses signature and statistical based intrusion detection • Statistical based intrusion detection is provided by SPADE preprocessor plugin SPADE • Statistical Packet Anomaly Detection Engine – Silicon defence – Probability measurements for anomalous packet detection – Anomaly score determined by evaluating • Source IP • Destination IP • Destination port … contd.. • Spade – Automatically adjust threshold settings to reduce false positives – Generate reports about distribution of anomaly scores. Spade alerts [**] [104:1:1] spp_anomsensor: Anomaly threshold exceeded: 3.8919 [**] 08/22-22:37:00.419813 24.234.114.96:3246 -> VICTIM.HOST:80 TCP TTL:116 TOS:0x0 ID:25395 IpLen:20 DgmLen:48 DF ******S* Seq: 0xEBCF8EB7 Ack: 0x0 Win: 0x4000 TcpLen: 28 TCP Options (4) => MSS: 1460 NOP NOP SackOK The alert is an attempt to connect to a local web server. There is not a web server at the VICTIM.HOST address, so this is unusual activity. Yet, Spade did not flag this packet with a high anomaly score. In this specific case, the low anomaly score is likely due to the Code Red epidemic. The anomaly score of this packet is very low because the system had become accustomed to seeing traffic to port 80. Spade clearly thought this packet was not exceedingly anomalous activity (instead, Spade likened the port 80 request to the scenario where the newspaper landed on the driveway, which was anomalous, but not particularly unusual). [**] [104:1:1] spp_anomsensor: Anomaly threshold exceeded: 10.5464 [**] 08/22-22:22:46.577210 24.41.81.216:2065 -> VICTIM.HOST:27374 TCP TTL:108 TOS:0x0 ID:10314 IpLen:20 DgmLen:48 DF ******S* Seq: 0x63B97FE2 Ack: 0x0 Win: 0x4000 TcpLen: 28 TCP Options (4) => MSS: 1460 NOP NOP SackOK The packet shows a highly anomalous trace. With a score of 10.5464, this packet is extremely unique to the network. When looking at the destination port, it becomes clear why this packet should not be transmitted to the network. Simply, there are no services on the network utilizing the 27374 port. In fact, upon further investigation, it is realized that this port is usually associated with the Sub Seven Trojan [22]. Therefore, the packet warrants investigation, and Spade correctly associated a high anomaly score to the trace. Survey log The survey log listed below displays the distribution of anomaly scores over time (60 minutes). The file shows the hour relative to the execution of the Spade program, the total number of packets of the specified hour, the average anomaly score (Median Anom), the 90th percentile, and the the 99th percentile anomaly scores. Logfile.txt 392 packets recorded 51 packets reported as alerts Threshold learning results: top 200 anomaly scores over 23.58361 hours Suggested threshold based on observation: 3.522590 Top scores: 3.52317, 3.52433, 3.52549, 3.52665, 3.52782, 3.52898, 3.53015, 3.53132, 3.53249, 3.53366, 3.53483, 3.53601, 3.53718, 3.53836, 3.53954, 3.54072, 3….10.29728, 10.33199,10.33308,10.33351,10.33360,10.33362 First runner up is 3.52201, so use threshold between 3.52201 and 3.52317 for 8.523 packets/hr H(dip)=5.30397479 H(dport|dip)=9.69742991 P(dip=44044824)= 0.064466877730 P(dip=44044824,dport=1)= 0.000062047043 P(dip=44044824,dport=2)= 0.000077558804 P(dip=44044824,dport=3)= 0.000062047043 P(dip=44044824,dport=4)= 0.000062047043 P(dip=44044824,dport=5)= 0.000062047043 Initially, the log displays basic packet statistics and the threshold learning results. This log shows how and why Spade is determining a certain threshold for a particular time. Towards the bottom of this file probability statistics are listed where H = entropy, dip = destination IP, dport = destination port, and P = probability. Questions ? References • http://www.silicondefence.com/ • http://www.sans.org/resources/idfaq/statisti cs_ids.php