Intrusion Detection using Sequences of System Calls By S. Hofmeyr & S. Forrest Overview Focus: privileged processes Discriminator: system call sequences Building a database: defining “normal” Detecting anomalies: how to measure Results: promising numbers Concerns: remaining doubts Extensions of research: Jones, Li & Lin Inspiration Human immune system Recognition of self Rejection of nonself How would we describe “self” for a software system, or a program? Focus and Motivation Focus on privileged processes Exploitation can give a user root access They provide a natural boundary e.g. telnet daemon, login daemon Privileged processes are easier to track Specific, limited function Stable over time Contrast with the diversity of user actions Where do we look? Need to distinguish when: Privileged process runs normally Privileged process exhibits an anomaly The discriminator is the observable entity used to distinguish between these two Use sequences of system calls as the discriminator, the signature How much detail? Discriminator is sequences of system calls Simple temporal ordering is chosen Ignore parameters Ignore specific timing information Ignore everything else! Why? As much as possible, work with simple assumptions Is it “enough”? Is it enough detail? Does the discriminator include enough detail for this hypothesis to hold? Answer seems to be yes ! Extra complication: due to the variability in configuration and use of individual systems, the set of “normal” sequences of system calls will be different on different systems Design Decisions Remember temporal ordering of calls Not total sequence, but sequences of length k What size should k be? Long enough to detect anomalies, short as possible Empirical observation: length 6 to 10 is sufficient So “self” is a database of (unordered) short call sequences Building the “normal” database Synthetic Assurance that the normal database contains no intrusions; reproducible But does not reflect any particular real user activity Actual use Necessary to generate from actual use in order to have a unique “self” How long to accumulate? Is it clean? The normal database Database of normal sequences does not contain all legal sequences If it did, anomalies would not be detected Some rare sequences will not be used during database initialization Database is stored as a forest to save space Signature Database Structure (length 3) fopen fread fopen fread strcmp fread strcmp strcmp strcmp strcmp fopen strcmp fopen fread strcmp strcmp fopen fopen fread fread fread strcmp strcmp strcmp strcmp strcmp strcmp fopen fopen fread Derive Robust Signature Database Robust Signature Database 600 Database Size 500 400 300 200 100 0 0 2000 4000 6000 Total Seqences Scanned 8000 10000 Detecting anomalies A call sequence not in the database is an anomalous sequence Strength of that anomalous sequence is measured by “Hamming distance” to the closest normal sequence (called dmin) Any call trace with an anomalous sequence is an anomalous trace Detecting anomalies Strength of an anomalous trace is the maximum dmin of the trace normalized for the value of k (length of sequences in the database): ŜA = max{dmin values for the trace} / k Value is between 0 and 1 By adjusting the threshold value for ŜA, false positives can be reduced Efficiency Complexity of computing dmin O(k(RAN + 1)) k is sequence length, RA is ratio of anomalous to normal sequences, N is the number of sequences in the database dmin is calculated after every system call The constant associated with this algorithm is very important Not yet running in real time Results (synthetic) Sanity test: If different programs are not distinguishable, anomalies within one program will certainly not be either Easy to distinguish between programs; mismatches on well more than 50% of the instruction sequences (and ŜA >= 0.6) All intrusions (both attempted & successful) produced anomalies of varying strengths Results (real environment) The conjecture of unique normal databases Experiments in two configurations (at UNM and MIT) had very different databases for the same program (lpr) Is this typical? Closing concerns False positives vs false negatives If forced to choose, UNM prefers to have false negatives because layering can mitigate Saw 1 per 100 print jobs (lpr) Due to system problems Is ŜA a good measure? It could help generate false positives Single extra system call might make ŜA = 0.5 Annex Material Some UVa experiments S. Li, Y. Lin, and A. Jones Illustrated by two attacks on Apache Varied sequence length from 2 to 30 We chose length 10 to have margin of error Normalized Anomaly Signal Signature Length Has Little Effect 1.2 1 0.8 0.6 0.4 0.2 0 0 10 20 30 Sequence Length 40 Effectiveness: Buffer Overflow High normalized #Mismatch %Mismatc Normalized Anomaly anomaly signals es hes indicate attacks Signal Stack Overwrite 467 3.5 0.7 Realpath Vulnerability 569 2.7 0.6 Successfully detected buffer overflow attacks against wu-ftpd Work well because attacker code adds new sequences of library calls Effectiveness: Denial of Service Simulated DOS attack that uses up all available memory As attack progresses, library calls requesting memory return abnormally and are re-issued DOS attack caused application to invoke new library call, fsync Program - vi #Mismat ches Normal Run 0 DOS Attack 101 No intrusion detected %Mismat Normalized Anomaly ches Signal 0 2.6normalized High anomaly signal indicates attack 0 0.6