Improving Self-Defense by Learning from Limited Experience Karen Haigh BBN Technologies Steven Harp Adventium Labs Haigh & Harp, Learning from Limited Experience Overview • Goal: Systems that autonomously improve their defenses with experience. • Several ways to do this... • Examples discussed: – Learning to recognize anomalies – Self Immunizing against observed exploits – Acquiring multistage attacks concepts – Learning effective responses Haigh & Harp, Learning from Limited Experience Learning in Cyber Security • What is (machine) learning? – Automatically using prior experience to improve performance over time • Problems addressable by learning? – Detection: distinguish problem from non-problem – Immunity: • Good: “an exploit should succeed at most once” • Better: “a vulnerability should be exploitable at most once” – Response: how best to actively counter an attack? Long Term Goal: Cognitive Immunity Haigh & Harp, Learning from Limited Experience Opportunities & Techniques Detecting Attacks Passive Observation Relatively well explored. Example: anomaly detection Work remains to be done to detect attacks extended over multiple hosts and steps. Responding to attacks Not well explored. CSISM innovation: Situation-dependent utility of responses. Cortex innovation: Use Not well explored (e.g. in sandbox / experiments to generalize from CSISM innovation: instances of attacks to classes of Variations on responses taster / attacks. laboratory) CSISM innovation: Use experiments to identify necessary & sufficient elements of multi-step attacks. Experiment Haigh & Harp, Learning from Limited Experience Modelling Defended Systems • Expert Rules • Offline Learning • Online Learning Experimental Sandbox Offline Training + Good data Online Training - Unknown data + Complex environment - Dynamic system + Complex environment + Dynamic system Expert Heuristics + Good data Experimental Sandbox + Good data (self-labeled) - Complex environment - Dynamic system + Complex environment + Dynamic system Haigh & Harp, Learning from Limited Experience Very hard for adversary to “train” the learner!!! Complex Domain: Human Rules are Incomplete Quad 0&1 are slower than Quads 2&3. Complex domain: human calibration (incorrectly) claimed that Quad 1 was slowest, missing Quad 0 Time by Quad Experience DPASA (DARPA OASIS) Haigh & Harp, Learning from Limited Registration Complex Domain (2) caf_plan, chem_haz and maf_plan are slower than other clients Complex domain: human calibration (incorrectly) claimed that caf_plan & maf_plan were slowest because of hand-typed password, missing chem_haz Registration Limited Experience Time by Client Type DPASA (DARPA OASIS) Haigh & Harp, Learning from Learning for Calibration • Calibrate the parameters of rules for normal operating conditions – Important first step because it learns how to respond to normal conditions – For example: learn timing parameters for rapid response controller, e.g. • Client Registration, PSQ server local probes, SELinux enforcement, SELinux flapping, File integrity checks – Need to handle multi-modal data: CSISM / BBN Haigh & Harp, Learning from Limited Experience Results for all Registration times These two “shoulder” points indicate Beta=0.00 upper 05 and lower limits. As more observations are collected, the estimates become more confident of the range of expected values (i.e. tighter estimates to observations) CSISM / BBN Haigh & Harp, Learning from Limited ExperienceAlgorithm of Last & Kandel, 2001 Generalization of Attack Signatures Cortex Project Haigh & Harp, Learning from Limited Experience Generalization • Goal: Learn a most general concept from instances of attacks and block all similar attacks against the vulnerability. Dealing with Zero-day attacks... • Payload Analysis Challenges – How to automatically recognize which element(s) of an attack are essential? – How to generalize them to their boundary conditions? • avoid the fragility of simple pattern matching rules • Approach: Experimentation – Validation of attack concepts 0 false positives Cortex / Honeywell Haigh & Harp, Learning from Limited Experience Generalization by Experimentation Model of normal traffic Taste Tester Experiment 1) Score suspicious elements 2) Replace with innocuous or generalized values 3) Validate in tester Model contains axes of vulnerability Cortex / Honeywell Blocking Rules • Payload content – Binary machine instructions – Unusual payload (e.g. unix commands, registry keys, database administrative commands) – Length (# bytes/terms) • Resource consumption patterns • Probing (e.g. password guessing) • Session-wide (multiple queries) Haigh & Harp, Learning from Limited Experience Cortex Demo Architecture and Use Cases Normal Query AMP Query Mission Planning CSM Master DB Once per phase Proxy (Dexter) Block known bad queries Taste test Log results RTS . Replicate Switch Tasters Rebuild Tasters Send to Learning Replicator Replicate queries Switch Tasters Create tasters Delete tasters Heartbeat Status Learner Read Training Data Experiment Generate Rules Cortex / Honeywell Haigh & Harp, Learning from Limited Experience Tasters Tasters Tasters Cortex Demo Architecture and Use Cases Attack is through blocked Attack gets Query CSM Proxy (Dexter) Block known bad queries Taste test Log results AMP RTS . Replicate Switch Tasters Rebuild Tasters Send to Learning Master DB Replicator Replicate queries Switch Tasters Create tasters Delete tasters Heartbeat Status Learner Read Training Data Experiment Generate Rules Cortex / Honeywell Haigh & Harp, Learning from Limited Experience Tasters Tasters Tasters Example Results: MySQL Attacks Notes String buffer overflow (password) Correctly generalized single attack to number of valid bytes. Integer overflow Correctly generalized single attack to 0x7FFF max value MySQL DOS attack Noted that hex bytes were suspicious, so generalized bytes and correctly blocked integer overflow! Project was tested with a red-team model Cortex / Honeywell Haigh & Harp, Learning from Limited Experience Identification of Multistage Attacks CSISM Project Haigh & Harp, Learning from Limited Experience MultiStage Attacks: Challenges • Detect and generalize multi-step attacks across time and space. – Multistage attacks involve a sequence of actions that span multiple hosts and take multiple steps to succeed. • Challenges: – Which observations are necessary & sufficient? • Incidental observations that are either – side effects of normal operations, or – chaff explicitly added by an attacker to divert the defender. • Concealment (e.g. to remove evidence) • Probabilistic actions (e.g. to improve probability of attack success) – What are the most reliable observations? – What are the parameter boundaries? • Approach: Experimentation – Allows validation of pruning CSISM / BBN Haigh & Harp, Learning from Limited Experience Architectural Schema 2 Observations A Actions CSISM Sensors (ILC, IDS) “Sandbox” 1 2 3 4 5 6 Observations ending in failure of protected system. Only some are essential. B C A A C B C A B D Attack Theory Experimenter Defense Measures Experimenter 1 2 3 4 5 6 A B C D A B X C ? Viable Attack Theories CSISM / BBN Haigh & Harp, Learning from Limited Experience Viable Defense Strategies and Detection Rules Multi-Stage Learner • Do { – Generate Theory according to heuristic The hard part! • Complete set of theories is Permutations( Powerset( observations )) – Test Theory – Incrementally update controller rulebases • } while Theories remain • For only 10 observations, there are > 10,000,000 possible theories (not including variations on steps!) CSISM / BBN Haigh & Harp, Learning from Limited Experience Hypothesis Generation • Query learner generates attack hypotheses – in heuristic order to acquire the concept rapidly • Candidate Heuristics – Look for shorter attacks first (adjustable prior) – Suspect order of steps has an influence – Suspect steps to interact positively (for the attacker) – Prefer hypotheses with less common / more suspicious elements Project was tested with a red-team model CSISM / BBN 22 Response Learning CSISM Project Haigh & Harp, Learning from Limited Experience Situation-dependent Action Utilities • Learn tradeoffs among potential responses; context changes appropriateness of responses changes – Context includes descriptions of users, attack elements, system performance, etc – Benefit is effectiveness of defense action – Cost includes effort to mount response and impact on availability • Challenges: – Measuring the effect of responses is hard: • Complex domain rarely identical situations non-deterministic actions/effects • Approach: Experimentation – System “snapshots” get close to identical conditions CSISM / BBN Haigh & Harp, Learning from Limited Experience Response Learning: Results Pending • Bias toward results that worked in similar situations in the past – Hybrid Reinforcement learning and Nearest-Neighbour approaches • Given a set of hypotheses about the locus of an attack – Search for true locus: • Hierarchical based on system architecture • Bias by historical attack patterns – Select response based on similarity match to prior attacks: • Same response when quality was high • Alternate response when quality was low Project will be tested by a red-team on 20 May 2008. Goal is to demonstrate “better” responses over time. CSISM / BBN Haigh & Harp, Learning from Limited Experience Conclusion Haigh & Harp, Learning from Limited Experience Learning Benefits • Learning can improve the defensive posture – better knowledge (about the attacks or attacker), better policies • Learning can improve how the system responds to symptoms – better connection between response actions and their triggers • Active Learning – A mechanism for recognizing Zero-day attacks – No false positives — only validated attacks are added • Learning techniques are enablers for the next level of enhancements in adaptive defense Adaptation is the key to survival Haigh & Harp, Learning from Limited Experience From Proof-of-Concept to Production Demonstrated Future Directions Generalization Able to generalize instances to classes. •More axes of vulnerability •More handling of joint probabilities •More domains •Meta learning to induce new axes Multi-stage attack •Probabilistic actions Able to identify Chaff •Concealment Responses Able to map context to response •Model of normal •Generalization •Richer context, richer responses •Automatic measurement of benefit •Scalable “snapshots” Haigh & Harp, Learning from Limited Experience Backup Haigh & Harp, Learning from Limited Experience Multistage Attacks • • Detect and then generalize multi-step attacks across time and space. Multistage attacks involve a sequence of actions that span multiple hosts and take multiple steps to succeed. – – – A sequence of actions with causal relationships. An action A must occur set up the initial conditions for action B. Action B would have no effect without previously executing action A. For example 1. gain ability to execute commands on Box1 as unprivileged user by exploiting a buffer overflow in Service1 2. gain root shell by running an exploit of a race condition 3. disable protection mechanism, e.g. SElinux 4. replace dpasa jar with attacker jar code 5. run attacker code that sends bad refs to Box2, Box3, Box4. Walk-Away-Message Haigh & Harp, Learning from Limited Experience Attacks (MySQL DoS-1) • mysql-com_table-dump-memory-corruption – Malformed request leaves MySQL unstable • Countermeasures: – Block the malformed com_table_dump command using learned pattern and proxy filter rules. – Restart the server – Block all requests from the offending sources Haigh & Harp, Learning from Limited Experience Attacks (MySQL DoS-2) • mysql-password-handler-buffer-overflow – Excessive password length can crash server • Countermeasures: – Block connections which proffer “abnormal” passwords (learned response or statistical anomaly). – Restart the server. – Block all requests from the offending sources. Haigh & Harp, Learning from Limited Experience Attacks (MySQL DoS-3) • mysql-remote-fulltext-search-DoS – Malformed request crashes server • Countermeasures: – Detect and block malformed queries – Block all queries of this type (fulltext-search) – Block all requests from the offending sources. – Restart the server Haigh & Harp, Learning from Limited Experience