Process Detection George Cybenko Dartmouth gvc@dartmouth.edu 1 Acknowledgements Current Members George Bakos Alumni Alex Barsamian Marion Bates Naomi Fox (UMass, Ph.D. student) Vincent Berk Hrithik Govardhan (Rocket) Chad Behre* Robert Gray (BAE Systems) Wayne Chung* Diego Hernando (UIUC, Ph.D. student) Valentino Crespi (Prof. Cal State LA) Guofei Jiang (NEC Research) George Cybenko Alex Jordan (BAE Systems) Ian deSouza Han Li (China Shipping Corp) Annarita Giani* Josh Peteet (Greylock Partners) Doug Madory* Chris Roblee (LLNL) Glenn Nofsinger* Robert Savell Jan-Peter Schutt* * graduate students Yong Sheng* William Stearns Research Support: DHS, ARDA, AFOSR, NGA, DARPA Cybenko 2 Overview of Lectures 1. Process modeling 2. Process detection, theory 3. Software and applications 3 Why be interested in this.... • Sensor networks • Airborne plume detection • Cyber security • Autonomic server pool management • Dynamics of social networks 400000 • Genomics and biological pathways* Total Successful Requests 350000 300000 250000 200000 150000 100000 50000 0 0 • Human situation awareness* *Possible applications. Cybenko 100 200 300 400 500 Time (s) 4 Overview • Lecture 1: Process models – Notion of "state" – Differential equations – State Machines and Automata – Probabilistic and quantum states – Constructing state representations – Some 5 Newton's Big Idea(s) Calculus Laws of Physics Concept of "state" Isaac Newton 6 Contrast with Aristotle Nature consists of objects and “rules” Examples Ancient law (religious and civil) Astronomical observations Superstition Crisis - could not explain the natural world 7 A Closer Look at F=ma 8 A Closer Look at F=ma 9 A Closer Look at F=ma Previous state Next state Dynamics Input 10 A Closer Look at F=ma ua Concept of state: the future evolution of the system depends only on the current state and future inputs. sm IE, the past's influence on the future is totally summarized by the state. si ub sn The next state is determined by the current state and the current input (or control, etc). 11 Outputs/Observables Inputs, u Forces Black Box: States may not be observable by an external agent x =(Position, Momentum) Outputs, y Position only 12 Automaton Alan Turing 13 Graphical Depiction of Automata 0 Start State a v u v u d c b u 1 1 1 v u,v Q = States = { a , b , c , d }, X = { u , v } , Y = { 0 , 1 } d and b shown in graph 14 Caution/Nuisance • Some models of automata have observables generated by state occupancy • Other models have observables generated by state transitions • There are simple mechanisms for transforming one to the other....they are equivalent. 15 Automata and Languages • The set of all possible finite length outputs of the previous example are a "language" • The language can be represented by a regular expression - (0*1|0*11|0*111)* • "Classical relationship" between regular languages and nondeterministic finite automata ie, given one, construct the other (Kleene's Theorem) • How about constructing an automaton from the input-output relationship? 16 Nerode Equivalence • Theorem: Every causal, time-invariant system has a state space description. • "Constructive" proof: – use the input-output description of a system – two finite length input strings belong to the same equivalence class if all the corresponding outputs (beyond the inputs' lengths) are the same – ie, if inputs w1w2 and w3w2 have outputs z1z2 and z3z2 for all w2 then w1 is equiv to w3 – the resulting equivalence classes are the states 17 Partial Differential Equations 18 Quantum Mechanical Systems x(t ) i Hx (t ) t 19 Other process formalisms • A Petri Net (PN) is given a state by marking its places. • Marking of a PN consists of assigning a nonnegative integer to each place. – Graphically, tokens are inserted in places of a PN • Input place - arrow goes from the place to the transition • Output place - arrow goes from the transition to the place Concurrency Examples R. Apcar, E. Chiu, H. Jerejian 20 Definitions • A transition may have one or more Input and Output places • A transition is enabled if there is at least one token in each of its input places. • An Enabled transition may fire: – one token is removed from each input place and one token is inserted in each ouput place of the transition Concurrency Examples R. Apcar, E. Chiu, H. Jerejian 21 An example Concurrency Examples R. Apcar, E. Chiu, H. Jerejian 22 Example continued Concurrency Examples R. Apcar, E. Chiu, H. Jerejian 23 A “Process” has... • • • • • Hidden states (discrete or continuous) State transitions (nondeterministic, probabilistic) Observables/events Relationship between observables and states An algorithm to “score” observations/events to state sequences assignments • Examples: – – – – – – Nondeterministic automata Hidden Markov Models Petri Nets Linear Systems Nonlinear Systems etc 24 Models for Organizational Processes (W. Chung, J.-P. Schutt, R. Savell, G. Cybenko) Observables of the Process A A B B A B A asks B to join a project ENRON, Ebay, etc “Static” Analysis B accepts A adds B to a list of recipients AB, C, … Dynamics of the Process “Dynamic” Analysis 25 Example of a Multistage Process Model in Computer Security Potential malicious activity snort alerts Potential normal activity Scanned Data Access Samba Start/Normal Tripwire Infected ftp, covert channel, etc Exfiltration 26 Cybenko Real time Fish Tracking • Objective: Track several fish in the fish tank • Why: Very strong example of the power of PQS – Fish swim very quickly and erratically – Lots of missed observations – Lots of noise – Classical Kalman filters don’t work (non-linear movement and acceleration) – “Easier” than getting permission to track people (we mistakenly thought) Cybenko 27 Fish Tracking Details • 5 Gallon tank with 2 red Platys named Bubble and Squeak • Camera generates a stream of “centroids”: For each frame a series of (X,Y) pairs is generated. • Model describes the kinematics of a fish: The model evaluates if new (X,Y) pairs could belong to the same fish, based on measured position, momentum, and predicted next position. This way, multiple “tracks” are formed. One for each object. • Model was built in under 3 days!!! Cybenko 28 Kinematic Tracking (2) Model: the motion of a feature moving at "human" speed: The model evaluates if new (X,Y) pairs could belong to the same hot spot, based on measured position, momentum, and predicted next position. This way, multiple “tracks” are formed. One for each object. Sensors: Infrared video camera provides datastream Camera generates a stream of “centroids” For each frame a series of (X,Y) pairs is generated. 29 An Example of a Process A “Process” Model Two states - { 1 , 2 } a b 1 2 Two observables – { a , b } Legal transitions between states are depicted by arrows. When occupying a state, the process emits an observable. All states are initial/start states and there are no terminal states. Some legal sequences of observables: abbab , bababbb, abbb Some illegal sequences of observables: aa , baab Further reading: Automata Theory, Regular Languages, etc 30 A More Complex Process Another “Process” Model a,c b a,c 1 2 3 Three states - { 1 , 2 , 3 } Three observables – { a , b , c } Some legal sequences of observables: abab , babaccab, ab Some illegal sequences of observables: bb , baabb Problem: Given a sequence of possible observations is it legal? What states? Solution: 1 Read the first observable, mark states that emit that observable 2 Read an observable, z 3 New marked states = (states reachable from old marked states) intersected with (states that could have emitted z ) 4 If no new marked states, illegal sequence; else go to 2 31 Extensions: Hidden Markov Model (HMM) p(a|1) = 0.8 , p(c|1) = 0.2 p(b|2) = 1 0.8 1 Add probabilities 1 p(a|3) = 0.8, p(c|3) = 0.2 3 2 0.2 0.5 0.5 Hidden Markov Models consist of two ingredients: - the dynamics: state transition probabilities in a Markov chains - the emissions: p(observation|state) Given a sequence of observations of length t, what are the possible states at time t? Unlike the case for a nondeterministic automaton, all we can say in general for an HMM is what the probability distribution on states is. 32 Extensions: Hidden Markov Model (HMM) p(a|1) = 0.8 , p(c|1) = 0.2 p(b|2) = 1 0.8 1 1 p(a|3) = 0.8, p(c|3) = 0.2 3 2 0.2 0.5 0.5 Probability distribution at time t+1 is obtained by combining: - propagation of the distribution from time t using only the dynamics - factoring in the observation observed at time t+1 33 Two Simple Processes Model Instance A Model Instance B a b A1 A2 a b B1 B2 aabb is a legal observation sequence A1 B1 A2 A2 , A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all legal state sequences A1 A2 A2 B1 , A1 A2 B1 B2 , A1 B1 B2 B2 We can reduce this to a single process.... a track a hypothesis 34 Multiple Process Representation A1 B1 Model Instance A Model Instance A Model Instance B a b A1 A2 a b A1 A2 a b B1 B2 0 1 M= MxM= 0 0 0 1 A1 B1 1 1 0 0 0 1 1 0 1 1 1 1 1 1 If the observation sequence is aaaaaa and multiple copies of the model are allowed, then we get a product model of size 2n. 35 A Simple Example of Process Detection a,b,c,d are events that can be observed {a} {b} {b,c} {c,d} A B C D NETWORK WORM MODEL (NW) (a,b,c,d ICMP traffic levels) {a} E {b} F • a,b,c,d are events that can be observed • states A, B, C, D, E, F are hidden • observe a sequence of events Sequence Hypotheses • ab NW | RF • abab (NW & NW)|(RF&NW)... E,F = 0 • ababc (NW & RF)|(NW & NW) repeat • ababcc read eventNW e & NW if e==a then E • Which process or combination of if E and e==b then F until F processes explains the observed events? ROUTER FAILURE MODEL (RF) Two models; states have different semantics; sets of observables intersect – what is the “diagnosis”?36 Cybenko Key Questions • How is a process model built? – from first principles – from expert insights – from data (lots) • Given an event sequence, is it feasible or what is its probability? • Given an event sequence, estimate the current state • Given an event sequence, estimate the state sequence • How good are those estimates (ie variance) 37 Homework Problems What are the states, dynamics and observables of the following processes: – intercontinental ballistic missile – soccer, American football, baseball games – Avian bird flu epidemic – terrorist cell – blogosphere – US/global economy – poker – romance 38 39 40 41 42 43 44 45 Overview • Lecture 2: Detecting processes – What does detection of processes mean? – Automata – Hidden Markov Models – Kalman filtering – Particle filters 46 Process Detection Problems • Given a sequence of observations... • What is the current state of the process? • What is the probability distribution on the states? • What are the most likely state sequences? • What is the uncertainty/error of the estimates? 47 Graphical Depiction of Automata 0 Start State a v u v u d c b u 1 1 1 v u,v Q = States = { a , b , c , d }, X = { u , v } , Y = { 0 , 1 } d and b shown in graph 48 Input-Output Description 0 Start State a v u v u d c b u 1 1 1 v u,v uuuu uuvu vuuuu vvuuuu uvvuuuu ..... 01010 01001 001010 0001010 01101010 a b c d f v = vv = uu = uvv = ... u = vu = vuuu = .... uv = vuv = vuuuv = ... uvu = vuvu = vvuvu = ... 49 Estimating states in an automaton a 1 a Observe a 1 a Observe ab 1 a Observe ac 1 a Observe acb 1 b a,c 2 3 b a,c 2 3 b a,c 2 3 b a,c 2 3 b a,c 2 3 Sequences: 12, 32 Sequences: 33 Sequences: 332 50 Commentary • Trivial algorithm.... • Interesting question: What is the worst case growth of states sequences? Tomorrow. • No probabilities, only possibilities. • What if we add probabilities? 51 Simplest Hidden Markov Model b1(u) = 0.9, b1(v) = 0.1 a11 = 0.7 1 a21 = 0.1 a12 = 0.3 p(1)=0.5, p(2)=0.5 are initial probabilities 2 a22 = 0.9 b2(u) = 0.1, b2(v) = 0.9 52 Applications of HMM's • • • • • • • • Speech recognition Gene sequencing Motion modeling and detection Pattern recognition (OCR) Darpa Grand Challenge (autonomic systems) etc etc etc 53 Estimating States b1(u) = 0.9, b1(v) = 0.1 a11 = 0.7 1 a21 = 0.1 a12 = 0.3 p(1)=0.5, p(2)=0.5 are initial probabilities 2 a22 = 0.9 b2(u) = 0.1, b2(v) = 0.9 54 Estimating Another State b1(u) = 0.9, b1(v) = 0.1 a11 = 0.7 1 a21 = 0.1 a12 = 0.3 p(1)=0.5, p(2)=0.5 are initial probabilities 2 a22 = 0.9 b2(u) = 0.1, b2(v) = 0.9 Propagate using dynamics Factor in the observation 55 Sequences of Observations Time 1 States 2 3 4 O 2= v O 3= u O 4= v 5 1 2 Observations O1 = u O 5= v Problems: Given a sequence of observations O1O2O3 ... 1. What is the most likely state at time t ? 2. What is the most likely state sequence over all time ? 3. What is the probability of the observation sequence? 56 Best state vs best sequence b1(u) = 0.9, b1(v) = 0.1 a11 = 0.7 1 a21 = 0 a12 = 0.3 p(1)=0.5, p(2)=0.5 are initial probabilities 2 a22 = 1 b2(u) = 0, b2(v) = 1 Observe v - most likely state is 2 Observe u next - must be in state 1 but no transition from 2 to 1 is possible The sequence vu could only have been produced by starting and staying in state 1 57 Probability of the Observations Time 1 States 2 3 4 O 2= v O 3= u O 4= v 5 1 2 Observations O 1= u O 5= v 58 Optimal Sequences Time 1 States 2 3 4 O 2= v O 3= u O4= v 5 1 2 Observations O 1= u O 5= v 59 Viterbi's Algorithm • These computations were discovered by A. Viterbi, a founder of Qualcomm. • The algorithms are used in all modern cell phones and telecom devices in general. Noisy Channel Source sequence 11221212122212 Decode Receive 11221212222212 uvvuvuvvuvuvvv 60 Other issues for HMM • Learning an HMM -ie. what are the various probabilities? – Baum/Welch Algorithm – variational algorithms • Finite, discrete state spaces 61 How about continuous state spaces? • Major challenge – in the finite, discrete case (HMM), we can represent and store the whole probability distribution as an n-vector – what continuous state probability distributions have simple representations? • Gaussians - mean and variance specify them – what if the distribution is more general than a Gaussian? 62 Madory's Goats • Goat herder • Herd state is the number of infant females, adult females, infant males and adult females • Dynamics are generation to generation: how many infant females and males are born, how many infants of each gender become adults and how many adults survive • Observables are goat milk revenues and goat baby inoculation costs - these are noisy • Problem: estimate total number of goats and number of adult females (Example and code due to Doug Madory) 63 64 Quantification of the State 65 Quantification of the Dynamics 66 Quantification of Observations 67 68 Basic Concept in Kalman Filtering • Use the fact that the sum of variables with Gaussian distributions is also Gaussian • Gaussian is characterized by mean and variance • Use dynamics to predict the next state • Use measurement (observation) to correct that prediction • Update the error covariance (ie confidence in the estimate) 69 70 71 Kalman Equations and Geometry 72 Extensions • To nonlinear systems (linearize locally) • Learn the system dynamics • Use the estimates to control the state (feedback) • To non-Gaussian noise problems – particle filter methods 73 Particle Filters • Represent a probability distribution using a discrete distribution of particles • Sample the particles, propagate using dynamics and correct using obervations • This creates a new distribution for the next time step 74 Deep Connections to Information Theory • This is all part of a much larger problem description - cybernetics ala N. Wiener • Noisy Channel Environment Decode Receiver Estimate of Environment Learning Models of Environment Actions 75 Summary of Lecture 2 Process class Distribution Algorithm Automaton None Simple marking HMM Discrete, finite Viterbi Linear, continuous Gaussian Kalman Continous, nonlinear Arbitrary Particle filters What are the observables? What are the states? What are the dynamics? 76 Overview of Lecture 3 Detecting multiple processes – Instead of one process, we now have some unknown number of them – Multiple hypothesis tracking (MHT) framework – The basic algorithms – Complexity theory – Process Query Systems – Applications 77 Multiple Hidden Process Models Observations missed, noise added, unlabelled (This is what we see) abacfkhdcbgdbkhagda Observations are interleaved a b c c f h d cc a b g d b a g d a Observations related to state sequences abcdabbada cfhccgdg f, g a, c a, b Underlying (hidden) state spaces c, d e Model 1 Cybenko f, c c, d h Model n 78 Why be interested in this.... • Sensor networks • Airborne plume detection • Cyber security • Autonomic server pool management • Dynamics of social networks 400000 • Genomics and biological pathways* Total Successful Requests 350000 300000 250000 200000 150000 100000 50000 0 0 • Human situation awareness* *Possible applications. Cybenko 100 200 300 400 Time (s) 79 500 Basic Concepts of Process Query Systems (PQS) An Operational Network 6 129.170.46.3 is at high risk 129.170.46.33 is a stepping stone ...... that are used 5 to defend Hypotheses the network consists of Multiple Processes l1 router failure that detect complex attacks and anticipate the next steps Track 1 Track 1 Track 2 Track 2 Track 3 l2 worm l3 scan 1 Track 30.8 Hypothesis 1 Hypothesis 2 2 that produce Events ……. Time Real World that are seen as 4 Sample Console Track Score 1 Indictors and Warnings that PQS resolves into 0.6 0.4 0.2 0 0 Unlabelled Sensor Reports ……. Time 3 PQS 100 20 Service Degrada Track Scores 80 Discrete Source Separation Problem (viz Blind Source Separation, “Cocktail Party” Problem) Process/Model Example: 3 states + transition probabilities n observable events: a,b,c,d,e,… Pr( state | observable event ) given/known Observed event sequence: ….abcbbbaaaababbabcccbdddbebdbabcbabe…. A Hypothesis Catalog of Processes/Models A Track Which combination of which process models “best” accounts for the observations? This is what we want to compute. Events not associated with a known process are “anomalies”. Cybenko 81 Multiple Hypothesis Approach to the "Discrete Source Separation Problem" Obs1 Obs1 Obs2 Obs2 Hypothesis 1 . . . Hypothesis 1a Obs2 Obs1 Hypothesis 2 . . . Observables at time t+1 "Solutions" at time t Hypothesis 1b 82 Candidates at time t+1 Multiple Hypothesis Approach to the "Discrete Source Separation Problem" Score=79 Score=79 Obs1 Obs1 Obs1 Obs2 Obs2 Obs2 Hypothesis 1a Hypothesis 1a Hypothesis 1a Score=43 Score=43 Obs2 Obs2 Obs2 Obs1 Obs1 Obs1 Hypothesis 1b Hypothesis 1b Hypothesis 1b 83 Candidates at time t+1 "Scores" at time t+1 Prune hypotheses Terminology Tracks are associations of observations to individual processes. Hypotheses are consistent tracks that explain all the observables. Hypothesis extension is the conjectural assignment of new observations to existing hypotheses. Track initiation is the instantiation of a new process in a hypothesis' extension. Handling missed detections means that an intermediate observation may have been dropped. 84 Cybenko A Simple Example of Process Detection a,b,c,d are events that can be observed {a} {b} {b,c} {c,d} A B C D NETWORK WORM MODEL (NW) (a,b,c,d ICMP traffic levels) {a} E {b} F • a,b,c,d are events that can be observed • states A, B, C, D, E, F are hidden • observe a sequence of events Sequence Hypotheses • ab NW | RF • abab (NW & NW)|(RF&NW)... E,F = 0 • ababc (NW & RF)|(NW & NW) repeat • ababcc read eventNW e & NW if e==a then E • Which process or combination of if E and e==b then F until F processes explains the observed events? ROUTER FAILURE MODEL (RF) Two models; states have different semantics; sets of observables intersect – what is the “diagnosis”?85 Cybenko Add Rules for Missed Detections and Disambiguation {a} {b} {b,c} {c,d} A B C D WORM MODEL (a,b,c,d ICMP traffic levels) A,B,C,D = 0 repeat read event e if e==a then A if A and e==b then B if A and e==c then C,D if A and e==d then D if B and (e==b or e==c) then C if C then (E=0, F=0) if C and (e==c or e==d) then D if D then (E=0, F=0) until D Blue statements handle missed detections Red statements handle consistency This clearly does not scale and does not lead to manageable sets/systems of rules. Cybenko 86 Approaches to Detecting Processes • Aristotelian - Traditional information retrieval is based on specification of a query in terms of Boolean expressions based on record fields. IE. SQL ( name = “smith” & age > 20 & age < 40 ) + rule-based logics + decision trees, etc • Newtonian - Next generation process detection requires retrieval based on specification of a set of discrete, dynamic processes. IE, descriptions of a Hidden Markov Model, Hidden Petri Net, weak models, FSMs, attack trees, etc. Main Concept: Move from an Aristotelian to a Newtonian Paradigm. Cybenko 87 Process Query Systems (PQS) • Process Query Systems solve the Discrete Source Separation Problem in a generic way: – inputs • a sequence of unlabelled observations (stream, logfiles, etc) • a collection of process models – outputs • estimates of which processes produced those observations • estimates of which states those processes are in • Basic theory and technology has been developed by the PQS team at Dartmouth • Now being applied to a variety of applications 88 Cybenko Algorithms/Operations of PQS 2 Track Track Manage Hypotheses (MHT) Subscribed Data Arrives Hypothesis 1 4 Track Track Track Track Track Tracks Track Track Tracks Tracks Track Tra cks Tracks Tracks Track Track Tracks Tra cks Hypothesis Pool Track Tra cks Tra cks Tracks Hypothesis n Build or Learn Models 1 Recursive in Time Cybenko Track Update Tracks Within Hypotheses (Viterbi / Kalman / NDFA,etc) and Create New Hypotheses 3 5 Evaluate Solutions and Process Outputs 89 The COBOL and pre-PQS Analogy … application logic statement 1; application logic statement 2; file management statement 1; record management statement 1; file management statement 2; record management statement 2; application logic statement 3; record management statement 3; file management statement 3; application logic statement 4; … User responsibility System responsibility … application logic statement 1; application logic statement 2; SQL statement 1; application logic statement 3; SQL statement 2; application logic statement 4; … … file management operation 1; record management operation 1; file management operation 2; record management operation 2; record management operation 3; file management operation 3; … + Application logic Database management system Interwoven logic Post-SQL Programs Pre-SQL Programs … model logic statement 1; model logic statement 2; sensor access statement 1; state estimate statement 1; sensor access statement 2; state estimate statement 2; model logic statement 3; sensor access statement 3; state estimate statement 3; model logic statement 4; … User responsibility System responsibility … model description statement 1; model description statement 2; model description statement 3; model description statement 4; … … sensor access statement 1; state estimate statement 1; sensor access statement 2; state estimate statement 2; sensor access statement 3; state estimate statement 3; … Model description Interwoven logic Current Process Detection Programs + Process query system 90 PQS-based Programs Network Security (V. Berk, I. De Souza, A. Bersamian, A. Giani, M. Bates, D. Madory, G. Bakos, et al) • Objective: Detect, disambiguate, and predict the course of concerted network attacks in an enterprise class network. • Why: Problem domain demands the power of PQS – Hundreds of “processes” occurring at once – Lots of missed observations and noise – All commercial technology focuses on collection and presentation of data – Existing correlation efforts very weak at best Cybenko 91 SENSORS INTEGRATED SENSOR DESCRIPTION DIB:s Dartmouth ICMP-T3 Bcc: System CovChan Timing Covert Channel Detection Snort Signature Matching IDS IPtables Linux Netfilter firewall, log based Samba SMB server - file access reporting Weblog IIS, Apache, SSL error logs, … US-agent Userspace host monitoring agent Tripwire Host filesystem integrity checker SCOPE Global Network Host 92 Cybenko Example of a Multistage Process Model Potential malicious activity snort alerts Potential normal activity Scanned Data Access Samba Start/Normal Tripwire Infected ftp, covert channel, etc Exfiltration 93 Cybenko PQS-Net supply chain Tier 1 Models • Focus on individual host status • Report on status changes Tier 2 Models • Focus on correlating host activity • Report chains of events Tier 1 Output Tier 2 Output Mon Feb 21 20:06:17 2005 000000 131.58.63.160 (hostile) recon on 100.10.20.4 SNORT 469 proto: 1 Hypothesis 1 Score: 0.8 Hypothesis 2 Score 0.2 A scans B A scans B Mon Feb 21 20:30:24 2005 000000 138.158.170.45 (hostile) attacked 100.10.20.4 ERRORLOG 400 proto: 6 dport: 443 B scans E B attacks E sensor data sensors Cybenko Tier 1 Tracker Attack steps Tier 2 Tracker Attack sequences and scores 94 Analyst’s front-end Example Scenario Internet A C D B E Tier1 Alerts Indicators A scans B Snort: 02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification: Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4 C attacks B (success) SSL error log (host 100.10.20.4): [Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows) [Mon Feb 21 20:30:24 2005] [error] OpenSSL: error:1406908F:lib(20):func(105):reason(143) 95 Cybenko Example Cont’d D B E Tier1 Alerts Indicators B scans D 02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] {TCP} 100.10.20.4:34074 -> 100.10.20.169:80 B attacks D (fails) B scans E B attacks E (succeeds) Cybenko 100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET /default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-" 02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding transfer attempt [**] [Classification: Web Application Attack] [Priority: 1] {TCP} 100.10.20.4:34076 -> 100.10.20.170:80 100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET /default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-" 96 Results Dataset: 3s8 3s26 3s28 3s29 22930 18391 12522 39270 4830 5959 1159 8168 11751 7284 7006 19866 Lines in weblogs (apache, IIS) 6349 5148 4357 11236 Number of tracks produced 100 75 51 107 Attack Tracks not in ground truth 1 0 0 0 Attackers identified 3 of 3 4 of 4 0 of 2 3 of 5 Decoys found 5 of 5 2 of 2 2 of 2 6 of 6 Victims identified 2 of 2 2 of 2 1 of 2 10 of 11 Stepping stones identified 1 of 1 1 of 1 1 of 2 297of 3 #Alerts Lines in trunk_alert Lines in snort files generated from tcpdump Autonomic Server Monitoring (C. Roblee, V. Berk) Funded by DHS 98 Cybenko Autonomic Server Monitoring • Objective: Detect and predict deteriorating service situations • Why: Another strong example of the power of PQS – Software and hardware are buggy and vulnerable – Hot market, large profits for “The ONE” application – Very ambiguous observations – Sys-admins also want vacation 99 Cybenko The Environment • Hundreds of servers and services • Various non-intrusive sensors check for: – – – – – – – CPU load Memory footprint Process table (forking behavior) Disk I/O Network I/O Service query response times Suspicious network activities (i.e.. Snort) • Models describe the kinematics of failures and attacks: The model evaluates load balancing problems, memory leaks, suspicious forking behavior (like /bin/sh), service hiccups correlated with network attacks… Cybenko 100 Server Compromise Model: Generic Attack Scenario 2. Monitored host sensor output (system level) 3. PQS Tracker Output Current system record for host 10.0.0.24 (10 records): Average memory over previous 10 samples: 251.000 Average CPU over previous 10 samples: 0.970 | time | mem used | CPU load | num procs | flag | ---------------------------------------------------------------------------------| 1101094903 | 251 | 0.970 | 64 | | | 1101094911 | 252 | 0.820 | 64 | | | 1101094920 | 251 | 0.920 | 64 | | | 1101094928 | 251 | 0.930 | 64 | | | 1101094937 | 251 | 0.870 | 65 | | | 1101094946 | 251 | 0.970 | 65 | | | 1101094955 | 251 | 0.820 | 65 | | | 1101094964 | 253 | 1.220 | 65 | ! | | 1101094973 | 255 | 1.810 | 65 | ! | | 1101094982 | 258 | 2.470 | 65 | ! | 1. Last Modified: Mon Nov 21 21:01:03 Model Name: server_compromise1 Likelihood: 0.9182 Target: 10.0.0.24 Optimal Response: SIGKILL proc 6992 o1 o2 o3 Snort NIDS sensor output .. . Nov 21 20:57:16 [10.0.0.6] snort: [1:613:7] SCAN myscan [Classification: attempted-recon] [Priority: 2]: {TCP} 212.175.64.248-> 10.0.0.24 .. . Cybenko o1 SIGKILL t0 t4 101 Response t 1 t2 t3 Observations Experimental Results: Tracking 400000 400000 350000 350000 Total Successful Requests Total Successful Requests No Tracking 300000 250000 200000 150000 100000 300000 250000 200000 150000 100000 50000 50000 0 0 0 100 200 300 400 500 0 100 200 300 400 500 Time (s) Time (s) 100 100 90 90 80 80 % System Memory Used % System Memory Used Successful Requests 70 60 50 40 30 20 10 70 60 50 40 30 20 10 0 0 0 100 200 300 Time (s) 210,000 requests serviced Cybenko 400 500 0 100 200 300 400 500 Time (s) System Memory Consumed 380,000 requests serviced 102 Chemical Plume Process Detection Funded by DHS Glenn Nofsinger 103 The Forward Problem Concentration in a 2D region as a function of time: c ( x, y , t ) Ficks Law (diffusion) + Concentration equation composed of diffusion and advection Advection (wind) c c c c c Dx 2 Dy 2 Vd ( ) t x y x y Forward model result: • arbitrary initial sources • pseudo-random wind • includes diffusion and wind 2 2 104 Current technology on DC Mall. Future sensors will be smaller and greater in number, with a need for measurement correlation. 105 Multiple Source Case With Terrain: Connectivity determined by wind and geography Source 1 Connectivity Source 2 source high low Wind sensor 106 Multiple Source Case With Terrain: Connectivity determined by wind and geography Source 1 Connectivity Source 2 source high low Wind sensor 107 Inverse Source Likelihood Estimating the probability that a sensor observation is generated by a source at a given location. Based on wind direction history and diffusion properties of agent. wind sensors S S sources 108 Correlation Between Observations at Different Locations Forward Likelihood of Observations 100 Picking any two sensors we evaluate a probability that the observation at that sensor is connected to observations at different sensors in the region. This is a function of wind history, distance, and diffusion properties. 90 wind 80 70 60 50 40 30 20 10 0 109 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 Source Estimation Compared to True Source Location Estimated Source based on inverse correlation of plume observations and tracks Forward Simulation 159.4 182 170 111.0 98.7 160 150 74.0 61.7 49.3 37.0 24.7 12.3 0.0 140 130 120 110 100 90 80 70 60 50 40 30 20 110 10 0 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 182 Social Network Analysis Comparison of Static vs Dynamic (W. Chung, R. Savell, J.-P. Schuett) Temporal sequence of transactions Time Analyze projected, non-temporal data Projection removes temporal relationships Analysis of Static Artifacts Temporal sequence of transactions Time Analysis of temporal aspects of transactions Extraction of Dynamic Processes 111 Process Primitives Decay kernel correlates potentially related emails - eg. links Functional roles based on conversation segments shown below A. Initiator B. Broker C. Bridge D. Triad E. Terminator 112 Combining Primitives into Processes P(t'-t) > f P(t''-t') > f P(t'''-t'') < f t' t t'' X t''' Probabilities of temporal relationships are used to grow tracks 113 Methodology Details 1. Crude Naïve Bayes Text Classification w/ Temporal Correlations to isolate coarse 2. Local structure via Process Primitives on the Dynamic Social Network. thread. 114 Theory • PQS offer a principled approach that enables – understanding how distinguishable models (attack and failure) are – developing a notion of processes that are “trackable,” given models and sensing infrastructure (ie a “sampling theory”) 115 Hypothesis Growth A “hypothesis” is a consistent assignment of events to processes and/or states(ie, each event assigned to only one process instance). Given a set of “hypotheses” for an event stream of length k-1, update the hypotheses to length k to explain the new event. NP-Complete in general. Need to prune the pool of hypotheses, keeping the most suitable. time Individual path is a “track” – ie one process instance Consistent tracks form a “hypothesis” 116 Models and Hypothesis Growth “Weak” model FSM with “emission” vectors Emission for state i = 0/1 vector of sensor reports eg obs(i) = ( 0 , 1 , 1 , 0 , 0 , 1 , 1 ) Observation vector at time t collected by sensors: eg sensors(t) = ( 0 , 1 , 1 , 1 , 1 , 1 , 0 ) Possible states at time t are determined by: P = { i | Hamming_distance( obs(i) , sensors(t)) <= HD } R = { i | j possible at time t - 1 and i is reachable from j } U P R is the set of possible states at time t Number of hypotheses at time t recursively computed as above. Theorem: For a fixed value of HD, the worst-case number of hypotheses at time t is either polynomial or exponential in t. (Crespi, Cybenko, Jiang 2005) 117 Longer tracking time More noise (worse model) Oh, %#&@!! Nice Demo!! 118 Poor Models and Sensor Coverage Longer tracking time More noise (worse model) Excellent Models and Sensor Coverage Acceptable Models and Sensor Coverage 119 Basic Idea Behind the Proof N states time t time t+1 time t+2 time k Process dynamics (ie what is reachable from each state in a time step) + observations + noise threshold determines a “trellis”. If there are two distinct paths from one node to itself over some period of time, the number of distinct paths grows exponentially by repeating the construct. 120 Basic Idea Behind the Proof N states time t time t+1 time t+2 time k If there are never two distinct paths from any node to itself over any period of observation, there is a simple injective mapping (ie. unique labeling) of the paths into {0, 1, ... , k} x {0, 1, ... , k} x {0, 1, ... , k} ... x {0, 1, ... , k} 2N times. So the number of paths is < (k+1)2N. The label for each path is the time it first occupies a state and the time it last occupies that state. 121 Relationship to Joint Spectral Radius 122 New Ideas for Large-Scale Hypothesis Management • Data structures for maintaining one copy of many hypotheses that are variants of one another • Viewing the set of hypotheses as the solution (instead of the highest ranked hypothesis eg) – propagating the set can be done in linear space, constant time – some properties of the set of hypotheses can be computed in constant time, others in linear time, others seem to require exponentially much time and/or space, etc. • Development of a “nonparametric” approach to tracking and Situational Awareness, not unlike nonparametric statistical techniques (order statistics, etc) • Reduce dependencies on probabilistic parameters and model building 123 Distinguishability of models (Yong Sheng) • Given two “models”, how distinguishable are they? • Example: Model of router failure vs worm attack? • Do we need to build more refined models or do we need to add additional sensors/data sources? 124 Different degrees of distinguishability between models given sensing capabilities (eg DDOS vs router failure) Red: Prob of deciding model 2 given model 1 Blue: Prob of deciding model 1 given model 2 Entropy of the two ergodic models are different. Decision rule is based on ML as determined by the Viterbi algorithm Shannon-MacMillan-Brieman Ergodic Theorem states that “most” observation sequences are “typical” and have probability related to the entropy 125 Different degrees of distinguishability between models given sensing capabilities (eg DDOS vs router failure) However, nonmonotonic behaviors are possible (in general) and without convergence to zero (if the entropies are the same) 126 Different degrees of distinguishability between models given sensing capabilities (eg DDOS vs router failure) However, nonmonotonic behaviors are possible (in general) and without convergence to zero (if the entropies are the same) 127 Where do models come from? • In practice, we build models of processes by: – First principles – ie, symmetry, physical laws, etc. – “Expert” models/rules/experience – ie, chess playing computers, military tactics, etc – Empirical analysis (from real or simulated data) – ie. backgammon, stock market models, etc. • Process Query Markup Language developed and almost implemented – allows rapid insertion of new attack models into PQS 128 PQS INPUTS: PROCESS MODEL SEMANTICS AND SENSOR DATA REQUIREMENTS Failed Failed A A 0.03 0.05 alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP Destination Unreachable (Host Unreachable)"; itype: 3; icode: 1; sid:399; classtype:misc-activity; rev:4;) Rules + signatures, etc Represent Marginal B C Normal Reachability (weak) Models Compile Execute Learn Marginal if (src_ip_new.equals(src_ip_track)) { if (IPv4_in_CIDR_ints (208,253,154,0, 24, src_ip_new) == true) { // local? new_likelihood = new Likelihood ((0.90f + likelihood.getProbability())/2.0f); } else { // Else don’t care new_likelihood = new Likelihood (0.0); } B 0.2 C Normal 0.9 Probabilistic Models (HMM, Bayes Nets, Fuzzy models, etc) Compile Code 129 More details.... gvc@dartmouth.edu See www.pqsnet.net 130