Process Detection

advertisement
Process Detection
George Cybenko
Dartmouth
gvc@dartmouth.edu
1
Acknowledgements
Current Members
George Bakos
Alumni
Alex Barsamian
Marion Bates
Naomi Fox (UMass, Ph.D. student)
Vincent Berk
Hrithik Govardhan (Rocket)
Chad Behre*
Robert Gray (BAE Systems)
Wayne Chung*
Diego Hernando (UIUC, Ph.D. student)
Valentino Crespi (Prof. Cal State LA) Guofei Jiang (NEC Research)
George Cybenko
Alex Jordan (BAE Systems)
Ian deSouza
Han Li (China Shipping Corp)
Annarita Giani*
Josh Peteet (Greylock Partners)
Doug Madory*
Chris Roblee (LLNL)
Glenn Nofsinger*
Robert Savell
Jan-Peter Schutt*
* graduate students
Yong Sheng*
William Stearns
Research Support: DHS, ARDA, AFOSR, NGA, DARPA
Cybenko
2
Overview of Lectures
1. Process modeling
2. Process detection, theory
3. Software and applications
3
Why be interested in this....
• Sensor networks
• Airborne plume detection
• Cyber security
• Autonomic server pool
management
• Dynamics of social networks
400000
• Genomics and biological
pathways*
Total Successful Requests
350000
300000
250000
200000
150000
100000
50000
0
0
• Human situation awareness*
*Possible applications.
Cybenko
100
200
300
400
500
Time (s)
4
Overview
• Lecture 1: Process models
– Notion of "state"
– Differential equations
– State Machines and Automata
– Probabilistic and quantum states
– Constructing state representations
– Some
5
Newton's Big Idea(s)
Calculus
Laws of Physics
Concept of "state"
Isaac Newton
6
Contrast with Aristotle
Nature consists of objects and “rules”
Examples
Ancient law (religious and civil)
Astronomical observations
Superstition
Crisis - could not explain the natural world
7
A Closer Look at F=ma
8
A Closer Look at F=ma
9
A Closer Look at F=ma
Previous state
Next state
Dynamics
Input
10
A Closer Look at F=ma
ua
Concept of state: the future
evolution of the system depends
only on the current state and
future inputs.
sm
IE, the past's influence on the
future is totally summarized by the
state.
si
ub
sn
The next state is determined by
the current state and the current
input (or control, etc).
11
Outputs/Observables
Inputs, u
Forces
Black Box:
States may not be
observable by an
external agent
x =(Position, Momentum)
Outputs, y
Position only
12
Automaton
Alan Turing
13
Graphical Depiction of Automata
0
Start State
a
v
u
v
u
d
c
b
u
1
1
1
v
u,v
Q = States = { a , b , c , d }, X = { u , v } , Y = { 0 , 1 }
d and b shown in graph
14
Caution/Nuisance
• Some models of automata have
observables generated by state
occupancy
• Other models have observables generated
by state transitions
• There are simple mechanisms for
transforming one to the other....they are
equivalent.
15
Automata and Languages
• The set of all possible finite length outputs of the
previous example are a "language"
• The language can be represented by a regular
expression - (0*1|0*11|0*111)*
• "Classical relationship" between regular
languages and nondeterministic finite automata ie, given one, construct the other (Kleene's
Theorem)
• How about constructing an automaton from the
input-output relationship?
16
Nerode Equivalence
• Theorem: Every causal, time-invariant system
has a state space description.
• "Constructive" proof:
– use the input-output description of a system
– two finite length input strings belong to the same
equivalence class if all the corresponding outputs
(beyond the inputs' lengths) are the same
– ie, if inputs w1w2 and w3w2 have outputs z1z2 and z3z2
for all w2 then w1 is equiv to w3
– the resulting equivalence classes are the states
17
Partial Differential Equations
18
Quantum Mechanical Systems
x(t )
i
 Hx (t )
t
19
Other process formalisms
• A Petri Net (PN) is given a state by marking its
places.
• Marking of a PN consists of assigning a
nonnegative integer to each place.
– Graphically, tokens are inserted in places of a PN
• Input place - arrow goes from the place to the
transition
• Output place - arrow goes from the transition to
the place
Concurrency Examples
R. Apcar, E. Chiu, H. Jerejian
20
Definitions
• A transition may have one or more Input and
Output places
• A transition is enabled if there is at least one
token in each of its input places.
• An Enabled transition may fire:
– one token is removed from each input place and one
token is inserted in each ouput place of the transition
Concurrency Examples
R. Apcar, E. Chiu, H. Jerejian
21
An example
Concurrency Examples
R. Apcar, E. Chiu, H. Jerejian
22
Example continued
Concurrency Examples
R. Apcar, E. Chiu, H. Jerejian
23
A “Process” has...
•
•
•
•
•
Hidden states (discrete or continuous)
State transitions (nondeterministic, probabilistic)
Observables/events
Relationship between observables and states
An algorithm to “score” observations/events to state
sequences assignments
• Examples:
–
–
–
–
–
–
Nondeterministic automata
Hidden Markov Models
Petri Nets
Linear Systems
Nonlinear Systems
etc
24
Models for Organizational Processes
(W. Chung, J.-P. Schutt, R. Savell, G. Cybenko)
Observables of the Process
A
A
B
B
A
B
A asks B to join
a project
ENRON,
Ebay,
etc
“Static” Analysis
B accepts
A adds B to a
list of
recipients
AB, C, …
Dynamics of the Process
“Dynamic” Analysis
25
Example of a Multistage Process Model
in Computer Security
Potential malicious activity
snort alerts
Potential normal activity
Scanned
Data Access
Samba
Start/Normal
Tripwire
Infected
ftp, covert channel, etc
Exfiltration
26
Cybenko
Real time Fish Tracking
• Objective:
Track several fish in the fish tank
• Why:
Very strong example of the power of PQS
– Fish swim very quickly and erratically
– Lots of missed observations
– Lots of noise
– Classical Kalman filters don’t work (non-linear
movement and acceleration)
– “Easier” than getting permission to track people
(we mistakenly thought)
Cybenko
27
Fish Tracking Details
• 5 Gallon tank with 2 red Platys
named Bubble and Squeak
• Camera generates a stream of
“centroids”:
For each frame a series of (X,Y) pairs
is generated.
• Model describes the
kinematics of a fish:
The model evaluates if new (X,Y)
pairs could belong to the same
fish, based on measured position,
momentum, and predicted next
position. This way, multiple
“tracks” are formed. One for each
object.
• Model was built in under 3
days!!!
Cybenko
28
Kinematic Tracking (2)
Model: the motion of a feature
moving at "human" speed:
The model evaluates if new (X,Y)
pairs could belong to the same
hot spot, based on measured
position, momentum, and
predicted next position. This
way, multiple “tracks” are
formed. One for each object.
Sensors: Infrared video camera
provides datastream
Camera generates a stream of
“centroids”
For each frame a series of
(X,Y) pairs is generated.
29
An Example of a Process
A “Process” Model
Two states - { 1 , 2 }
a
b
1
2
Two observables – { a , b }
Legal transitions between states are depicted by arrows.
When occupying a state, the process emits an observable.
All states are initial/start states and there are no terminal states.
Some legal sequences of observables: abbab , bababbb, abbb
Some illegal sequences of observables: aa , baab
Further reading: Automata Theory, Regular Languages, etc
30
A More Complex Process
Another “Process” Model
a,c
b
a,c
1
2
3
Three states - { 1 , 2 , 3 } Three observables – { a , b , c }
Some legal sequences of observables: abab , babaccab, ab
Some illegal sequences of observables: bb , baabb
Problem: Given a sequence of possible observations is it legal? What states?
Solution:
1 Read the first observable, mark states that emit that observable
2 Read an observable, z
3 New marked states = (states reachable from old marked states)
intersected with (states that could have emitted z )
4 If no new marked states, illegal sequence; else go to 2
31
Extensions: Hidden Markov Model (HMM)
p(a|1) = 0.8 , p(c|1) = 0.2
p(b|2) = 1
0.8
1
Add probabilities
1
p(a|3) = 0.8, p(c|3) = 0.2
3
2
0.2
0.5
0.5
Hidden Markov Models consist of two ingredients:
- the dynamics: state transition probabilities in a Markov chains
- the emissions: p(observation|state)
Given a sequence of observations of length t, what are the possible states at
time t? Unlike the case for a nondeterministic automaton, all we can say in
general for an HMM is what the probability distribution on states is.
32
Extensions: Hidden Markov Model (HMM)
p(a|1) = 0.8 , p(c|1) = 0.2
p(b|2) = 1
0.8
1
1
p(a|3) = 0.8, p(c|3) = 0.2
3
2
0.2
0.5
0.5
Probability distribution at time t+1 is obtained by combining:
- propagation of the distribution from time t using only the dynamics
- factoring in the observation observed at time t+1
33
Two Simple Processes
Model Instance A
Model Instance B
a
b
A1
A2
a
b
B1
B2
aabb is a legal observation sequence
A1 B1 A2 A2 , A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all legal state sequences
A1 A2 A2
B1
, A1 A2
B1 B2
, A1
B1 B2 B2
We can reduce this to a single process....
a track
a hypothesis
34
Multiple Process Representation
A1 B1
Model Instance A
Model Instance A
Model Instance B
a
b
A1
A2
a
b
A1
A2
a
b
B1
B2
0
1
M=
MxM=
0
0
0
1
A1
B1
1
1
0 0
0 1
1 0
1 1
1
1
1
1
If the observation sequence is aaaaaa and multiple copies of the
model are allowed, then we get a product model of size 2n.
35
A Simple Example of Process Detection
a,b,c,d are events that can be observed
{a}
{b}
{b,c}
{c,d}
A
B
C
D
NETWORK WORM MODEL (NW)
(a,b,c,d ICMP traffic levels)
{a}
E
{b}
F
• a,b,c,d are events that can be observed
• states A, B, C, D, E, F are hidden
• observe a sequence of events
Sequence
Hypotheses
• ab
NW | RF
• abab
(NW & NW)|(RF&NW)...
E,F = 0
• ababc
(NW & RF)|(NW & NW)
repeat
• ababcc
read eventNW
e & NW
if e==a then E
• Which process
or combination
of
if E and e==b
then F
until F
processes
explains the observed events?
ROUTER FAILURE MODEL (RF)
Two models; states have different semantics;
sets of observables intersect – what is the “diagnosis”?36
Cybenko
Key Questions
• How is a process model built?
– from first principles
– from expert insights
– from data (lots)
• Given an event sequence, is it feasible or what
is its probability?
• Given an event sequence, estimate the current
state
• Given an event sequence, estimate the state
sequence
• How good are those estimates (ie variance)
37
Homework Problems
What are the states, dynamics and
observables of the following processes:
– intercontinental ballistic missile
– soccer, American football, baseball games
– Avian bird flu epidemic
– terrorist cell
– blogosphere
– US/global economy
– poker
– romance
38
39
40
41
42
43
44
45
Overview
• Lecture 2: Detecting processes
– What does detection of processes mean?
– Automata
– Hidden Markov Models
– Kalman filtering
– Particle filters
46
Process Detection Problems
• Given a sequence of observations...
• What is the current state of the process?
• What is the probability distribution on the
states?
• What are the most likely state sequences?
• What is the uncertainty/error of the
estimates?
47
Graphical Depiction of Automata
0
Start State
a
v
u
v
u
d
c
b
u
1
1
1
v
u,v
Q = States = { a , b , c , d }, X = { u , v } , Y = { 0 , 1 }
d and b shown in graph
48
Input-Output Description
0
Start State
a
v
u
v
u
d
c
b
u
1
1
1
v
u,v
uuuu
uuvu
vuuuu
vvuuuu
uvvuuuu
.....
01010
01001
001010
0001010
01101010
a
b
c
d
f  v = vv = uu = uvv = ...
u = vu = vuuu = ....
uv = vuv = vuuuv = ...
uvu = vuvu = vvuvu = ...
49
Estimating states in an automaton
a
1
a
Observe a
1
a
Observe ab
1
a
Observe ac
1
a
Observe acb
1
b
a,c
2
3
b
a,c
2
3
b
a,c
2
3
b
a,c
2
3
b
a,c
2
3
Sequences: 12, 32
Sequences: 33
Sequences: 332
50
Commentary
• Trivial algorithm....
• Interesting question: What is the worst
case growth of states sequences?
Tomorrow.
• No probabilities, only possibilities.
• What if we add probabilities?
51
Simplest Hidden Markov Model
b1(u) = 0.9, b1(v) = 0.1
a11 = 0.7
1
a21 = 0.1
a12 = 0.3
p(1)=0.5, p(2)=0.5 are initial probabilities
2
a22 = 0.9
b2(u) = 0.1, b2(v) = 0.9
52
Applications of HMM's
•
•
•
•
•
•
•
•
Speech recognition
Gene sequencing
Motion modeling and detection
Pattern recognition (OCR)
Darpa Grand Challenge (autonomic systems)
etc
etc
etc
53
Estimating States
b1(u) = 0.9, b1(v) = 0.1
a11 = 0.7
1
a21 = 0.1
a12 = 0.3
p(1)=0.5, p(2)=0.5 are initial probabilities
2
a22 = 0.9
b2(u) = 0.1, b2(v) = 0.9
54
Estimating Another State
b1(u) = 0.9, b1(v) = 0.1
a11 = 0.7
1
a21 = 0.1
a12 = 0.3
p(1)=0.5, p(2)=0.5 are initial probabilities
2
a22 = 0.9
b2(u) = 0.1, b2(v) = 0.9
Propagate using
dynamics
Factor in the observation
55
Sequences of Observations
Time 1
States
2
3
4
O 2= v
O 3= u
O 4= v
5
1
2
Observations
O1 = u
O 5= v
Problems: Given a sequence of observations O1O2O3 ...
1. What is the most likely state at time t ?
2. What is the most likely state sequence over all time ?
3. What is the probability of the observation sequence?
56
Best state vs best sequence
b1(u) = 0.9, b1(v) = 0.1
a11 = 0.7
1
a21 = 0
a12 = 0.3
p(1)=0.5, p(2)=0.5 are initial probabilities
2
a22 = 1
b2(u) = 0, b2(v) = 1
Observe v - most likely state is 2
Observe u next - must be in state 1 but no transition from 2 to 1 is possible
The sequence vu could only have been produced by starting and staying
in state 1
57
Probability of the Observations
Time 1
States
2
3
4
O 2= v
O 3= u
O 4= v
5
1
2
Observations
O 1= u
O 5= v
58
Optimal Sequences
Time 1
States
2
3
4
O 2= v
O 3= u
O4= v
5
1
2
Observations
O 1= u
O 5= v
59
Viterbi's Algorithm
• These computations were discovered by
A. Viterbi, a founder of Qualcomm.
• The algorithms are used in all modern cell
phones and telecom devices in general.
Noisy Channel
Source sequence
11221212122212
Decode
Receive
11221212222212
uvvuvuvvuvuvvv
60
Other issues for HMM
• Learning an HMM -ie. what are the
various probabilities?
– Baum/Welch Algorithm
– variational algorithms
• Finite, discrete state spaces
61
How about continuous state
spaces?
• Major challenge
– in the finite, discrete case (HMM), we can
represent and store the whole probability
distribution as an n-vector
– what continuous state probability distributions
have simple representations?
• Gaussians - mean and variance specify them
– what if the distribution is more general than a
Gaussian?
62
Madory's Goats
• Goat herder
• Herd state is the number of infant females, adult
females, infant males and adult females
• Dynamics are generation to generation: how many infant
females and males are born, how many infants of each
gender become adults and how many adults survive
• Observables are goat milk revenues and goat baby
inoculation costs - these are noisy
• Problem: estimate total number of goats and number of
adult females
(Example and code due to Doug Madory)
63
64
Quantification of the State
65
Quantification of the Dynamics
66
Quantification of Observations
67
68
Basic Concept in Kalman Filtering
• Use the fact that the sum of variables with
Gaussian distributions is also Gaussian
• Gaussian is characterized by mean and
variance
• Use dynamics to predict the next state
• Use measurement (observation) to correct
that prediction
• Update the error covariance (ie confidence
in the estimate)
69
70
71
Kalman Equations and Geometry
72
Extensions
• To nonlinear systems (linearize locally)
• Learn the system dynamics
• Use the estimates to control the state
(feedback)
• To non-Gaussian noise problems
– particle filter methods
73
Particle Filters
• Represent a probability distribution using a discrete
distribution of particles
• Sample the particles, propagate using dynamics and
correct using obervations
• This creates a new distribution for the next time step
74
Deep Connections to
Information Theory
• This is all part of a much larger problem
description - cybernetics ala N. Wiener
•
Noisy Channel
Environment
Decode
Receiver
Estimate
of Environment
Learning
Models of
Environment
Actions
75
Summary of Lecture 2
Process class
Distribution
Algorithm
Automaton
None
Simple marking
HMM
Discrete, finite
Viterbi
Linear, continuous
Gaussian
Kalman
Continous, nonlinear
Arbitrary
Particle filters
What are the observables?
What are the states?
What are the dynamics?
76
Overview of Lecture 3
Detecting multiple processes
– Instead of one process, we now have some
unknown number of them
– Multiple hypothesis tracking (MHT) framework
– The basic algorithms
– Complexity theory
– Process Query Systems
– Applications
77
Multiple Hidden Process Models
Observations missed,
noise added, unlabelled
(This is what we see)
abacfkhdcbgdbkhagda
Observations
are interleaved
a b c c f h d cc a b g d b a g d a
Observations
related to state
sequences
abcdabbada
cfhccgdg
f, g
a, c
a, b
Underlying
(hidden)
state spaces
c, d
e
Model 1
Cybenko
f, c
c, d
h
Model n
78
Why be interested in this....
• Sensor networks
• Airborne plume detection
• Cyber security
• Autonomic server pool
management
• Dynamics of social networks
400000
• Genomics and biological
pathways*
Total Successful Requests
350000
300000
250000
200000
150000
100000
50000
0
0
• Human situation awareness*
*Possible applications.
Cybenko
100
200
300
400
Time (s)
79
500
Basic Concepts of Process Query Systems (PQS)
An Operational Network
6
129.170.46.3 is at high risk
129.170.46.33 is a stepping stone
......
that
are
used
5
to
defend Hypotheses
the
network
consists of
Multiple Processes
l1  router failure
that detect
complex attacks
and anticipate
the next steps
Track 1
Track 1
Track 2
Track 2
Track 3
l2  worm
l3  scan
1
Track 30.8
Hypothesis 1
Hypothesis 2
2
that produce
Events
…….
Time
Real World
that
are
seen
as
4
Sample
Console
Track Score
1
Indictors and Warnings
that PQS resolves into
0.6
0.4
0.2
0
0
Unlabelled Sensor Reports
…….
Time
3
PQS
100
20
Service Degrada
Track
Scores
80
Discrete Source Separation Problem
(viz Blind Source Separation, “Cocktail Party” Problem)
Process/Model Example:
3 states + transition probabilities
n observable events: a,b,c,d,e,…
Pr( state | observable event ) given/known
Observed event sequence:
….abcbbbaaaababbabcccbdddbebdbabcbabe….
A Hypothesis
Catalog of
Processes/Models
A Track
Which combination of which process models “best” accounts
for the observations? This is what we want to compute. Events
not associated with a known process are “anomalies”.
Cybenko
81
Multiple Hypothesis Approach to the
"Discrete Source Separation Problem"
Obs1
Obs1
Obs2
Obs2
Hypothesis 1
.
.
.
Hypothesis 1a
Obs2
Obs1
Hypothesis 2
.
.
.
Observables at time t+1
"Solutions" at time t
Hypothesis 1b
82
Candidates at time t+1
Multiple Hypothesis Approach to the
"Discrete Source Separation Problem"
Score=79
Score=79
Obs1
Obs1
Obs1
Obs2
Obs2
Obs2
Hypothesis 1a
Hypothesis 1a
Hypothesis 1a
Score=43
Score=43
Obs2
Obs2
Obs2
Obs1
Obs1
Obs1
Hypothesis 1b
Hypothesis 1b
Hypothesis 1b
83
Candidates at time t+1
"Scores" at time t+1
Prune hypotheses
Terminology
Tracks are associations of observations to individual
processes.
Hypotheses are consistent tracks that explain all the
observables.
Hypothesis extension is the conjectural assignment of new
observations to existing hypotheses.
Track initiation is the instantiation of a new process in a
hypothesis' extension.
Handling missed detections means that an intermediate
observation may have been dropped.
84
Cybenko
A Simple Example of Process Detection
a,b,c,d are events that can be observed
{a}
{b}
{b,c}
{c,d}
A
B
C
D
NETWORK WORM MODEL (NW)
(a,b,c,d ICMP traffic levels)
{a}
E
{b}
F
• a,b,c,d are events that can be observed
• states A, B, C, D, E, F are hidden
• observe a sequence of events
Sequence
Hypotheses
• ab
NW | RF
• abab
(NW & NW)|(RF&NW)...
E,F = 0
• ababc
(NW & RF)|(NW & NW)
repeat
• ababcc
read eventNW
e & NW
if e==a then E
• Which process
or combination
of
if E and e==b
then F
until F
processes
explains the observed events?
ROUTER FAILURE MODEL (RF)
Two models; states have different semantics;
sets of observables intersect – what is the “diagnosis”?85
Cybenko
Add Rules for Missed Detections and
Disambiguation
{a}
{b}
{b,c}
{c,d}
A
B
C
D
WORM MODEL
(a,b,c,d ICMP traffic levels)
A,B,C,D = 0
repeat
read event e
if e==a then A
if A and e==b then B
if A and e==c then C,D
if A and e==d then D
if B and (e==b or e==c) then C
if C then (E=0, F=0)
if C and (e==c or e==d) then D
if D then (E=0, F=0)
until D
Blue statements handle
missed detections
Red statements handle
consistency
This clearly does not scale and does not lead to
manageable sets/systems of rules.
Cybenko
86
Approaches to Detecting Processes
• Aristotelian - Traditional information retrieval is based
on specification of a query in terms of Boolean
expressions based on record fields. IE. SQL ( name =
“smith” & age > 20 & age < 40 ) + rule-based logics +
decision trees, etc
• Newtonian - Next generation process detection
requires retrieval based on specification of a set of
discrete, dynamic processes. IE, descriptions of a
Hidden Markov Model, Hidden Petri Net, weak models,
FSMs, attack trees, etc.
Main Concept: Move from an Aristotelian to a
Newtonian Paradigm.
Cybenko
87
Process Query Systems (PQS)
• Process Query Systems solve the Discrete
Source Separation Problem in a generic way:
– inputs
• a sequence of unlabelled observations (stream, logfiles, etc)
• a collection of process models
– outputs
• estimates of which processes produced those observations
• estimates of which states those processes are in
• Basic theory and technology has been developed
by the PQS team at Dartmouth
• Now being applied to a variety of applications
88
Cybenko
Algorithms/Operations of PQS
2
Track
Track
Manage
Hypotheses
(MHT)
Subscribed
Data
Arrives
Hypothesis 1
4
Track
Track
Track
Track
Track
Tracks Track
Track
Tracks
Tracks
Track
Tra
cks
Tracks
Tracks
Track
Track
Tracks
Tra
cks
Hypothesis
Pool
Track
Tra
cks
Tra
cks
Tracks
Hypothesis n
Build or
Learn
Models
1
Recursive in Time
Cybenko
Track
Update Tracks Within
Hypotheses (Viterbi / Kalman /
NDFA,etc) and Create New Hypotheses
3
5
Evaluate
Solutions
and
Process
Outputs
89
The COBOL and pre-PQS Analogy
…
application logic statement 1;
application logic statement 2;
file management statement 1;
record management statement 1;
file management statement 2;
record management statement 2;
application logic statement 3;
record management statement 3;
file management statement 3;
application logic statement 4;
…
User responsibility
System responsibility
…
application logic statement 1;
application logic statement 2;
SQL statement 1;
application logic statement 3;
SQL statement 2;
application logic statement 4;
…
…
file management operation 1;
record management operation 1;
file management operation 2;
record management operation 2;
record management operation 3;
file management operation 3;
…
+
Application logic
Database management system
Interwoven logic
Post-SQL Programs
Pre-SQL Programs
…
model logic statement 1;
model logic statement 2;
sensor access statement 1;
state estimate statement 1;
sensor access statement 2;
state estimate statement 2;
model logic statement 3;
sensor access statement 3;
state estimate statement 3;
model logic statement 4;
…
User responsibility
System responsibility
…
model description statement 1;
model description statement 2;
model description statement 3;
model description statement 4;
…
…
sensor access statement 1;
state estimate statement 1;
sensor access statement 2;
state estimate statement 2;
sensor access statement 3;
state estimate statement 3;
…
Model description
Interwoven logic
Current Process Detection Programs
+
Process query system
90
PQS-based Programs
Network Security
(V. Berk, I. De Souza, A. Bersamian, A. Giani, M.
Bates, D. Madory, G. Bakos, et al)
• Objective:
Detect, disambiguate, and predict the course
of concerted network attacks in an
enterprise class network.
• Why:
Problem domain demands the power of PQS
– Hundreds of “processes” occurring at once
– Lots of missed observations and noise
– All commercial technology focuses on collection
and presentation of data
– Existing correlation efforts very weak at best
Cybenko
91
SENSORS INTEGRATED
SENSOR
DESCRIPTION
DIB:s
Dartmouth ICMP-T3 Bcc: System
CovChan
Timing Covert Channel Detection
Snort
Signature Matching IDS
IPtables
Linux Netfilter firewall, log based
Samba
SMB server - file access reporting
Weblog
IIS, Apache, SSL error logs, …
US-agent
Userspace host monitoring agent
Tripwire
Host filesystem integrity checker
SCOPE
Global
Network
Host
92
Cybenko
Example of a Multistage Process Model
Potential malicious activity
snort alerts
Potential normal activity
Scanned
Data Access
Samba
Start/Normal
Tripwire
Infected
ftp, covert channel, etc
Exfiltration
93
Cybenko
PQS-Net supply chain
Tier 1 Models
• Focus on individual host
status
• Report on status changes
Tier 2 Models
• Focus on correlating host
activity
• Report chains of events
Tier 1 Output
Tier 2 Output
Mon Feb 21 20:06:17 2005 000000 131.58.63.160
(hostile) recon on 100.10.20.4 SNORT 469
proto: 1
Hypothesis 1
Score: 0.8
Hypothesis 2
Score 0.2
A scans B
A scans B
Mon Feb 21 20:30:24 2005 000000 138.158.170.45
(hostile) attacked 100.10.20.4 ERRORLOG 400
proto: 6 dport: 443
B scans E
B attacks E
sensor data
sensors
Cybenko
Tier 1
Tracker
Attack steps
Tier 2
Tracker
Attack sequences
and scores
94
Analyst’s front-end
Example Scenario
Internet
A
C
D
B
E
Tier1 Alerts
Indicators
A scans B
Snort:
02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification:
Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4
C attacks B
(success)
SSL error log (host 100.10.20.4):
[Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server
www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows)
[Mon Feb 21 20:30:24 2005] [error] OpenSSL:
error:1406908F:lib(20):func(105):reason(143)
95
Cybenko
Example Cont’d
D
B
E
Tier1 Alerts
Indicators
B scans D
02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding
transfer attempt [**] [Classification: Web Application Attack] [Priority: 1]
{TCP} 100.10.20.4:34074 -> 100.10.20.169:80
B attacks D (fails)
B scans E
B attacks E
(succeeds)
Cybenko
100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET
/default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-"
02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding
transfer attempt [**] [Classification: Web Application Attack] [Priority: 1]
{TCP} 100.10.20.4:34076 -> 100.10.20.170:80
100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET
/default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-"
96
Results
Dataset:
3s8
3s26
3s28
3s29
22930
18391
12522
39270
4830
5959
1159
8168
11751
7284
7006
19866
Lines in weblogs (apache,
IIS)
6349
5148
4357
11236
Number of tracks produced
100
75
51
107
Attack Tracks not in ground
truth
1
0
0
0
Attackers identified
3 of 3
4 of 4
0 of 2
3 of 5
Decoys found
5 of 5
2 of 2
2 of 2
6 of 6
Victims identified
2 of 2
2 of 2
1 of 2
10 of 11
Stepping stones identified
1 of 1
1 of 1
1 of 2
297of 3
#Alerts
Lines in trunk_alert
Lines in snort files
generated from tcpdump
Autonomic Server Monitoring
(C. Roblee, V. Berk)
Funded by DHS
98
Cybenko
Autonomic Server Monitoring
• Objective:
Detect and predict deteriorating service
situations
• Why:
Another strong example of the power of PQS
– Software and hardware are buggy and vulnerable
– Hot market, large profits for “The ONE” application
– Very ambiguous observations
– Sys-admins also want vacation
99
Cybenko
The Environment
• Hundreds of servers and services
• Various non-intrusive sensors check for:
–
–
–
–
–
–
–
CPU load
Memory footprint
Process table (forking behavior)
Disk I/O
Network I/O
Service query response times
Suspicious network activities (i.e.. Snort)
• Models describe the kinematics of failures and
attacks:
The model evaluates load balancing problems, memory leaks,
suspicious forking behavior (like /bin/sh), service hiccups
correlated with network attacks…
Cybenko
100
Server Compromise Model:
Generic Attack Scenario
2.
Monitored host sensor output (system level)
3.
PQS Tracker Output
Current system record for host 10.0.0.24 (10 records):
Average memory over previous 10 samples: 251.000
Average CPU over previous 10 samples: 0.970
| time
| mem used | CPU load | num procs | flag |
---------------------------------------------------------------------------------| 1101094903 |
251
|
0.970
|
64
|
|
| 1101094911 |
252
|
0.820
|
64
|
|
| 1101094920 |
251
|
0.920
|
64
|
|
| 1101094928 |
251
|
0.930
|
64
|
|
| 1101094937 |
251
|
0.870
|
65
|
|
| 1101094946 |
251
|
0.970
|
65
|
|
| 1101094955 |
251
|
0.820
|
65
|
|
| 1101094964 |
253
|
1.220
|
65
| ! |
| 1101094973 |
255
|
1.810
|
65
| ! |
| 1101094982 |
258
|
2.470
|
65
| ! |
1.
Last Modified:
Mon Nov 21 21:01:03
Model Name:
server_compromise1
Likelihood:
0.9182
Target:
10.0.0.24
Optimal Response: SIGKILL proc 6992
o1 o2 o3
Snort NIDS sensor output
..
.
Nov 21 20:57:16 [10.0.0.6] snort: [1:613:7]
SCAN myscan [Classification: attempted-recon] [Priority: 2]:
{TCP} 212.175.64.248-> 10.0.0.24
..
.
Cybenko
o1
SIGKILL
t0
t4
101
Response
t 1 t2 t3
Observations
Experimental Results:
Tracking
400000
400000
350000
350000
Total Successful Requests
Total Successful Requests
No Tracking
300000
250000
200000
150000
100000
300000
250000
200000
150000
100000
50000
50000
0
0
0
100
200
300
400
500
0
100
200
300
400
500
Time (s)
Time (s)
100
100
90
90
80
80
% System Memory Used
% System Memory Used
Successful Requests
70
60
50
40
30
20
10
70
60
50
40
30
20
10
0
0
0
100
200
300
Time (s)
210,000 requests serviced
Cybenko
400
500
0
100
200
300
400
500
Time (s)
System Memory Consumed
380,000 requests serviced
102
Chemical Plume
Process Detection
Funded by DHS
Glenn Nofsinger
103
The Forward Problem
Concentration in a 2D region as a function of time:
c ( x, y , t )
Ficks Law (diffusion) +
Concentration equation composed
of diffusion and advection
Advection (wind)
c
c
c
c c
 Dx 2  Dy 2  Vd (  )
t
x
y
x y
Forward model result:
• arbitrary initial sources
• pseudo-random wind
• includes diffusion and wind
2
2
104
Current technology on DC Mall.
Future sensors will be smaller and greater in number, with a need for
measurement correlation.
105
Multiple Source Case With Terrain:
Connectivity determined by wind and geography
Source 1
Connectivity
Source 2
source
high
low
Wind
sensor
106
Multiple Source Case With Terrain:
Connectivity determined by wind and geography
Source 1
Connectivity
Source 2
source
high
low
Wind
sensor
107
Inverse Source Likelihood
Estimating the probability that a sensor observation is
generated by a source at a given location. Based on wind
direction history and diffusion properties of agent.
wind
sensors
S
S
sources
108
Correlation Between Observations
at Different Locations
Forward Likelihood of Observations
100
Picking any two
sensors we evaluate
a probability that the
observation at that
sensor is connected
to observations at
different sensors in
the region. This is a
function of wind
history, distance,
and diffusion
properties.
90
wind
80
70
60
50
40
30
20
10
0
109
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
Source Estimation Compared to
True Source Location
Estimated Source based on inverse
correlation of plume observations
and tracks
Forward Simulation
159.4
182
170
111.0
98.7
160
150
74.0
61.7
49.3
37.0
24.7
12.3
0.0
140
130
120
110
100
90
80
70
60
50
40
30
20
110
10
0
0
10
20
30
40
50
60
70
80
90
100 110 120 130 140
150 160 170
182
Social Network Analysis
Comparison of Static vs Dynamic
(W. Chung, R. Savell, J.-P. Schuett)
Temporal sequence of transactions
Time
Analyze projected, non-temporal data
Projection
removes
temporal
relationships
Analysis
of
Static
Artifacts
Temporal sequence of transactions
Time
Analysis of
temporal
aspects of
transactions
Extraction
of
Dynamic
Processes
111
Process Primitives
Decay kernel correlates potentially related emails - eg. links
Functional roles based on conversation segments shown below
A. Initiator
B. Broker
C. Bridge
D. Triad
E. Terminator
112
Combining Primitives into
Processes
P(t'-t) > f
P(t''-t') > f
P(t'''-t'') < f
t'
t
t''
X
t'''
Probabilities of temporal
relationships are used to
grow tracks
113
Methodology Details
1. Crude Naïve Bayes Text
Classification w/ Temporal
Correlations to isolate coarse
2. Local structure via Process
Primitives on the Dynamic
Social Network.
thread.
114
Theory
• PQS offer a principled approach that
enables
– understanding how distinguishable models
(attack and failure) are
– developing a notion of processes that are
“trackable,” given models and sensing
infrastructure (ie a “sampling theory”)
115
Hypothesis Growth
A “hypothesis” is a consistent
assignment of events to
processes and/or states(ie,
each event assigned to only
one process instance).
Given a set of “hypotheses”
for an event stream of length
k-1, update the hypotheses to
length k to explain the new
event.
NP-Complete in general.
Need to prune the pool of
hypotheses, keeping the
most suitable.
time
Individual path is
a “track” – ie one
process instance
Consistent tracks
form a “hypothesis”
116
Models and Hypothesis Growth
“Weak” model
FSM with “emission”
vectors
Emission for state i = 0/1 vector of sensor reports
eg obs(i) = ( 0 , 1 , 1 , 0 , 0 , 1 , 1 )
Observation vector at time t collected by
sensors: eg sensors(t) = ( 0 , 1 , 1 , 1 , 1 , 1 , 0 )
Possible states at time t are determined by:
P = { i | Hamming_distance( obs(i) , sensors(t)) <= HD }
R = { i | j possible at time t - 1 and i is reachable from j }
U
P
R is the set of possible states at time t
Number of hypotheses at time t recursively computed as above.
Theorem: For a fixed value of HD, the worst-case number of hypotheses
at time t is either polynomial or exponential in t.
(Crespi, Cybenko, Jiang 2005)
117
Longer
tracking
time
More noise
(worse model)
Oh, %#&@!!
Nice Demo!!
118
Poor
Models
and
Sensor
Coverage
Longer
tracking
time
More noise
(worse model)
Excellent
Models
and
Sensor
Coverage
Acceptable
Models
and
Sensor
Coverage
119
Basic Idea Behind the Proof
N states
time t
time t+1
time t+2
time k
Process dynamics (ie what is reachable from each state in a time step)
+ observations + noise threshold determines a “trellis”. If there are two
distinct paths from one node to itself over some period of time, the
number of distinct paths grows exponentially by repeating the construct.
120
Basic Idea Behind the Proof
N states
time t
time t+1
time t+2
time k
If there are never two distinct paths from any node to itself over any
period of observation, there is a simple injective mapping (ie. unique
labeling) of the paths into {0, 1, ... , k} x {0, 1, ... , k} x {0, 1, ... , k} ... x
{0, 1, ... , k} 2N times. So the number of paths is < (k+1)2N. The label
for each path is the time it first occupies a state and the time it last
occupies that state.
121
Relationship to Joint Spectral Radius
122
New Ideas for Large-Scale Hypothesis
Management
• Data structures for maintaining one copy of many
hypotheses that are variants of one another
• Viewing the set of hypotheses as the solution (instead of
the highest ranked hypothesis eg)
– propagating the set can be done in linear space, constant time
– some properties of the set of hypotheses can be computed in
constant time, others in linear time, others seem to require
exponentially much time and/or space, etc.
• Development of a “nonparametric” approach to tracking
and Situational Awareness, not unlike nonparametric
statistical techniques (order statistics, etc)
• Reduce dependencies on probabilistic parameters and
model building
123
Distinguishability of models
(Yong Sheng)
• Given two “models”, how distinguishable
are they?
• Example: Model of router failure vs worm
attack?
• Do we need to build more refined models
or do we need to add additional
sensors/data sources?
124
Different degrees of distinguishability between
models given sensing capabilities (eg DDOS vs router failure)
Red: Prob of deciding model 2 given model 1
Blue: Prob of deciding model 1 given model 2
Entropy of the two ergodic models are different.
Decision rule is based on ML as determined by
the Viterbi algorithm
Shannon-MacMillan-Brieman Ergodic Theorem
states that “most” observation sequences
are “typical” and have probability
related to the entropy
125
Different degrees of distinguishability between
models given sensing capabilities (eg DDOS vs router failure)
However, nonmonotonic behaviors are possible
(in general) and without convergence to zero
(if the entropies are the same)
126
Different degrees of distinguishability between
models given sensing capabilities (eg DDOS vs router failure)
However, nonmonotonic behaviors are possible
(in general) and without convergence to zero
(if the entropies are the same)
127
Where do models come from?
• In practice, we build models of processes by:
– First principles – ie, symmetry, physical laws, etc.
– “Expert” models/rules/experience – ie, chess playing
computers, military tactics, etc
– Empirical analysis (from real or simulated data) – ie.
backgammon, stock market models, etc.
• Process Query Markup Language developed and
almost implemented – allows rapid insertion of
new attack models into PQS
128
PQS INPUTS: PROCESS MODEL SEMANTICS
AND SENSOR DATA REQUIREMENTS
Failed
Failed
A
A
0.03
0.05
alert icmp $EXTERNAL_NET any ->
$HOME_NET any (msg:"ICMP
Destination Unreachable (Host
Unreachable)"; itype: 3; icode: 1;
sid:399; classtype:misc-activity;
rev:4;)
Rules +
signatures, etc
Represent
Marginal
B
C
Normal
Reachability
(weak)
Models
Compile
Execute
Learn
Marginal
if (src_ip_new.equals(src_ip_track))
{
if (IPv4_in_CIDR_ints (208,253,154,0, 24,
src_ip_new) == true)
{
// local?
new_likelihood = new Likelihood ((0.90f +
likelihood.getProbability())/2.0f);
}
else
{
// Else don’t care
new_likelihood = new Likelihood (0.0);
}
B
0.2
C
Normal
0.9
Probabilistic Models
(HMM, Bayes Nets,
Fuzzy models, etc)
Compile
Code
129
More details....
gvc@dartmouth.edu
See www.pqsnet.net
130
Download