Presentation (PowerPoint File)

advertisement
The Mathematics and Algorithmics of
Process Detection
George Cybenko
Dartmouth College
Hanover NH 03755 USA
gvc@dartmouth.edu
Cybenko
IPAM 27-7-2005
Acknowledgements
Active Members
Alumni
George Bakos
Alex Barsamian
Marion Bates
Vincent Berk
Wayne Chung
Valentino Crespi (Cal State LA)
George Cybenko
Ian deSouza
Annarita Giani
Doug Madory
Glenn Nofsinger
Yong Sheng
William Stearns
Naomi Fox (UMass, Ph.D. student)
Hrithik Govardhan (Rocket)
Robert Gray (BAE Systems)
Diego Hernando (UIUC, Ph.D. student)
Guofei Jiang (NEC Research)
Alex Jordan (BAE Systems)
Han Li (China)
Josh Peteet (Greylock Partners)
Chris Roblee (LLNL)
Research Support: DARPA, DHS, ARDA, ISTS, I3P, AFOSR, Microsoft
Cybenko
IPAM 27-7-2005
Outline
Background and basics
 Software and Applications
 Theory
 Summary

Cybenko
IPAM 27-7-2005
An Example of a Process
A “Process” Model
Two states - { 1 , 2 }
a
b
1
2
Two observables – { a , b }
Legal transitions between states are depicted by arrows.
When occupying a state, the process emits an observable.
All states are initial/start states and there are no terminal states.
Some legal sequences of observables: abbab , bababbb, abbb
Some illegal sequences of observables: aa , baab
Further reading: Automata Theory, Regular Languages, etc
Cybenko
IPAM 27-7-2005
A More Complex Process
Another “Process” Model
a,c
b
a,c
1
2
3
Three states - { 1 , 2 , 3 } Three observables – { a , b , c }
Some legal sequences of observables: abab , babaccab, ab
Some illegal sequences of observables: bb , baabb
Problem: Given a sequence of possible observations is it legal? What states?
Solution:
Cybenko
1 Read the first observable, mark states that emit that observable
2 Read an observable, z
3 New marked states = (states reachable from old marked states)
intersected with (states that could have emitted z )
4 If no new marked states, illegal sequence; else go to 2
IPAM 27-7-2005
Two Simple Processes
Model Instance A
Model Instance B
a
b
A1
A2
a
b
B1
B2
aabb is a legal observation sequence
A1 B1 A2 A2 , A1 B1 A2 B2 , B1 A1 B2 B2 , ... are all legal state sequences
A1 A2 A2
B1
, A1 A2
B1 B2
, A1
B1 B2 B2
We can reduce this to a single process....
Cybenko
a track
a hypothesis
IPAM 27-7-2005
Multiple Process Representation
A1 B1
Model Instance A
Model Instance A
Model Instance B
a
b
A1
A2
a
b
A1
A2
a
b
B1
B2
0
1
M=
MxM=
0
0
0
1
1
1
0 0
0 1
1 0
1 1
A1
B1
1
1
1
1
If the observation sequence is aaaaaa and multiple copies of the
model are allowed, then we get a product model of size 2n.
Cybenko
IPAM 27-7-2005
Multistage Process Model
Potential malicious activity
Potential normal activity
Scanned
Data Access
Start/Normal
Infected
Cybenko
Exfiltration
IPAM 27-7-2005
Extensions: Hidden Markov Model (HMM)
p(a|1) = 0.8 , p(c|1) = 0.2
p(b|2) = 1
0.8
1
Add probabilities
1
t=1
t=2
3
2
0.2
t=0
p(a|3) = 0.8, p(c|3) = 0.2
t=3
Copies of
states
0.5
0.5
t=k-2
t=k-1
Take logs
of probabilities
so this is a
shortest path
problem and
can use
dynamic
programming
(Viterbi algorithm)
k copies
Cybenko
IPAM 27-7-2005
Hidden Process Models
Observations missed,
noise added, unlabelled
(This is what we see)
aba cfkhdcbgdbkhagda
Observations
are interleaved
a b c c f h d cc a b g d b a g d a
Observations
related to state
sequences
abcdabbada
cfhccgdg
f, g
a, c
a, b
Underlying
(hidden)
state spaces
c, d
e
Model 1
Cybenko
f, c
c, d
h
Model n
IPAM 27-7-2005
Terminology and Summary
Processes have states.
The states are hidden.
States emit observables that are possibly not unique to a state.
Observables are not labeled, can be noisy and might be dropped.
Multiple processes might be instantiated.
The problem is to determine which processes are possible and which
states those processes can be in.
Multiple process detection can be reduced to single process detection
at the expense of exponential growth.
Tracks are associations of observations to processes.
Hypotheses are consistent tracks that explain all the observables.
Cybenko
IPAM 27-7-2005
Discrete Source Separation Problem
(viz Blind Source Separation, “Cocktail Party” Problem)
Process/Model Example:
3 states + transition probabilities
n observable events: a,b,c,d,e,…
Pr( state | observable event ) given/known
Observed event sequence:
….abcbbbaaaababbabcccbdddbebdbabcbabe….
A Hypothesis
Catalog of
Processes/Models
A Track
Which combination of which process models “best” accounts
for the observations? This is what we want to compute. Events
not associated with a known process are “anomalies”.
Cybenko
IPAM 27-7-2005
A Simple Example of Process Detection
a,b,c,d are events that can be observed
{a}
{b}
{b,c}
{c,d}
A
B
C
D
NETWORK WORM MODEL (NW)
(a,b,c,d ICMP traffic levels)
{a}
E
{b}
F
• a,b,c,d are events that can be observed
• states A, B, C, D, E, F are hidden
• observe a sequence of events
Sequence
Hypotheses
• ab
NW | RF
• abab
(NW & NW)|(RF&NW)...
E,F = 0
• ababc
(NW & RF)|(NW & NW)
repeat
• ababcc
read eventNW
e & NW
if e==a then E
• Which process
or combination
of
if E and e==b
then F
until F
processes
explains the observed events?
ROUTER FAILURE MODEL (RF)
Two models; states have different semantics;
sets of observables intersect – what is the “diagnosis”?
Cybenko
IPAM 27-7-2005
Detecting a Process Using Rules
{a}
{b}
{b,c}
{c,d}
A
B
C
D
WORM MODEL
(a,b,c,d ICMP traffic levels)
{a}
E
{b}
F
A,B,C,D = 0
repeat
read event e
if e==a then A
if A and e==b then B
if B and (e==b or e==c) then C
if C and (e==c or e==d) then D
until D
E,F = 0
repeat
read event e
if e==a then E
if E and e==b then F
until F
ROUTER FAILURE MODEL
What does “ab” mean ? (Process ambiguity)
What does “ac” mean ? (Missed Detections)
Cybenko
IPAM 27-7-2005
Rules for Process Disambiguation
{a}
{b}
{b,c}
{c,d}
A
B
C
D
WORM MODEL
(a,b,c,d ICMP traffic levels)
{a}
E
{b}
F
A,B,C,D = 0
repeat
read event e
if e==a then A
if A and e==b then B
if B and (e==b or e==c) then C
if C then (E=0, F=0)
if C and (e==c or e==d) then D
until D
E,F = 0
repeat
read event e
if e==a then E
if E and e==b then F
until F
ROUTER FAILURE MODEL
Cannot decide which process is instantiated until
more data arrives.
Cybenko
IPAM 27-7-2005
Rules for Missed Detections
{a}
{b}
{b,c}
{c,d}
A
B
C
D
WORM MODEL
(a,b,c,d ICMP traffic levels)
A,B,C,D = 0
repeat
read event e
if e==a then A
if A and e==b then B
if A and e==c then C,D
if A and e==d then D
if B and (e==b or e==c) then C
if C then (E=0, F=0)
if C and (e==c or e==d) then D
if D then (E=0, F=0)
until D
This clearly does not scale and does not lead to
manageable sets/systems of rules.
Cybenko
IPAM 27-7-2005
Complexity of Rule-Based Systems
for Multiple Process Detection



m process models, each with n states
Potentially as few as mn state transitions in the
original models
Potentially need to add:




Cybenko
O(m2n2) rules for disambiguation
O(mn2) rules for missed detections
these are “overhead” processing steps that can be
done generically, not by the decision tree or rule set
Process Query System software handles this
overhead processing
IPAM 27-7-2005
Approaches to Detecting Processes

Aristotelian - Traditional information retrieval is based
on specification of a query in terms of Boolean
expressions based on record fields. IE. SQL ( name =
“smith” & age > 20 & age < 40 ) + rule-based logics +
decision trees, etc

Newtonian - Next generation process detection
requires retrieval based on specification of a set of
discrete, dynamic processes. IE, descriptions of a
Hidden Markov Model, Hidden Petri Net, weak models,
FSMs, attack trees, etc.
Main Concept: Move from an Aristotelian to a
Newtonian Paradigm.
Cybenko
IPAM 27-7-2005
Examples of Process Detection Problems
Is there an unusual pattern of computer network events, host activities,
system calls, etc? (Network and computer security)
 Is a complex infrastructure (telecom, electricity, financial networks)
operating normally or in a failure mode? (Critical Infrastructures)
 Is my software operating normally? (Autonomic computing)
 What biological pathways/processes are engaged? (Molecular Biology)
 Is there an unusual pattern of document accesses within an enterprise
document control system? (Insider Threat Detection)
 Does a group of unusual transactions constitute a threat? (Homeland
Security)
 Has the physical border/perimeter been breached? (National and
industrial physical intrusion detection)
 Is there a large ground vehicle convoy moving towards our position?
(Tactical military)
 What’s going on around me? (Human Cognitive Processing)
IMPORTANT – All are “adversarial” situations, not cooperative, so the
observations are not necessarily labeled for easy identification and
association with a process!
IPAM 27-7-2005
Cybenko

Related Disciplines
“Weak”
Models
Hidden
Markov
Models
Linear State
Space Systems
Multiple
Target
Tracking
Underlying
Model
Finite
State
Machines
Markov
Chains,
Shannon
Channel
x’ = Ax + Bu
y = Cx + Dv
u, v Gaussian
noise
Any
Algorithms
for Single
Processes
State
marking, eg
Viterbi
algorithms
Kalman Filtering
Not applicable
Multiple
Processes
Process
Query
Systems
Process
Query
Systems
Process
Query
Systems
Multiple
Hypothesis
Tracking (MHT)
Cybenko
IPAM 27-7-2005
Software and Applications

Sensor networks

Airborne plume detection

Cyber security

Server pool management

Dynamics of social networks*
400000

Genomics and biological
pathways*
Total Successful Requests
350000
300000
250000
200000
150000
100000
50000

Human situation awareness*
0
0
100
200
300
400
500
Time (s)
*In process or planned.
Cybenko
IPAM 27-7-2005
Process Query Systems (PQS)

Process Query Systems solve the Discrete
Source Separation Problem in a generic way:

inputs



outputs




Cybenko
a sequence of unlabelled observations (stream, logfiles, etc)
a collection of process models
estimates of which processes produced those observations
estimates of which states those processes are in
Basic theory and technology has been developed
by the PQS team at Dartmouth
Now being applied to a variety of applications
IPAM 27-7-2005
Algorithms/Operations of PQS
2
Track
Track
Manage
Hypotheses
(MHT)
Subscribed
Data
Arrives
Hypothesis 1
4
Track
Track
Track
Track
Track
Tracks Track
Track
Tracks
Tracks
Track
Tra
cks
Tracks
Tracks
Track
Track
Tracks
Tra
cks
Hypothesis
Pool
Track
Tra
cks
Tra
cks
Tracks
Hypothesis n
Build or
Learn
Models
1
Recursive in Time
Cybenko
Track
Update Tracks Within
Hypotheses (Viterbi / Kalman /
NDFA,etc) and Create New Hypotheses
3
5
Evaluate
Solutions
and
Process
Outputs
IPAM 27-7-2005
Software: Process Query System
One platform, many applications
DISCUS
Cyberlog Analysis
Vehicle Tracking
Attacks on utilities
DHS
DARPA
PQSnet.net
Plume detection
Computer Security
ARDA
Robust Server
Pooling
Sensor networks
DHS
DHS
Generic Process Query System
Cybenko
IPAM 27-7-2005
The COBOL and pre-PQS Analogy
…
application logic statement 1;
application logic statement 2;
file management statement 1;
record management statement 1;
file management statement 2;
record management statement 2;
application logic statement 3;
record management statement 3;
file management statement 3;
application logic statement 4;
…
User responsibility
System responsibility
…
application logic statement 1;
application logic statement 2;
SQL statement 1;
application logic statement 3;
SQL statement 2;
application logic statement 4;
…
…
file management operation 1;
record management operation 1;
file management operation 2;
record management operation 2;
record management operation 3;
file management operation 3;
…
+
Application logic
Database management system
Interwoven logic
Post-SQL Programs
Pre-SQL Programs
…
model logic statement 1;
model logic statement 2;
sensor access statement 1;
state estimate statement 1;
sensor access statement 2;
state estimate statement 2;
model logic statement 3;
sensor access statement 3;
state estimate statement 3;
model logic statement 4;
…
User responsibility
System responsibility
…
model description statement 1;
model description statement 2;
model description statement 3;
model description statement 4;
…
…
sensor access statement 1;
state estimate statement 1;
sensor access statement 2;
state estimate statement 2;
sensor access statement 3;
state estimate statement 3;
…
Model description
Interwoven logic
Current Process Detection Programs
+
Process query system
PQS-based Programs
Computer Security Example
(V. Berk and N. Fox)
Funded by ARDA and DHS
Cybenko
IPAM 27-7-2005
Network Security

Objective:
Detect, disambiguate, and predict the course
of concerted network attacks in an
enterprise class network.

Why:
Problem domain demands the power of PQS
 Hundreds of “processes” occurring at once
 Lots of missed observations and noise
 All commercial technology focuses on collection
and presentation of data
 Existing correlation efforts very weak at best
Cybenko
IPAM 27-7-2005
Goal of PQS in network monitoring
Create a system that quickly, and
accurately correlates related activity.
 Assist a security analyst in deciding:

What activity is irrelevant.
 What activity needs attention and further
investigation.

Cybenko
IPAM 27-7-2005
SENSORS INTEGRATED
SENSOR
Cybenko
DESCRIPTION
DIB:s
Dartmouth ICMP-T3 Bcc: System
CovChan
Timing Covert Channel Detection
Snort
Signature Matching IDS
IPtables
Linux Netfilter firewall, log based
Samba
SMB server - file access reporting
Weblog
IIS, Apache, SSL error logs, …
US-agent
Userspace host monitoring agent
Tripwire
Host filesystem integrity checker
SCOPE
Global
Network
Host
IPAM 27-7-2005
Multistage Process Model
Potential malicious activity
Potential normal activity
Scanned
Data Access
Start/Normal
Infected
Cybenko
Exfiltration
IPAM 27-7-2005
PQS-Net Testbed at Dartmouth
DIB:s
Dartmouth
Internet
CovChan
PQS-Net
IPTables
Snort
SaMBa
US-Agent
ISTS
DMZ
WWW
Mail
WS
172.18.12.32-38
PQS
Attack Hosts:
• Skaion
• Custom Exploits
• Core Impact™
• Normal Traffic
WinXP/LINUX targets
• Covert Channels
192.168.24.192/26
• Worms
www.pqsnet.net
PQS-Net supply chain
Tier 1 Models
 Focus on individual host
status
 Report on status changes
Tier 2 Models
 Focus on correlating host
activity
 Report chains of events
Tier 1 Output
Tier 2 Output
Mon Feb 21 20:06:17 2005 000000 131.58.63.160
(hostile) recon on 100.10.20.4 SNORT 469
proto: 1
Hypothesis 1
Score: 0.8
Hypothesis 2
Score 0.2
A scans B
A scans B
Mon Feb 21 20:30:24 2005 000000 138.158.170.45
(hostile) attacked 100.10.20.4 ERRORLOG 400
proto: 6 dport: 443
B scans E
B attacks E
sensor data
Tier 1
Tracker
Attack steps
Tier 2
Tracker
Attack sequences
and scores
sensors
Cybenko
Analyst’s front-end
IPAM 27-7-2005
Example Scenario
Internet
A
C
D
B
E
Tier1 Alerts
Indicators
A scans B
Snort:
02/21-20:06:17.904500 [**] [1:469:1] ICMP PING NMAP [**] [Classification:
Attempted Information Leak] [Priority: 2] {ICMP} 131.58.63.160 -> 100.10.20.4
C attacks B
(success)
Cybenko
SSL error log (host 100.10.20.4):
[Mon Feb 21 20:30:24 2005] [error] mod_ssl: SSL handshake failed (server
www.osis.gov:443, client 138.185.170.45) (OpenSSL library error follows)
[Mon Feb 21 20:30:24 2005] [error] OpenSSL:
error:1406908F:lib(20):func(105):reason(143)
IPAM 27-7-2005
Example Cont’d
D
B
E
Tier1 Alerts
Indicators
B scans D
02/21-20:31:17.528602 [**] [1:1807:2] WEB-MISC Chunked-Encoding
transfer attempt [**] [Classification: Web Application Attack] [Priority: 1]
{TCP} 100.10.20.4:34074 -> 100.10.20.169:80
B attacks D (fails)
B scans E
B attacks E
(succeeds)
Cybenko
100.20.1.169 - - [21/Feb/2005:08:31:22 -0500] "GET
/default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 404 1287 "-" "-"
02/21-20:32:01.622465 [**] [1:1807:2] WEB-MISC Chunked-Encoding
transfer attempt [**] [Classification: Web Application Attack] [Priority: 1]
{TCP} 100.10.20.4:34076 -> 100.10.20.170:80
100.20.1.170 - - [21/Feb/2005:08:32:06 -0500] "GET
/default.idq?AAAAAAAAAAA………..AAAAAAA HTTP/1.1" 200 1287 "-" "-"
IPAM 27-7-2005
Fish Tracking (Kinematic Tracking)
A. Jordan, W. Chung, V. Crespi
Funded by DARPA and DHS
Cybenko
IPAM 27-7-2005
Real time Fish Tracking

Objective:
Track the fish in the fish tank

Why:
Very strong example of the power of PQS
 Fish swim very quickly and erratically
 Lots of missed observations
 Lots of noise
 Classical Kalman filters don’t work (non-linear
movement and acceleration)
 “Easier” than getting permission to track people
(we mistakenly thought)
Cybenko
IPAM 27-7-2005
Fish Tracking Details


5 Gallon tank with 2 red Platys
named Bubble and Squeak
Camera generates a stream of
“centroids”:
For each frame a series of (X,Y) pairs
is generated.

Model describes the
kinematics of a fish:
The model evaluates if new (X,Y)
pairs could belong to the same
fish, based on measured position,
momentum, and predicted next
position. This way, multiple
“tracks” are formed. One for each
object.

Model was built in under 3
days!!!
Cybenko
IPAM 27-7-2005
Autonomic Server Monitoring
(C. Roblee, V. Berk)
Funded by DHS, ARDA
Cybenko
IPAM 27-7-2005
Autonomic Server Monitoring

Objective:
Detect and predict deteriorating service
situations

Why:
Another strong example of the power of PQS
 Software and hardware are buggy and vulnerable
 Hot market, large profits for “The ONE” application
 Very ambiguous observations
 Sys-admins also want vacation
Cybenko
IPAM 27-7-2005
The Environment


Hundreds of servers and services
Various non-intrusive sensors check for:








CPU load
Memory footprint
Process table (forking behavior)
Disk I/O
Network I/O
Service query response times
Suspicious network activities (i.e.. Snort)
Models describe the kinematics of failures and
attacks:
The model evaluates load balancing problems, memory leaks,
suspicious forking behavior (like /bin/sh), service hiccups
correlated with network attacks…
Cybenko
IPAM 27-7-2005
Server Compromise Model:
Generic Attack Scenario
2.
Monitored host sensor output (system level)
3.
PQS Tracker Output
Current system record for host 10.0.0.24 (10 records):
Average memory over previous 10 samples: 251.000
Average CPU over previous 10 samples: 0.970
| time
| mem used | CPU load | num procs | flag |
---------------------------------------------------------------------------------| 1101094903 |
251
|
0.970
|
64
|
|
| 1101094911 |
252
|
0.820
|
64
|
|
| 1101094920 |
251
|
0.920
|
64
|
|
| 1101094928 |
251
|
0.930
|
64
|
|
| 1101094937 |
251
|
0.870
|
65
|
|
| 1101094946 |
251
|
0.970
|
65
|
|
| 1101094955 |
251
|
0.820
|
65
|
|
| 1101094964 |
253
|
1.220
|
65
| ! |
| 1101094973 |
255
|
1.810
|
65
| ! |
| 1101094982 |
258
|
2.470
|
65
| ! |
1.
Last Modified:
Mon Nov 21 21:01:03
Model Name:
server_compromise1
Likelihood:
0.9182
Target:
10.0.0.24
Optimal Response: SIGKILL proc 6992
o1 o2 o3
Snort NIDS sensor output
..
.
Nov 21 20:57:16 [10.0.0.6] snort: [1:613:7]
SCAN myscan [Classification: attempted-recon] [Priority: 2]:
{TCP} 212.175.64.248-> 10.0.0.24
..
.
Cybenko
o1
SIGKILL
t0
t4
Response
t 1 t2 t3
Observations
IPAM 27-7-2005
Experimental Results:
Tracking
400000
400000
350000
350000
Total Successful Requests
Total Successful Requests
No Tracking
300000
250000
200000
150000
100000
300000
250000
200000
150000
100000
50000
50000
0
0
0
100
200
300
400
500
0
100
200
300
400
500
Time (s)
Time (s)
100
100
90
90
80
80
% System Memory Used
% System Memory Used
Successful Requests
70
60
50
40
30
20
10
70
60
50
40
30
20
10
0
0
0
100
200
300
Time (s)
210,000 requests serviced
Cybenko
400
500
0
100
200
300
400
500
Time (s)
System Memory Consumed
380,000 requests serviced
IPAM 27-7-2005
Theory

Process Query System frameworks offer a
principled approach that enables
understanding how distinguishable models
(attack and failure) are
 developing a notion of processes that are
“trackable,” given models and sensing
infrastructure (ie a “sampling theory”)

Cybenko
IPAM 27-7-2005
Hypothesis Growth
A “hypothesis” is a consistent
assignment of events to
processes and/or states(ie,
each event assigned to only
one process instance).
Given a set of “hypotheses”
for an event stream of length
k-1, update the hypotheses to
length k to explain the new
event.
NP-Complete in general.
Need to prune the pool of
hypotheses, keeping the
most suitable.
Cybenko
time
Individual path is
a “track” – ie one
process instance
Consistent tracks
form a “hypothesis”
IPAM 27-7-2005
Models and Hypothesis Growth
“Weak” model
FSM with “emission”
vectors
Emission for state i = 0/1 vector of sensor reports
eg obs(i) = ( 0 , 1 , 1 , 0 , 0 , 1 , 1 )
Observation vector at time t collected by
sensors: eg sensors(t) = ( 0 , 1 , 1 , 1 , 1 , 1 , 0 )
Possible states at time t are determined by:
P = { i | Hamming_distance( obs(i) , sensors(t)) <= HD }
R = { i | j possible at time t - 1 and i is reachable from j }
U
P
R is the set of possible states at time t
Number of hypotheses at time t recursively computed as above.
Theorem: For a fixed value of HD, the worst-case number of hypotheses
at time t is either polynomial or exponential in t.
(Crespi, Cybenko, Jiang 2004)
Cybenko
IPAM 27-7-2005
Longer
Longer
tracking
tracking
time
time
More noise
More
noise
(worse
model)
(worse model)
Ouch!!!
Nice Demo!!
Cybenko
IPAM 27-7-2005
Poor
Models
and
Sensor
Coverage
Longer
tracking
time
More noise
(worse model)
Excellent
Models
and
Sensor
Coverage
Cybenko
Acceptable
Models
and
Sensor
Coverage
IPAM 27-7-2005
Basic Idea Behind the Proof
N states
time t
time t+1
time t+2
time k
If there are never two distinct paths from any node to itself over any
period of observation, there is a simple injective mapping (ie. unique
labeling) of the paths into {0, 1, ... , k} x {0, 1, ... , k} x {0, 1, ... , k} ... x
{0, 1, ... , k} 2N times. So the number of paths is < (k+1)2N. The label
for each path is the time it first occupies a state and the time it last
occupies that state.
Cybenko
IPAM 27-7-2005
Basic Idea Behind the Proof
N states
time t
time t+1
time t+2
time k
Process dynamics (ie what is reachable from each state in a time step)
+ observations + noise threshold determines a “trellis”. If there are two
distinct paths from one node to itself over some period of time, the
number of distinct paths grows exponentially by repeating the construct.
Cybenko
IPAM 27-7-2005
Relationship to Spectral Radius


Classical spectral radius: r(A) = |lmax|
Joint spectral radius of a set, S = {A1, ... An}, of
matrices:
r(S) = lim max r(P Bk)1/ t
t



Bk e S
0 < k < t+1
Hypothesis growth is polynomial iff r(S) <= 1
Deciding whether r(S) <= 1 for real or rational
matrices is impossible (Tsitsiklis and Blondel, 2000)
If S consists of 0-1 matrices, decidable but NP hard.
Cybenko
IPAM 27-7-2005
Distinguishability of models


Given two “models”, how distinguishable are
they?
Example: How different are these two models?
p(a|1) = 0.8 , p(c|1) = 0.2
p(b|2) = 1
p(a|3) = 0.8, p(c|3) = 0.2
0.8
1
1
p(a|1) = 0.9 , p(d|1) = 0.1
p(b|2) = 1
Cybenko
p(a|3) = 0.8, p(c|3) = 0.2
0.8
0.9
1
3
2
0.2
0.5
0.5
0.2
0.1
3
2
0.5
0.5
IPAM 27-7-2005
Distinguishability of models
s
The goal is to answer questions such as: “Do we
need to build more refined models or do we
need to add additional sensors/data sources or
improve tracking/hypothesis management?”
Cybenko
IPAM 27-7-2005
Different degrees of distinguishability between
models given their sensing capabilities: 1
Red: Prob of deciding model 2 given model 1
Blue: Prob of deciding model 1 given model 2
Entropy of the two ergodic models are different.
Decision rule is based on ML as determined by
the Viterbi algorithm
Shannon-MacMillan-Brieman Ergodic Theorem
states that “most” observation sequences
are “typical” and have probability
related to the entropy
Cybenko
IPAM 27-7-2005
Different degrees of distinguishability between
models given their sensing capabilities: 2
However, nonmonotonic behaviors are possible
(in general) and without convergence to zero
(if the entropies are the same)
Cybenko
IPAM 27-7-2005
Different degrees of distinguishability between
models given their sensing capabilities: 3
However, nonmonotonic behaviors are possible
(in general) and without convergence to zero
(if the entropies are the same)
Cybenko
IPAM 27-7-2005
Unifilar models
Definition: for any pair of state si, and
input yj, there could be at most one
successor state
One state sequence, one observation seq.
One observation seq., at most 1 state seq.
If acceptable, there is 1 state seq.
If unacceptable, there is 0 state seq.
T1
{0}
T2
{1}
A WM can be reduced to a DFA.
Every DFA has an unique minimum state
unifilar WM:
T3
{1}
1 1 0
1 0
A  0 0 1 B  0 1
1 1 0
0 1
 0  1 1 0
WM->DFA->Minimization->WM
For a unifilar WM, counting acceptable strings
with length n, for n sufficiently large:
L(M   Sn   0 An1   0 1 l1n1
1
Where λ1 is the maximum eigenvalue of A .
Y. Sheng thesis, efficient estimates of l1
Cybenko
IPAM 27-7-2005
Summary
Multiple process detection is a ubiquitous problem
with many applications but it has not been
systematically studied.
Existing approaches are either very ad hoc, very
specialized or very unscalable.
There is a promising generic software system for
solving multiple process detection.
The theory is rich and largely unexplored.
Cybenko
IPAM 27-7-2005
Questions
See www.pqsnet.net for papers.
Cybenko
IPAM 27-7-2005
Download