Dynamic Social Network Analysis

advertisement
Mining for Social Processes in
Intelligence Data Streams
Robert Savell, Ph.D.
SBP ‘08
April 1,2008
04/01/08
Overview:
1.
2.
3.
4.
5.
Introduction: Process Based SNA.
Process Detection and the Process Query
System (PQS).
Experiment: The Alibaba Data Set.
Results.
Conclusion.
Traditional SNA and DSNA
are reductionist:
Project Rich Data Sets onto:

Graph representations



Define Structure and Properties



Directed and undirected
Reachability and connectedness
Centrality and prestige
Sub groups - clustering
Analyze Role and Position



Structural Equivalence
Block Models
Network Level Equivalence
Real World DSNA: Complex
Systems on Networks.
Real World social networks
are composed of dynamic
multimodal systems whose
attendant processes and
interactions both determine
and are determined by the
network topology
Process Based Social Network Analysis:
Problem:
1. Identify and track active
process threads in
transactional datasets.
2. Identify supporting control
and communication processes
in the social network.
3. Establish structural roles of
agents.
4. Define active individual or
group processes and track the
state of these processes.
Note: Paradoxically added complexity can sometimes simplify the analysis.
Methodology:
Process Detection and Tracking
Observations missed,
noise added, unlabelled
(This is what we see)
aba cfkhdcbgdbkhagda
Observations
are interleaved
a b c c f h d cc a b g d b a g d a
Observations
related to state
sequences
abcdabbada
cfhccgdg
f, g
a, c
a, b
Underlying
(hidden)
state spaces
c, d
e
Process 1
f, c
c, d
h
Process n
Note: Complexity from Entanglement of Distributed Simple Processes
Discrete Source Separation:
The Process Query System (PQS):
Sensor: Upon query produces a constrained set of recent email events from stream.
Subscriber: Queries Sensors. Preprocesses streams. Produces attribute rich encapsulated observations.
Trafen Engine: Partitions observation set into tracks (evidence of underlying social processes).
Produces maximum likelihood hypothesis (collections of tracks and inferred process descriptions).
[current implementation is based on the MHT algorithm].
Publisher: Formats Output of Trafen Engine.
Please refer to www.pqsnet.net
Weak Process Detection:
Task: The Alibaba Dataset (Scenario 1)

A Simulated SigInt and HumInt collection.




Approximately 800 reports.
8 month plot window.
409 named entities.
98 locations.
Ground Truth: A 12 Member Terrorist Cell --- connected with the Ali Baba Network
plans to “bake a cake” (build a bomb) which will be targeted to
blow up a water treatment facility near London. The plot takes place
from April to September of 2003.
A close knit association of terrorists and sympathizers from other organizations
will fill the air w/ fake chatter and decoy plots.
Alibaba Scn 1: discover the plot.
Scenario 1:
820 reports.
409 named entities.
98 locations.
Approx. 8 months.
Alibaba Scenario 1 Ground Truth
(Lethal characters in green w/
their connected component in cyan).
Alibaba Scenario 1 Ground Truth
Ground Truth:
Leader: Imad Abdul.
Planner: Tarik Mashal.
Hacker: Ali Hakem.
Financier: Salam Seeweed.
Recruiter: Yakib Abbaz.
Security: Ramad Raed.
Demolitions: Quazi Aziz.
Demolitions: Hosni Abdel.
Associate: Phil Salwah.
Associate: Lu’ay.
Alibaba terrorist Network in green.
Background connected component in cyan.
Alibaba Scenario 1: SNA (cluster
analysis)
Stationary clustering
finds some key suspects:
1-Phil Salwah
2-Abdul
3-Yakib Abbaz
4-Tarik Mashal
5-Qazi
6-Fawzan
7-Alvaka
8-Afia
9-Mazhar
10-Salam
11-Ahlima Amit
12-Wazir Bengazi
13-Raed
14-Saud Uvmyuzik
15-Mahira
Algorithm: Extract triads. Collect common neighbors. Threat score for node is
proportional to number of triads containing node. Top 15 suspects shown at right.
Alibaba S1: SNA Results
Results vs. Ground Truth
Stationary clustering:
Ground Truth:
1-Phil Salwah
Leader: Imad Abdul.
2-Abdul
Planner: Tarik Mashal.
3-Yakib Abbaz
4-Tarik Mashal
Hacker: Ali Hakem.
5-Qazi
Financier: Salam Seeweed.
6-Fawzan
7-Alvaka
Recruiter: Yakib Abbaz.
8-Afia
9-Mazhar
Security: Ramad Raed.
10-Salam
Demolitions: Quazi Aziz.
11-Ahlima Amit
12-Wazir Bengazi
Demolitions: Hosni Abdel.
13-Raed
Associate: Phil Salwah.
14-Saud Uvmyuzik
Associate: Lu’ay.
15-Mahira
---->
Significant Deviations from Ground Truth
DSNA: A Process View
--- What we’d like:
Full Transactional Data
Ex. A Complete Meeting FSM
A Process View I--- What we have:
Colocation Information:
Event(d) = {date, location, named entities x 3}.
A Process View II--- Process Fragments:
A Process View III--and singleton evidence of local state:
Some Example Target/Event Strings from
Alibaba Scenario 1:
• 'Abdul tasked Yakib to recruit’.
• 'Declining invitation to meet Phil Salwah’.
• 'Discussed planning schedule’.
• 'Arranged for meeting next week’.
• 'Charity fundraiser’.
• 'Discussed payment for assisting in baking of cakes’.
• 'Informed that deception is in effect’.
• 'Discussed training arrangements for baking cake’.
• 'Attempted theft of chemicals’.
• 'Casing Portsmouth Facility’.
Stages of Process Detection (1):
Track Individual Entities.
A. Remove Broadcasts. (Minimal
information content).
1. Infer a home location for entities, and
track individual trajectories.
Stages of Process Detection (2):
Track Group Coordination Processes.
2. Aggregate l trajectories according to
group synchronization FSM.
Weak Process Methodology (Stage 1):
Given: A Constrained Alibaba corpus--- colocation event tuples:
The Problem: Make Group and Subgroup coordination and broadcast
process assignments (partition the event space):
Define a quality measure for the partition:
Results: Alibaba Network
Discovery.
Ground Truth:
1 Leader: Imad Abdul.
2 Planner: Tarik Mashal.
3 Hacker: Ali Hakem.
4 Financier: Salam Seeweed.
5 Recruiter: Yakib Abbaz.
6 Security: Ramad Raed.
7 Associate: Phil Salwah.
8 Demolitions: Quazi Aziz.
9 Demolitions: Hosni Abdel.
9 ******: Omar.
10 Recruitee: Fawzan.
11: Decoy: Ahmet, Ali,…
12 Associate: Lu’ay.
12. ******: Sinan.
Ali Baba Network Discovery 2:
Result: The technique successfully assigns significant hierarchical relationships across the net.
Ali Baba Cell Process Signature:
Downstream
Control
Coordination w/
Top 4 Suspects
Peak Logistical Preparation
Initiation of plot
Peak planning period
Ali Baba Subgroup Signatures:
Downstream Control
Ali Baba Top 4 Suspects:
High Level Coordination
Early and persistent meeting events.
Decoy Plot (Ship or Port):
Also lacks upstream coordination.
Sparse early structure. No meetings.
Ali Baba Role Differentiation I:
Downstream Control
Predominance of
Home Events
Leader: Imad Abdul
Ali Baba Role Differentiation II:
Downstream Control
Operational Independence
(from Abdul)
Balanced Travel and
Home Events
Planner: Tarik Mashall
Ali Baba Role Differentiation III:
Few Downstream Events
(not a subgroup leader)
Close Interaction w/
Imad Abdul
Predominantly
Home Events
w/ Abdul
Financier: Salaam Seeweed
Stages of Process Detection (4):
Track an Evolving Threat.
Note: too little training examples to do this systematically. But...
Some potential Keyword Mappings:
Requisite processes:
1-personnel
2-skill
3-leadership
Recommend
Recruit
Invitation
Propose
Arrange
Meeting
Plan
Assign Reprimand
Report
Terminate
Assassinate
Disengage
Train
Finance
Request
Exit Return
Trip
7-transport
8-house
Price
Payment
Material
Smuggle
Break and Enter
Housing
Deception
9-stealth
Case
10-recon
11-action
Task
Skill
4-train
5-finance
6-material
Assign
Target
Case
Sleep
Target
Case
Activate
Attack
Detecting an aliased plot:
Cake Plot vs. WaterPlot Sync Events
Cake Plot
Activity profiles, of suspects
Associated with each plot.
Similar but not obviously related.
Water Plot
Aliased plot detection:
Cake vs Water Threat Spectra
Legend:
1-personnel
2-skill
3-leadership
4-train
5-finance
6-material
7-transport
8-house
9-stealth
10-recon
11-action
AND THE WINNER IS…
Cake: blue
Water : red
Conclusion:

Process analysis provides a generic framework for identification
tracking and categorization of social organisms.

Excellent results so far from process based techniques--- even
on restricted attribute sets.

Just the beginning of exploration of this methodology within the
complex systems framework.
Questions?
Thanks To:
George Cybenko: Postdoctoral advisor.
Gary Kuhn: IC advisor.
and the Process Query Systems Group.
For further examples of PQS applications please visit: www.pqsnet.net.
Extra Slides:
A hostile network as an autopoietic
system (collection of processes):
Sustaining Processes (a partial list):
Structural coherence:
 planning. leadership. synchronization.
 Differentiation (from environment) -- deception, active defense.
 obsolescence and termination.
 Metabolism:
 financial and material support.
 transportation and housing.
 Sustainability/Reproduction:
 recruitment. reward. indoctrination.
 fission. merger.
 Responsiveness (environmental interaction):
 plot generation. planning. execution.
 adaptive strategies. replanning.

DSSP Step I--- Partition the
event space:
Entity Tracking and Stream Aggregation:
1)
i.
ii.
iii.
Isolate low entropy events such as broadcasts
via process signature (broadcast FSM).
Track spatio and socio-temporal trajectories of
individuals.
Identify trajectory collisions (co-occurrences).
DSSP Step Ia--- Derive network
hierarchy from partition structure:
Identify coordinated entities (subgroups):
1)
i.
ii.
iii.
Aggregate entities into structurally coherent
units.
Establish hierarchical relationship of units.
Identify primary communication channels
between units.
DSSP Step 2--- State Assignments:
Identify Process Signatures:
1)
i.
ii.
iii.
Assign event states in Coordination FSM using
hierarchical context defined in Step I.
Distribution of event states defines the weak
process signature for the individual and
subgroup.
Qualitatively (for now) assess individual roles via
analysis of synch process event distributions.
Download