Patterns and Behaviors in Complex Systems
James M. Brase
Deputy Associate Director, Computation
Lawrence Livermore National Laboratory
LLNL-PRES-671957
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract
DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
9-8-10-SSCI
Document analysis – Is this document relevant to topic Y? Topics are defined as distributions of terms, phrases, phrase graphs ….
Cybersecurity – How many network connections do we expect node A to make in the next minute?
Materials science – Discovery of patterns in component material attributes and critical reaction parameters to produce customdesigned properties
Adaptive mesh simulation - Will this simulation parameter set cause the mesh to tangle?
Image and multimedia analysis – Can we label the objects in this image? Can we find other, similar videos?
Lawrence Livermore National Laboratory 2
LLNL-PRES-671957
Training….
Training data
Applying….
New data
Training set
Feature vectors
Feature vector
1
1
1
-1
1
-1
-1
-1
1
-1
1
-1
-1
Labels l
ˆ = f
ˆ ( ) f
ˆ ( )
Supervised learning –
Mapping feature vectors to labels
• Discrete labels – classifiers
• Continuous labels – regression
• Function mapping
• Logistic regression
• Random forests
• Neural networks
Unsupervised learning –
Finding structure in data
• Association rules
• Clustering
• Density estimation
• Autoencoders
Lawrence Livermore National Laboratory 3
LLNL-PRES-671957
New document graph
Entity extractor
New documents
Weak filtering
Collocation filter
Keyphrase extractor
Graph classifier
Relevant graphs vs backround graphs
Forced migration reference documents
Training graph models Relevance score
Lawrence Livermore National Laboratory 4
LLNL-PRES-671957
Relevance to forced migration reference document set
Lawrence Livermore National Laboratory 5
LLNL-PRES-671957
Cybersecurity uses machine learning and graph analysis to model network behavior
Collect packets, flow and process data from the full physical network
Stream processing for feature and signature extraction
Build a dynamic graph representation of activity
Machine learning on the dynamic graph
• Node and group classification algorithms
• Temporal activity models – dynamic
Bayesian networks
• Anomaly detection algorithms
Applications
• Inferring node and group roles
• Prediction of activity distributions
• Cueing analysts to anomalous behaviors
• Functional network discovery and characterization
Lawrence Livermore National Laboratory 6
LLNL-PRES-671957
Dynamic IP-IP graph
Host role learning Learning Markov models for behavior forecasting
Reduced prediction error using host roles
Host roles are local characteristics of the IP-IP graph structure e.g.
“center of star”, end node,
…
Anomaly Detection in host role distribution
Ryan Rossi, Brian Gallagher, Jennifer Neville, Keith Henderson. Modeling Dynamic
Behavior in Large Evolving Graphs. ACM International Conference on Web Search and Data Mining (WSDM), 2013.
Lawrence Livermore National Laboratory 7
LLNL-PRES-671957
Training….
Training data
Training set
N
1
1
1
-1
1
-1
-1
-1
1
-1
1
-1
-1
D
Labels
Feature vectors f
ˆ ( )
Features have traditionally been hand engineered. Is there a principled approach to finding a good set of features?
Deep learning
We usually deal with N>>D. In emerging app’s we can have N<<D.
(e.g. genomics, ...). Can we regularize
(constrain the solutions) with mechanistic models?
Lawrence Livermore National Laboratory 8
LLNL-PRES-671957
Deep learning provides an unsupervised approach to learning feature sets from data
Lawrence Livermore National Laboratory 9
LLNL-PRES-671957
100B synapse deep learning networks Airplanes neuron
“Fireworks” neuron
Learning patterns in 100M random images from Flickr
Images w. text neuron
• Discovering complex patterns in massive multisource intelligence data sets guided by science-based models – not exact keywords
• Image recognition performance now surpasses human accuracy
• Partnership with Stanford and UC Berkeley on algorithms, NVIDIA on large GPU implementations, and IBM on neurosynaptic architectures
Lawrence Livermore National Laboratory 10
LLNL-PRES-671957
Data movement is the limiting factor for analytics
– supplementing the memory hierarchy
Partnership with Intel and Cray to develop a 150 TF/s data analytics computer
Technical focus on NVRAM layers in memory hierarchy supporting 24 core node – prototyping analytics in new environment
Initial applications will focus on
Prototyping exascale simulation analysis architectures
Bioinformatics algorithms
Graph analytics
Lawrence Livermore National Laboratory
Over 5GB DRAM & 36GB NVRAM per core
11
LLNL-PRES-671957