Application of graph analytics and machine learning

advertisement

Application of Machine Learning

Patterns and Behaviors in Complex Systems

James M. Brase

Deputy Associate Director, Computation

Lawrence Livermore National Laboratory

LLNL-PRES-671957

This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract

DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

9-8-10-SSCI

Machine learning is applied to a broad set of applications at LLNL

Document analysis – Is this document relevant to topic Y? Topics are defined as distributions of terms, phrases, phrase graphs ….

Cybersecurity – How many network connections do we expect node A to make in the next minute?

Materials science – Discovery of patterns in component material attributes and critical reaction parameters to produce customdesigned properties

Adaptive mesh simulation - Will this simulation parameter set cause the mesh to tangle?

Image and multimedia analysis – Can we label the objects in this image? Can we find other, similar videos?

Lawrence Livermore National Laboratory 2

LLNL-PRES-671957

Machine learning – statistical inference of patterns in data

Training….

Training data

Applying….

New data

Training set

Feature vectors

Feature vector

1

1

1

-1

1

-1

-1

-1

1

-1

1

-1

-1

Labels l

ˆ = f

ˆ ( ) f

ˆ ( )

Supervised learning –

Mapping feature vectors to labels

• Discrete labels – classifiers

• Continuous labels – regression

• Function mapping

• Logistic regression

• Random forests

• Neural networks

Unsupervised learning –

Finding structure in data

• Association rules

• Clustering

• Density estimation

• Autoencoders

Lawrence Livermore National Laboratory 3

LLNL-PRES-671957

Learning language models for estimating document relevance

New document graph

Entity extractor

New documents

Weak filtering

Collocation filter

Keyphrase extractor

Graph classifier

Relevant graphs vs backround graphs

Forced migration reference documents

Training graph models Relevance score

Lawrence Livermore National Laboratory 4

LLNL-PRES-671957

Document relevance for the NYT corpus

Relevance to forced migration reference document set

Lawrence Livermore National Laboratory 5

LLNL-PRES-671957

Cybersecurity uses machine learning and graph analysis to model network behavior

Collect packets, flow and process data from the full physical network

Stream processing for feature and signature extraction

Build a dynamic graph representation of activity

Machine learning on the dynamic graph

Node and group classification algorithms

Temporal activity models – dynamic

Bayesian networks

Anomaly detection algorithms

Applications

Inferring node and group roles

Prediction of activity distributions

Cueing analysts to anomalous behaviors

Functional network discovery and characterization

Lawrence Livermore National Laboratory 6

LLNL-PRES-671957

Dynamic IP-IP graph

Host role learning Learning Markov models for behavior forecasting

Reduced prediction error using host roles

Host roles are local characteristics of the IP-IP graph structure e.g.

“center of star”, end node,

Anomaly Detection in host role distribution

Ryan Rossi, Brian Gallagher, Jennifer Neville, Keith Henderson. Modeling Dynamic

Behavior in Large Evolving Graphs. ACM International Conference on Web Search and Data Mining (WSDM), 2013.

Lawrence Livermore National Laboratory 7

LLNL-PRES-671957

Some R&D directions in machine learning

Training….

Training data

Training set

N

1

1

1

-1

1

-1

-1

-1

1

-1

1

-1

-1

D

Labels

Feature vectors f

ˆ ( )

Features have traditionally been hand engineered. Is there a principled approach to finding a good set of features?

 Deep learning

We usually deal with N>>D. In emerging app’s we can have N<<D.

(e.g. genomics, ...). Can we regularize

(constrain the solutions) with mechanistic models?

Lawrence Livermore National Laboratory 8

LLNL-PRES-671957

Deep learning provides an unsupervised approach to learning feature sets from data

Lawrence Livermore National Laboratory 9

LLNL-PRES-671957

Deep machine learning research is extending pattern recognition and discovery beyond human capabilities

100B synapse deep learning networks Airplanes neuron

“Fireworks” neuron

Learning patterns in 100M random images from Flickr

Images w. text neuron

• Discovering complex patterns in massive multisource intelligence data sets guided by science-based models – not exact keywords

• Image recognition performance now surpasses human accuracy

• Partnership with Stanford and UC Berkeley on algorithms, NVIDIA on large GPU implementations, and IBM on neurosynaptic architectures

Lawrence Livermore National Laboratory 10

LLNL-PRES-671957

Data movement is the limiting factor for analytics

– supplementing the memory hierarchy

Partnership with Intel and Cray to develop a 150 TF/s data analytics computer

Technical focus on NVRAM layers in memory hierarchy supporting 24 core node – prototyping analytics in new environment

Initial applications will focus on

 Prototyping exascale simulation analysis architectures

Bioinformatics algorithms

Graph analytics

Lawrence Livermore National Laboratory

Over 5GB DRAM & 36GB NVRAM per core

11

LLNL-PRES-671957

Download