SAS Visual Analytics Overview

advertisement
ANALYTICS IN BIG DATA ERA
ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA
MAURIZIO SALUSTI SAS
C op yr i g h t © 2 0 1 2 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
NEW QUESTIONS WITH BIG DATA
•
Not always data are in structured data model
• Often we need to join data with not same keys
• Often data coming with periodic flow in real time
• Often we need to recognize pattern from data changing
frequently
New ways to manage distributed and not structured in classical way data are
needed:
We need different paradigm to organize data and, above all, to query them.
Collect several sources and manage them open several new problems:
• Relational data (GRAPH DATA) can be useful to understand event
spreading in a population.
• Data in motion coming from several tools on field (sensor
devices) provide dynamic pattern often without an history of
their form
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ANALYSIS
•
Not always you can apply sampling to extract data
• Not always you can join data to define ABT
• Often you need to know how environment can influence
event changements.
• Often we need to merging information collected in
different time window.
• SQL Queries often are useless to reach these data:
• Information are not organized into DB structures
• Data are very different way to provides information: i.e. text
are not easy to query using traditional query languages.
• Merging are driven by fuzzy keys where you can assign group
information according statistic relationship.
• Event can be happen driven from relational with other data
rather from specific behavior.
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
SAS
PROCEDURES
BIG DATA REQUIRES ALSO SEVERAL METHODOLOGICAL
STRATEGIES:
• methods for pattern recognition coming from statistical
inference analysis using SEMMA paradigm for supervised and
unsupervised data patterns.
• Other coming from stochastic process analysis both for
continue time and discrete events like diffusion process or
markov chains process.
• Time series forecasting: stochastic processes in continue time
with continue space
• Multivariate analysis applied on semantic rules to discover
text patterns
• Graph analysis
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ANALYTICAL CATEGORIES AND TARGET USAGE
Data Mining
Statistics
•
•
Binary target
& continuous
no.
predictions
Linear, NonLinear, &
Mixed Linear
modeling
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
• Complex
relationships
•
Tree-based
Classification
•
Variable
Selection
Text Mining
•
Parsing
large-scale
text
collections
• Extract
entities
• Auto.
Stemming &
synonym
detection
Forecasting
•
Large-scale,
•
multiple
hierarchy
problems
Optimization
Econometrics
•
Probability of
events
Severity of
random
events
•
•
•
Local search
optimization
Large-scale
linear &
mixed integer
problems
Graph theory
Data coming from different sources can be tie using
different methods like linear or not linear canonical
decomposition.
Data pattern variability on data in motion like data coming
from devices can be sampled or simulate pattern
distribution.
Sparse vector data with missing values can be simulate
using particular regression methods
Discrete choice among different events can be defined using
multinomial discrete models.
Automatic time series forecast considering many series at
the same time
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
GRAPH
ANALYSIS
Network
Graph Analysis can be used to:
Measuring nodes importance and
relationships among them.
Link
Node
Measuring changes over time into a
net.
Identify how events spreading into the
net using particular diffusion process.
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Scenario
REAL TIME MONITORING SYSTEM:
 Building and managing the behavioral patterns of the
measures for each type sensor to detect abnormal process
by rules of alarm (offline process).
 Building scenario how events spreading and influence
different part of system
 Monitoring measures to detect anomalies and the validity
of the rules over time (online process).
 Produce models to predict abnormalities in the medium
term.
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Scenario
INTEGRATED PROCESS CONTROL:
• Shewhart type control charts with identification of the role of the
history of the measures and trend-cycle components according to
the Box-Jenkins methodology
• Multivariate analysis of processes: This is the main tool for
statistical process control measures in relation to each other
considering Markov chain process or diffusion processes
• Classification system components: The machines can be classified
according to their behavior and some information about the specific
characteristics of the same
• Identifying patterns of alarm: Rules of diagnostic thresholds
identified by the control charts to minimize false alarms, depending
on the history of the event to be monitored in real time
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
ADMINISTRATION SYSTEM: EXAMPLE
System interface
Extraction rules DABT
Pattern recognition and event handling Module
Event process thresholds managing for alert process
Measures Metadata and classification
Historical process data storage
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
REAL TIME MONITORING
SYSTEM: EXAMPLE
Alert Rules and pattern thresholds
Module in real time check
Real time modelling.
Data streaming analysis and update
historical data.
Real time Feedback
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
Download