Använd SAS för att bearbeta och analysera ditt data i Hadoop

make connections • share ideas • be inspired
Använd SAS för att bearbeta och
analysera ditt data i Hadoop
Mikael Turvall
Copyright © 2014, SAS Institute Inc. All rights reserved.
Arkitektur
®
®
SAS VISUAL ANALYTICS and SAS VISUAL STATISTICS
®
SAS IN-MEMORY STATISTICS FOR HADOOP
MPP DATASTORE
BLADE ENVIRONMENT
WEB-BASED CLIENT
SAS VA/VS
METADATA
IN-MEMORY STORE
SERVER (Optional)
SAS® LASR™
ANALYTIC SERVER
SAS Embedded
Process
Hadoop
Hadoop
Teradata
Pivotal
Oracle
®
SAS Studio
®
MID-TIER
RDBMS
Copyright © 2014, SAS Institute Inc. All rights reserved.
Cloudera
Hortonworks
WORKSPACE
SERVER
Nonrelational
Click Stream
PC Files
Other
Varför ?
Hadoop som en platform för data
Hadoop som kärnan i nästa generations
analysplatform
IDENTIFY /
FORMULATE
PROBLEM
EVALUATE /
MONITOR
RESULTS
DATA
PREPARATION
DEPLOY
MODEL
DATA
EXPLORATION
VALIDATE
MODEL
TRANSFORM
& SELECT
BUILD
MODEL
Copyright © 2014, SAS Institute Inc. All rights reserved.
Från data till beslut
•
•
•
•
SAS/ACCESS
SAS Data Management
SAS Federation Server
SAS Data Loader for Hadoop
MANAGE
DATA
•
Copyright © 2014, SAS Institute Inc. All rights reserved.
MONITOR
DEPLOY &
SAS Scoring Accelerator for
Hadoop
SAS Code Accelerator for
Hadoop
EXPLORE
DATA
•
•
•
SAS Visual Analytics
SAS In-memory
Statistics for Hadoop
•
•
•
SAS HPA Products
SAS Visual Statistics
SAS In-memory Statistics
for Hadoop
SAS Enterprise Miner
TEXT
DEVELOP
MODELS
•
Kom igång snabbt
Möjligheter
• Transparent access till Hadoop-tabeller i
vanliga SAS-library
• Man kan programmera i SAS SQL och SAS
datasteg som vanligt
• Man kan hantera Hadoop från SAS:
• Native HDFS kommandon
• MapReduce, Pig, och HiveQL
MAN KAN BÖRJA IDAG
Copyright © 2014, SAS Institute Inc. All rights reserved.
Fördelar
• Man behöver inte vara expert på Hadoopspecifik syntax
• Byta till Hadoop är lika enkelt som att byta ett
libname
• Befintliga SAS program, rapproter, etc. kan
återanvändas
• Många olika sätt att accessa data ger IT olika
möjligheter att utnyttja kapaciteten
Var får jag tag i Hadoop ?
Copyright © 2014, SAS Institute Inc. All rights reserved.
SAS/ACCESS to Hadoop
HADOOP
SAS
Hive QL
SERVER
Flytta delar av jobbet in i Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Komma igång med Hadoop
libname elefant hadoop PORT=10000 SERVER=sascldserv02
USER=hadoop PASSWORD=“hadoop" ;
Copyright © 2014, SAS Institute Inc. All rights reserved.
Hadoop Filename Statement
FILENAME hdpfile1 hadoop "/user/hadoop/gutenberg/pg20417.txt"
Definiera en fileref
cfg="C:\Users\hadoop_config.xml" user='hadoop' ;
DATA my_analysis_data;
INFILE hdpfile1 ;
INPUT …;
Använd den som
vanligt
RUN;
OBS! Flytta inte över ALL data i till en SAS-tabell
Copyright © 2014, SAS Institute Inc. All rights reserved.
Hadoop File Reader
 SAS 9.4 kan läsa “icke-HIVE”-filer som tabeller
 Filformatformat
• Delimited
• CSV
• XML
• JSON (experimental)
• Binary files
 Multipla filer i en katalog
Copyright © 2014, SAS Institute Inc. All rights reserved.
Hadoop File Reader
libname HDP hadoop user=hadoop pw=Hadoop
config = '/home/sasinst/hadoop_config.xml‘
hdfs_tempdir = '/user/hadoop/tmp‘
hdfs_metadir = '/user/hadoop/metadata‘
hdfs_permdir = '/user/hadoop/dataload' ;
proc hdmd name=hdp.pipedata_dept
format=delimited sep = '|‘
DATA_FILE='pipedata_dept.txt' ;
COLUMN col1 int;
COLUMN col2 char(15); run;
proc print data=hdp.pipedata_dept; run;
Copyright © 2014, SAS Institute Inc. All rights reserved.
Definiera ett
libname
Specificera
filformatet
Använd den som
vanligt
DI Studio
Access data in
Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
Transform data inside
Hadoop using HiveQL
Creating new data in
Hadoop
SPDE
Traditionellt filsystem
libname spdlib spde ‘/path’;
proc print data=spdlib.mytab;
run;
Open/read/close
mytab.mdf
SPDE
Open/read/close
mytab.dpf1
Open/read/close
mytab.dpf2
Copyright © 2014, SAS Institute Inc. All rights reserved.
t
k
i
o
e
mytab.mdf
mytab.dpf1
mytab.dpf2
SPDE - Hadoop
HDFS
Namenode
Get data block
locations
libname spdlib spde ‘/path’ hdfshost=default;
proc print data=spdlib.mytab;
run;
Datanode
Get data
Open/read/close
mytab.mdf
SPDE
Open/read/close
mytab.dpf1
Open/read/close
mytab.dpf2
Copyright © 2014, SAS Institute Inc. All rights reserved.
H
D
F
S
M1
Datanode
Get data
C
l
i
e
n
t
D1
D2
Get data
Datanode
D1
D2
Nästa steg - SAS-jobb i Hadoop
HADOOP
SAS
SAS Data Step
SERVER
& DS2
• SAS® Data Loader for Hadoop
• SAS® Code Accelerator for Hadoop
• SAS® Scoring Accelerator for Hadoop
Copyright © 2014, SAS Institute Inc. All rights reserved.
® Data Loader for Hadoop
SAS
SAS®
Data
Director
User Name
What directive do you want to perform?
Saved Directives
Show:
Schedule a Directive to Run
Open a previously created directive
to run, view, or edit.
Copy Data for Visualization
Copy data from Hadoop and load it
into LASR for visualization. Existing
data in the target table will be
replaced.
Pivot a Table in Hadoop
Transpose the columns of a table in
Hadoop.
Schedule a directive to run at
specified dates and times
Copy Data to Hadoop
Copy data from a source and load it
into Hadoop. Existing data in the
target file will be replaced.
Transform Data in Hadoop
Transform the data in an Hadoop
data file.
All Directives
Chain Directives Together
Run a number of directives in a
specific order.
Join Tables in Hadoop
Create a table in Hadoop from
multiple tables.
Verify Mailing Address
Check the validity of the mailing
address data in a table.
1 Click
Profile Data
Create a report profiling the data in a
table.
Generate Business Rules
Analyze data in a table and generate
business rules.
Send Data for Remediation
Select data to send to the
remediation queue for further action.
Från data till beslut
•
•
•
•
SAS/ACCESS
SAS Data Management
SAS Federation Server
SAS Data Loader for Hadoop
MANAGE
DATA
•
Copyright © 2014, SAS Institute Inc. All rights reserved.
MONITOR
DEPLOY &
SAS Scoring Accelerator for
Hadoop
SAS Code Accelerator for
Hadoop
EXPLORE
DATA
•
•
•
SAS Visual Analytics
SAS In-memory
Statistics for Hadoop
•
•
•
SAS HPA Products
SAS Visual Statistics
SAS In-memory Statistics
for Hadoop
SAS Enterprise Miner
TEXT
DEVELOP
MODELS
•
make connections • share ideas • be inspired
mikael.turvall@sas.com
Copyright © 2014, SAS Institute Inc. All rights reserved.