make connections • share ideas • be inspired Använd SAS för att bearbeta och analysera ditt data i Hadoop Mikael Turvall Copyright © 2014, SAS Institute Inc. All rights reserved. Arkitektur ® ® SAS VISUAL ANALYTICS and SAS VISUAL STATISTICS ® SAS IN-MEMORY STATISTICS FOR HADOOP MPP DATASTORE BLADE ENVIRONMENT WEB-BASED CLIENT SAS VA/VS METADATA IN-MEMORY STORE SERVER (Optional) SAS® LASR™ ANALYTIC SERVER SAS Embedded Process Hadoop Hadoop Teradata Pivotal Oracle ® SAS Studio ® MID-TIER RDBMS Copyright © 2014, SAS Institute Inc. All rights reserved. Cloudera Hortonworks WORKSPACE SERVER Nonrelational Click Stream PC Files Other Varför ? Hadoop som en platform för data Hadoop som kärnan i nästa generations analysplatform IDENTIFY / FORMULATE PROBLEM EVALUATE / MONITOR RESULTS DATA PREPARATION DEPLOY MODEL DATA EXPLORATION VALIDATE MODEL TRANSFORM & SELECT BUILD MODEL Copyright © 2014, SAS Institute Inc. All rights reserved. Från data till beslut • • • • SAS/ACCESS SAS Data Management SAS Federation Server SAS Data Loader for Hadoop MANAGE DATA • Copyright © 2014, SAS Institute Inc. All rights reserved. MONITOR DEPLOY & SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop EXPLORE DATA • • • SAS Visual Analytics SAS In-memory Statistics for Hadoop • • • SAS HPA Products SAS Visual Statistics SAS In-memory Statistics for Hadoop SAS Enterprise Miner TEXT DEVELOP MODELS • Kom igång snabbt Möjligheter • Transparent access till Hadoop-tabeller i vanliga SAS-library • Man kan programmera i SAS SQL och SAS datasteg som vanligt • Man kan hantera Hadoop från SAS: • Native HDFS kommandon • MapReduce, Pig, och HiveQL MAN KAN BÖRJA IDAG Copyright © 2014, SAS Institute Inc. All rights reserved. Fördelar • Man behöver inte vara expert på Hadoopspecifik syntax • Byta till Hadoop är lika enkelt som att byta ett libname • Befintliga SAS program, rapproter, etc. kan återanvändas • Många olika sätt att accessa data ger IT olika möjligheter att utnyttja kapaciteten Var får jag tag i Hadoop ? Copyright © 2014, SAS Institute Inc. All rights reserved. SAS/ACCESS to Hadoop HADOOP SAS Hive QL SERVER Flytta delar av jobbet in i Hadoop Copyright © 2014, SAS Institute Inc. All rights reserved. Komma igång med Hadoop libname elefant hadoop PORT=10000 SERVER=sascldserv02 USER=hadoop PASSWORD=“hadoop" ; Copyright © 2014, SAS Institute Inc. All rights reserved. Hadoop Filename Statement FILENAME hdpfile1 hadoop "/user/hadoop/gutenberg/pg20417.txt" Definiera en fileref cfg="C:\Users\hadoop_config.xml" user='hadoop' ; DATA my_analysis_data; INFILE hdpfile1 ; INPUT …; Använd den som vanligt RUN; OBS! Flytta inte över ALL data i till en SAS-tabell Copyright © 2014, SAS Institute Inc. All rights reserved. Hadoop File Reader SAS 9.4 kan läsa “icke-HIVE”-filer som tabeller Filformatformat • Delimited • CSV • XML • JSON (experimental) • Binary files Multipla filer i en katalog Copyright © 2014, SAS Institute Inc. All rights reserved. Hadoop File Reader libname HDP hadoop user=hadoop pw=Hadoop config = '/home/sasinst/hadoop_config.xml‘ hdfs_tempdir = '/user/hadoop/tmp‘ hdfs_metadir = '/user/hadoop/metadata‘ hdfs_permdir = '/user/hadoop/dataload' ; proc hdmd name=hdp.pipedata_dept format=delimited sep = '|‘ DATA_FILE='pipedata_dept.txt' ; COLUMN col1 int; COLUMN col2 char(15); run; proc print data=hdp.pipedata_dept; run; Copyright © 2014, SAS Institute Inc. All rights reserved. Definiera ett libname Specificera filformatet Använd den som vanligt DI Studio Access data in Hadoop Copyright © 2014, SAS Institute Inc. All rights reserved. Transform data inside Hadoop using HiveQL Creating new data in Hadoop SPDE Traditionellt filsystem libname spdlib spde ‘/path’; proc print data=spdlib.mytab; run; Open/read/close mytab.mdf SPDE Open/read/close mytab.dpf1 Open/read/close mytab.dpf2 Copyright © 2014, SAS Institute Inc. All rights reserved. t k i o e mytab.mdf mytab.dpf1 mytab.dpf2 SPDE - Hadoop HDFS Namenode Get data block locations libname spdlib spde ‘/path’ hdfshost=default; proc print data=spdlib.mytab; run; Datanode Get data Open/read/close mytab.mdf SPDE Open/read/close mytab.dpf1 Open/read/close mytab.dpf2 Copyright © 2014, SAS Institute Inc. All rights reserved. H D F S M1 Datanode Get data C l i e n t D1 D2 Get data Datanode D1 D2 Nästa steg - SAS-jobb i Hadoop HADOOP SAS SAS Data Step SERVER & DS2 • SAS® Data Loader for Hadoop • SAS® Code Accelerator for Hadoop • SAS® Scoring Accelerator for Hadoop Copyright © 2014, SAS Institute Inc. All rights reserved. ® Data Loader for Hadoop SAS SAS® Data Director User Name What directive do you want to perform? Saved Directives Show: Schedule a Directive to Run Open a previously created directive to run, view, or edit. Copy Data for Visualization Copy data from Hadoop and load it into LASR for visualization. Existing data in the target table will be replaced. Pivot a Table in Hadoop Transpose the columns of a table in Hadoop. Schedule a directive to run at specified dates and times Copy Data to Hadoop Copy data from a source and load it into Hadoop. Existing data in the target file will be replaced. Transform Data in Hadoop Transform the data in an Hadoop data file. All Directives Chain Directives Together Run a number of directives in a specific order. Join Tables in Hadoop Create a table in Hadoop from multiple tables. Verify Mailing Address Check the validity of the mailing address data in a table. 1 Click Profile Data Create a report profiling the data in a table. Generate Business Rules Analyze data in a table and generate business rules. Send Data for Remediation Select data to send to the remediation queue for further action. Från data till beslut • • • • SAS/ACCESS SAS Data Management SAS Federation Server SAS Data Loader for Hadoop MANAGE DATA • Copyright © 2014, SAS Institute Inc. All rights reserved. MONITOR DEPLOY & SAS Scoring Accelerator for Hadoop SAS Code Accelerator for Hadoop EXPLORE DATA • • • SAS Visual Analytics SAS In-memory Statistics for Hadoop • • • SAS HPA Products SAS Visual Statistics SAS In-memory Statistics for Hadoop SAS Enterprise Miner TEXT DEVELOP MODELS • make connections • share ideas • be inspired mikael.turvall@sas.com Copyright © 2014, SAS Institute Inc. All rights reserved.