BIG DATA, FAST PROCESSING SPEEDS SEPTEMBER 10, 2013 Gary T. Ciampa SAS® Solutions OnDemand Advanced Analytics Lab NESUG 2013 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . OVERVIEW AND AGENDA • • Big data introduction SAS language performance tuning • • • Case study - SAS Revenue Optimization Solution • • • SAS system facilities SQL, MACRO and DATA STEP examples History and tuning techniques High Performance Revenue Optimization – GRID environment SAS emerging big data technologies 2 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . BIG DATA INTRODUCTION • Wiki Knows All: … is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications • Forrester: … software and/or hardware solutions that allow firms to discover, evaluate, optimize, and deploy predictive models by analyzing big data sources to improve business performance or mitigate risk. • Gartner: … technology is the management of high-volume, high-velocity and high-variety information assets that demand cost-effective and innovative forms of information processing for enhanced insight and decision making. 3 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . … the management of high-volume, high-velocity and high-variety assets that demand cost-effective and innovative forms of processing for enhanced insight and decision making 4 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . BIG DATA ACCORDING TO SAS • Incorporates concepts of IDC dimensions • SAS considers additional dimensions • Volume – transactions, streaming, sensors, … Variety – database, warehouse, text, email, metered, OLAP, stocks, etc… Velocity – how fast the data is produced; and processed (near real-time) Variability - in velocity and variety of the data (peaks and valleys, seasonal) Complexity - handling disparate sources to cleanse, transform, correlate and establish relationships and hierarchies SAS Big Data Starting Point: http://www.sas.com/big-data 5 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . APPROACHES TO PROCESSING BIG DATA • Bigger, Faster, More Powerful is Better • Parallel Processing • Increase CPU processor speed and count Increase MEMORY capability or speed Faster Networks and Network Devices High-speed disk arrays, or, direct memory disk arrays Multi-threading capabilities, distributed processing within or across nodes Segmented data along with distributed processing Viable, but not always feasible within constraints (time, resource and dollars) 6 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS SYSTEM FACILITIES • SAS command line options, AUTOEXEC and CONFIG processing Customizes the SAS execution environment Settings can affect performance significantly Settings may have unexpected or unintended consequences Set on command line, configuration or within the program SAS Companion for <OS> (Windows, UNIX, z/OS) Bonus Options • • VERBOSE option – emits options and configuration details RTRACE option – emits list of resources that are read, loaded 7 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS SYSTEM AND HOST OPTIONS • System Options, SAS Files • System Administration Memory • CPUCOUNT, THREADS System Options for Macros • MEMSIZE, SORTSIZE, SUMSIZE System Administration, Performance • BUFNO, BUFSIZE, OBS, IBUFNO, IBUFSIZE (index processing) MLOGIC, MPRINT, SYMBOLGEN (everyone has their favorites) NOTE: Use the *correct* SAS Companion for the target OS 8 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS SYSTEM FACILITIES • SAS option STIMER or FULLSTIMER • • • System performance statistics, CPU, memory, real and elapsed time Subtle differences depending on the OS SAS option MSGLEVEL – level of detail for messages to SAS log SAS option OBS – last observation or record to process ARM and PERF macro facility Default or custom performance metrics at programmers discretion PROC or DATA STEP statistics User controlled START and STOP semantics across segments of SAS code Discrete log and format to include macros to process and report on metrics 9 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAMPLE OPTIONS STATEMENTS & LOG options obs=max fullstimer; data work.sort500k; set sgf2013.sort_500000; run; NOTE: DATA statement used (Total process time): real time 1.66 seconds user cpu time 0.12 seconds system cpu time 0.34 seconds memory 356.15k OS Memory 10424.00k Timestamp 04/25/2013 03:16:21 PM options obs=10; data work.sort500k; set sgf2013.sort_500000; run; … NOTE: DATA statement used (Total process time): real time 0.03 seconds user cpu time 0.00 seconds system cpu time 0.03 seconds … 10 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAMPLE ARM / PERF MACRO EXECUTION %let _armexec=1; %perfinit(applname="Glm_Appl_1"); %perfstrt(txnname="Glm_Txn1"); …. Do some work…. %perfstop; %perfstrt(txnname="Glm_Txn2"); ods exclude all; proc GLM data=one; model y = x1; by by; quit; ods select all; %perfstop; 11 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAMPLE ARM / PERF MACRO EXECUTION …lines deleted… G,1682537590.504000,2,2,Glm_Txn1, CPU ,IO_CNT ,MEMORY INFO ,THREAD S,1682537590.426000,2,1,1,1.060806,1.341608,327491731,7266304,7532544,6,6 P,1682537590.504000,2,1,1,1.123207,1.357208,0,335645285,7266304,7532544,6,6 …lines deleted… G,1682537590.504000,2,2,Glm_Txn2, CPU ,IO_CNT ,MEMORY INFO ,THREAD S,1682537590.504000,2,2,2,1.123207,1.357208,335674088,7266304,7532544,6,6 P,1682537591.845000,2,2,2,1.653610,1.575610,0,340575257,11984896,11984896,6,6 SAS 9.3 Interface to Application Response Measurement (http://support.sas.com) 12 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . OVERVIEW ENVIRONMENT AND INTRODUCTION • Sample Environment • RHEL Linux 5.6, Intel Xenon 2.67 GHz, 32 Cores, 256 MB; SAS 9.3, • Oracle Table, 44 columns, 10 million records • SAS Language Reference (cost, benefit and considerations) • Understanding SAS Indexes • Understanding Integrity Constraints • Use EXISTS (0:04.6) rather than IN (0:05.2). • For example, select * from table_a a where exists (select * from orders o where a.prod_id=o.prod_id); 13 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . INDEXES USING INDEXES FOR PERFORMANCE OPTIMIZATION • INDEX Considerations (TANSTAAFL) • Data file size, small tables would be suitable for sequential processing • Change rate of the data and use key variables, NAME versus GENDER • Generally used where sub-setting the data, 25% or less is typical • Sort by key variables, ordered data improves index behavior • Some operators, conditions are not optimized with an INDEX • Arithmetic, variable-to-variable, sounds-like operator • CONTAINS, IS NULL or IS MISSING, TRIM, SUBSTR* • where amount !=0; 0:28.0 Minutes:Seconds.Tenths • where amount > 0; 0:26.0 Minutes:Seconds.Tenths 14 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . PROC SQL OPTIMIZING PROC SQL • HAVING versus WHERE • HAVING operates on all rows returned, not a subset • Use HAVING on summary operations, after a restricted WHERE step • Order statements, filter or select rows before grouping • select state from order group by state having state =’nc’; • • 01:50 select state from order where state =’nc’; group by state; • 01:31 15 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . PROC SQL OPTIMIZING PROC SQL • Nested (sub-)queries • Minimize nested queries with a small number of tables • SUBQUERY versus JOIN • select ename from employees emp where exists (select price from prices where prod_id = emp.prod_id and prices.class=’j’); • • >05:00 minutes (terminated with prejudice) select ename, from prices pr, employees emp where pr.prod_id=emp.prod_id and pr.class=’j’; • 01:40 seconds 16 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . PROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest number of rows left to right in the query • SQL processing scans the last table listed, and merges all of the rows • Assuming TAB1 has 20,000 rows, TAB2 has 10 rows • select count (*) from tab2, tab1 • • 0.61 select count (*) from tab1, tab2 • 0.52 17 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . PROC SQL OPTIMIZING PROC SQL • EXISTS versus DISTINCT for table join • select distinct date,name from sales s, employee emp where s.prod_id=emp.prod_id; • • select date, name from sales s where exists(select ’x’ from employee emp where emp.prod_id = s.prod_id); • • > 7 minutes 0:11 seconds (including post distinct step) SAS 9.3 SQL Procedure User's Guide 18 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS MACRO OPTIONS AND CONSIDERATIONS • • • Use MLOGIC, MPRINT & SYMBOLGEN – development phase Do NOT use MLOGIC, MPRINT & SYMBOLGEN – production Stored Compiled Macro Facility • Permanent SAS catalog • Protect intellectual property • Both AUTOCALL and SESSION macros are available • Override compiled macros with session instances or AUTOCALL semantics • Minimize nesting macro definitions 19 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS MACRO NESTING MACRO INSTANCE • Avoid nesting macros where possible • %macro m1; %macro m2; %mend m2; %mend m1; • • /* nested macro */ 02.81 %macro m1; <macro 1 code goes here> %mend m1; %macro m2; <macro 2 code goes here> %mend m2; • 02.45 20 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS DATA STEP A FEW EXAMPLES TO CONSIDER • Missing values may perturb performance “.” is propagated across all calculations • total=t4+(x*b)+c*(abc); • 01:03 (63 seconds) • • total=(x*b)+c*(abc) + t4; 00:59 • • Superior practice, check for “.” before expression • if <operand> ne . then do <expression>; end; 21 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS DATA STEP A FEW EXAMPLES TO CONSIDER • PROC FORMAT: User defined formats associated with variables • Details in the Base SAS 9.3 Procedures Guide • Reference the format throughout the code, simplifies logic and support • if educ = 0 then neweduc="< 3 yrs old"; else if educ=1 then neweduc="no school"; else if educ=2 then neweduc="nursery school"; • • 10:54 proc format; value educf 0="< 3 yrs old“ 1="no school“ … neweduc=put(educ,educf); … • 2="nursery school"; 10:32 22 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS DATA STEP A FEW EXAMPLES TO CONSIDER • Using the IN operator, versus OR conditions • OR function checks all the conditions • IN function matches first occurrence • if x=8 or x=9 or x=23 or x=45 then do; end; • • 01:04 if x in (8,9,23,45) then do; end; • 00:58 23 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS USER FEEDBACK: “IN” VERSUS “OR” VALIDATION • Thanks to Bruce Gilsen at Federal Reserve for independent validation • Bruce’s Optimization Validation • 1,000,000 OBS, 100 VARIABLES with RANGE VALUES 1 to 100 • Independent DATA STEP, using IN versus OR • IN 8.15 / • OR 21.75 / 7.88 Seconds (REAL / CPU) 21.73 Seconds (REAL / CPU) data two; set one; array vall (*) v1-v100; drop i; do i = 1 to 100; if vall(i) in (1 2 3 4 5 6 7 8 9 10 … 99) then vall(i) = vall(i) + 100; end; run; data two; set one; array vall (*) v1-v100; drop i; do i = 1 to 100; if vall(i)= 1 or vall(i) = 2 or vall(i) = 3 or vall(i) = 4 … vall(i) = 99 then vall(i) = vall(i) + 1000; end; run; 24 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . CASE STUDY - SAS REVENUE OPTIMIZATION SOLUTION • Big Data Introduction • SAS Language Performance Tuning • SAS System Facilities • SQL, MACRO and DATA STEP examples • Case Study - SAS Revenue Optimization Solution • History and Tuning Techniques • High Performance Revenue Optimization – GRID Environment • SAS Emerging Big Data Technologies 25 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SOLUTIONS ONDEMAND ADVANCED ANALYTICS LAB • • Over a petabyte of data, 400+ customers Customer Profiles Variety of industry sectors, private as well as public Multi-tier deployments, client, mid-tier, analytic tier and RDBMS Daily and Weekly ETL feed requirements • • • • PROD, QA, DEV environments and data synchronization Disparate analytic processing (batch) schedules Backup and restore processing that minimizes performance impacts 99.5% up time service level agreements 26 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . CASE STUDY SAS REVENUE OPTIMIZATION SOLUTION • Problem Statement: 33 hours of processing time for one batch component using 30% of projected data. Linear projection approximately 110 hours or 4 ½ days processing time. • Requirement to fit batch into a 40 hour window • AIX 6.1+, Power7, 64 Bit attached to EMC SAN Arrays • • • 7 CPUS, SMT=4, 128GB RAM, 3700 IOPS, CPU 45% Approximately 1.2 TB of DATA, target 1.6 TB primary warehouse Focus on the most significant issues and then repeat as new issues arise 27 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . 28 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . 29 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . 30 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . • SAS WORK volume • Eight-way stripe with eight paths • Warehouse • Fixed Tier 1 EMC storage; 80 x 100GB disk arrays • Moved support directories off of volume 31 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . Weekly Performance • Parallel Executions • 16 processes • 54 processes • IO/SEC • 8.5K to 15.3K • CPU Idle Time • 42% to 13% • Weekly Batch Time • 60 hours • 43 hours • GEO_PRODS • 67 Million • 92 Million 32 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . 33 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . 34 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . 35 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS GRID SAS REVENUE OPTIMIZATION SOLUTION • Initial RO Versions used SAS/Connect parallel processing • Single host deployments with concurrent analytics • • Flat data warehouse structure, non-partitioned SAS tables SAS High Performance Revenue Optimization Enhancements • SAS TK GRID architecture distributed processing across grid nodes • SAS data partitions distributed across grid nodes • ETL processes, daily and weekly to distribute data across partitions • Grid Captain to manage the processing and analytic across grid nodes 36 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS GRID SAS REVENUE OPTIMIZATION NON GRID 37 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS GRID SAS HIGH PERFORMANCE REVENUE OPTIMIZATION 38 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS GRID & EMERGING TECHNOLOGIES • SAS Grid Manager: distributed SAS processing • SAS In-Data Base: queries, aggregations, analytics within DBMS • Scheduling, Workload Balancing, High Availability & Management 9.2M3: DB2, EDW & Oracle; 9.3 Netezza HADOOP Scalable, fault tolerant, distributed files system SAS integration includes access, analysis and management • SAS In Memory Analytics • Distributed, descriptive, inferential to visualization analytics Visual Analytics and Visual Analytics HPA 39 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS TECHNICAL SUPPORT 40 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS BIG-DATA HOME PAGE 41 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SUMMARY CONSIDERATIONS – PERFORMANCE IMPROVEMENT IS A CONTINUAL PROCESS Focus on the most severe hotspots within SAS program and operating environment Use INDEX where appropriate Exploit SAS OPTIONS tuning Consider SAS Grid Products Evaluate SAS Visual Analytics and Visual Analytics HPA 42 C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . SAS SOLUTION ON DEMAND ADVANCED ANALYTICS LAB GARY.CIAMPA@SAS.COM C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d . T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c . www.SAS.com