slides - Lex Jansen

advertisement
BIG DATA, FAST PROCESSING SPEEDS
SEPTEMBER 10, 2013
Gary T. Ciampa
SAS® Solutions OnDemand Advanced
Analytics Lab
NESUG 2013
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
OVERVIEW AND AGENDA
•
•
Big data introduction
SAS language performance tuning
•
•
•
Case study - SAS Revenue Optimization Solution
•
•
•
SAS system facilities
SQL, MACRO and DATA STEP examples
History and tuning techniques
High Performance Revenue Optimization – GRID environment
SAS emerging big data technologies
2
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
BIG DATA INTRODUCTION
•
Wiki Knows All: … is a collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or
traditional data processing applications
•
Forrester: … software and/or hardware solutions that allow firms to discover,
evaluate, optimize, and deploy predictive models by analyzing big data
sources to improve business performance or mitigate risk.
•
Gartner: … technology is the management of high-volume, high-velocity and
high-variety information assets that demand cost-effective and innovative
forms of information processing for enhanced insight and decision making.
3
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
… the management of high-volume, high-velocity and high-variety assets that demand
cost-effective and innovative forms of processing for enhanced insight and decision
making
4
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
BIG DATA ACCORDING TO SAS
•
Incorporates concepts of IDC dimensions



•
SAS considers additional dimensions


•
Volume – transactions, streaming, sensors, …
Variety – database, warehouse, text, email, metered, OLAP, stocks, etc…
Velocity – how fast the data is produced; and processed (near real-time)
Variability - in velocity and variety of the data (peaks and valleys, seasonal)
Complexity - handling disparate sources to cleanse, transform, correlate and
establish relationships and hierarchies
SAS Big Data Starting Point: http://www.sas.com/big-data
5
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
APPROACHES TO PROCESSING BIG DATA
•
Bigger, Faster, More Powerful is Better




•
Parallel Processing


•
Increase CPU processor speed and count
Increase MEMORY capability or speed
Faster Networks and Network Devices
High-speed disk arrays, or, direct memory disk arrays
Multi-threading capabilities, distributed processing within or across nodes
Segmented data along with distributed processing
Viable, but not always feasible within constraints (time, resource and
dollars)
6
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS SYSTEM FACILITIES
•
SAS command line options, AUTOEXEC and CONFIG processing





Customizes the SAS execution environment
Settings can affect performance significantly
Settings may have unexpected or unintended consequences
Set on command line, configuration or within the program
SAS Companion for <OS> (Windows, UNIX, z/OS)
Bonus Options
•
•
VERBOSE option – emits options and configuration details
RTRACE option – emits list of resources that are read, loaded
7
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS SYSTEM AND HOST OPTIONS
•
System Options, SAS Files


•
System Administration Memory

•
CPUCOUNT, THREADS
System Options for Macros

•
MEMSIZE, SORTSIZE, SUMSIZE
System Administration, Performance

•
BUFNO, BUFSIZE, OBS,
IBUFNO, IBUFSIZE (index processing)
MLOGIC, MPRINT, SYMBOLGEN (everyone has their favorites)
NOTE: Use the *correct* SAS Companion for the target OS
8
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS SYSTEM FACILITIES
•
SAS option STIMER or FULLSTIMER


•
•
•
System performance statistics, CPU, memory, real and elapsed time
Subtle differences depending on the OS
SAS option MSGLEVEL – level of detail for messages to SAS log
SAS option OBS – last observation or record to process
ARM and PERF macro facility




Default or custom performance metrics at programmers discretion
PROC or DATA STEP statistics
User controlled START and STOP semantics across segments of SAS code
Discrete log and format to include macros to process and report on metrics
9
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAMPLE OPTIONS STATEMENTS & LOG
options obs=max fullstimer;
data work.sort500k;
set sgf2013.sort_500000;
run;
NOTE: DATA statement used (Total
process time):
real time
1.66 seconds
user cpu time
0.12 seconds
system cpu time
0.34 seconds
memory
356.15k
OS Memory
10424.00k
Timestamp
04/25/2013 03:16:21 PM
options obs=10;
data work.sort500k;
set sgf2013.sort_500000;
run;
…
NOTE: DATA statement used (Total
process time):
real time
0.03 seconds
user cpu time
0.00 seconds
system cpu time
0.03 seconds
…
10
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAMPLE ARM / PERF MACRO EXECUTION
%let _armexec=1;
%perfinit(applname="Glm_Appl_1");
%perfstrt(txnname="Glm_Txn1");
…. Do some work….
%perfstop;
%perfstrt(txnname="Glm_Txn2");
ods exclude all;
proc GLM data=one; model y = x1; by by; quit;
ods select all;
%perfstop;
11
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAMPLE ARM / PERF MACRO EXECUTION
…lines deleted…
G,1682537590.504000,2,2,Glm_Txn1, CPU
,IO_CNT ,MEMORY INFO ,THREAD
S,1682537590.426000,2,1,1,1.060806,1.341608,327491731,7266304,7532544,6,6
P,1682537590.504000,2,1,1,1.123207,1.357208,0,335645285,7266304,7532544,6,6
…lines deleted…
G,1682537590.504000,2,2,Glm_Txn2, CPU
,IO_CNT ,MEMORY INFO ,THREAD
S,1682537590.504000,2,2,2,1.123207,1.357208,335674088,7266304,7532544,6,6
P,1682537591.845000,2,2,2,1.653610,1.575610,0,340575257,11984896,11984896,6,6
SAS 9.3 Interface to Application Response Measurement (http://support.sas.com)
12
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
OVERVIEW ENVIRONMENT AND INTRODUCTION
•
Sample Environment
•
RHEL Linux 5.6, Intel Xenon 2.67 GHz, 32 Cores, 256 MB; SAS 9.3,
• Oracle Table, 44 columns, 10 million records
•
SAS Language Reference (cost, benefit and considerations)
•
Understanding SAS Indexes
• Understanding Integrity Constraints
•
Use EXISTS (0:04.6) rather than IN (0:05.2).
•
For example,
select * from table_a a
where exists (select * from orders o
where a.prod_id=o.prod_id);
13
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
INDEXES USING INDEXES FOR PERFORMANCE OPTIMIZATION
•
INDEX Considerations (TANSTAAFL)
•
Data file size, small tables would be suitable for sequential processing
• Change rate of the data and use key variables, NAME versus GENDER
• Generally used where sub-setting the data, 25% or less is typical
• Sort by key variables, ordered data improves index behavior
•
Some operators, conditions are not optimized with an INDEX
•
Arithmetic, variable-to-variable, sounds-like operator
• CONTAINS, IS NULL or IS MISSING, TRIM, SUBSTR*
• where amount !=0;
0:28.0
Minutes:Seconds.Tenths
• where amount > 0;
0:26.0
Minutes:Seconds.Tenths
14
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
PROC SQL OPTIMIZING PROC SQL
•
HAVING versus WHERE
•
HAVING operates on all rows returned, not a subset
• Use HAVING on summary operations, after a restricted WHERE step
• Order statements, filter or select rows before grouping
• select state
from order
group by state
having state =’nc’;
•
•
01:50
select state
from order
where state =’nc’;
group by state;
•
01:31
15
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
PROC SQL OPTIMIZING PROC SQL
•
Nested (sub-)queries
•
Minimize nested queries with a small number of tables
• SUBQUERY versus JOIN
• select ename
from employees emp
where exists (select price from prices
where prod_id = emp.prod_id and prices.class=’j’);
•
•
>05:00 minutes (terminated with prejudice)
select ename,
from prices pr, employees emp
where pr.prod_id=emp.prod_id and pr.class=’j’;
•
01:40 seconds
16
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
PROC SQL OPTIMIZING PROC SQL
•
TABLE order
•
Order of tables within the SQL statement impacts performance
• List the tables with the greatest number of rows left to right in the query
• SQL processing scans the last table listed, and merges all of the rows
•
Assuming TAB1 has 20,000 rows, TAB2 has 10 rows
•
select count (*) from tab2, tab1
•
•
0.61
select count (*) from tab1, tab2
•
0.52
17
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
PROC SQL OPTIMIZING PROC SQL
•
EXISTS versus DISTINCT for table join
•
select distinct date,name
from sales s, employee emp
where s.prod_id=emp.prod_id;
•
•
select date, name
from sales s
where exists(select ’x’ from
employee emp
where emp.prod_id = s.prod_id);
•
•
> 7 minutes
0:11 seconds (including post distinct step)
SAS 9.3 SQL Procedure User's Guide
18
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS MACRO OPTIONS AND CONSIDERATIONS
•
•
•
Use MLOGIC, MPRINT & SYMBOLGEN – development phase
Do NOT use MLOGIC, MPRINT & SYMBOLGEN – production
Stored Compiled Macro Facility
•
Permanent SAS catalog
• Protect intellectual property
• Both AUTOCALL and SESSION macros are available
• Override compiled macros with session instances or AUTOCALL semantics
•
Minimize nesting macro definitions
19
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS MACRO NESTING MACRO INSTANCE
•
Avoid nesting macros where possible
•
%macro m1;
%macro m2;
%mend m2;
%mend m1;
•
•
/* nested macro */
02.81
%macro m1;
<macro 1 code goes here>
%mend m1;
%macro m2;
<macro 2 code goes here>
%mend m2;
•
02.45
20
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS DATA STEP A FEW EXAMPLES TO CONSIDER
•
Missing values may perturb performance
“.” is propagated across all calculations
• total=t4+(x*b)+c*(abc);
•
01:03 (63 seconds)
•
•
total=(x*b)+c*(abc) + t4;
00:59
•
•
Superior practice, check for “.” before expression
•
if <operand> ne . then do <expression>; end;
21
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS DATA STEP A FEW EXAMPLES TO CONSIDER
•
PROC FORMAT: User defined formats associated with variables
•
Details in the Base SAS 9.3 Procedures Guide
• Reference the format throughout the code, simplifies logic and support
• if educ = 0 then neweduc="< 3 yrs old";
else if educ=1 then neweduc="no school";
else if educ=2 then neweduc="nursery school";
•
•
10:54
proc format; value educf
0="< 3 yrs old“
1="no school“
… neweduc=put(educ,educf); …
•
2="nursery school";
10:32
22
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS DATA STEP A FEW EXAMPLES TO CONSIDER
•
Using the IN operator, versus OR conditions
• OR function checks all the conditions
• IN function matches first occurrence
•
if x=8 or x=9 or x=23 or x=45 then do; end;
•
•
01:04
if x in (8,9,23,45) then do; end;
•
00:58
23
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS USER FEEDBACK: “IN” VERSUS “OR” VALIDATION
•
Thanks to Bruce Gilsen at Federal Reserve for independent validation
• Bruce’s Optimization Validation
•
1,000,000 OBS, 100 VARIABLES with RANGE VALUES 1 to 100
• Independent DATA STEP, using IN versus OR
•
IN 8.15 /
• OR 21.75 /
7.88 Seconds (REAL / CPU)
21.73 Seconds (REAL / CPU)
data two;
set one;
array vall (*) v1-v100;
drop i;
do i = 1 to 100;
if vall(i) in (1 2 3 4 5 6 7 8 9 10 … 99)
then vall(i) = vall(i) + 100; end; run;
data two;
set one;
array vall (*) v1-v100;
drop i;
do i = 1 to 100;
if vall(i)= 1 or vall(i) = 2 or vall(i) = 3 or vall(i) = 4
… vall(i) = 99 then vall(i) = vall(i) + 1000; end; run;
24
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
CASE STUDY - SAS REVENUE OPTIMIZATION SOLUTION
•
Big Data Introduction
• SAS Language Performance Tuning
•
SAS System Facilities
• SQL, MACRO and DATA STEP examples
•
Case Study - SAS Revenue Optimization Solution
•
History and Tuning Techniques
• High Performance Revenue Optimization – GRID Environment
•
SAS Emerging Big Data Technologies
25
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SOLUTIONS ONDEMAND ADVANCED ANALYTICS LAB
•
•
Over a petabyte of data, 400+ customers
Customer Profiles

Variety of industry sectors, private as well as public
 Multi-tier deployments, client, mid-tier, analytic tier and RDBMS
 Daily and Weekly ETL feed requirements
•
•
•
•
PROD, QA, DEV environments and data synchronization
Disparate analytic processing (batch) schedules
Backup and restore processing that minimizes performance impacts
99.5% up time service level agreements
26
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
CASE STUDY
SAS REVENUE OPTIMIZATION SOLUTION
•
Problem Statement: 33 hours of processing time for one batch component
using 30% of projected data. Linear projection approximately 110 hours or 4
½ days processing time.
• Requirement to fit batch into a 40 hour window
• AIX 6.1+, Power7, 64 Bit attached to EMC SAN Arrays
•
•
•
7 CPUS, SMT=4, 128GB RAM, 3700 IOPS, CPU 45%
Approximately 1.2 TB of DATA, target 1.6 TB primary warehouse
Focus on the most significant issues and then repeat as new issues arise
27
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
28
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
29
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
30
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
• SAS WORK volume
• Eight-way stripe with eight paths
• Warehouse
• Fixed Tier 1 EMC storage; 80 x 100GB disk arrays
• Moved support directories off of volume
31
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
Weekly Performance
• Parallel Executions
• 16 processes
• 54 processes
• IO/SEC
• 8.5K to 15.3K
• CPU Idle Time
• 42% to 13%
• Weekly Batch Time
• 60 hours
• 43 hours
• GEO_PRODS
• 67 Million
• 92 Million
32
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
33
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
34
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
35
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS GRID
SAS REVENUE OPTIMIZATION SOLUTION
•
Initial RO Versions used SAS/Connect parallel processing
• Single host deployments with concurrent analytics
•
•
Flat data warehouse structure, non-partitioned SAS tables
SAS High Performance Revenue Optimization Enhancements
•
SAS TK GRID architecture distributed processing across grid nodes
• SAS data partitions distributed across grid nodes
• ETL processes, daily and weekly to distribute data across partitions
• Grid Captain to manage the processing and analytic across grid nodes
36
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS GRID
SAS REVENUE OPTIMIZATION NON GRID
37
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS GRID
SAS HIGH PERFORMANCE REVENUE OPTIMIZATION
38
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS GRID & EMERGING TECHNOLOGIES
•
SAS Grid Manager: distributed SAS processing

•
SAS In-Data Base: queries, aggregations, analytics within DBMS

•
Scheduling, Workload Balancing, High Availability & Management
9.2M3: DB2, EDW & Oracle; 9.3 Netezza
HADOOP

Scalable, fault tolerant, distributed files system
 SAS integration includes access, analysis and management
•
SAS In Memory Analytics

•
Distributed, descriptive, inferential to visualization analytics
Visual Analytics and Visual Analytics HPA
39
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS TECHNICAL SUPPORT
40
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS BIG-DATA HOME PAGE
41
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SUMMARY CONSIDERATIONS – PERFORMANCE IMPROVEMENT IS
A CONTINUAL PROCESS

Focus on the most severe hotspots within SAS program
and operating environment
 Use INDEX where appropriate
 Exploit SAS OPTIONS tuning
 Consider SAS Grid Products
 Evaluate SAS Visual Analytics
and Visual Analytics HPA
42
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
SAS SOLUTION ON DEMAND
ADVANCED ANALYTICS LAB
GARY.CIAMPA@SAS.COM
C op yr i g h t © 2 0 1 3 , S A S I n s t i t u t e I n c . A l l r i g h t s r es er v e d .
T h i s i n f or m a t i o n i s c on f i d e n t i a l an d c ov er e d u n d er t h e t er m s of an y S A S ag r e e m e n t s as exec u t e d b y c u s t om e r an d S A S I n s t i t u t e I n c .
www.SAS.com
Download