Forthcoming Changes in SAS
Paul Kent
VP SAS Platform Research & Development
<kent@sas.com>
Copyright © 2004, SAS Institute Inc. All rights reserved.
Where do I come from?
New Hill, North Carolina
Y’all
Johannesburg, South
Africa
Julle
Fareham, England
???
Copyright © 2004, SAS Institute Inc. All rights reserved.
R & D :: Loyal Employees
Copyright © 2004, SAS Institute Inc. All rights reserved.
R & D groups, and where I come from
 Platform
 Clients
 Solutions
• With Analytics
Copyright © 2004, SAS Institute Inc. All rights reserved.
R & D groups, and where I come from
 Platform
 Clients
 Solutions
• With Analytics
Copyright © 2004, SAS Institute Inc. All rights reserved.
What do we programmers do?






Copyright © 2004, SAS Institute Inc. All rights reserved.
Gather Data
Organise Data
Arrange Data for consumption
Facilitate said consumption
Create understanding of Data
Promote understanding of said Data
Valu
e
Who do we programmers do it for?
Audience Continuum
Large%
Information Consumers
Small%
Domain Experts
Power
User
Business
Analyst
Info
Tech
Web Report Viewing
Web Reporting
Power Reporting
Analytic
Reporting
Value
Information Delivery Framework
Copyright © 2004, SAS Institute Inc. All rights reserved.
Forthcoming Improvements in the SAS
Foundation





ODS (and the new ODS statistical graphics)
SAS Database Storage capabilities
The Data Step and Proc SQL
Grid Computing Capabilities
Bits and Pieces
Copyright © 2004, SAS Institute Inc. All rights reserved.
ODS Statistical Graphics
Copyright © 2004, SAS Institute Inc. All rights reserved.
Survival Plot Using PROC LIFETEST in SAS 8
 J. Zhou, NESUG 2002
 Three-page SAS
program with macros
 Use GPLOT and
GREPLAY for graphics
Statistical Metadata
Overlaid Curves
Copyright © 2004, SAS Institute Inc. All rights reserved.
Statistical Graphics
 Essential for modern data analysis
 Difficult to create in SAS prior to SAS 9
• Context lost when statistical procedure terminates
• Programmer must recreate context, metadata
 Statistical procedures should automatically
create graphics
 Follow the 80-20 rule – 20% of these might need
further tweaking, but for the most part…
Copyright © 2004, SAS Institute Inc. All rights reserved.
Life Is Easier in SAS 9 …
ods graphics on;
ods html file="lifetest.htm";
proc lifetest data=surv;
time surv*censor(1);
survival plots=(survival hwb);
strata trt;
id patient;
run;
ods html close;
ods graphics off;
Copyright © 2004, SAS Institute Inc. All rights reserved.
LIFETEST Procedure – Survival Plot
Copyright © 2004, SAS Institute Inc. All rights reserved.
LIFETEST Procedure – HWB plot
Copyright © 2004, SAS Institute Inc. All rights reserved.
Usage of ODS Statistical Graphics in SAS 9
 Experimental in 30 SAS/STAT and SAS/ETS
procedures - SAS 9.1
 Automates creation of commonly used graphical
displays for a particular analysis
 Production in SAS 9.2
Copyright © 2004, SAS Institute Inc. All rights reserved.
PROC GLM
Copyright © 2004, SAS Institute Inc. All rights reserved.
PROC GLM (ANCOVA)
Copyright © 2004, SAS Institute Inc. All rights reserved.
GAM Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
HPF Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
KDE Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
KDE Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
LOGISTIC Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
MIXED Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
MIXED Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
PHREG Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
PLS Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
PRINCOMP Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
REG Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
TIMESERIES Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
UCM Procedure
Copyright © 2004, SAS Institute Inc. All rights reserved.
Integration with ODS Styles
 Over 30 different styles
 New style elements for statistical graphics
• Fitted line
• Confidence lines and bands
• Prediction Lines
• Outliers
• Classification groups
Copyright © 2004, SAS Institute Inc. All rights reserved.
Style Demonstration
ods html file=“robustreg.htm” style=journal;
ods graphics on;
title “Journal Style”;
proc robustreg data=mydata plot=all;
model y = x1 x2 x3;
run;
ods html close;
Journal
Analysis
Default
Statistical
(only Summary Statistics and Residual Histogram output shown)
Copyright © 2004, SAS Institute Inc. All rights reserved.
Summary
 Goal is to automate creation of graphics by
statistical procedures
• Minimum work for user
• Maximum built-in functionality
 Experimental in SAS 9.1
 Production in SAS 9.2
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Transactional Storage
(aka SAS Database Capabilities)
 Demo Time
 1. Color_table
• Remember to start your TableServer
 2. Customers
• Remember to start your AppServer (tomcat5)
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Transactional Storage
(aka SAS Database Capabilities)
 A more traditional Database Capability
 From SAS. (not oracle, ibm, or microsoft)
 Based on OpenSource “Firebird”




Real Datatypes – INT, MONEY, VARCHAR
Real Connectors – JDBC, ODBC, SAS Libname
Real Transactions – Rollback and Commit
MultiUser Server
Copyright © 2004, SAS Institute Inc. All rights reserved.
What’s New in SAS Grid
Automation
Cheryl Doninger
R&D Director, Grid Development
Roger Thompson
Relationship Manager
Merry Rabb
Product Manager, Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Computing Market Size & Growth
Rapid Adoption of Grid Computing Based on Benefits
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Adoption is Increasing
2/3 of firms surveyed
are using or
considering grid
technology
A high percentage of
firms using analytical
applications are
considering grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Benefits of Grid Computing
 Faster results
 More executions – more data
 Time to recover from errors
 Better use of resources
 Virtualize resources
 Incremental IT spend
Copyright © 2004, SAS Institute Inc. All rights reserved.
Types of Applications Suitable for Grid
 Long running
 Many replicate runs of same fundamental
task
•
•
•
•
simulation (what if analysis)
optimization (testing lots of scenarios)
BY GROUP processing
data segmentation
 Independent tasks running against large
data sources
• scoring – risk analysis
• multiple procedures and data steps
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Strategy
 Infrastructure benefits SAS applications
• large data / complex algorithms
 Focus areas
• Development
• Run-time
• System management
 Incremental Releases
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Roadmap
Phase I
 SAS 8.2 functionality
• %Distribute
• SAS/CONNECT
• SAS log
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Success Stories
Texas Tech University
Statistics Canada
Large Pharmaceutical Company
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid Roadmap
Phase II
 SAS 9.1.3 Q3/2005 functionality
• smarter engines for SAS IDEs
• SAS/Platform integration
• SASMC monitoring
Copyright © 2004, SAS Institute Inc. All rights reserved.
Business Analytics - Enterprise Miner on SMP
Copyright © 2004, SAS Institute Inc. All rights reserved.
Business Analytics - Enterprise Miner on Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Data Integration – ETL Studio on SMP/Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Data Integration – ETL Studio on SMP/Grid
Copyright © 2004, SAS Institute Inc. All rights reserved.
Business Intelligence – Enabled on SMP/Grid
Web
Services
SAS Stored Process
SAS Program
ETL Studio
Enterprise Miner
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Manager Plugin – job view
Copyright © 2004, SAS Institute Inc. All rights reserved.
Grid Manager Plugin – host view
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS 9 Grid Computing Components
SAS 9 Grid Computing
NEW September 2005
Grid Manager
Plug-in

Grid Monitoring

Grid Management

Job Termination
Platform Suite
for SAS


Dynamic Load
Balancing
Job, Queue & Host
Management
Multi-Processor
SAS
SAS Connect

Piping

Distribution

Session Spawning
SAS Applications
 Enterprise Miner
 Stored Processes
 Data Integration

Grid Enabled
Code Generation
Multiple Components Working Together to Provide Grid Computing
Copyright © 2004, SAS Institute Inc. All rights reserved.
General Layout of a SAS Grid
Grid Node
LSF
SAS ETL
SAS EM
SAS Foundation
Grid Node
LSF
Client Machine
Copyright © 2004, SAS Institute Inc. All rights reserved.
SAS Grid
Grid Node
LSF
…
Metadata Server
Machine
Grid Mgr plugin
Grid Control
Machine
Platform
Suite for SAS
n
Grid Work Flow
Metadata Server
session resource
sascmd
wl options
------------------------------------------------p1
SASMain sas –noobjectserver
Workspace Server
Connect Client
Node1
LSF
LSF Cluster File
SAS
Servers
grdsvc_enable(p1, “resource=SASMain”);
Node1 ! ! 1 () (SASMain)
Node2 ! ! 1 () ()
Node3 ! ! 1 () (SASMain)
…
Node2
signon p1;
SAS Metadata
Node3
SASMain – Server Context
Platform Server Component
sas -noobjectserver
…
ETL Studio
Enterprise Miner
n
SAS MC
Copyright © 2004, SAS Institute Inc. All rights reserved.
Partitioning the Grid
session resource
sascmd
wl options
-------------------------------------------------------------------------p1
SASMain sas –noobjectserver ETL
Metadata Server
Workspace Server
Node1
Connect Client
LSF
EM grid
LSF Cluster File
Node1 ! ! 1 () (SASMain,EM)
Node2 ! ! 1 () (SASMain,EM,ETL)
Node2
Node3 ! ! 1 () (SASMain, ETL)
SAS
Servers
…
grdsvc_enable(p1, “resource=SASMain,
workload=ETL”);
signon p1;
Node3
SAS Metadata
SASMain – Server Context
Platform Server Component
sas –noobjectserver
…
EM, ETL
ETL Studio
Enterprise Miner
Copyright © 2004, SAS Institute Inc. All rights reserved.
ETL grid
SAS MC
n
Grid Provides: Speed and Efficiency
Copyright © 2004, SAS Institute Inc. All rights reserved.
Analytics are working, so people…
 Build more models
• For successively refined segments of customers
 Use more data in those models
 Integrate the results into operational systems
• <near real time>
 A SAS9.2 datastep movie
Copyright © 2004, SAS Institute Inc. All rights reserved.
Implications
 More Multi thread enablement within SAS
 Yes, even the DATA STEP
 Saved Programs
 Multi Threaded Server Capabilities
• Same model, parallel data for thruput
• Many models, same data – one off scores in
operational systems
 Models Management can deploy models to
“score servers” without restarting them
Copyright © 2004, SAS Institute Inc. All rights reserved.
Bits and Pieces





Reverse Engineer SAS jobs
Checkpoint and Restart SAS jobs
Encode (and protect) your SAS jobs
ZIP functions
CRC …
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Protect your IP
 PROC SCRAMBLE
file=‘myfile.sas’
outfile=‘secret.sas’
<expire=> <site=> …
;
 Send secret.sas to your customers
 %include ‘secret.sas’;
• Implies nosource; your macros can reset NOMPRINT…
Copyright © 2004, SAS Institute Inc. All rights reserved.
Checkpoint/Restart and
Parallelization Features
in the Core Supervisor
Rick Langston, Core Systems Department
Copyright © 2004, SAS Institute Inc. All rights reserved.
Checkpoint/Restart
 Craig R.’s request as per user community
 Job fails – want to restart where it left off
 ETL Studio also wanted a restart facility
Copyright © 2004, SAS Institute Inc. All rights reserved.
A simple solution




Record a checkpoint number, save it in WORK
If restarting, skip PROC / DATA steps to there
Tokenize everything
Execute all global statements
Copyright © 2004, SAS Institute Inc. All rights reserved.
To set up for checkpointing
 Use NOWORKINIT, NOWORKTERM
 Have WORK refer to a permanent directory
 Use the CHECKPOINT option
Copyright © 2004, SAS Institute Inc. All rights reserved.
Subsequent restarting




Again use NOWORKINIT, NOWORKTERM
Again use WORK to the permanent directory
Use the RESTART option
Job will restart as of the last successful step
Copyright © 2004, SAS Institute Inc. All rights reserved.
Is this what users want?
 We can’t do this without user being proactive
 data temp / set temp issues
 skipped steps may need to be executed
 Output files (flat files – DISP=MOD,
databases…)
Copyright © 2004, SAS Institute Inc. All rights reserved.
EXECUTE_ALWAYS
 CHECKPOINT / EXECUTE_ALWAYS;
 Use it for a step that must be executed
 For example, SYMPUT and CALL EXECUTE
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Example
 Using options debug=‘checkpoint-implicit’;
 Option names still to be decided
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
data temp1; x=1; run;
data temp2; x=2; run;
data temp3; x=3; run;
data _null_;
if "&sysparm."="1"
then abort abend 999;
run;
data temp4; x=4; run;
Copyright © 2004, SAS Institute Inc. All rights reserved.
 Invoke once with checkpoint-implicit
 Then reinvoke with restart-implicit
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Additional info
 Planned for 9.2
 Option names still being decided
 Wanting additional input
Copyright © 2004, SAS Institute Inc. All rights reserved.
Parallelization Efforts




Reading in arbitrary SAS code
Producing metadata in comments
This could be post-processed by ETL Studio
This could be post-processed by Grid Computing
Copyright © 2004, SAS Institute Inc. All rights reserved.
Parallelization Efforts





Researching so far
Hooks in dependency opens
Catalogs, flat files, SAS data sets, etc.
Emitting info in comments
Example of use
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Copyright © 2004, SAS Institute Inc. All rights reserved.
Exposure to User
 New option, such as DEPMETA=fileref
 SAS program with comments written to this file
Copyright © 2004, SAS Institute Inc. All rights reserved.
Questions/comments?
Copyright © 2004, SAS Institute Inc. All rights reserved.
Ideas for the Future!
 How can the software learn?
 So the user doesn’t have to learn about the
software; they can learn the business!
 Some future ETL studio JOB
• Remembers data volumes from last weeks run
• Uses that memory to choose a better strategy
Copyright © 2004, SAS Institute Inc. All rights reserved.
Your Turn!!
 You tell me next time SAS forgets something it
should have remembered
 And why remembering that would help SAS
improve next time
< Paul.Kent@sas.com >
Thanks for listening!
Copyright © 2004, SAS Institute Inc. All rights reserved.