Forthcoming Changes in SAS Paul Kent VP SAS Platform Research & Development <kent@sas.com> Copyright © 2004, SAS Institute Inc. All rights reserved. Where do I come from? New Hill, North Carolina Y’all Johannesburg, South Africa Julle Fareham, England ??? Copyright © 2004, SAS Institute Inc. All rights reserved. R & D :: Loyal Employees Copyright © 2004, SAS Institute Inc. All rights reserved. R & D groups, and where I come from Platform Clients Solutions • With Analytics Copyright © 2004, SAS Institute Inc. All rights reserved. R & D groups, and where I come from Platform Clients Solutions • With Analytics Copyright © 2004, SAS Institute Inc. All rights reserved. What do we programmers do? Copyright © 2004, SAS Institute Inc. All rights reserved. Gather Data Organise Data Arrange Data for consumption Facilitate said consumption Create understanding of Data Promote understanding of said Data Valu e Who do we programmers do it for? Audience Continuum Large% Information Consumers Small% Domain Experts Power User Business Analyst Info Tech Web Report Viewing Web Reporting Power Reporting Analytic Reporting Value Information Delivery Framework Copyright © 2004, SAS Institute Inc. All rights reserved. Forthcoming Improvements in the SAS Foundation ODS (and the new ODS statistical graphics) SAS Database Storage capabilities The Data Step and Proc SQL Grid Computing Capabilities Bits and Pieces Copyright © 2004, SAS Institute Inc. All rights reserved. ODS Statistical Graphics Copyright © 2004, SAS Institute Inc. All rights reserved. Survival Plot Using PROC LIFETEST in SAS 8 J. Zhou, NESUG 2002 Three-page SAS program with macros Use GPLOT and GREPLAY for graphics Statistical Metadata Overlaid Curves Copyright © 2004, SAS Institute Inc. All rights reserved. Statistical Graphics Essential for modern data analysis Difficult to create in SAS prior to SAS 9 • Context lost when statistical procedure terminates • Programmer must recreate context, metadata Statistical procedures should automatically create graphics Follow the 80-20 rule – 20% of these might need further tweaking, but for the most part… Copyright © 2004, SAS Institute Inc. All rights reserved. Life Is Easier in SAS 9 … ods graphics on; ods html file="lifetest.htm"; proc lifetest data=surv; time surv*censor(1); survival plots=(survival hwb); strata trt; id patient; run; ods html close; ods graphics off; Copyright © 2004, SAS Institute Inc. All rights reserved. LIFETEST Procedure – Survival Plot Copyright © 2004, SAS Institute Inc. All rights reserved. LIFETEST Procedure – HWB plot Copyright © 2004, SAS Institute Inc. All rights reserved. Usage of ODS Statistical Graphics in SAS 9 Experimental in 30 SAS/STAT and SAS/ETS procedures - SAS 9.1 Automates creation of commonly used graphical displays for a particular analysis Production in SAS 9.2 Copyright © 2004, SAS Institute Inc. All rights reserved. PROC GLM Copyright © 2004, SAS Institute Inc. All rights reserved. PROC GLM (ANCOVA) Copyright © 2004, SAS Institute Inc. All rights reserved. GAM Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. HPF Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. KDE Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. KDE Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. LOGISTIC Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. MIXED Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. MIXED Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. PHREG Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. PLS Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. PRINCOMP Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. REG Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. TIMESERIES Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. UCM Procedure Copyright © 2004, SAS Institute Inc. All rights reserved. Integration with ODS Styles Over 30 different styles New style elements for statistical graphics • Fitted line • Confidence lines and bands • Prediction Lines • Outliers • Classification groups Copyright © 2004, SAS Institute Inc. All rights reserved. Style Demonstration ods html file=“robustreg.htm” style=journal; ods graphics on; title “Journal Style”; proc robustreg data=mydata plot=all; model y = x1 x2 x3; run; ods html close; Journal Analysis Default Statistical (only Summary Statistics and Residual Histogram output shown) Copyright © 2004, SAS Institute Inc. All rights reserved. Summary Goal is to automate creation of graphics by statistical procedures • Minimum work for user • Maximum built-in functionality Experimental in SAS 9.1 Production in SAS 9.2 Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Transactional Storage (aka SAS Database Capabilities) Demo Time 1. Color_table • Remember to start your TableServer 2. Customers • Remember to start your AppServer (tomcat5) Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Transactional Storage (aka SAS Database Capabilities) A more traditional Database Capability From SAS. (not oracle, ibm, or microsoft) Based on OpenSource “Firebird” Real Datatypes – INT, MONEY, VARCHAR Real Connectors – JDBC, ODBC, SAS Libname Real Transactions – Rollback and Commit MultiUser Server Copyright © 2004, SAS Institute Inc. All rights reserved. What’s New in SAS Grid Automation Cheryl Doninger R&D Director, Grid Development Roger Thompson Relationship Manager Merry Rabb Product Manager, Grid Copyright © 2004, SAS Institute Inc. All rights reserved. Grid Computing Market Size & Growth Rapid Adoption of Grid Computing Based on Benefits Copyright © 2004, SAS Institute Inc. All rights reserved. Grid Adoption is Increasing 2/3 of firms surveyed are using or considering grid technology A high percentage of firms using analytical applications are considering grid Copyright © 2004, SAS Institute Inc. All rights reserved. Benefits of Grid Computing Faster results More executions – more data Time to recover from errors Better use of resources Virtualize resources Incremental IT spend Copyright © 2004, SAS Institute Inc. All rights reserved. Types of Applications Suitable for Grid Long running Many replicate runs of same fundamental task • • • • simulation (what if analysis) optimization (testing lots of scenarios) BY GROUP processing data segmentation Independent tasks running against large data sources • scoring – risk analysis • multiple procedures and data steps Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Grid Strategy Infrastructure benefits SAS applications • large data / complex algorithms Focus areas • Development • Run-time • System management Incremental Releases Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Grid Roadmap Phase I SAS 8.2 functionality • %Distribute • SAS/CONNECT • SAS log Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Grid Success Stories Texas Tech University Statistics Canada Large Pharmaceutical Company Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Grid Roadmap Phase II SAS 9.1.3 Q3/2005 functionality • smarter engines for SAS IDEs • SAS/Platform integration • SASMC monitoring Copyright © 2004, SAS Institute Inc. All rights reserved. Business Analytics - Enterprise Miner on SMP Copyright © 2004, SAS Institute Inc. All rights reserved. Business Analytics - Enterprise Miner on Grid Copyright © 2004, SAS Institute Inc. All rights reserved. Data Integration – ETL Studio on SMP/Grid Copyright © 2004, SAS Institute Inc. All rights reserved. Data Integration – ETL Studio on SMP/Grid Copyright © 2004, SAS Institute Inc. All rights reserved. Business Intelligence – Enabled on SMP/Grid Web Services SAS Stored Process SAS Program ETL Studio Enterprise Miner Copyright © 2004, SAS Institute Inc. All rights reserved. Grid Manager Plugin – job view Copyright © 2004, SAS Institute Inc. All rights reserved. Grid Manager Plugin – host view Copyright © 2004, SAS Institute Inc. All rights reserved. SAS 9 Grid Computing Components SAS 9 Grid Computing NEW September 2005 Grid Manager Plug-in Grid Monitoring Grid Management Job Termination Platform Suite for SAS Dynamic Load Balancing Job, Queue & Host Management Multi-Processor SAS SAS Connect Piping Distribution Session Spawning SAS Applications Enterprise Miner Stored Processes Data Integration Grid Enabled Code Generation Multiple Components Working Together to Provide Grid Computing Copyright © 2004, SAS Institute Inc. All rights reserved. General Layout of a SAS Grid Grid Node LSF SAS ETL SAS EM SAS Foundation Grid Node LSF Client Machine Copyright © 2004, SAS Institute Inc. All rights reserved. SAS Grid Grid Node LSF … Metadata Server Machine Grid Mgr plugin Grid Control Machine Platform Suite for SAS n Grid Work Flow Metadata Server session resource sascmd wl options ------------------------------------------------p1 SASMain sas –noobjectserver Workspace Server Connect Client Node1 LSF LSF Cluster File SAS Servers grdsvc_enable(p1, “resource=SASMain”); Node1 ! ! 1 () (SASMain) Node2 ! ! 1 () () Node3 ! ! 1 () (SASMain) … Node2 signon p1; SAS Metadata Node3 SASMain – Server Context Platform Server Component sas -noobjectserver … ETL Studio Enterprise Miner n SAS MC Copyright © 2004, SAS Institute Inc. All rights reserved. Partitioning the Grid session resource sascmd wl options -------------------------------------------------------------------------p1 SASMain sas –noobjectserver ETL Metadata Server Workspace Server Node1 Connect Client LSF EM grid LSF Cluster File Node1 ! ! 1 () (SASMain,EM) Node2 ! ! 1 () (SASMain,EM,ETL) Node2 Node3 ! ! 1 () (SASMain, ETL) SAS Servers … grdsvc_enable(p1, “resource=SASMain, workload=ETL”); signon p1; Node3 SAS Metadata SASMain – Server Context Platform Server Component sas –noobjectserver … EM, ETL ETL Studio Enterprise Miner Copyright © 2004, SAS Institute Inc. All rights reserved. ETL grid SAS MC n Grid Provides: Speed and Efficiency Copyright © 2004, SAS Institute Inc. All rights reserved. Analytics are working, so people… Build more models • For successively refined segments of customers Use more data in those models Integrate the results into operational systems • <near real time> A SAS9.2 datastep movie Copyright © 2004, SAS Institute Inc. All rights reserved. Implications More Multi thread enablement within SAS Yes, even the DATA STEP Saved Programs Multi Threaded Server Capabilities • Same model, parallel data for thruput • Many models, same data – one off scores in operational systems Models Management can deploy models to “score servers” without restarting them Copyright © 2004, SAS Institute Inc. All rights reserved. Bits and Pieces Reverse Engineer SAS jobs Checkpoint and Restart SAS jobs Encode (and protect) your SAS jobs ZIP functions CRC … Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Protect your IP PROC SCRAMBLE file=‘myfile.sas’ outfile=‘secret.sas’ <expire=> <site=> … ; Send secret.sas to your customers %include ‘secret.sas’; • Implies nosource; your macros can reset NOMPRINT… Copyright © 2004, SAS Institute Inc. All rights reserved. Checkpoint/Restart and Parallelization Features in the Core Supervisor Rick Langston, Core Systems Department Copyright © 2004, SAS Institute Inc. All rights reserved. Checkpoint/Restart Craig R.’s request as per user community Job fails – want to restart where it left off ETL Studio also wanted a restart facility Copyright © 2004, SAS Institute Inc. All rights reserved. A simple solution Record a checkpoint number, save it in WORK If restarting, skip PROC / DATA steps to there Tokenize everything Execute all global statements Copyright © 2004, SAS Institute Inc. All rights reserved. To set up for checkpointing Use NOWORKINIT, NOWORKTERM Have WORK refer to a permanent directory Use the CHECKPOINT option Copyright © 2004, SAS Institute Inc. All rights reserved. Subsequent restarting Again use NOWORKINIT, NOWORKTERM Again use WORK to the permanent directory Use the RESTART option Job will restart as of the last successful step Copyright © 2004, SAS Institute Inc. All rights reserved. Is this what users want? We can’t do this without user being proactive data temp / set temp issues skipped steps may need to be executed Output files (flat files – DISP=MOD, databases…) Copyright © 2004, SAS Institute Inc. All rights reserved. EXECUTE_ALWAYS CHECKPOINT / EXECUTE_ALWAYS; Use it for a step that must be executed For example, SYMPUT and CALL EXECUTE Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Example Using options debug=‘checkpoint-implicit’; Option names still to be decided Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. data temp1; x=1; run; data temp2; x=2; run; data temp3; x=3; run; data _null_; if "&sysparm."="1" then abort abend 999; run; data temp4; x=4; run; Copyright © 2004, SAS Institute Inc. All rights reserved. Invoke once with checkpoint-implicit Then reinvoke with restart-implicit Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Additional info Planned for 9.2 Option names still being decided Wanting additional input Copyright © 2004, SAS Institute Inc. All rights reserved. Parallelization Efforts Reading in arbitrary SAS code Producing metadata in comments This could be post-processed by ETL Studio This could be post-processed by Grid Computing Copyright © 2004, SAS Institute Inc. All rights reserved. Parallelization Efforts Researching so far Hooks in dependency opens Catalogs, flat files, SAS data sets, etc. Emitting info in comments Example of use Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Copyright © 2004, SAS Institute Inc. All rights reserved. Exposure to User New option, such as DEPMETA=fileref SAS program with comments written to this file Copyright © 2004, SAS Institute Inc. All rights reserved. Questions/comments? Copyright © 2004, SAS Institute Inc. All rights reserved. Ideas for the Future! How can the software learn? So the user doesn’t have to learn about the software; they can learn the business! Some future ETL studio JOB • Remembers data volumes from last weeks run • Uses that memory to choose a better strategy Copyright © 2004, SAS Institute Inc. All rights reserved. Your Turn!! You tell me next time SAS forgets something it should have remembered And why remembering that would help SAS improve next time < Paul.Kent@sas.com > Thanks for listening! Copyright © 2004, SAS Institute Inc. All rights reserved.