Optimizing SAS System Performance − A Platform Perspective Patrick McDonald Scryer Analytics, LLC June 3, 2010 Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Presentation Overview After this presentation you will know: • How your SAS code interacts with the hardware it runs on. • The different hardware configurations SAS may run on in your organization. • How to help your IT organization diagnose and correct performance problems. You probably won’t gain: • Any new SAS programming tips • More than a very brief overview of efficient programming techniques Copyright © 2010, Scryer Analytics, LLC. All rights reserved. An Easy Question proc sql; connect to db2 (database=mydatabase); create table Table1 as select * from connection to db2 ( select * from db2table); disconnect from db2; quit; What does this program do? Connects to DB2 data View1 / view=View1; set Table1; retain x; output; x=y; run; Creates x as previous y proc summary data=View1 NWAY; var _numeric_; class c1 c2 c3; output out =p.mymeans mean= M N=COUNT; run; Copyright © 2010, Scryer Analytics, LLC. All rights reserved. SAS table of db2table Disconnects from DB2 Calculates Mean and N and outputs data What controls system performance? Resources Relationships Programmer Time Programmer Time Storage CPU Time Resources Memory Copyright © 2010, Scryer Analytics, LLC. All rights reserved. I/O Hardware Efficient Programming Practices Writing Efficient Code Necessary Statements Configuring/Tuning Options Buffer Allocation Passes Through Data Memory Allocation Essential Read/Writes Multithreading Permanent SAS Data Necessary Procedures Sorting, Duplicates, Etc. SAS Views DBMS Optimization Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Resource Model – CPU, RAM, I/O, & Disk Copyright © 2010, Scryer Analytics, LLC. All rights reserved. CPU CPU What is a CPU? • # of Sockets • # of Chips • # of Cores • # of Co-processors • Clock Speed • Etc. SPECfp SPECint Copyright © 2010, Scryer Analytics, LLC. All rights reserved. RAM Memory RAM RAM per core RAM per session RAM for OS Copyright © 2010, Scryer Analytics, LLC. All rights reserved. I/O I/O Types of Storage • Network Attached Storage • Local Disk • Storage Area Network The disk is the slowest part of the system ~10-60 MB/s read/write speeds Throughput per session • 15-25 MB/s • 50-75+ MB/s Copyright © 2010, Scryer Analytics, LLC. All rights reserved. A little more about storage Storage Options HBAs File Systems SAS User LUNS Temporary Work Space RAID Permanent Data Storage Disks Utility (UTILLOC) Disk Speed Disk Size Copyright © 2010, Scryer Analytics, LLC. All rights reserved. RAID Configurations in SAS Environments Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Operating System Limitations Windows (32 bit) Enterprise Edition (32 bit) • ~2 GB of RAM practical limit • 5 GB data set size practical limit (file cache contention) Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Windows (x64) Enterprise Edition for x64 • Support issues (9.1) • 5 GB data set size practical limit (file cache contention) Operating System Limitations Windows (Itanium) Enterprise Edition (Itanium) • 10 GB data set size practical limit (file cache contention) Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Unix (64 bit) HPUX, Solaris, AIX etc. • Limited by hardware only • Access to additional memory • No file cache contention issues Architecture Limitations Hardware Bottlenecks CPU (#, speed, etc.) SAN Bottlenecks I/O RAM • Host Bus Adaptors Backplane • Paths to Disk Cache • Ethernet (2 GB/s Ethernet) • Disks Configuration/Tuning − RAID Hyperthreading − Disk Speed − # of disks − Disk Size • Luns & File Systems Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Redux: what does this program do? proc sql; connect to db2 (database=mydatabase); create table Table1 as select * from connection to db2 ( select * from db2table); disconnect from db2; quit; data View1 / view=View1; set Table1; retain x; output; x=y; run; proc summary data=View1 NWAY; var _numeric_; class c1 c2 c3; output out =p.mymeans mean= M N=COUNT; run; Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Think like hardware? PROC SQL proc sql; connect to db2 (database=mydatabase); create table Table1 as select * from connection to db2 ( select * from db2table); disconnect from db2; quit; data View1 / view=View1; set Table1; retain x; output; x=y; run; proc summary data=View1 NWAY; var _numeric_; class c1 c2 c3; output out =p.mymeans mean= M N=COUNT; run; Copyright © 2010, Scryer Analytics, LLC. All rights reserved. What resources are used? Data Step proc sql; connect to db2 (database=mydatabase); create table Table1 as select * from connection to db2 ( select * from db2table); disconnect from db2; quit; data View1 / view=View1; set Table1; retain x; output; x=y; run; proc summary data=View1 NWAY; var _numeric_; class c1 c2 c3; output out =p.mymeans mean= M N=COUNT; run; Copyright © 2010, Scryer Analytics, LLC. All rights reserved. What resources are used? Proc Step proc sql; connect to db2 (database=mydatabase); create table Table1 as select * from connection to db2 ( select * from db2table); disconnect from db2; quit; data View1 / view=View1; set Table1; retain x; output; x=y; run; proc summary data=View1 NWAY; var _numeric_; class c1 c2 c3; output out =p.mymeans mean= M N=COUNT; run; Copyright © 2010, Scryer Analytics, LLC. All rights reserved. What resources are used? Typical BI/SAS Solution Architecture Copyright © 2010, Scryer Analytics, LLC. All rights reserved. BI Architecture Web Server Loads CPU Intensive Integer Calculations Rack Servers Pooled, Load Balanced ~ 100 concurrent sessions per core (CPU) Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Small Text Files BI Architecture Application Server Loads CPU Intensive Integer Calculations Rack Servers Pooled, Load balanced ~100 concurrent sessions per core (CPU). Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Small Text Files BI Architecture SAS Metadata Server Memory Intensive Metadata stored in memory for speed Generally 2 CPU except for very large implementations Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Metadata in RAM database BI Architecture SAS BI Servers CPU and or I/O Intensive Heavy Floating Point (CPU) Heavy I/O depending upon the number of sessions and volume of data Heavy Memory (type of problem & number of concurrent sessions) Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Large Volumes of Data BI Architecture SPD Server/RDBMS I/O Intensive SAN Storage (75+ Mb/s sustained I/O throughput per session) Copyright © 2010, Scryer Analytics, LLC. All rights reserved. Large Volumes of Data Questions Copyright © 2010, Scryer Analytics, LLC. All rights reserved. References http://en.wikipedia.org/wiki/RAID Optimizing SAS® Programs Course Notes SGF 2009: How to maintain happy SAS users SUGI 31: Solving SAS Performance Problems: Employing Host Based Tools Copyright © 2010, Scryer Analytics, LLC. All rights reserved. SIMPLICITY BEYOND COMPLEXITY Copyright © 2010, Scryer Analytics, LLC. All rights reserved.