An Exercise in Improving SAS Performance on Mainframe Processors SAS BLKSIZE and BUFSIZE Options Forward • At the last KCASUG meeting, George Hurley presented “Customizing Your SAS Initialization II.” • In this presentation, George suggested that it is possible to save CPU in SAS jobs by tuning the BUFSIZE parameter. • With our current interest in saving CPU and stretching the life of mainframe equipment, I decided to investigate what kind of savings were possible in our environment. Background • In the 1990s and earlier, disk storage for mainframes consisted of a stack of 14” platters arranged in what was called a disk drive. • There was a separate read/write head for each surface – All read/write heads were aligned at the same relative position and moved together • Disk drives were organized into tracks and cylinders. – A track represented the data that could be accessed from one surface with one revolution of the disk – A cylinder was all the tracks that could be accessed from the same relative location of the read/writes heads. • Data was stored with gaps between records in a CKD format Background • 3390s were the final generation of IBM classical disk drives – Each track could hold up to 56,664 bytes – The largest size record was 32,767 bytes – While records could be larger, records were rarely larger than 27,998 bytes • This is the largest record size that allowed 2 records per track • Record sizes approaching 27,998 bytes provided optimal use of disk storage on 3390 devices – This is commonly referred to as a “half track” record size Background • When modern storage controllers started replacing classical mainframe storage, the storage controllers emulated classical storage devices, particularly the 3390 • While data is actually stored in stripes with multiple layers of virtualization, access to the data still follows the protocol of classical mainframe storage Mainframe SAS Files • Two factors have the most influence on the performance of I/Os for SAS datasets – BLKSIZE – the size of the block (physical record) – BUFSIZE – the size of the storage buffer • Should be a multiple of BLKSIZE SAS BLKSIZE • BLKSIZE – Larger block sizes are more efficient • With smaller block sizes, there is additional overhead in SAS to manage each block – SAS files can have any BLKSIZE up to 32,760 – The optimal BLKSIZE for SAS files is 27,648 • Largest “half-track” size for SAS files • Provides optimal balance of performance and disk storage utilization SAS BUFSIZE • BUFSIZE – When SAS schedules an I/O for a SAS dataset, it builds the I/O command to transfer as much data as will fit in the buffer as a single I/O command • This saves the operating system overhead related to managing multiple I/Os • SAS uses its own channel programming (EXCPs) for SAS files, not normal operating system access methods – For example, with a BLKSIZE of 27,648 and a BUFSIZE of 110,592, SAS would build I/O commands to transfer 4 blocks with each I/O command SAS BUFSIZE • BUFSIZE – Buffer sizes of between 110,592 and 221,184 tend to be fairly efficient – MEMSIZE may need to be increased when BUFSIZE is increased Controlled Tests • Performed some controlled tests • One controlled test – Wrote 250,000 records to a SAS file (each about 1.6K of data) – In separate step, read the records (in a _NULL_ data step, SET the input to the file just created) – Varied BLKSIZE and BUFSIZE in each run Controlled Tests • Tests showed that a BLKSIZE of 27,648 performed better than a BLKSIZE of 6,144 for similar buffer sizes – A BLKSIZE of 6,144 was the old standard in our shop • Tests also suggested limited improvements in CPU and run times with buffer sizes above 110,592 to 221,184 – In fact, sometimes performance appeared to deteriorate with larger buffer sizes Test - Write 250,000 Records CPU per Run 2.50 CPU (seconds) 2.00 1.50 1.00 0.50 0.00 6,144 / 6,144 6,144 / 12,288 6,144 / 24,576 6,144 / 49,152 6,144 / 110,592 6,144 / 221,184 27,648 / 27,648 BLKSIZE / BUFSIZE 27,648 / 55,296 27,648 / 110,592 27,648 / 221,184 27,648 / 331,776 27,648 / 442,368 Test - Read 250,000 Records CPU per Run 2.50 CPU (seconds) 2.00 1.50 1.00 0.50 0.00 6,144 / 6,144 6,144 / 12,288 6,144 / 24,576 6,144 / 49,152 6,144 / 110,592 6,144 / 221,184 27,648 / 27,648 BLKSIZE / BUFSIZE 27,648 / 55,296 27,648 / 110,592 27,648 / 221,184 27,648 / 331,776 27,648 / 442,368 Test - Write 250,000 Records Run Time 50 45 40 Run Time (seconds) 35 30 25 20 15 10 5 0 6,144 / 6,144 6,144 / 12,288 6,144 / 24,576 6,144 / 49,152 6,144 / 110,592 6,144 / 221,184 27,648 / 27,648 BLKSIZE / BUFSIZE 27,648 / 55,296 27,648 / 110,592 27,648 / 221,184 27,648 / 331,776 27,648 / 442,368 Test - Read 250,000 Records Run Time 50 45 40 Run Time (seconds) 35 30 25 20 15 10 5 0 6,144 / 6,144 6,144 / 12,288 6,144 / 24,576 6,144 / 49,152 6,144 / 110,592 6,144 / 221,184 27,648 / 27,648 BLKSIZE / BUFSIZE 27,648 / 55,296 27,648 / 110,592 27,648 / 221,184 27,648 / 331,776 27,648 / 442,368 Production Pilots • Identified the jobs that were using the largest total amount of CPU • Ran pilots on 2 of the top 5 jobs to explore potential benefits with real jobs – Changed BKLSIZE from 6,144 to 27,648 – Increased BUFSIZE to 221,184 • Ran several parallel runs of the MXG job with various BLKSIZE and BUFSIZE (MXG is a common SAS-based mainframe tool to capture and manage mainframe performance data) – Experimented with various block sizes – Have not placed changes to MXG in production yet • Rewrote one job in another language Pilot Results • Pilot results were quite favorable – Job using largest amount of CPU (runs many times each day) – see charts for Job 1 • 6% reduction in CPU • 25% improvement in run time – Job using 5th largest amount of CPU (runs many times each day) – see charts for Job 2 • 9% reduction in CPU • 43% improvement in run time – MXG (2nd largest user of CPU – runs once daily) • 5% reduction in CPU • ~ 10% improvement in run time 3/ 7/ 20 10 3/ 9/ 20 10 3/ 11 /2 01 0 3/ 13 /2 01 0 3/ 15 /2 01 0 3/ 17 /2 01 0 3/ 19 /2 01 0 3/ 21 /2 01 0 3/ 23 /2 01 0 3/ 25 /2 01 0 3/ 27 /2 01 0 3/ 29 /2 01 0 3/ 31 /2 01 0 4/ 2/ 20 10 4/ 4/ 20 10 4/ 6/ 20 10 4/ 8/ 20 10 Average CPU Time Changes in CPU Time Job 1 120 160,000 6% reduction in CPU 100 60 0 CPU Time (Seconds) Total I/Os 140,000 120,000 80 100,000 80,000 40 60,000 40,000 20 20,000 0 Changes in Run Time Job 1 450 160,000 400 140,000 25% improvement in average run time 350 120,000 100,000 250 80,000 200 60,000 150 40,000 100 20,000 50 0 0 3/ 7/ 20 10 3/ 9/ 20 10 3/ 11 /2 01 0 3/ 13 /2 01 0 3/ 15 /2 01 0 3/ 17 /2 01 0 3/ 19 /2 01 0 3/ 21 /2 01 0 3/ 23 /2 01 0 3/ 25 /2 01 0 3/ 27 /2 01 0 3/ 29 /2 01 0 3/ 31 /2 01 0 4/ 2/ 20 10 4/ 4/ 20 10 4/ 6/ 20 10 4/ 8/ 20 10 Average Run Time 300 Run Time (Seconds) Total I/Os CPU Time (Seconds) Total I/Os 4/27/2010 4/26/2010 4/25/2010 4/24/2010 4/23/2010 4/22/2010 4/21/2010 4/20/2010 4/19/2010 4/18/2010 4/17/2010 4/16/2010 4/15/2010 4/14/2010 4/13/2010 4/12/2010 4/11/2010 4/10/2010 4/9/2010 4/8/2010 4/7/2010 4/6/2010 4/5/2010 4/4/2010 4/3/2010 4/2/2010 4/1/2010 3/31/2010 3/30/2010 3/29/2010 3/28/2010 Average CPU Time Changes in CPU Time Job 2 30 45,000 9% reduction in CPU 40,000 25 35,000 20 30,000 25,000 15 20,000 10 15,000 10,000 5 5,000 0 0 Run Time (Seconds) Total I/Os 4/27/2010 4/26/2010 4/25/2010 4/24/2010 4/23/2010 4/22/2010 4/21/2010 4/20/2010 4/19/2010 4/18/2010 4/17/2010 80 4/16/2010 4/15/2010 4/14/2010 4/13/2010 4/12/2010 4/11/2010 4/10/2010 4/9/2010 4/8/2010 4/7/2010 4/6/2010 4/5/2010 4/4/2010 4/3/2010 4/2/2010 4/1/2010 3/31/2010 3/30/2010 3/29/2010 3/28/2010 Average Run Time Changes in Run Time Job 2 120 45,000 40,000 100 35,000 43% improvement in average run time 30,000 25,000 60 20,000 40 15,000 10,000 20 5,000 0 0 Production Implementation • Changed BLKSIZE to 27,648 – Changed both CONFIG member and SAS PROC • Changed BUFSIZE to 221,184 – Changed CONFIG member • Made changes to ensure jobs would not fail with memory issues – MEMSIZE parameter removed from CONFIG • Defaults to 0 (no limitation on memory) – Changed REGION to 0M in SAS PROC – Made mass change to production SAS jobs to remove REGION parameter overrides Implementation Results • Measured results based on production jobs that ran daily – Compared results on job / weekday basis • For jobs that ran during the day: – 10% average reduction in CPU • Varied from no gain to 15-20% improvement – 30% average improvement in run times • Varied considerably from job to job • For jobs that ran at night – 3% reduction in CPU – 10% improvement in run times Issues and Opportunities • Many production jobs reuse same SAS files without ever deleting and recreating them – BLKSIZE remains smaller size • Many production jobs use their own customized SAS PROCs or CONFIG members – Cannot easily take advantage of changes • Will need to look for opportunities to tune these jobs later Thinking Outside the Box • One very large SAS job runs daily – Job would read 10-12 million rows – Sort data on 4 keys – Summarize 32 columns using PROC UNIVARIATE • Rewrote job in another language – Took advantage of partial natural order of data and used hashing algorithm to organize data • Initial level summary done in summary program – Summarized data was then input to SAS Changes in Rewritten Job • Reduced CPU 95% • Improved run time 97% It is worth noting that I could find only two large SAS jobs that could take advantage of this technique. All other SAS jobs that I looked at were far too complex to consider doing this. 3/ 15 /2 01 3/ 17 0 /2 01 3/ 19 0 /2 01 3/ 21 0 /2 01 3/ 23 0 /2 01 3/ 0 25 /2 01 3/ 27 0 /2 01 3/ 29 0 /2 01 3/ 31 0 /2 01 0 4/ 2/ 20 10 4/ 4/ 20 10 4/ 6/ 20 10 4/ 8/ 20 1 4/ 10 0 /2 01 4/ 12 0 /2 01 4/ 14 0 /2 01 4/ 16 0 /2 01 4/ 18 0 /2 01 4/ 20 0 /2 01 4/ 22 0 /2 01 0 CPU Time (seconds) Changes in CPU Time Rewritten Job 1800 1,200,000 1600 95% reduction in CPU 1,000,000 1400 1200 800,000 1000 800 600,000 600 400,000 400 200,000 200 0 0 CPU Time (Seconds) Total I/Os