Macrotize! A Beginner’s Guide to Designing and Writing Macros Stacey D. Phillips, i3 Statprobe, Madison, WI Copyright © 2009 i3 | CONFIDENTIAL 1 Summary of Presentation Introduction to Macro Design and Writing Description of Real World Problem using Quality of Life Data (QOL) Overview of 5 Steps to Designing and Writing Macros 5 Steps In-Depth using QOL data Additional example of the 5 Steps using Vital Signs data Conclusion Questions Copyright © 2009 i3 | CONFIDENTIAL 2 Introduction to Designing and Writing Macros Learning to write macro code is a valuable skill for a SAS programmer though it can be intimidating to beginning programmers. This presentation and the corresponding paper focuses more on the concepts of macro design/writing rather than specific macro code details. There are many ways to think about macro design. I will present the five steps I use when developing a new macro. The paper demonstrates the five steps using a real-world example involving Quality of Life data. This presentation will include an additional example that uses the five steps. Copyright © 2009 i3 | CONFIDENTIAL 3 Why Use Macros? The macro facility is a tool for extending and customizing SAS and for reducing the amount of text you must enter to do a particular task. SAS macros are used to reduce the amount of code one has to write by automating similar operations that are likely to be repeated. In general, it is best to use macro language when you find yourself writing the same or very similar code over and over. Macro code can be more difficult to debug in the short term but it may save you time in the long run. Copyright © 2009 i3 | CONFIDENTIAL 4 Some reasons you might consider using macro code… Macros allow you to make a change in one location of your program so that SAS can cascade the change throughout your program. Macros allow you to write a section of code once and use it over and over again in the same program or even in different programs. Macros allow you to make programs data driven letting SAS decide what to do based on actual data values. Copyright © 2009 i3 | CONFIDENTIAL 5 Some examples of where macro code might be useful… You find yourself doing the same statistical calculation in multiple programs so decide to create and store an external macro (macro contained in a completely separate program) in the macro library that can be called whenever these calculations are necessary. – Example: A macro named %count01 is created to count subjects, calculate percentages and format the data into character variables appropriate for display in any RTF table. Copyright © 2009 i3 | CONFIDENTIAL 6 Some examples of where macro code might be useful… You are producing the similar output (eg. a data set or table or graph, etc.) for different groups of a certain population. – Example: A macro named %by_state is created to generate an economic profile table for each of the 50 states. You want to run multiple statistical analyses with only slight modifications to the inputs. – Example: A macro named %reg is created to run a series of regression models and includes parameters to specify both the independent and dependent variables for each individual model without retyping the whole PROC REG code. Copyright © 2009 i3 | CONFIDENTIAL 7 A Real-world Example Using Quality of Life (QOL) Regression Data A statistician on my project ran many regression models and output each individual statistic into 225 separate SAS datasets. The datasets needed to be accessed, manipulated and reformatted based on the table shell provided (see Handout reference #1). The original task required six tables be created though this presentation with demonstrate only three tables as the process is the same whether three or six tables are created. In the interest of simplicity, I will include only partial sections of the code needed for the various steps. The complete code can be found in Step 5 of the paper and in Handout reference #5. Copyright © 2009 i3 | CONFIDENTIAL 8 Some Things to Keep in Mind Paper is designed to be concept-based rather than codebased. Don’t get too bogged down in the specific code…but do ask questions if you have them. Code examples are excerpts in the interest of simplicity. The complete code is in the paper. The ultimate goal of this example is to write a dynamic macro that will be as data independent as possible. Because the PROC REPORT portion of the generated table is not relevant to the included macro writing, I’ve left out the code. Copyright © 2009 i3 | CONFIDENTIAL 9 Table Shell (see Handout Reference #1) Table 1.1 Summary of Statistical Analysis of Time-Adjusted PRO AUC by Scale EORTC QLQ - C30 QOL Analysis Set Treatment Comparison 95% CI Scale Appetite loss scale Treatment A n* x Mean Estimate xx.xx Comparison A-C Comparison Estimate x.xx Lower x.xxx Upper x.xxx B x xx.xx B-C x.xx x.xxx x.xxx C x xx.xx A x xx.xx A-C x.xx x.xxx x.xxx B x xx.xx B-C x.xx x.xxx x.xxx C x xx.xx A x xx.xx A-C x.xx x.xxx x.xxx B x xx.xx B-C x.xx x.xxx x.xxx C x xx.xx Constipation scale Diarrhoea scale Additional scales to be included in subsequent pages… Page X of Y Footnote 1 Footnote X Program: xxx.sas Output: xxxrtf Source Data: QOL Copyright © 2009 i3 | CONFIDENTIAL 10 Five Easy Steps to Designing and Writing Macros Step 1 - Know where you are and where you’re going. Step 2 - Pretend you’re not writing a macro. Step 3 - Look for patterns. Step 4 - Vary one parameter at a time. Step 5 - Put it all together. Copyright © 2009 i3 | CONFIDENTIAL 11 Step 1 - Know where you are and where you are going. Do you even need to write a macro? Sometimes utilizing macro code is unnecessary and cumbersome. Macro code is more difficult to debug than normal SAS code so macro language should only be used when it will be advantageous. The benefits of using SAS macro language include its adaptability and portability which make it ideal for reducing the amount of code necessary when you find yourself repeating code over and over. How do we know when using a macro is a good idea? I usually start to think about writing a macro when I what I’m working on involves repetition or pattern. Copyright © 2009 i3 | CONFIDENTIAL 12 Step 1 - Know where you are and where you are going. (See Handout Reference #2) Where are we? Copyright © 2009 i3 | CONFIDENTIAL 13 Step 1 - Know where you are and where you are going. (See Handout Reference #2) Where are we going? Annotated Table Shell Based on QOL Data Treatment Comparison 95% CI Treatment cohortcd A n n.n Mean Estimate m.estimate Comparison A-C Comparison Estimate d.estimate Lower d.lower Upper d.upper Data: B n.n m.estimate B-C d.estimate d.lower d.upper <n/m/d>appetite C n.n m.estimate Constipation scale A n.n m.estimate A-C d.estimate d.lower d.upper Data: B n.n m.estimate B-C d.estimate d.lower d.upper <n/m/d>constipation C n.n m.estimate Diarrhoea scale A n.n m.estimate A-C d.estimate d.lower d.upper <n/m/d>diarrhoea B n.n m.estimate B-C d.estimate d.lower d.upper C n.n m.estimate Scale Appetite loss scale Additional scales to be included in subsequent pages… Page X of Y Footnote 1 Footnote X Program: xxx.sas Output: xxx.rtf Source Data: QOL Copyright © 2009 i3 | CONFIDENTIAL 14 Step 2 - Pretend you’re not writing a macro. Ignore the fact that you want to write a macro and write out your code in regular SAS code without any macro language. Why would I suggest this? – It’s important to make sure your basic SAS code works properly before you “macrotize.” – While using macro code will make your work more efficient in the long-run, in the short-term it can make the task more complex if you don’t have your basic code hammered out. – Sometimes writing out the code in its basic form helps to identify additional patterns that will be useful for macro writing. Copyright © 2009 i3 | CONFIDENTIAL 15 Step 2 - Pretend you’re not writing a macro. QOL example: I want to SET the required data for each scale together then MERGE the “N”, “mean” and “estimate/CI” data together. /*EXAMPLE: Setting the “N” datasets together to get the N statistics*/ libname c30auc "/project/data/stats/output/PRO/C30/AUC"; data N; set C30AUC.NAPPETITE C30AUC.NCONSTIPATION C30AUC.NDIARRHOEA C30AUC.NEMOTIONAL C30AUC.NFATIGUE C30AUC.NGLOBAL C30AUC.NINSOMNIA C30AUC.NNAUSEA C30AUC.NPAIN C30AUC.NPHYSICAL C30AUC.NROLE; run; proc sort; by scale cohort; run; Copyright © 2009 i3 | CONFIDENTIAL 16 /*Merge the “N”, “mean” and “estimate/CI” data together to get final dataset ALL used in generating the table*/ data ALL; merge N /* “N” data */ M /* “mean” data*/ D; /* “estimate/confidence interval” data */ by scale cohort; run; Using the ALL data I can now go ahead and use PROC REPORT to generate my table. While this code is simple enough, I’m really interested in getting SAS to figure out which datasets I need based on what’s in the directory so the individual scale datasets don’t need to be specified. Copyright © 2009 i3 | CONFIDENTIAL 17 Step 2 - Pretend you’re not writing a macro. I am interested in generating the tables independent of the input data. In other words, I don’t want to specify the individual datasets for each scale (eg. NAPPETITE, DPAIN) but want SAS to figure it out for me. The following code creates a variable that includes the list of relevant datasets from a specified directory using SASHELP tables. SASHELP tables are outside the scope of this paper but if you’re interested I’ve included the reference to a paper I wrote several years ago regarding this topic. Try not to get bogged down in the specific code but notice the patterns that I’ve highlighted which will help write our macro later. Copyright © 2009 i3 | CONFIDENTIAL 18 Get “N” Population Data data N_data; length N_data $200; set vstable; by libname memname; retain N_data; if first.libname then do; N_data=trim(left(libname))||"."||trim(left(memname));; end; else do; N_data=trim(left(N_data))||" "||trim(left(libname))||"."||trim(left(memname)); end; call symput("N_data",N_data); if last.libname; run; Copyright © 2009 i3 | CONFIDENTIAL 19 Step 2 - Pretend you’re not writing a macro. The PROC PRINT of the newly created N_DATA variable containing the dataset lists is below: N_data C30AUC.NAPPETITE C30AUC.NCONSTIPATION C30AUC.NDIARRHOEA C30AUC.NEMOTIONAL C30AUC.NFATIGUE C30AUC.NGLOBAL C30AUC.NINSOMNIA C30AUC.NNAUSEA C30AUC.NPAIN C30AUC.NPHYSICAL C30AUC.NROLE Note how this list matches the SET statement we previously wrote out manually earlier in Step 2: data N; set C30AUC.NAPPETITE C30AUC.NDIARRHOEA C30AUC.NFATIGUE C30AUC.NINSOMNIA C30AUC.NPAIN C30AUC.NROLE; run; Copyright © 2009 i3 | CONFIDENTIAL C30AUC.NCONSTIPATION C30AUC.NEMOTIONAL C30AUC.NGLOBAL C30AUC.NNAUSEA C30AUC.NPHYSICAL 20 Step 3 - Look for patterns and variability. Now that you know your code is working well, it is time to start thinking about introducing macro code. First, let’s revisit the annotated shell to look for patterns and variability in dataset and variable naming conventions. Next, revisit the code from Step 2 for the N data and compare it to the same code for the M data. The locations of the “N” and “M” differences demonstrate a great place to add a macro parameter. Further, by noting the similarities in variable naming conventions (eg. mean, estimate, etc.) we know the data can be set easily together without the need to rename variables. Copyright © 2009 i3 | CONFIDENTIAL 21 Step 3 – Look for patterns and variability. Returning to the annotated shell, note the patterns in the dataset (eg. scales named the same thing in N/M/D datasets) as well as the variability (eg. each scale is prefixed differently). Annotated Table Shell Based on QOL Data Treatment Comparison 95% CI Treatment cohortcd A n n.n Mean Estimate m.estimate Comparison A-C Comparison Estimate d.estimate Lower d.lower Upper d.upper Data: B n.n m.estimate B-C d.estimate d.lower d.upper <n/m/d>appetite C n.n m.estimate Constipation scale A n.n m.estimate A-C d.estimate d.lower d.upper Data: B n.n m.estimate B-C d.estimate d.lower d.upper <n/m/d>constipation C n.n m.estimate Diarrhoea scale A n.n m.estimate A-C d.estimate d.lower d.upper <n/m/d>diarrhoea B n.n m.estimate B-C d.estimate d.lower d.upper C n.n m.estimate Scale Appetite loss scale Additional scales to be included in subsequent pages… Page X of Y Footnote 1 Footnote X Program: xxx.sas Output: xxx.rtf Source Data: QOL Copyright © 2009 i3 | CONFIDENTIAL 22 Get “N” Population Data data N_data; length N_data $200; set vstable; by libname memname; retain N_data; if first.libname then do; N_data=trim(left(libname))||"."||trim(left(memname));; end; else do; N_data=trim(left(N_data))||" "||trim(left(libname))||"."||trim(left(memname)); end; call symput("N_data",N_data); if last.libname; run; Copyright © 2009 i3 | CONFIDENTIAL 23 Get “M” Population Data data M_data; length M_data $200; set vstable; by libname memname; retain M_data; if first.libname then do; M_data=trim(left(libname))||"."||trim(left(memname));; end; else do; M_data=trim(left(M_data))||" "||trim(left(libname))||"."||trim(left(memname)); end; call symput(“M_data",M_data); if last.libname; run; Copyright © 2009 i3 | CONFIDENTIAL 24 Step 4 - Vary one parameter at a time. It’s best to vary one parameter at a time and check your work before continuing to add additional parameters. By varying only one parameter at a time the code will be easier to debug should there be problems. See Handout Reference #3 for the macro %AUC. Similar in concept to varying one parameter at a time is to work from the “inside to the outside.” When working with “inner” and “outer” macros (often referred to as “nested”) it is best to work on the inner macros and then the outer macros to reduce complexity in debugging. Copyright © 2009 i3 | CONFIDENTIAL 25 Step 4 - Vary one parameter at a time. Remember that we’re generating not one but three similar tables. The titles are below: Table 1.1 Summary of Statistical Analysis of TimeAdjusted PRO AUC by Scale EORTC QLQ - C30 QOL Analysis Set Table 1.2 Summary of Statistical Analysis of TimeAdjusted PRO AUC by Scale EORTC QLQ – LC13 QOL Analysis Set Table 1.3 Summary of Statistical Analysis of TimeAdjusted PRO AUC by Scale EORTC QLQ – TSQM QOL Analysis Set Copyright © 2009 i3 | CONFIDENTIAL 26 Step 4 - Vary one parameter at a time. Similar in concept to varying one parameter at a time is to work from the “inside to the outside.” When working with “inner” and “outer” macros (often referred to as “nested”) it is best to work on the inner macros and then the outer macros to reduce complexity in debugging. See Handout Reference #4 for example of nested macros. Note that %AUC generates the interior of the table (eg. columns, rows, etc.) and is the inner macro. Note that %PRO generates each individual table (eg. t_pro_auc_c30) and is the outer macro. Copyright © 2009 i3 | CONFIDENTIAL 27 Step 5 - Put it all together. 1. Do a complete run of your program and fix any errors, warnings, etc. in the log file. 2. Review results and compare back to the original assignment. Have you included all of the necessary components? 3. Identify additional places of variability and if applicable, adjust the code and start step #5 again. 4. Once you’re convinced the output is what is desired, review the macro code and clean-up as necessary. Check for indentation for better readability and document with comments throughout the macro especially for large, nested macros. 5. See Handout Reference #5 for complete code. Copyright © 2009 i3 | CONFIDENTIAL 28 Another Example Using the Five Steps: Vital Sign Data The Task: 1. Create two datasets containing vital sign data. 1. Blood Pressure Measurements (Systolic and Diastolic BP) 2. Non-Blood Pressure Measurements (Pulse, Respiration and Temperature). 1. Add unit labels to each vital sign parameters (eg. mmHG for BP measurements). Copyright © 2009 i3 | CONFIDENTIAL 29 Five Easy Steps to Designing and Writing Macros Step 1 - Know where you are and where you’re going. Step 2 - Pretend you’re not writing a macro. Step 3 - Look for patterns. Step 4 - Vary one parameter at a time. Step 5 - Put it all together. Copyright © 2009 i3 | CONFIDENTIAL 30 Step 1 – Know where you are and where you’re going: Where are we? SUBJECT 1 1 1 1 1 2 2 2 2 2 Copyright © 2009 i3 | CONFIDENTIAL RESULT 96.000 18.000 36.800 92.000 60.000 96.000 18.000 36.700 129.000 84.333 TEST PULSE RESP TEMP SYSBP DIABP PULSE RESP TEMP SYSBP DIABP 31 Step 1 – Know where you are and where you’re going: Where are we going? Data set with Blood Pressure Parameters: SUBJECT TEST RESULT 1 SYSBP 92.000 2 SYSBP 129.000 3 SYSBP 90.000 4 SYSBP 120.667 5 SYSBP 144.000 1 DIABP 60.000 2 DIABP 84.333 3 DIABP 60.667 4 DIABP 73.333 5 DIABP 84.000 Copyright © 2009 i3 | CONFIDENTIAL UNIT mmHg mmHg mmHg mmHg mmHg mmHg mmHg mmHg mmHg mmHg 32 Step 1 – Know where you are and where you’re going: Where are we going? Data set with non-Blood Pressure Parameters: SUBJECT TEST RESULT 1 PULSE 96.00 2 PULSE 96.00 3 PULSE 77.00 4 PULSE 74.00 5 PULSE 80.00 1 RESP 18.00 2 RESP 18.00 3 RESP 18.00 4 RESP 16.00 5 RESP 18.00 1 TEMP 36.80 2 TEMP 36.70 3 TEMP 37.06 4 TEMP 36.60 Copyright © 2009 i3 | CONFIDENTIAL 5 TEMP 36.89 UNIT Beats per minute Beats per minute Beats per minute Beats per minute Beats per minute Breaths per minute Breaths per minute Breaths per minute Breaths per minute Breaths per minute Degrees Celsius Degrees Celsius Degrees Celsius Degrees Celsius 33 Degrees Celsius Step 2 - Pretend you’re not writing a macro. data pulse; length UNIT $50; set vitals_v; if test='PULSE'; UNIT='Beats per minute'; run; data resp; length UNIT $50; set vitals_v; if test='RESP'; UNIT='Breaths per minute'; run; Copyright © 2009 i3 | CONFIDENTIAL 34 Step 3 - Look for patterns. data pulse; length UNIT $50; set vitals_v; if test=“PULSE”; UNIT=“Beats per minute”; run; data resp; length UNIT $50; set vitals_v; if test=“RESP”; UNIT=“Breaths per minute”; run; Copyright © 2009 i3 | CONFIDENTIAL 35 Step 4 - Vary one parameter at a time. %macro inner(test=); data &test; /*Vary the TEST parameter first*/ length UNIT $50; set vitals_v; if test=“&test”; UNIT=“Beats per minute”; /*Vary the UNIT parameter next*/ run; %mend inner; %inner(test=PULSE) %inner(test=RESP) Copyright © 2009 i3 | CONFIDENTIAL 36 Step 5 - Put it all together. %macro outer(name=,data=,label=); %macro inner(test=,unit=); data &test; length UNIT $50; set vitals_v; if test="&test"; UNIT="&unit"; run; %mend inner; %inner(test=RESP,unit=Breaths per minute); %inner(test=PULSE,unit=Beats per minute); %inner(test=TEMP,unit=Degrees Celsius); %inner(test=SYSBP,unit=mmHg); %inner(test=DIABP,unit=mmHg); data &name; set &data; run; %mend outer; %outer(name=BP,data=sysbp diabp,label=Blood Pressure Parameters); %outer(name=NON_BP,data=pulse resp temp,label=non-Blood Pressure Parameters); Copyright © 2009 i3 | CONFIDENTIAL 37 Step 5 - Put it all together. %macro outer(name=,data=,label=); %macro inner(test=,unit=); data &test; length UNIT $50; set vitals_v; if test="&test"; UNIT="&unit"; run; %mend inner; %inner(test=RESP,unit=Breaths per minute); %inner(test=PULSE,unit=Beats per minute); %inner(test=TEMP,unit=Degrees Celsius); %inner(test=SYSBP,unit=mmHg); %inner(test=DIABP,unit=mmHg); data &name; set &data; run; %mend outer; %outer(name=BP,data=sysbp diabp,label=Blood Pressure Parameters); %outer(name=NON_BP,data=pulse resp temp,label=non-Blood Pressure Parameters); Copyright © 2009 i3 | CONFIDENTIAL 38 Step 5 – Put it all together. Data set with Blood Pressure Parameters: SUBJECT TEST RESULT 1 SYSBP 92.000 2 SYSBP 129.000 3 SYSBP 90.000 4 SYSBP 120.667 5 SYSBP 144.000 1 DIABP 60.000 2 DIABP 84.333 3 DIABP 60.667 4 DIABP 73.333 5 DIABP 84.000 Copyright © 2009 i3 | CONFIDENTIAL UNIT mmHg mmHg mmHg mmHg mmHg mmHg mmHg mmHg mmHg mmHg 39 Step 5 – Put it all together. Data set with non-Blood Pressure Parameters: SUBJECT TEST RESULT 1 PULSE 96.00 2 PULSE 96.00 3 PULSE 77.00 4 PULSE 74.00 5 PULSE 80.00 1 RESP 18.00 2 RESP 18.00 3 RESP 18.00 4 RESP 16.00 5 RESP 18.00 1 TEMP 36.80 2 TEMP 36.70 3 TEMP 37.06 4 TEMP 36.60 5 TEMP 36.89 Copyright © 2009 i3 | CONFIDENTIAL UNIT Beats per minute Beats per minute Beats per minute Beats per minute Beats per minute Breaths per minute Breaths per minute Breaths per minute Breaths per minute Breaths per minute Degrees Celsius Degrees Celsius Degrees Celsius Degrees Celsius Degrees Celsius 40 Conclusion Learning to write macro code is a useful skill for the SAS programmer. Macros can make your programs more efficient and reduce the amount of code writing that needs to be done. By learning a few basic concepts of macro writing, the beginning SAS programmer can easily get started in designing and writing macros. Copyright © 2009 i3 | CONFIDENTIAL 41