Macrotize! A Beginner`s Guide to Designing and Writing Macros

advertisement
Macrotize! A Beginner’s Guide to
Designing and Writing Macros
Stacey D. Phillips, i3 Statprobe, Madison, WI
Copyright © 2009 i3 | CONFIDENTIAL
1
Summary of Presentation
 Introduction to Macro Design and Writing
 Description of Real World Problem using Quality of
Life Data (QOL)
 Overview of 5 Steps to Designing and Writing
Macros
 5 Steps In-Depth using QOL data
 Additional example of the 5 Steps using Vital Signs
data
 Conclusion
 Questions
Copyright © 2009 i3 | CONFIDENTIAL
2
Introduction to Designing and Writing
Macros
 Learning to write macro code is a valuable skill for a SAS
programmer though it can be intimidating to beginning
programmers.
 This presentation and the corresponding paper focuses
more on the concepts of macro design/writing rather than
specific macro code details.
 There are many ways to think about macro design. I will
present the five steps I use when developing a new macro.
 The paper demonstrates the five steps using a real-world
example involving Quality of Life data.
 This presentation will include an additional example that
uses the five steps.
Copyright © 2009 i3 | CONFIDENTIAL
3
Why Use Macros?
 The macro facility is a tool for extending and customizing
SAS and for reducing the amount of text you must enter to
do a particular task.
 SAS macros are used to reduce the amount of code one has
to write by automating similar operations that are likely to be
repeated.
 In general, it is best to use macro language when you find
yourself writing the same or very similar code over and over.
Macro code can be more difficult to debug in the short term
but it may save you time in the long run.
Copyright © 2009 i3 | CONFIDENTIAL
4
Some reasons you might consider
using macro code…



Macros allow you to make a change in one location of your
program so that SAS can cascade the change throughout
your program.
Macros allow you to write a section of code once and use it
over and over again in the same program or even in
different programs.
Macros allow you to make programs data driven letting
SAS decide what to do based on actual data values.
Copyright © 2009 i3 | CONFIDENTIAL
5
Some examples of where macro code might
be useful…

You find yourself doing the same statistical calculation in
multiple programs so decide to create and store an external
macro (macro contained in a completely separate program)
in the macro library that can be called whenever these
calculations are necessary.
– Example: A macro named %count01 is created to count
subjects, calculate percentages and format the data into
character variables appropriate for display in any RTF
table.
Copyright © 2009 i3 | CONFIDENTIAL
6
Some examples of where macro code might
be useful…


You are producing the similar output (eg. a data set or table
or graph, etc.) for different groups of a certain population.
– Example: A macro named %by_state is created to
generate an economic profile table for each of the 50
states.
You want to run multiple statistical analyses with only slight
modifications to the inputs.
– Example: A macro named %reg is created to run a
series of regression models and includes parameters to
specify both the independent and dependent variables
for each individual model without retyping the whole
PROC REG code.
Copyright © 2009 i3 | CONFIDENTIAL
7
A Real-world Example Using Quality of
Life (QOL) Regression Data
 A statistician on my project ran many regression models and
output each individual statistic into 225 separate SAS
datasets.
 The datasets needed to be accessed, manipulated and
reformatted based on the table shell provided (see Handout
reference #1).
 The original task required six tables be created though this
presentation with demonstrate only three tables as the
process is the same whether three or six tables are created.
 In the interest of simplicity, I will include only partial sections
of the code needed for the various steps. The complete
code can be found in Step 5 of the paper and in Handout
reference #5.
Copyright © 2009 i3 | CONFIDENTIAL
8
Some Things to Keep in Mind
 Paper is designed to be concept-based rather than codebased.
 Don’t get too bogged down in the specific code…but do ask
questions if you have them.
 Code examples are excerpts in the interest of simplicity.
The complete code is in the paper.
 The ultimate goal of this example is to write a dynamic
macro that will be as data independent as possible.
 Because the PROC REPORT portion of the generated table
is not relevant to the included macro writing, I’ve left out the
code.
Copyright © 2009 i3 | CONFIDENTIAL
9
Table Shell (see Handout Reference #1)
Table 1.1 Summary of Statistical Analysis of Time-Adjusted PRO AUC by Scale
EORTC QLQ - C30
QOL Analysis Set
Treatment Comparison
95% CI
Scale
Appetite loss scale
Treatment
A
n*
x
Mean
Estimate
xx.xx
Comparison
A-C
Comparison
Estimate
x.xx
Lower
x.xxx
Upper
x.xxx
B
x
xx.xx
B-C
x.xx
x.xxx
x.xxx
C
x
xx.xx
A
x
xx.xx
A-C
x.xx
x.xxx
x.xxx
B
x
xx.xx
B-C
x.xx
x.xxx
x.xxx
C
x
xx.xx
A
x
xx.xx
A-C
x.xx
x.xxx
x.xxx
B
x
xx.xx
B-C
x.xx
x.xxx
x.xxx
C
x
xx.xx
Constipation scale
Diarrhoea scale
Additional scales to be included in subsequent pages…
Page X of Y
Footnote 1
Footnote X
Program: xxx.sas
Output: xxxrtf Source Data: QOL
Copyright © 2009 i3 | CONFIDENTIAL
10
Five Easy Steps to Designing and
Writing Macros
 Step 1 - Know where you are and
where you’re going.
 Step 2 - Pretend you’re not writing a
macro.
 Step 3 - Look for patterns.
 Step 4 - Vary one parameter at a time.
 Step 5 - Put it all together.
Copyright © 2009 i3 | CONFIDENTIAL
11
Step 1 - Know where you are and where you
are going.




Do you even need to write a macro? Sometimes utilizing
macro code is unnecessary and cumbersome.
Macro code is more difficult to debug than normal SAS
code so macro language should only be used when it will
be advantageous.
The benefits of using SAS macro language include its
adaptability and portability which make it ideal for reducing
the amount of code necessary when you find yourself
repeating code over and over.
How do we know when using a macro is a good idea? I
usually start to think about writing a macro when I what I’m
working on involves repetition or pattern.
Copyright © 2009 i3 | CONFIDENTIAL
12
Step 1 - Know where you are and where you
are going. (See Handout Reference #2)
Where are we?
Copyright © 2009 i3 | CONFIDENTIAL
13
Step 1 - Know where you are and where you
are going. (See Handout Reference #2)
Where are we going?
Annotated Table Shell Based on QOL Data
Treatment Comparison
95% CI
Treatment
cohortcd
A
n
n.n
Mean
Estimate
m.estimate
Comparison
A-C
Comparison
Estimate
d.estimate
Lower
d.lower
Upper
d.upper
Data:
B
n.n
m.estimate
B-C
d.estimate
d.lower
d.upper
<n/m/d>appetite
C
n.n
m.estimate
Constipation scale
A
n.n
m.estimate
A-C
d.estimate
d.lower
d.upper
Data:
B
n.n
m.estimate
B-C
d.estimate
d.lower
d.upper
<n/m/d>constipation
C
n.n
m.estimate
Diarrhoea scale
A
n.n
m.estimate
A-C
d.estimate
d.lower
d.upper
<n/m/d>diarrhoea
B
n.n
m.estimate
B-C
d.estimate
d.lower
d.upper
C
n.n
m.estimate
Scale
Appetite loss scale
Additional scales to be included in subsequent pages…
Page X of Y
Footnote 1
Footnote X
Program: xxx.sas
Output: xxx.rtf Source Data: QOL
Copyright © 2009 i3 | CONFIDENTIAL
14
Step 2 - Pretend you’re not writing a macro.
 Ignore the fact that you want to write a macro and write out
your code in regular SAS code without any macro language.
 Why would I suggest this?
– It’s important to make sure your basic SAS code works
properly before you “macrotize.”
– While using macro code will make your work more
efficient in the long-run, in the short-term it can make the
task more complex if you don’t have your basic code
hammered out.
– Sometimes writing out the code in its basic form helps to
identify additional patterns that will be useful for macro
writing.
Copyright © 2009 i3 | CONFIDENTIAL
15
Step 2 - Pretend you’re not writing a macro.
QOL example: I want to SET the required data for each scale
together then MERGE the “N”, “mean” and “estimate/CI”
data together.
/*EXAMPLE: Setting the “N” datasets together to get the N statistics*/
libname c30auc "/project/data/stats/output/PRO/C30/AUC";
data N;
set C30AUC.NAPPETITE
C30AUC.NCONSTIPATION
C30AUC.NDIARRHOEA
C30AUC.NEMOTIONAL
C30AUC.NFATIGUE
C30AUC.NGLOBAL
C30AUC.NINSOMNIA
C30AUC.NNAUSEA
C30AUC.NPAIN
C30AUC.NPHYSICAL
C30AUC.NROLE;
run;
proc sort;
by scale cohort;
run;
Copyright © 2009 i3 | CONFIDENTIAL
16
/*Merge the “N”, “mean” and “estimate/CI” data together to get final
dataset ALL used in generating the table*/
data ALL;
merge N /* “N” data */
M /* “mean” data*/
D; /* “estimate/confidence interval” data */
by scale cohort;
run;
 Using the ALL data I can now go ahead and use PROC
REPORT to generate my table.
 While this code is simple enough, I’m really interested in getting
SAS to figure out which datasets I need based on what’s in the
directory so the individual scale datasets don’t need to be
specified.
Copyright © 2009 i3 | CONFIDENTIAL
17
Step 2 - Pretend you’re not writing a macro.
 I am interested in generating the tables independent of the
input data. In other words, I don’t want to specify the individual
datasets for each scale (eg. NAPPETITE, DPAIN) but want
SAS to figure it out for me.
 The following code creates a variable that includes the list of
relevant datasets from a specified directory using SASHELP
tables.
 SASHELP tables are outside the scope of this paper but if
you’re interested I’ve included the reference to a paper I wrote
several years ago regarding this topic.
 Try not to get bogged down in the specific code but notice the
patterns that I’ve highlighted which will help write our macro
later.
Copyright © 2009 i3 | CONFIDENTIAL
18
Get “N” Population Data
data N_data;
length N_data $200;
set vstable;
by libname memname;
retain N_data;
if first.libname then do;
N_data=trim(left(libname))||"."||trim(left(memname));;
end;
else do;
N_data=trim(left(N_data))||"
"||trim(left(libname))||"."||trim(left(memname));
end;
call symput("N_data",N_data);
if last.libname;
run;
Copyright © 2009 i3 | CONFIDENTIAL
19
Step 2 - Pretend you’re not writing a macro.
The PROC PRINT of the newly created N_DATA variable
containing the dataset lists is below:
N_data
C30AUC.NAPPETITE C30AUC.NCONSTIPATION
C30AUC.NDIARRHOEA C30AUC.NEMOTIONAL
C30AUC.NFATIGUE C30AUC.NGLOBAL C30AUC.NINSOMNIA
C30AUC.NNAUSEA C30AUC.NPAIN C30AUC.NPHYSICAL
C30AUC.NROLE
Note how this list matches the SET statement we previously
wrote out manually earlier in Step 2:
data N;
set C30AUC.NAPPETITE
C30AUC.NDIARRHOEA
C30AUC.NFATIGUE
C30AUC.NINSOMNIA
C30AUC.NPAIN
C30AUC.NROLE;
run;
Copyright © 2009 i3 | CONFIDENTIAL
C30AUC.NCONSTIPATION
C30AUC.NEMOTIONAL
C30AUC.NGLOBAL
C30AUC.NNAUSEA
C30AUC.NPHYSICAL
20
Step 3 - Look for patterns and variability.
 Now that you know your code is working well, it is time to
start thinking about introducing macro code.
 First, let’s revisit the annotated shell to look for patterns and
variability in dataset and variable naming conventions.
 Next, revisit the code from Step 2 for the N data and
compare it to the same code for the M data.
 The locations of the “N” and “M” differences demonstrate a
great place to add a macro parameter.
 Further, by noting the similarities in variable naming
conventions (eg. mean, estimate, etc.) we know the data
can be set easily together without the need to rename
variables.
Copyright © 2009 i3 | CONFIDENTIAL
21
Step 3 – Look for patterns and variability.
Returning to the annotated shell, note the patterns in the dataset (eg. scales named the
same thing in N/M/D datasets) as well as the variability (eg. each scale is prefixed
differently).
Annotated Table Shell Based on QOL Data
Treatment Comparison
95% CI
Treatment
cohortcd
A
n
n.n
Mean
Estimate
m.estimate
Comparison
A-C
Comparison
Estimate
d.estimate
Lower
d.lower
Upper
d.upper
Data:
B
n.n
m.estimate
B-C
d.estimate
d.lower
d.upper
<n/m/d>appetite
C
n.n
m.estimate
Constipation scale
A
n.n
m.estimate
A-C
d.estimate
d.lower
d.upper
Data:
B
n.n
m.estimate
B-C
d.estimate
d.lower
d.upper
<n/m/d>constipation
C
n.n
m.estimate
Diarrhoea scale
A
n.n
m.estimate
A-C
d.estimate
d.lower
d.upper
<n/m/d>diarrhoea
B
n.n
m.estimate
B-C
d.estimate
d.lower
d.upper
C
n.n
m.estimate
Scale
Appetite loss scale
Additional scales to be included in subsequent pages…
Page X of Y
Footnote 1
Footnote X
Program: xxx.sas
Output: xxx.rtf Source Data: QOL
Copyright © 2009 i3 | CONFIDENTIAL
22
Get “N” Population Data
data N_data;
length N_data $200;
set vstable;
by libname memname;
retain N_data;
if first.libname then do;
N_data=trim(left(libname))||"."||trim(left(memname));;
end;
else do;
N_data=trim(left(N_data))||"
"||trim(left(libname))||"."||trim(left(memname));
end;
call symput("N_data",N_data);
if last.libname;
run;
Copyright © 2009 i3 | CONFIDENTIAL
23
Get “M” Population Data
data M_data;
length M_data $200;
set vstable;
by libname memname;
retain M_data;
if first.libname then do;
M_data=trim(left(libname))||"."||trim(left(memname));;
end;
else do;
M_data=trim(left(M_data))||"
"||trim(left(libname))||"."||trim(left(memname));
end;
call symput(“M_data",M_data);
if last.libname;
run;
Copyright © 2009 i3 | CONFIDENTIAL
24
Step 4 - Vary one parameter at a time.
 It’s best to vary one parameter at a time and check your
work before continuing to add additional parameters.
 By varying only one parameter at a time the code will be
easier to debug should there be problems.
 See Handout Reference #3 for the macro %AUC.
 Similar in concept to varying one parameter at a time is to
work from the “inside to the outside.”
 When working with “inner” and “outer” macros (often
referred to as “nested”) it is best to work on the inner macros
and then the outer macros to reduce complexity in
debugging.
Copyright © 2009 i3 | CONFIDENTIAL
25
Step 4 - Vary one parameter at a time.
Remember that we’re generating not one but three similar
tables. The titles are below:
Table 1.1 Summary of Statistical Analysis of TimeAdjusted PRO AUC by Scale
EORTC QLQ - C30
QOL Analysis Set
Table 1.2 Summary of Statistical Analysis of TimeAdjusted PRO AUC by Scale
EORTC QLQ – LC13
QOL Analysis Set
Table 1.3 Summary of Statistical Analysis of TimeAdjusted PRO AUC by Scale
EORTC QLQ – TSQM
QOL Analysis Set
Copyright © 2009 i3 | CONFIDENTIAL
26
Step 4 - Vary one parameter at a time.
 Similar in concept to varying one parameter at a time is to
work from the “inside to the outside.”
 When working with “inner” and “outer” macros (often
referred to as “nested”) it is best to work on the inner macros
and then the outer macros to reduce complexity in
debugging.
 See Handout Reference #4 for example of nested macros.
 Note that %AUC generates the interior of the table (eg.
columns, rows, etc.) and is the inner macro.
 Note that %PRO generates each individual table (eg.
t_pro_auc_c30) and is the outer macro.
Copyright © 2009 i3 | CONFIDENTIAL
27
Step 5 - Put it all together.
1. Do a complete run of your program and fix any errors,
warnings, etc. in the log file.
2. Review results and compare back to the original
assignment. Have you included all of the necessary
components?
3. Identify additional places of variability and if applicable,
adjust the code and start step #5 again.
4. Once you’re convinced the output is what is desired, review
the macro code and clean-up as necessary. Check for
indentation for better readability and document with
comments throughout the macro especially for large,
nested macros.
5. See Handout Reference #5 for complete code.
Copyright © 2009 i3 | CONFIDENTIAL
28
Another Example Using the Five
Steps: Vital Sign Data
The Task:
1. Create two datasets containing vital sign data.
1. Blood Pressure Measurements (Systolic and Diastolic BP)
2. Non-Blood Pressure Measurements (Pulse, Respiration and
Temperature).
1. Add unit labels to each vital sign parameters (eg. mmHG
for BP measurements).
Copyright © 2009 i3 | CONFIDENTIAL
29
Five Easy Steps to Designing and
Writing Macros
 Step 1 - Know where you are and
where you’re going.
 Step 2 - Pretend you’re not writing a
macro.
 Step 3 - Look for patterns.
 Step 4 - Vary one parameter at a time.
 Step 5 - Put it all together.
Copyright © 2009 i3 | CONFIDENTIAL
30
Step 1 – Know where you are and where
you’re going: Where are we?
SUBJECT
1
1
1
1
1
2
2
2
2
2
Copyright © 2009 i3 | CONFIDENTIAL
RESULT
96.000
18.000
36.800
92.000
60.000
96.000
18.000
36.700
129.000
84.333
TEST
PULSE
RESP
TEMP
SYSBP
DIABP
PULSE
RESP
TEMP
SYSBP
DIABP
31
Step 1 – Know where you are and where
you’re going: Where are we going?
Data set with Blood Pressure Parameters:
SUBJECT TEST
RESULT
1
SYSBP
92.000
2
SYSBP
129.000
3
SYSBP
90.000
4
SYSBP
120.667
5
SYSBP
144.000
1
DIABP
60.000
2
DIABP
84.333
3
DIABP
60.667
4
DIABP
73.333
5
DIABP
84.000
Copyright © 2009 i3 | CONFIDENTIAL
UNIT
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
32
Step 1 – Know where you are and where
you’re going: Where are we going?
Data set with non-Blood Pressure Parameters:
SUBJECT
TEST
RESULT
1
PULSE
96.00
2
PULSE
96.00
3
PULSE
77.00
4
PULSE
74.00
5
PULSE
80.00
1
RESP
18.00
2
RESP
18.00
3
RESP
18.00
4
RESP
16.00
5
RESP
18.00
1
TEMP
36.80
2
TEMP
36.70
3
TEMP
37.06
4
TEMP
36.60
Copyright © 2009 i3 | CONFIDENTIAL
5
TEMP
36.89
UNIT
Beats per minute
Beats per minute
Beats per minute
Beats per minute
Beats per minute
Breaths per minute
Breaths per minute
Breaths per minute
Breaths per minute
Breaths per minute
Degrees Celsius
Degrees Celsius
Degrees Celsius
Degrees Celsius
33
Degrees Celsius
Step 2 - Pretend you’re not writing a macro.
data pulse;
length UNIT $50;
set vitals_v;
if test='PULSE';
UNIT='Beats per minute';
run;
data resp;
length UNIT $50;
set vitals_v;
if test='RESP';
UNIT='Breaths per minute';
run;
Copyright © 2009 i3 | CONFIDENTIAL
34
Step 3 - Look for patterns.
data pulse;
length UNIT $50;
set vitals_v;
if test=“PULSE”;
UNIT=“Beats per minute”;
run;
data resp;
length UNIT $50;
set vitals_v;
if test=“RESP”;
UNIT=“Breaths per minute”;
run;
Copyright © 2009 i3 | CONFIDENTIAL
35
Step 4 - Vary one parameter at a time.
%macro inner(test=);
data &test;
/*Vary the TEST parameter first*/
length UNIT $50;
set vitals_v;
if test=“&test”;
UNIT=“Beats per minute”; /*Vary the UNIT parameter next*/
run;
%mend inner;
%inner(test=PULSE)
%inner(test=RESP)
Copyright © 2009 i3 | CONFIDENTIAL
36
Step 5 - Put it all together.
%macro outer(name=,data=,label=);
%macro inner(test=,unit=);
data &test;
length UNIT $50;
set vitals_v;
if test="&test";
UNIT="&unit";
run;
%mend inner;
%inner(test=RESP,unit=Breaths per minute);
%inner(test=PULSE,unit=Beats per minute);
%inner(test=TEMP,unit=Degrees Celsius);
%inner(test=SYSBP,unit=mmHg);
%inner(test=DIABP,unit=mmHg);
data &name;
set &data;
run;
%mend outer;
%outer(name=BP,data=sysbp diabp,label=Blood Pressure Parameters);
%outer(name=NON_BP,data=pulse resp temp,label=non-Blood Pressure Parameters);
Copyright © 2009 i3 | CONFIDENTIAL
37
Step 5 - Put it all together.
%macro outer(name=,data=,label=);
%macro inner(test=,unit=);
data &test;
length UNIT $50;
set vitals_v;
if test="&test";
UNIT="&unit";
run;
%mend inner;
%inner(test=RESP,unit=Breaths per minute);
%inner(test=PULSE,unit=Beats per minute);
%inner(test=TEMP,unit=Degrees Celsius);
%inner(test=SYSBP,unit=mmHg);
%inner(test=DIABP,unit=mmHg);
data &name;
set &data;
run;
%mend outer;
%outer(name=BP,data=sysbp diabp,label=Blood Pressure Parameters);
%outer(name=NON_BP,data=pulse resp temp,label=non-Blood Pressure Parameters);
Copyright © 2009 i3 | CONFIDENTIAL
38
Step 5 – Put it all together.
Data set with Blood Pressure Parameters:
SUBJECT TEST
RESULT
1
SYSBP
92.000
2
SYSBP
129.000
3
SYSBP
90.000
4
SYSBP
120.667
5
SYSBP
144.000
1
DIABP
60.000
2
DIABP
84.333
3
DIABP
60.667
4
DIABP
73.333
5
DIABP
84.000
Copyright © 2009 i3 | CONFIDENTIAL
UNIT
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
mmHg
39
Step 5 – Put it all together.
Data set with non-Blood Pressure Parameters:
SUBJECT
TEST
RESULT
1
PULSE
96.00
2
PULSE
96.00
3
PULSE
77.00
4
PULSE
74.00
5
PULSE
80.00
1
RESP
18.00
2
RESP
18.00
3
RESP
18.00
4
RESP
16.00
5
RESP
18.00
1
TEMP
36.80
2
TEMP
36.70
3
TEMP
37.06
4
TEMP
36.60
5
TEMP
36.89
Copyright © 2009 i3 | CONFIDENTIAL
UNIT
Beats per minute
Beats per minute
Beats per minute
Beats per minute
Beats per minute
Breaths per minute
Breaths per minute
Breaths per minute
Breaths per minute
Breaths per minute
Degrees Celsius
Degrees Celsius
Degrees Celsius
Degrees Celsius
Degrees Celsius
40
Conclusion
 Learning to write macro code is a useful skill for the SAS
programmer.
 Macros can make your programs more efficient and reduce
the amount of code writing that needs to be done.
 By learning a few basic concepts of macro writing, the
beginning SAS programmer can easily get started in
designing and writing macros.
Copyright © 2009 i3 | CONFIDENTIAL
41
Download