Let SAS Do the Coding for You

advertisement
Let SAS Do the Coding for You!
Robert Williams
Business Info Analyst Sr.
WellPoint Inc.
Overview
• Coding Dilemma: Direct coding or “Robo-Coding”?
• Step 1: Setting up SAS Data Set for “Robo-Coding”
• Step 2a: Technique #1 – Macro variable
• Generate SAS code using DATA _NULL_ and CALL SYMPUT to a macro
• Step 2b: Technique #2 – External SAS code file
• Generate SAS code using DATA _NULL_ and PUT statements to write to
an external SAS program file
• Step 3: Run the auto-generated SAS codes
• Extra Example and Conclusion
2
Coding Dilemma: Claims Data Extract Request
• Example data extraction asking for a subset of claims
based on the following diagnosis and the corresponding
diagnosis categories
• Muscle problems that are of a disabling nature, limited to:
356.1, 359.0-359.2, 045.9, 728.11
• Hemophilia:
286.0-286.4
• Acquired or congenital heart disease:
745.0-747.9; 424.0-429.9
3
Coding Dilemma: 1-digit & 2-digit DX sub-codes
• Diagnosis (DX) codes can be cumbersome to code into
the WHERE statement due to having to account for each
1-digit and 2-digit sub-codes.
• Example: To account for all medical DX codes from
359.0 to 359.2 (www.icd9data.com)
4
Coding Dilemma: Write the code the hard way
data WORK.CLAIMS_SUBSET_DX_cat;
set WORK.SESUG_samp_claims;
where (IDCD_ID like "0459%" or IDCD_ID like "3561%" or IDCD_ID like "3590%"
or IDCD_ID like "3591%" or IDCD_ID like "3592%" or IDCD_ID like "72811%"
or IDCD_ID like "2860%" or IDCD_ID like "2861%" or IDCD_ID like "2862%"
or IDCD_ID like "2863%" or IDCD_ID like "2864%" or IDCD_ID like "424%"
or IDCD_ID like "425%" or IDCD_ID like "426%" or IDCD_ID like "427%"
or IDCD_ID like "428%" or IDCD_ID like "429%" or IDCD_ID like "745%"
or IDCD_ID like "746%" or IDCD_ID like "747%");
format DX_category $150.;
select;
when (substr(IDCD_ID,1,4) = '0459') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3561') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3590') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3591') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3592') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,5) = '72811') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '2860') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2861') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2862') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2863') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2864') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,3) = '424') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '425') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '426') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '427') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '428') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '429') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '745') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '746') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '747') DX_category = 'Heart disease';
OTHERWISE DX_category = "WHAT CATEGORY?!";
END;
run;
Code all the
LIKE
statements
with %
Code all the
WHEN
statements
with
different
sizes for
SUBSTR
function
5
Step 1: Setting up SAS Data File
• Create a SAS data set with key data. It can be created in
a DATA step or imported from Excel.
DX codes for
WHERE statements
and WHEN
statements
(without decimals)
DX_list
DX_CATEGORY
0459
Muscle problems
3561
Muscle problems
3590
Muscle problems
3591
Muscle problems
3592
Muscle problems
72811
Muscle problems
2860
Hemophilia
2861
Hemophilia
2862
Hemophilia
2863
Hemophilia
2864
Hemophilia
424
Heart disease
425
Heart disease
426
Heart disease
427
Heart disease
428
Heart disease
429
Heart disease
745
Heart disease
746
Heart disease
747
Heart disease
Categories
for the
WHEN
statements
Muscle problems:
356.1, 359.0-359.2, 045.9,
728.11
Hemophilia:
286.0-286.4
Heart disease:
745.0-747.9; 424.0-429.9
6
Step 2a: Technique #1 – Macro Variable
• Using the SAS data set from previous slide (Step 1), this
technique uses DATA _NULL_ to generate a data string
for the WHERE statement like this
“WHERE IDCD_ID like '747%' or IDCD_ID like
'0459%' or …”
• Uses the RETAIN statement so the data string is retained
for each DATA step iteration
• Uses the END statement and CALL SYMPUT function to
output the data string to a macro variable. The CALL
SYMPUT is invoked at the last iteration of the DATA step
7
Step 2a: Technique #1 – The Robo-Coder
• DATA _NULL_ step code to generate the data string
data _null_;
set WORK.SESUG_DX_LISTING END=last;
format where_string $5000.;
retain where_string;
/* This is the first line for the where statement */
if _n_ = 1 then where_string = 'IDCD_ID like "'||trim(DX_LIST)||'%"';
/* subsequent lines that adds the "OR" line for the where statement */
else where_string = trim(where_string)||' '||'OR IDCD_ID like
"'||trim(DX_LIST)||'%"';
/* At the end, this outputs entire where statement to the macro variable */
if last then call symput("IDCD_ID_where_statement",where_string);
run;
At _n_=1, start
the string. Then
At _n_>1,
“OR…” is added
to the data
string
At _n_ = 1,
where_string = IDCD_ID like "0459%"
At the last iteration, the DATA _NULL_ outputs the where_string value
as a SAS macro variable named: “IDCD_ID_WHERE_STATEMENT"
At _n_ = 2,
where_string = IDCD_ID like "0459%"
OR IDCD_ID like "3561%"
8
Step 2a: Technique #1 – The Robo-Coder
• DATA _NULL_ step code to generate the data string
data _null_;
set WORK.SESUG_DX_LISTING END=last;
format where_string $5000.;
retain where_string;
/* This is the first line for the where statement */
if _n_ = 1 then where_string = 'IDCD_ID like "'||trim(DX_LIST)||'%"';
/* subsequent lines that adds the "OR" line for the where statement */
else where_string = trim(where_string)||' '||'OR IDCD_ID like
"'||trim(DX_LIST)||'%"';
/* At the end, this outputs entire where statement to the macro variable */
if last then call symput("IDCD_ID_where_statement",where_string);
run;
_n_ = 1 starts
the string, after
that, the “OR…”
is added to the
data string
• Using the %PUT statement, the macro variable will look
like this in the SAS log
%put &IDCD_ID_where_statement;
IDCD_ID like "0459%" or IDCD_ID like "3561%" or IDCD_ID like "3590%" or IDCD_ID like
"3591%" or IDCD_ID like "3592%" or IDCD_ID
like "72811%" or IDCD_ID like "2860%" or IDCD_ID like "2861%" or IDCD_ID like "2862%"
or IDCD_ID like "2863%" or IDCD_ID like
"2864%" or IDCD_ID like "424%" or IDCD_ID like "425%" or IDCD_ID like "426%" or
IDCD_ID like "427%" or IDCD_ID like "428%" or
IDCD_ID like "429%" or IDCD_ID like "745%" or IDCD_ID like "746%" or IDCD_ID like
"747%"
9
Step 2b: Technique #2 – External Program File
• Using the SAS data set from Step 1, this technique uses
DATA _NULL_ to write the series of SELECT/WHEN
statements to an external SAS program file using the
PUT statements
• Uses FILENAME statement to establish external SAS file
reference i.e. “tmpwhen” to be used with FILE statement
• Uses LENGTH function to determine the size of the
DX_LIST field of the SAS data set (from step 1) i.e.
“str_length = put(length(DX_LIST),1.0);”
• Each DATA step iteration writes the each WHEN
statement for the DX_ Category
10
Step 2b: Technique #2 – External Program File
• Uses the END statement (inside the SET statement) to
write the final the final two lines of the series of
SELECT/WHEN statements i.e.
“OTHERWISE DX_category="WHAT CATEGORY?!";
END;”
• When the DATA _NULL_ is completed, the external SAS
program file is created with SELECT/WHEN statement
block generated by all the PUT statements inside the
DATA _NULL_
11
Step 2b: Technique #2 – The Robo-Coder
• DATA _NULL_ step code to write to external program
filename tmpwhen "C:\SESUG\tempwhenstatements.sas"; /* Makes a SAS program file */
data _null_;
set WORK.SESUG_DX_LISTING END=last;
format when_string $250.;
file tmpwhen; /* This is the file the PUT statements writes to */
/* This sets the length for the SUBSTR function in the WHEN statement */
str_length = put(length(DX_LIST),1.0);
/* Write first line which is just the SELECT statement */
if _n_ = 1 then put @1 'select;';
/* Write each WHEN statement for each DX with the corresponding category */
when_string = "when (substr(IDCD_ID,1,"||trim(str_length)||") =
'"||trim(DX_LIST)||"') DX_category = '"||trim(DX_CATEGORY)||"';";
put @5 when_string;
/* Finish writing the last two statements in the SELECT/WHEN statement block */
if last then do;
put @5 'OTHERWISE DX_category = "WHAT CATEGORY?!";';
put @1 'END;';
end;
run;
At _n_
== 1,
it statement
writes
the
SASanother
file these
two
lines
Filename
established
an line
external
SAS
filefile
with
_n_ last
2 and
so on,to
itwrites
writes
toto
the
SAS
like this
At
the
iteration,
it
these
two
lines
SAS
file
LENGTH
function to get the size of DX_LIST field.the
It changes
for
select;
when
(substr(IDCD_ID,1,4)
'3561')
OTHERWISE
DX_category = "WHAT=CATEGORY?!";
reference
name
”tmpwhen”.
whenstep
(substr(IDCD_ID,1,4)
= '0459')
each
DATA
iteration.
DX_category
= 'Muscle problems';
END;
DX_category = 'Muscle problems';
12
Step 2b: Technique #2 – The Robo-Coder
• Opening up the SAS program name
“tempwhenstatements.sas”, you will see this block of
statements
select;
when (substr(IDCD_ID,1,4) = '0459') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3561') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3590') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3591') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3592') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,5) = '72811') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '2860') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2861') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2862') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2863') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2864') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,3) = '424') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '425') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '426') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '427') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '428') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '429') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '745') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '746') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '747') DX_category = 'Heart disease';
OTHERWISE DX_category = "WHAT CATEGORY?!";
END;
Notice how the
length size
changes for the
SUBSTR function
13
Step 3: Run the auto-generated SAS codes
• Invoke the macro variable inside the WHERE statement
using the “&” sign or ampersand symbol
• Insert external SAS program using the %INCLUDE
statement. Note: this line, “options source2;” will
allow the external SAS code to display in the log.
/* Source2 option will allow the external SAS code to display in the log */
options source2;
data WORK.CLAIMS_SUBSET_DX_cat;
set WORK.SESUG_samp_claims;
/* The macro variable will put the WHERE conditions we created */
where (&IDCD_ID_where_statement);
format DX_category $150.;
/* Inserts the external SAS code that we created the WHEN statements */
%include tmpwhen;
Invoke macro
here
run;
Insert external SAS
code as referenced
with FILENAME
14
Step 3: What the auto-generated code looks like
data WORK.CLAIMS_SUBSET_DX_cat;
set WORK.SESUG_samp_claims;
where (IDCD_ID like "0459%" or IDCD_ID like "3561%" or IDCD_ID like "3590%"
or IDCD_ID like "3591%" or IDCD_ID like "3592%" or IDCD_ID like "72811%"
or IDCD_ID like "2860%" or IDCD_ID like "2861%" or IDCD_ID like "2862%"
or IDCD_ID like "2863%" or IDCD_ID like "2864%" or IDCD_ID like "424%"
or IDCD_ID like "425%" or IDCD_ID like "426%" or IDCD_ID like "427%"
or IDCD_ID like "428%" or IDCD_ID like "429%" or IDCD_ID like "745%"
or IDCD_ID like "746%" or IDCD_ID like "747%");
format DX_category $150.;
select;
when (substr(IDCD_ID,1,4) = '0459') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3561') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3590') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3591') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '3592') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,5) = '72811') DX_category = 'Muscle problems';
when (substr(IDCD_ID,1,4) = '2860') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2861') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2862') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2863') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,4) = '2864') DX_category = 'Hemophilia';
when (substr(IDCD_ID,1,3) = '424') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '425') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '426') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '427') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '428') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '429') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '745') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '746') DX_category = 'Heart disease';
when (substr(IDCD_ID,1,3) = '747') DX_category = 'Heart disease';
OTHERWISE DX_category = "WHAT CATEGORY?!";
END;
run;
It looks
exactly the
same as
slide 6 if I
hard coded
it manually.
15
Step 3: Result of the claims data extraction
• Here is the results of the claims data extraction as well as
the DX categories.
Claim_ID
DOS
IDCD_ID
DX_category
521959269
01MAR20132863
Hemophilia
620558593
01MAR20133591
Muscle problems
834446494
01MAR201342511
Heart disease
350650440
01MAR20132860
Hemophilia
999400510
04MAR20137467
Heart disease
295583464
04MAR201374510
Heart disease
294778600
04MAR20137452
Heart disease
725175228
06MAR20132860
Hemophilia
576261199
06MAR20134280
Heart disease
325277861
06MAR201335921
Muscle problems
702352846
07MAR20134254
Heart disease
575425830
07MAR20134281
Heart disease
740288358
07MAR20137464
Heart disease
245189238
08MAR20132860
Hemophilia
930080969
11MAR201342821
Heart disease
411880687
11MAR20133591
Muscle problems
931851740
14MAR20133591
Muscle problems
305813261
16MAR20133590
Muscle problems
16
Extra Example
• Suppose, we get data with the patients’ diagnoses. We
want to transpose the patients diagnosis to put the most
severe diagnoses in the first column based on diag scores
NOTE: Some
patient has
only one diag
while others
has four diags
(or maybe
more.)
DX 055 has a
higher diag score
than DX 058. So,
it will be first for
the first patient.
17
Extra Example
SAS code to sort each patient ID with diag_score in
descending sort before transposing the diags into columns.
/* Sort the diag data with the highest diag score on top for each
patient */
proc sort data=work.vasug_sample_diag;
by patient_id descending diag_score;
run;
/* Transpose the diag for each patient */
proc transpose data=work.vasug_sample_diag out=work.trans_diags
(drop= _name_ _label_) prefix=diag_;
var DIAG;
by patient_id;
run;
18
Extra Example
SAS code to sort each patient ID with diag_score in
descending sort before transposing the diags into columns.
/* Sort the diag data with the highest diag score on top for each
patient */
proc sort data=work.vasug_sample_diag;
by patient_id descending diag_score;
run;
/* Transpose the diag for each patient */
proc transpose data=work.vasug_sample_diag out=work.trans_diags
(drop= _name_ _label_) prefix=diag_;
var DIAG;
by patient_id;
run;
19
Extra Example
It was nice to transpose the diags but the care managers
don’t know who the patient is. We join with member
demographics but we don’t know how many diag columns.
So, use PROC CONTENTS (with OUT= and VARNUM
options) to see all the diag columns from the PROC
TRANSPOSE. Looks like at-most 5 diag columns
proc contents data=work.trans_diags out=work.trans_diag_var_list
varnum noprint;
run;
20
Extra Example
Take advantage of the output from PROC CONTENTS to
create the variable list for PROC SQL when joining to
member table. Use marco variable method (technique #1)
data _null_;
format string_diag_var_list $500.; /* Set it large enough */
set work.trans_diag_var_list end=last;
retain string_diag_var_list " ";
if _n_ = 2 then string_diag_var_list = NAME; /* This will be
the first one */
else string_diag_var_list = trim(string_diag_var_list)||",
"||trim(NAME);
if last then call
symput("diag_var_list",trim(string_diag_var_list));
run;
%put &DIAG_var_list; /* See what it looks like */
21
Extra Example
Take advantage of the output from PROC CONTENTS to
create the variable list for PROC SQL when joining to
member table. Use marco variable method (technique #1)
data _null_;
format string_diag_var_list $500.; /* Set it large enough */
set work.trans_diag_var_list end=last;
retain string_diag_var_list " ";
if _n_ = 2 then string_diag_var_list = NAME; /* This will be
the first one */
else string_diag_var_list = trim(string_diag_var_list)||",
"||trim(NAME);
if last then call
symput("diag_var_list",trim(string_diag_var_list));
run;
%put &DIAG_var_list; /* See what it looks like */
22
Extra Example
Put the macro variable into the PROC SQL
proc sql;
create table work.Final_member_ordered_DIAGS
as select distinct a.MEMBER_ID, a.MEMBER_NAME, a.AGE, a.GENDER,
&DIAG_var_list
from VASUG_sample_MEMBERS a left join work.trans_diags b
on a.member_id=b.patient_id;
quit;
PROC SQL after invoking the macro substitution
proc sql;
create table work.Final_member_ordered_DIAGS
as select distinct a.MEMBER_ID, a.MEMBER_NAME, a.AGE, a.GENDER,
DIAG_1, DIAG_2, DIAG_3, DIAG_4, DIAG_5
from VASUG_sample_MEMBERS a left join work.trans_diags b
on a.member_id=b.patient_id;
quit;
23
Conclusion
• These two techniques saved us a significant amount of
manual coding labor.
• This example may not seem like much coding but the
idea will be useful if we have a large number of repetitive
SAS statements to code in a dynamical process
• This idea is not limited to just making WHERE
statements or SELECT/WHEN statements.
• The Robo-Coder can be expanded to include other SAS
PROCs as well as dozen or hundreds of routine
weekly/monthly reports (See my VASUG Spring 2011
presentation on the PROC REPORT example)
24
Questions?
• Questions?
• SAS code and sample SAS data set is available. Send me
an email.
• Contact:
Robert Williams
WellPoint Inc.
4425 Corporation Ln.
Virginia Beach, VA 23462
Robert.Williams@Amerigroup.com
25
References
• SAS Certification Prep Guide: Base Programming for
SAS 9 Third Edition Cary, NC: SAS Institute Inc. 2011
• SAS Certification Prep Guide: Advanced Programming
for SAS 9 Third Edition Cary, NC: SAS Institute Inc.
2011
• The Web's Free 2013 Medical Coding Reference.
(http://www.icd9data.com/)
• Robo-SAS Coding? Yes! We can auto-generate SAS
codes from a SAS data set. (http://vasug.org/about2/past-presentations/)
26
Download