Let SAS Do the Coding for You! Robert Williams Business Info Analyst Sr. WellPoint Inc. Overview • Coding Dilemma: Direct coding or “Robo-Coding”? • Step 1: Setting up SAS Data Set for “Robo-Coding” • Step 2a: Technique #1 – Macro variable • Generate SAS code using DATA _NULL_ and CALL SYMPUT to a macro • Step 2b: Technique #2 – External SAS code file • Generate SAS code using DATA _NULL_ and PUT statements to write to an external SAS program file • Step 3: Run the auto-generated SAS codes • Extra Example and Conclusion 2 Coding Dilemma: Claims Data Extract Request • Example data extraction asking for a subset of claims based on the following diagnosis and the corresponding diagnosis categories • Muscle problems that are of a disabling nature, limited to: 356.1, 359.0-359.2, 045.9, 728.11 • Hemophilia: 286.0-286.4 • Acquired or congenital heart disease: 745.0-747.9; 424.0-429.9 3 Coding Dilemma: 1-digit & 2-digit DX sub-codes • Diagnosis (DX) codes can be cumbersome to code into the WHERE statement due to having to account for each 1-digit and 2-digit sub-codes. • Example: To account for all medical DX codes from 359.0 to 359.2 (www.icd9data.com) 4 Coding Dilemma: Write the code the hard way data WORK.CLAIMS_SUBSET_DX_cat; set WORK.SESUG_samp_claims; where (IDCD_ID like "0459%" or IDCD_ID like "3561%" or IDCD_ID like "3590%" or IDCD_ID like "3591%" or IDCD_ID like "3592%" or IDCD_ID like "72811%" or IDCD_ID like "2860%" or IDCD_ID like "2861%" or IDCD_ID like "2862%" or IDCD_ID like "2863%" or IDCD_ID like "2864%" or IDCD_ID like "424%" or IDCD_ID like "425%" or IDCD_ID like "426%" or IDCD_ID like "427%" or IDCD_ID like "428%" or IDCD_ID like "429%" or IDCD_ID like "745%" or IDCD_ID like "746%" or IDCD_ID like "747%"); format DX_category $150.; select; when (substr(IDCD_ID,1,4) = '0459') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3561') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3590') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3591') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3592') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,5) = '72811') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '2860') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2861') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2862') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2863') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2864') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,3) = '424') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '425') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '426') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '427') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '428') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '429') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '745') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '746') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '747') DX_category = 'Heart disease'; OTHERWISE DX_category = "WHAT CATEGORY?!"; END; run; Code all the LIKE statements with % Code all the WHEN statements with different sizes for SUBSTR function 5 Step 1: Setting up SAS Data File • Create a SAS data set with key data. It can be created in a DATA step or imported from Excel. DX codes for WHERE statements and WHEN statements (without decimals) DX_list DX_CATEGORY 0459 Muscle problems 3561 Muscle problems 3590 Muscle problems 3591 Muscle problems 3592 Muscle problems 72811 Muscle problems 2860 Hemophilia 2861 Hemophilia 2862 Hemophilia 2863 Hemophilia 2864 Hemophilia 424 Heart disease 425 Heart disease 426 Heart disease 427 Heart disease 428 Heart disease 429 Heart disease 745 Heart disease 746 Heart disease 747 Heart disease Categories for the WHEN statements Muscle problems: 356.1, 359.0-359.2, 045.9, 728.11 Hemophilia: 286.0-286.4 Heart disease: 745.0-747.9; 424.0-429.9 6 Step 2a: Technique #1 – Macro Variable • Using the SAS data set from previous slide (Step 1), this technique uses DATA _NULL_ to generate a data string for the WHERE statement like this “WHERE IDCD_ID like '747%' or IDCD_ID like '0459%' or …” • Uses the RETAIN statement so the data string is retained for each DATA step iteration • Uses the END statement and CALL SYMPUT function to output the data string to a macro variable. The CALL SYMPUT is invoked at the last iteration of the DATA step 7 Step 2a: Technique #1 – The Robo-Coder • DATA _NULL_ step code to generate the data string data _null_; set WORK.SESUG_DX_LISTING END=last; format where_string $5000.; retain where_string; /* This is the first line for the where statement */ if _n_ = 1 then where_string = 'IDCD_ID like "'||trim(DX_LIST)||'%"'; /* subsequent lines that adds the "OR" line for the where statement */ else where_string = trim(where_string)||' '||'OR IDCD_ID like "'||trim(DX_LIST)||'%"'; /* At the end, this outputs entire where statement to the macro variable */ if last then call symput("IDCD_ID_where_statement",where_string); run; At _n_=1, start the string. Then At _n_>1, “OR…” is added to the data string At _n_ = 1, where_string = IDCD_ID like "0459%" At the last iteration, the DATA _NULL_ outputs the where_string value as a SAS macro variable named: “IDCD_ID_WHERE_STATEMENT" At _n_ = 2, where_string = IDCD_ID like "0459%" OR IDCD_ID like "3561%" 8 Step 2a: Technique #1 – The Robo-Coder • DATA _NULL_ step code to generate the data string data _null_; set WORK.SESUG_DX_LISTING END=last; format where_string $5000.; retain where_string; /* This is the first line for the where statement */ if _n_ = 1 then where_string = 'IDCD_ID like "'||trim(DX_LIST)||'%"'; /* subsequent lines that adds the "OR" line for the where statement */ else where_string = trim(where_string)||' '||'OR IDCD_ID like "'||trim(DX_LIST)||'%"'; /* At the end, this outputs entire where statement to the macro variable */ if last then call symput("IDCD_ID_where_statement",where_string); run; _n_ = 1 starts the string, after that, the “OR…” is added to the data string • Using the %PUT statement, the macro variable will look like this in the SAS log %put &IDCD_ID_where_statement; IDCD_ID like "0459%" or IDCD_ID like "3561%" or IDCD_ID like "3590%" or IDCD_ID like "3591%" or IDCD_ID like "3592%" or IDCD_ID like "72811%" or IDCD_ID like "2860%" or IDCD_ID like "2861%" or IDCD_ID like "2862%" or IDCD_ID like "2863%" or IDCD_ID like "2864%" or IDCD_ID like "424%" or IDCD_ID like "425%" or IDCD_ID like "426%" or IDCD_ID like "427%" or IDCD_ID like "428%" or IDCD_ID like "429%" or IDCD_ID like "745%" or IDCD_ID like "746%" or IDCD_ID like "747%" 9 Step 2b: Technique #2 – External Program File • Using the SAS data set from Step 1, this technique uses DATA _NULL_ to write the series of SELECT/WHEN statements to an external SAS program file using the PUT statements • Uses FILENAME statement to establish external SAS file reference i.e. “tmpwhen” to be used with FILE statement • Uses LENGTH function to determine the size of the DX_LIST field of the SAS data set (from step 1) i.e. “str_length = put(length(DX_LIST),1.0);” • Each DATA step iteration writes the each WHEN statement for the DX_ Category 10 Step 2b: Technique #2 – External Program File • Uses the END statement (inside the SET statement) to write the final the final two lines of the series of SELECT/WHEN statements i.e. “OTHERWISE DX_category="WHAT CATEGORY?!"; END;” • When the DATA _NULL_ is completed, the external SAS program file is created with SELECT/WHEN statement block generated by all the PUT statements inside the DATA _NULL_ 11 Step 2b: Technique #2 – The Robo-Coder • DATA _NULL_ step code to write to external program filename tmpwhen "C:\SESUG\tempwhenstatements.sas"; /* Makes a SAS program file */ data _null_; set WORK.SESUG_DX_LISTING END=last; format when_string $250.; file tmpwhen; /* This is the file the PUT statements writes to */ /* This sets the length for the SUBSTR function in the WHEN statement */ str_length = put(length(DX_LIST),1.0); /* Write first line which is just the SELECT statement */ if _n_ = 1 then put @1 'select;'; /* Write each WHEN statement for each DX with the corresponding category */ when_string = "when (substr(IDCD_ID,1,"||trim(str_length)||") = '"||trim(DX_LIST)||"') DX_category = '"||trim(DX_CATEGORY)||"';"; put @5 when_string; /* Finish writing the last two statements in the SELECT/WHEN statement block */ if last then do; put @5 'OTHERWISE DX_category = "WHAT CATEGORY?!";'; put @1 'END;'; end; run; At _n_ == 1, it statement writes the SASanother file these two lines Filename established an line external SAS filefile with _n_ last 2 and so on,to itwrites writes toto the SAS like this At the iteration, it these two lines SAS file LENGTH function to get the size of DX_LIST field.the It changes for select; when (substr(IDCD_ID,1,4) '3561') OTHERWISE DX_category = "WHAT=CATEGORY?!"; reference name ”tmpwhen”. whenstep (substr(IDCD_ID,1,4) = '0459') each DATA iteration. DX_category = 'Muscle problems'; END; DX_category = 'Muscle problems'; 12 Step 2b: Technique #2 – The Robo-Coder • Opening up the SAS program name “tempwhenstatements.sas”, you will see this block of statements select; when (substr(IDCD_ID,1,4) = '0459') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3561') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3590') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3591') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3592') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,5) = '72811') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '2860') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2861') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2862') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2863') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2864') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,3) = '424') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '425') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '426') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '427') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '428') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '429') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '745') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '746') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '747') DX_category = 'Heart disease'; OTHERWISE DX_category = "WHAT CATEGORY?!"; END; Notice how the length size changes for the SUBSTR function 13 Step 3: Run the auto-generated SAS codes • Invoke the macro variable inside the WHERE statement using the “&” sign or ampersand symbol • Insert external SAS program using the %INCLUDE statement. Note: this line, “options source2;” will allow the external SAS code to display in the log. /* Source2 option will allow the external SAS code to display in the log */ options source2; data WORK.CLAIMS_SUBSET_DX_cat; set WORK.SESUG_samp_claims; /* The macro variable will put the WHERE conditions we created */ where (&IDCD_ID_where_statement); format DX_category $150.; /* Inserts the external SAS code that we created the WHEN statements */ %include tmpwhen; Invoke macro here run; Insert external SAS code as referenced with FILENAME 14 Step 3: What the auto-generated code looks like data WORK.CLAIMS_SUBSET_DX_cat; set WORK.SESUG_samp_claims; where (IDCD_ID like "0459%" or IDCD_ID like "3561%" or IDCD_ID like "3590%" or IDCD_ID like "3591%" or IDCD_ID like "3592%" or IDCD_ID like "72811%" or IDCD_ID like "2860%" or IDCD_ID like "2861%" or IDCD_ID like "2862%" or IDCD_ID like "2863%" or IDCD_ID like "2864%" or IDCD_ID like "424%" or IDCD_ID like "425%" or IDCD_ID like "426%" or IDCD_ID like "427%" or IDCD_ID like "428%" or IDCD_ID like "429%" or IDCD_ID like "745%" or IDCD_ID like "746%" or IDCD_ID like "747%"); format DX_category $150.; select; when (substr(IDCD_ID,1,4) = '0459') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3561') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3590') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3591') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '3592') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,5) = '72811') DX_category = 'Muscle problems'; when (substr(IDCD_ID,1,4) = '2860') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2861') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2862') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2863') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,4) = '2864') DX_category = 'Hemophilia'; when (substr(IDCD_ID,1,3) = '424') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '425') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '426') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '427') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '428') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '429') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '745') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '746') DX_category = 'Heart disease'; when (substr(IDCD_ID,1,3) = '747') DX_category = 'Heart disease'; OTHERWISE DX_category = "WHAT CATEGORY?!"; END; run; It looks exactly the same as slide 6 if I hard coded it manually. 15 Step 3: Result of the claims data extraction • Here is the results of the claims data extraction as well as the DX categories. Claim_ID DOS IDCD_ID DX_category 521959269 01MAR20132863 Hemophilia 620558593 01MAR20133591 Muscle problems 834446494 01MAR201342511 Heart disease 350650440 01MAR20132860 Hemophilia 999400510 04MAR20137467 Heart disease 295583464 04MAR201374510 Heart disease 294778600 04MAR20137452 Heart disease 725175228 06MAR20132860 Hemophilia 576261199 06MAR20134280 Heart disease 325277861 06MAR201335921 Muscle problems 702352846 07MAR20134254 Heart disease 575425830 07MAR20134281 Heart disease 740288358 07MAR20137464 Heart disease 245189238 08MAR20132860 Hemophilia 930080969 11MAR201342821 Heart disease 411880687 11MAR20133591 Muscle problems 931851740 14MAR20133591 Muscle problems 305813261 16MAR20133590 Muscle problems 16 Extra Example • Suppose, we get data with the patients’ diagnoses. We want to transpose the patients diagnosis to put the most severe diagnoses in the first column based on diag scores NOTE: Some patient has only one diag while others has four diags (or maybe more.) DX 055 has a higher diag score than DX 058. So, it will be first for the first patient. 17 Extra Example SAS code to sort each patient ID with diag_score in descending sort before transposing the diags into columns. /* Sort the diag data with the highest diag score on top for each patient */ proc sort data=work.vasug_sample_diag; by patient_id descending diag_score; run; /* Transpose the diag for each patient */ proc transpose data=work.vasug_sample_diag out=work.trans_diags (drop= _name_ _label_) prefix=diag_; var DIAG; by patient_id; run; 18 Extra Example SAS code to sort each patient ID with diag_score in descending sort before transposing the diags into columns. /* Sort the diag data with the highest diag score on top for each patient */ proc sort data=work.vasug_sample_diag; by patient_id descending diag_score; run; /* Transpose the diag for each patient */ proc transpose data=work.vasug_sample_diag out=work.trans_diags (drop= _name_ _label_) prefix=diag_; var DIAG; by patient_id; run; 19 Extra Example It was nice to transpose the diags but the care managers don’t know who the patient is. We join with member demographics but we don’t know how many diag columns. So, use PROC CONTENTS (with OUT= and VARNUM options) to see all the diag columns from the PROC TRANSPOSE. Looks like at-most 5 diag columns proc contents data=work.trans_diags out=work.trans_diag_var_list varnum noprint; run; 20 Extra Example Take advantage of the output from PROC CONTENTS to create the variable list for PROC SQL when joining to member table. Use marco variable method (technique #1) data _null_; format string_diag_var_list $500.; /* Set it large enough */ set work.trans_diag_var_list end=last; retain string_diag_var_list " "; if _n_ = 2 then string_diag_var_list = NAME; /* This will be the first one */ else string_diag_var_list = trim(string_diag_var_list)||", "||trim(NAME); if last then call symput("diag_var_list",trim(string_diag_var_list)); run; %put &DIAG_var_list; /* See what it looks like */ 21 Extra Example Take advantage of the output from PROC CONTENTS to create the variable list for PROC SQL when joining to member table. Use marco variable method (technique #1) data _null_; format string_diag_var_list $500.; /* Set it large enough */ set work.trans_diag_var_list end=last; retain string_diag_var_list " "; if _n_ = 2 then string_diag_var_list = NAME; /* This will be the first one */ else string_diag_var_list = trim(string_diag_var_list)||", "||trim(NAME); if last then call symput("diag_var_list",trim(string_diag_var_list)); run; %put &DIAG_var_list; /* See what it looks like */ 22 Extra Example Put the macro variable into the PROC SQL proc sql; create table work.Final_member_ordered_DIAGS as select distinct a.MEMBER_ID, a.MEMBER_NAME, a.AGE, a.GENDER, &DIAG_var_list from VASUG_sample_MEMBERS a left join work.trans_diags b on a.member_id=b.patient_id; quit; PROC SQL after invoking the macro substitution proc sql; create table work.Final_member_ordered_DIAGS as select distinct a.MEMBER_ID, a.MEMBER_NAME, a.AGE, a.GENDER, DIAG_1, DIAG_2, DIAG_3, DIAG_4, DIAG_5 from VASUG_sample_MEMBERS a left join work.trans_diags b on a.member_id=b.patient_id; quit; 23 Conclusion • These two techniques saved us a significant amount of manual coding labor. • This example may not seem like much coding but the idea will be useful if we have a large number of repetitive SAS statements to code in a dynamical process • This idea is not limited to just making WHERE statements or SELECT/WHEN statements. • The Robo-Coder can be expanded to include other SAS PROCs as well as dozen or hundreds of routine weekly/monthly reports (See my VASUG Spring 2011 presentation on the PROC REPORT example) 24 Questions? • Questions? • SAS code and sample SAS data set is available. Send me an email. • Contact: Robert Williams WellPoint Inc. 4425 Corporation Ln. Virginia Beach, VA 23462 Robert.Williams@Amerigroup.com 25 References • SAS Certification Prep Guide: Base Programming for SAS 9 Third Edition Cary, NC: SAS Institute Inc. 2011 • SAS Certification Prep Guide: Advanced Programming for SAS 9 Third Edition Cary, NC: SAS Institute Inc. 2011 • The Web's Free 2013 Medical Coding Reference. (http://www.icd9data.com/) • Robo-SAS Coding? Yes! We can auto-generate SAS codes from a SAS data set. (http://vasug.org/about2/past-presentations/) 26