Nearest neighbor matching USING THE GREEDY MATCH MACRO Note: Much of the code originally was written by Lori Parsons http://www2.sas.com/proceedings/sugi26/p214-26.pdf This code has been written with simplicity as a primary concern. If you do not have a large number of controls, you may want to modify it /* Define the library for formats */ LIBNAME saslib "G:\oldpeople\sasdata\" ; OPTIONS NOFMTERR FMTSEARCH = (saslib) ; /* Define the library for study data */ LIBNAME study "C:\Users\AnnMaria\Documents\shrug\" ; Include the Macro %INCLUDE 'C:\Users\AnnMaria\Documents\shrug\nearest macro.sas' ; %propen (libname, dsname, idvariable, dependent, propensity) LIBNAME = directory for data sets DSNAME = dataset with study data IDVARIABLE = subject ID variable DEPENDENT = dependent variable PROPENSITY = propensity score produced in logistic regression %propen(study,allpropen,id,athome,p rob); FOR EXAMPLE Remember, we already have the study.allpropen dataset with the propensity score (prob) from the PROC LOGISTIC we just did Explaining the macro A Challenge %macro propen(lib,dsn,id,depend,prob); Data in5 ; set &lib..&dsn ; Creates a temporary data set Propensity scores rounded to 5, then 4, 2, 3 and 1 decimals %Do countr = 1 %to 5 ; %let digits = %eval(6 - &countr) ; %let roundto = %eval(10**&digits) ; %let roundto = %sysevalf(1/&roundto) ; %let nextin = %eval(&digits - 1) ; MACRO NOTES %Do countr = 1 %to 5 ; /* Starts %DO loop */ Use %EVAL function to do integer arithmetic %let digits = %eval(6 - &countr) ; Use %SYSEVALF function to do non-integers /* Output control to one data set, intervention to another */ /* Create random number to sort within group */ Create 2 data sets DATA yes1 (KEEP= &prob id_y depend_y randnum) no1 (KEEP = &prob id_n depend_n randnum ) ; SET in&digits ; We go through this loop 5 times and create data sets of records matching to 5, 4, 3, 2 and 1 decimal places We only keep four variables Assignment statements randnum = RANUNI(0) ; &prob = ROUND(&prob,&roundto) ; Create a random number and Round propensity score to a set number of digits Output to Case Data set … IF &depend = 1 THEN DO ; id_y = &id ; depend_y = &depend ; OUTPUT yes1 ; END ; We need to rename the dependent & id variables or they’ll get overwritten … Or output control data set ELSE IF &depend = 0 THEN DO ; id_n = &id ; depend_n = &depend ; OUTPUT no1 ; END ; Notice the data sets were named no1 and yes1 It becomes evident why shortly /* Runs through control and experimental and matches up to 20 subjects with identical propensity score */ %Do i = 1 %to 20 ; %let j = %eval(&i +1) ; proc sort data = yes&i ; by &prob randnum ; data yes&i yes&j ; set yes&i ; by &prob ; if first.&prob then output yes&i ; else output yes&j ; NOTE: Matching without replacement Same thing for controls proc sort data = no&i ; by &prob randnum ; data no&i no&j ; set no&i ; by &prob ; if first.&prob then output no&i ; else output no&j ; The randnum insures matching scores are pulled at random Merge matches, end loop DATA match&i ; MERGE yes&i (in= ina) no&i (in= inb) ; BY &prob ; IF ina AND inb ; run ; %END ; /* Adds all matches into a single data set */ DATA allmatches ; SET %DO k = 1 %TO 20 ; match&k %END ; Concatenate all data sets with matches (N=20) Create two data sets with IDs DATA allyes (RENAME = (id_y = &id depend_y = &depend)) allno (RENAME = (id_n = &id depend_n = &depend)); SET allmatches ; Create one file of all matched IDs DATA matchfile ; SET allyes allno ; And sort it … proc sort data = matchfile ; by &id &depend ; proc sort data = in&digits ; by &id &depend ; /* Creates a data set of all subjects with n-digit match */ /* Creates a second data set of subjects with no match */ DATA MATCHES&DIGITS IN&NEXTIN ; MERGE IN&DIGITS (IN = INA) MATCHFILE (IN= INB) ; BY &ID &DEPEND ; IF INA AND INB THEN OUTPUT MATCHES&DIGITS ; ELSE OUTPUT IN&NEXTIN ; JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS THROUGH TITLE "MATCHES &ROUNDTO " ; PROC FREQ DATA = MATCHES&DIGITS ; TABLES &DEPEND ; RUN ; %END ; End loop. Now match to 4 decimal places, etc /* Adds 1- to 5-digit matches into a single data set */ data &lib..finalset ; set %do m = 1 %to 5 ; matches&m %end ; One final check & done ! Title "Distribution of Dependent Variable in &lib..finalset " ; proc freq data = &lib..finalset ; tables &depend ; run; %mend propen; run ; Did it work? Variable QUINTILES NEAREST NEIGHBOR AT Home NOT Home Prob AT Home NOT Home Prob Age 79.2 79.3 .60 79.1 79.1 .76 ER visits 4.5 **** 3.8 **** .0001 4.2 4.2 .88 Female 52% .36 50% 50% .74 Race 54% .97 ** P <.01 **** P < .0001 .67 Model Comparison TEST Likelihood Ratio Score Wald Without Matching Quintile Matching 643.1 180.8 582.4 485.6 176.0 165.7 Nearest Neighbor 186.6 181.4 170.4 Odds ratio No Match Quintiles Nearest Neighbor .154 .281 .269 6.5 : 1 3.6: 1 3.7 : 1 How near? Decimals 5 4 3 2 1 # Matches 902 14 143 101 38