psm2 - SAS Halifax Regional User Group

advertisement
Nearest neighbor matching
USING THE GREEDY MATCH MACRO
Note: Much of the code originally was written by Lori Parsons
http://www2.sas.com/proceedings/sugi26/p214-26.pdf
This code has been written with simplicity as a primary concern.
If you do not have a large number of controls, you may want to modify it
/* Define the library for formats */
LIBNAME saslib "G:\oldpeople\sasdata\" ;
OPTIONS NOFMTERR FMTSEARCH =
(saslib) ;
/* Define the library for study data */
LIBNAME study
"C:\Users\AnnMaria\Documents\shrug\" ;
Include the Macro
%INCLUDE
'C:\Users\AnnMaria\Documents\shrug\nearest
macro.sas' ;
%propen
(libname, dsname, idvariable,
dependent, propensity)
LIBNAME = directory for data sets
DSNAME = dataset with study data
IDVARIABLE = subject ID variable
DEPENDENT = dependent variable
PROPENSITY = propensity score produced in logistic
regression
%propen(study,allpropen,id,athome,p
rob);
FOR EXAMPLE
Remember, we already have the study.allpropen dataset
with the propensity score (prob) from the PROC
LOGISTIC we just did
Explaining the macro
A
Challenge
%macro
propen(lib,dsn,id,depend,prob);
Data in5 ;
set &lib..&dsn ;
Creates a temporary data set
Propensity scores rounded
to 5, then 4, 2, 3 and 1 decimals
%Do countr = 1 %to 5 ;
%let digits = %eval(6 - &countr) ;
%let roundto = %eval(10**&digits) ;
%let roundto = %sysevalf(1/&roundto) ;
%let nextin = %eval(&digits - 1) ;
MACRO NOTES
%Do countr = 1 %to 5 ;
/* Starts %DO loop */
Use %EVAL function to do integer arithmetic
%let digits = %eval(6 - &countr) ;
Use %SYSEVALF function to do non-integers
/* Output control to one data
set, intervention to another */
/* Create random number to
sort within group
*/
Create 2 data sets
DATA yes1 (KEEP= &prob id_y depend_y
randnum)
no1 (KEEP = &prob id_n depend_n
randnum ) ;
SET in&digits ;
We go through this loop 5 times and create data sets of records
matching to 5, 4, 3, 2 and 1 decimal places
We only keep four variables
Assignment statements
randnum = RANUNI(0) ;
&prob = ROUND(&prob,&roundto) ;
Create a random number and
Round propensity score to a set
number of digits
Output to Case Data set …
IF &depend = 1 THEN DO ;
id_y = &id ;
depend_y = &depend ;
OUTPUT yes1 ;
END ;
We need to rename the dependent & id
variables or they’ll get overwritten
… Or output control data set
ELSE IF &depend = 0 THEN DO ;
id_n = &id ;
depend_n = &depend ;
OUTPUT no1 ;
END ;
Notice the data sets were named no1 and yes1
It becomes evident why shortly
/* Runs through control and
experimental and matches
up to 20 subjects with identical
propensity score */
%Do i = 1 %to 20 ;
%let j = %eval(&i +1) ;
proc sort data = yes&i ;
by &prob randnum ;
data yes&i yes&j ;
set yes&i ;
by &prob ;
if first.&prob then output yes&i ;
else output yes&j ;
NOTE: Matching without replacement
Same thing for controls
proc sort data = no&i ;
by &prob randnum ;
data no&i no&j ;
set no&i ;
by &prob ;
if first.&prob then output no&i ;
else output no&j ;
The randnum insures matching scores are pulled at random
Merge matches, end loop
DATA match&i ;
MERGE yes&i (in= ina) no&i (in= inb) ;
BY &prob ;
IF ina AND inb ;
run ;
%END ;
/* Adds all matches into a
single data set */
DATA allmatches ;
SET
%DO k = 1 %TO 20 ;
match&k
%END ;
Concatenate all data sets with matches (N=20)
Create two data sets with IDs
DATA
allyes (RENAME = (id_y = &id depend_y =
&depend))
allno (RENAME = (id_n = &id depend_n =
&depend));
SET allmatches ;
Create one file of all matched IDs
DATA matchfile ;
SET allyes allno ;
And sort it …
proc sort data = matchfile ;
by &id &depend ;
proc sort data = in&digits ;
by &id &depend ;
/* Creates a data set of all subjects with n-digit match */
/* Creates a second data set of subjects with no match */
DATA MATCHES&DIGITS IN&NEXTIN ;
MERGE IN&DIGITS (IN = INA)
MATCHFILE (IN= INB) ;
BY &ID &DEPEND ;
IF INA AND INB THEN OUTPUT
MATCHES&DIGITS ;
ELSE OUTPUT IN&NEXTIN ;
JUST A GOOD HABIT TO CHECK AS THE LOOP RUNS
THROUGH
TITLE "MATCHES &ROUNDTO " ;
PROC FREQ DATA =
MATCHES&DIGITS ;
TABLES &DEPEND ;
RUN ;
%END ;
End loop. Now match to 4 decimal places, etc
/* Adds 1- to 5-digit matches
into a single data set */
data &lib..finalset ;
set
%do m = 1 %to 5 ;
matches&m
%end ;
One final check & done !
Title "Distribution of Dependent Variable in
&lib..finalset " ;
proc freq data = &lib..finalset ;
tables &depend ;
run;
%mend propen;
run ;
Did it work?
Variable
QUINTILES
NEAREST
NEIGHBOR
AT
Home
NOT
Home
Prob
AT Home
NOT
Home
Prob
Age
79.2
79.3
.60
79.1
79.1
.76
ER visits
4.5 **** 3.8 ****
.0001
4.2
4.2
.88
Female
52%
.36
50%
50%
.74
Race
54%
.97
** P <.01 **** P < .0001
.67
Model Comparison
TEST
Likelihood
Ratio
Score
Wald
Without
Matching
Quintile
Matching
643.1
180.8
582.4
485.6
176.0
165.7
Nearest
Neighbor
186.6
181.4
170.4
Odds ratio
No Match
Quintiles
Nearest
Neighbor
.154
.281
.269
6.5 : 1
3.6: 1
3.7 : 1
How near?
Decimals
5
4
3
2
1
# Matches
902
14
143
101
38
Download