Using Proc IML

advertisement
Using Proc IML
Statistical Computing
Spring 2014
What is IML?

SAS vs R



Proc IML





SAS: procedures (PROCs) and datasets
R: functions/operations and matrices/vectors
IML = Interactive Matrix Language
R-like programming inside of SAS
Pros: more flexible
Cons: programs are not validated
Applications




Simulate data
Matrix algebra (e.g. contrasts, algorithms)
Many things you could normally only do in R
Graphics
The Matrix

A matrix is a collection of numbers ordered by rows
and columns.


Matrices are characterized by the number of rows and
columns
The elements in a matrix are referred to first by their row
then column
 x11
X  
 x21
x12 

x22 
Special Matrices

A 1 x 1 matrix is also known as a scalar
X  x11 

r x 1 or 1 x c matrices are known as vectors
X  x11

x12 
 x11 
X   
 x21 
A diagonal matrix is a square matrix where the offdiagonal elements are zero

An identity matrix is a diagonal matrix where the diagonal
elements are 1. These are also denoted by Ic, where c is the
dimension of the matrix
1 0
 x11 0 


X  
I 2  
 0 x22 
0 1
Creating Matrices in IML
PROC IML;
A
= 1;
/* CREATE A SCALAR*/
B
= {1 2 3}; /* CREATE A ROW VECTOR OF
LENGTH 3*/
C
= { 4,
5,
6}; /* CREATE A COLUMN VECTOR OF
LENGTH 3*/
D
={
1 2,
3 4,
5 .}; /* CREATE A 3 BY 2 MATRIX WHERE
THE 3,2 ELEMENT IS
MISSING*/
PRINT A B C D; /* DISPLAY THE MATRICES IN THE
OUTPUT*/
QUIT;
*Can assign characters instead of numbers but matrix algebra won’t work
Manipulating Matrices

Using brackets inside the specification allows you to
request repeats



Select a single element



A={1 2, 3 4}
To select the number 3: A2=A[2,1]
Select a row or column



A={ [2] ‘Yes’, [2] ‘No’} is equivalent to A={‘Yes’ ‘Yes’, ‘No’ ‘No’}
SAS: {[# Repeats] Value}, R: rep(value, number of times)
To select the first row: A3=A[1, ]
To select the first column: A4=A[ ,1]
Select a submatrix


B={1 2 0 0, 3 4 00}
To select the A matrix from within B:

A_new=B[1:2,1:2] or B[,{1 2} ]
Manipulating Matrices (cont.)

To define row and column labels, first create a vector with
the labels



PRINT B[rowname=name label vector]
Can also use colname, format, and labels in this way
To permanently assign use mattrib matrix rowname= colname=


Selecting elements with logical arguments



This then allows you to index using the matrix attributes (e.g.
A[“True”,])
Instead of listing the specific elements use a logical argument
A=[1 2 3 4], B=A[loc(A>2)]=[3 4]
Replace elements

Option 1: reassign specific elements


A[2]=7 will yield A=[1 7 3 4]
Option 2: reassign by a rule

A[loc(A>2)]=0 will yield A=[1 2 0 0]
Manipulating Matrices in IML
PROC IML;
REPEAT_O1={[2]"YES" [2] "NO"}; /*USING THE REPEAT FUNCTION TO FILL THE MATRIX*/
REPEAT_O2={"YES" "YES" "NO" "NO"}; /* REPEATING ELEMENTS MANUALLY*/
PRINT REPEAT_O1 REPEAT_O2;
A={1 2,
3 4}; /* DEFINE MATRIX*/
A1=A[2,1];
/* SELECT THE ELEMENT IN THE 2ND ROW, FIRST COLUMN: A1 SOULD
EQUAL 3 */
A2=A[1,]; /* SELECT THE FIRST ROW, A2 SHOULD EQUAL A 2 X 1 VECTOR {1 2} */
A3=A[,1]; /* SELECT THE FIRST COLUMN, A3 SHOULD EQUAL A 1 X 2 VECTOR {1,3} */
B={1 2 0 0, 3 4 0 0}; /* DEFINE A MATRIX B, WITH TWO SUBMATRICES A AND A 2 X 2
NULL MATRIX*/
A_NEW=B[1:2,1:2]; /* RECOVER THE A MATRIX FROM B */
A_NEW2=B[,{1 2}]; /*RECOVER THE A MATRIX FROM B, ANOTHER WAY TO WRITE IT*/
C_ROWNM={M F}; /* SET ROW NAMES FOR MATRIX C*/
C_COLNM={TRUE FALSE}; /* SET COL NAMES FOR MATRIX C*/
C={10 25,9 18};
PRINT A A1 A2 A3 B A_NEW
C[ROWNAME=C_ROWNM COLNAME=C_COLNM FORMAT=6.1 LABEL="MY MATRIX"]
/*MODIFYING PRINTED OUTPUT FOR MATRIX C*/;
Manipulating Matrices in IML
C_NEW=C; /* CREATING A DUPLICATE MATRIX*/
MATTRIB C_NEW ROWNAME=C_ROWNM COLNAME=C_COLNM
FORMAT=6.1 LABEL="MY MATRIX"; /* PERMANANTLY
CHANGING OUTPUT FORMAT*/
PRINT C C_NEW; /* COMPARING DIFFERENT APPROACHES*/
D=A[LOC(A>1)];/* SELECTING ONLY ELEMENTS THAT MEET
RULE, NOTE THAT MATRIX STRUCTURE NOT RETAINED*/
PRINT A D;
E_TEMP=A; /* CREATING A DUPLICATE MATRIX*/
E_TEMP[1,1]=25 /* CHANGING A SINGLE ELEMENT*/
PRINT E_TEMP;
E_TEMP[LOC(E_TEMP>5)]=.; /* SETTING ALL ELEMENTS
MEETING RULE TO MISSING*/
PRINT E_TEMP;
QUIT;
Creating Special Matrices

Identity Matrix


Dummy Matrix



I(r): Identity matrix of size r
j(nrow,ncol,x)
nrow= number of rows, ncol=number of columns, x =fill value
Diagonal matrix



diag(vector)
diag(matrix)
Note you can also accomplish this by using a Kroeneker
product ( @ ) for multiplying the desired matrix by an identity
matrix
Creating Special Matrices

Block diagonal matrix


Repeat(matrix,nrow,ncol)


repeats the specified matrix for the number of rows and columns
given
Shape(vector,nrow,ncol)


Block(M1, M2, …)
Repeats the given vector row-wise for the number of rows and
number of columns given. Note that the number of cells to repeat
must be a multiple of the vector length
Generate a sequence


Do(start,finish, by) creates a vector using the specified skip pattern.
For example do(-1,0,0.5) would return [-1 -0.5 0].
In R you can use seq(start, finish,by)
Brief Introduction to Matrix Algebra
Matrix Addition and Subtraction


To add or subtract two matrices, they both must
have the same number of rows and columns.
The addition or subtraction is element wise
R  A  B  rij  aij  biji, j

Example:
 1 3   5 2    4 5

  
  

 2 5  7 0   9 5
Matrix Multiplication and Division

Scalar by Matrix multiplication and division is an
element wise operation and commutative.
R  aB  Ba  rij  abij

Multiplication of vectors and matrices



Not commutative (AB ≠ BA)
Requires that the number of columns in A equals the
number of rows in B
The resulting matrix R will have dimension equal to rows of
A and columns of B
Ar , x  Bx,c  Rr ,c
Multiplication and Division (cont.)
x
Ri j  Ai x  Bx j , where rij   aihbhj
h 1
 2 3
 1 6
, B  

A  
 4 5
 2 0
 2 1  3  2 2  6  3  0  8 12 
  

AB  
 4 1  5  2 4  6  5  0 14 24
 26 33

BA  
4 6
Special Properties

Transpose: A’= (aji)
1 2


1 3 5 

A   3 4 , A'  
 2 4 6
5 6



Inverse (indicated with -1 superscript): the inverse of a
number is that number which, when multiplied by the
original number, gives a product of 1

Must be a square matrix
1
1
AA  A A  I
IML Commands for Special Matrices
Function
IML Code
Transpose
`
Determinant
Det(matrix)
Inverse
Inv(matrix)
Trace
Tr(matrix)
Matrix Algebra in IML
Matrix Operators: Arithmetic
Operation
IML Code
Addition
+
Subtraction
-
Division, element wise
/
Multiplication, element wise
#
Multiplication, matrix
*
Power, element wise
##
Power, Matrix
**
Matrix Algebra in IML
PROC IML;
*MATRIX ADDITION;
A={1 3, 2 5}; /*DEFINE MATRIX*/
B={-5 2, 7 0}; /*DEFINE MATRIX*/
C=A+B; /* ADD A AND B*/
PRINT C;
*MATRIX MULTIPLICATION;
A={2 3,4 5}; /*DEFINE MATRIX*/
B={1 6,2 0}; /*DEFINE MATRIX*/
AB=A*B; /*MULTIPLY A BY B*/
BA=B*A; /* MULTIPLY B BY A*/
PRINT A B AB BA; /* NOTE THAT MULTIPLICATION
IS NOT COMMUTATIVE, AB DOESN'T
EQUAL BA*/
QUIT;
Matrix Operators: Comparison


Element wise comparison of matrices, result is a
matrix of 0(False) and 1 (True)
Comparisons




Less than (<), less than or equal to (<=)
Greater than (>), greater than or equal to (>=)
Equal to (=), Not equal to (^=)
Can create compound arguments using logical
functions



And (&)
Or ( |)
Not ( ^)
Solving Systems of Equations

Solve the following system of equations
3x  2 y  4 z  11
5x  4 y  9
3 y  10z  42

When the problem is rewritten in terms of a matrix
3 2  4  x  11
5  4 0    y    9 

    
0 3 10   z  42
Solving Systems of Equations (cont)

To solve, we can
rearrange
AX  B  X  A1 B
1
 x  3 2  4 11
 y   5  4 0    9 
  
  
 z  0 3 10  42
PROC IML;
A={3 2 -4,
5 -4 0,
0 3 10};
B={11,9,42};
OPT1=SOLVE(A,B);
OPT2=INV(A)*B;
PRINT OPT1 OPT2;
QUIT;
Working with SAS Datasets
Opening a SAS Dataset

Before you can access a SAS dataset, you must first
submit a command to open it.

To simply read from an existing data set, submit a USE
statement.


To read and write to an existing data set, use the EDIT
statement.


USE <SAS Dataset> VAR <Variable Names> WHERE expression;
In addition to READ you can also EDIT, DELETE, and PURGE
observations from a dataset that has been opened using edit
Each dataset must only be opened once
Reading in Datasets

Create matrices from a SAS dataset




Create a vector for each variable
Create a matrix containing multiple variables
Select all observations or a subset
To transfer data from a SAS dataset to a matrix

SETIN


Specifies an open dataset as the current input dataset
READ
Transforms dataset into matrix
READ <range> VAR operand <WHERE (expression)>
INTO name;
READ all VAR VAR1 WHERE VAR1>80 INTO MYMAT;

Comparison Operators
Operation
IML Code
Less than
<
Less than or equal to
<=
Equal to
=
Greater than
>
Greater than or equal to
>=
Not equal to
^=
Contains a given string
?
Does not contain a given string
^?
Begins with a given string
=:
Sounds like or is spelled like a given string
=*
Sorting SAS Datasets



First close the dataset
SORT dataset out=new_dataset by var_name;
Can use the keyword DESCENDING to denote the
alternative sort order
Creating Datasets from Matrices

When you create a dataset



CREATE



Opens a new SAS dataset for I/O
APPEND


Columns become variables
Rows become observations
Writes to the dataset
CREATE SAS-data-set FROM matrix
<[COLNAME=column-name ROWNAME=row name]>
CREATE SAS-dataset VAR variable-names; APPEND
FROM matrix-name;
Data Management Commands
Command
Description
Command
Description
APPEND
Adds observations to the end
of a SAS dataset
RESET
DEFLIB
Names default libname
CLOSE
Closes a SAS dataset
SETIN
Selects an open SAS dataset
for input
CREATE
Creates and opens a new SAS
dataset or input and output
SETOUT
Selects an open SAS dataset
for output
DELETE
Marks observations for
deletion in a SAS dataset
SHOW
CONTENTS
Shows contents of the
current input SAS dataset
EDIT
Opens an existing SAS dataset
for I/O
SHOW
DATASETS
Shows SAS datasets currently
open
FIND
Finds observations
SORT
Sorts a SAS dataset
READ
Reads observations into IML
variables
SUMMARY
Produces summary statistics
for numeric variables
REPLACE
Writes observations back into
a SAS dataset
USE
Opens an existing SAS
dataset for input
Reading in SAS data with IML
*CREATING A SAS DATASET TO WORK WITH;
DATA MYDATA;
SET SASHELP.CARS;
RUN;
PROC IML;
USE MYDATA VAR {MSRP MPG_CITY MPG_HIGHWAY}
; /* OPEN DATASET*/
READ ALL VAR _ALL_ WHERE (MSRP<12000) INTO
CAR_MAT; /* READ DATASET*/
Z=NROW(CAR_MAT); /* FIGURE OUT HOW
MANY ROWS*/
PRINT Z CAR_MAT[COLNAME={MSRP CITY HWY}];
/* LOOK AT DATA*/
QUIT;
Analyzing Data & Writing Programs
Subscript Operations

Commands that can be applied
to obtain summary statistics on
matrices
Reduction operators

Addition +

Multiplication #
Select a single element, row,
column, or submatrix
Similar to the APPLY function in R

Mean:

Sum of Squares ##

Maximum <>
SUMMARY produces summary
statistics on the numeric
variables of a SAS data set. If you
want them by subgroup use the
CLASS option.

Minimum ><

Index of maximum <:>

Index of minimum >:<





SUMMARY VAR {VARIABLE LIST}
<CLASS (By Variables)> STAT
(Desired stats) <OPT (SAVE)>

Additional Operators

Concatenation: Horizontal ||, Vertical
//

Number of rows: nrow(matrix),
Number of Columns: ncol(matrix)
Types of Statements

Control Statements



Functions and CALL statements


Direct the flow of execution
E.g. IF-THEN/ELSE statement
Perform special tasks or user-defined operations
Command statements

Perform special processing such as setting options,
displaying windows, and handling input and output
Control Statements
Statement
Description
PROC IML; QUIT;
Initiates and ends an IML session
DO; END;
Specifies a group of statements
Iterative DO; END;
Defines an iteration loop
IF-THEN;ELSE;
Conditionally routes execution
START; FINISH;
Defines a module
RUN;
Executes a Module
IF-THEN/ELSE statements

IF expression THEN
statement-one; ELSE
statement-two;


PROC IML;
A={12 22 33};
IF MAX(A)<20
IML processess the
THEN P=1;
expression and uses this
ELSE P=0;
to decide whether
PRINT P;
statement one or
statement two is executed. QUIT;
You may also nest IFTHEN/ELSE Statements
DO groups

Several statements can be grouped
together into a compound statement to
be executed as a unit.



DO; Statements; END;
You can combine DO arguments with
IF/ELSE

IF (X<Y) THEN DO; Z=X+Y; END;

ELSE DO; Z=X-Y; END;
The iterative DO <WHILE/UNTIL
expression> repeats a set of statements
over an number of times defined by the
index.

If DO WHILE is used, the expression is
evaluated at the beginning of each loop
with iterations continuing until the
expression is false. If the expression begins
false the loop does not run.

If DO UNTIL is used the expression is
evaluated at the end of the loop, this
means that the loop will always execute at
least once.
PROC IML;
Y=0;
DO I=1 TO 3;
Y=Y+1;
PRINT Y;
END;
QUIT;
PROC IML;
COUNT=1;
DO WHILE(COUNT<3);
COUNT=COUNT+1;
PRINT “WHILE";
END;
COUNT=1;
DO UNTIL(COUNT>3);
COUNT=COUNT+1;
PRINT “UNTIL";
END;
QUIT;
Interacting with Procs




Option One

Write the data to a SAS data set by using the CREATE and APPEND statements

Use the SUBMIT statement to call a SAS procedure that analyzes the data

Read the results of the analysis into IML matrices using USE and READ statements
Option Two

Do what can only be done in IML

Write the data back out to a SAS dataset

Call PROCs normally
ODS TRACE ON;/ODS TRACE OFF;

Placed before and after a proc will print to the log the names of the various output.

Useful for requesting/saving specific parts of the analysis.
To use PROCs SUBMIT; Statements; END SUBMIT;

Like macros you can list variables already existing in IML that you would like to use in the
proc. Then inside the submit command refer to these variables using &Varname

Substitutions take place before the block is processed so no macro variable is created

If you use SUBMIT *, you indicate a wildcard so that any of the existing variables can be
referred

Any variable inside the submit block that is referenced (&var) but not created in the IML
procedure does not get substituted. This is used for creating true macros.
Interacting with Procs
PROC IML;
Q={2 5 7 9};
CREATE MYDATA VAR{Q};
APPEND;
CLOSE MYDATA;
*Table=“Moments”;
SUBMIT;
*SUBMIT table;
PROC UNIVARIATE DATA=MYDATA;
VAR Q;
ODS OUTPUT MOMENTS=MOMENTS;
* ODS OUTPUT MOMENTS=&Table;
RUN;
ENDSUBMIT;
USE MOMENTS;
READ ALL VAR{NVALUE1 LABEL1};
CLOSE MOMENTS;
LABL ="MY OUTPUT";
PRINT NVALUE1[ROWNAME=LABEL1 LABEL=LABL];
QUIT;
Modules

Modules are used for two purposes




To execute the module use



RUN MODULE-NAME; execute module first then subroutines
CALL MODULE_NAME; execute subroutines then modules
A function is a special type of module that only returns a specific value.



To create user-defined subroutine or function.
To define variables that are local to the module.
START MODULE-NAME OPTIONS; STATEMENTS; FINISH MODULE-NAME;
START MODULE; STATEMENTS; RETURN(VARIABLE); FINISH MODULE;
Any variables created inside the module but not mentioned in the return
statement will not be retained for future use.
Possible to store and load modules (like a macro library or SOURCE in
R)



STORE MODULE= MODULE NAME;
LOAD MODULE=MODULE NAME;
These will retain a program after IML has exited
Creating a Permanent Module Library

Permanent libraries maintain functions for multiple
users. Equivilant to datasets stored in a permanent
library vs. work folder
LIBNAME LIBRARY ‘PATH’;
PROC IML;
START FUNC1(X); RETURN(X+1); FINISH;
START FUNC2(X); RETURN(X**2); FINISH;
RESET STORAGE=SOURCEFILE.LIBRARY;
STORE MODULE=_ALL_;
QUIT;
Command Statments
Statement
Description
FREE
Frees memory associated with a matrix
LOAD
Loads a matrix or module from a
storage library
MATTRIB
Associates printing attributes with
matrices
PRINT
Prints a matrix or message
RESET
Sets various system options
REMOVE
Removes a matrix or module from
library storage
SHOW
Displays system information
STORE
Stores a matrix or module in the storage
library
Using R
Calling R from within IML



Check to see if R has permission for your SAS
 PROC OPTIONS OPTION=RLANG;
 If not, you will have to add the –RLANG option to startup
Similar to calling procs
 SUBMIT/R; ENDSUBMIT;
Export



Import



ExportDataSetToR: SAS dataset ->R data frame
ExportMatrixtoR:IML Matrix->R Matrix
IMPORTDATASETFROMR: R Expression ->SAS Dataset
IMPORTMATRIXFROMR : R Expression ->SAS MATRIX
R OBJECTS TEND TO BE COMPLEX SO YOU CAN ONLY
TRANSFER SOMETHING THAT HAS BEEN COERCED TO DATA
FRAME
SAS to R and back again
proc iml;
proc iml;
/* Comparison of matrix operations in IML and R */
print "---------- SAS/IML Results -----------------";
x = 1:3; /* vector of sequence 1,2,3 */
m = {1 2 3, 4 5 6, 7 8 9}; /* 3 x 3 matrix */
q = m * t(x); /* matrix multiplication */
print q;
use Sashelp.Class;
read all var {Weight Height};
close Sashelp.Class;
/* send matrices to R */
call ExportMatrixToR(Weight, "w");
call ExportMatrixToR(Height, "h");
submit / R;
Model <- lm(w ~ h, na.action="na.exclude") # a
ParamEst <- coef(Model) # b
print "------------- R Results --------------------";
submit / R;
rx <- matrix( 1:3, nrow=1) # vector of sequence 1,2,3
rm <- matrix( 1:9, nrow=3, byrow=TRUE) # 3 x 3 matrix
rq <- rm %*% t(rx) # matrix multiplication
print(rq)
endsubmit;
Pred <- fitted(Model)
Resid <- residuals(Model)
endsubmit;
call ImportMatrixFromR(pe, "ParamEst");
print pe[r={"Intercept" "Height"}];
ht = T( do(55, 70, 5) );
A = j(nrow(ht),1,1) || ht;
pred_wt = A * pe;
print ht pred_wt;
submit / R;
hist(p, freq=FALSE) # histogram
lines(est) # kde overlay
endsubmit;
YVar = "Weight";
XVar = "Height";
submit XVar YVar / R;
Model <- lm(&YVar ~ &XVar, data=Class, na.action="na.exclude")
print (Model$call)
endsubmit;
MISC
Download