TYPE=CORR Data Sets in SAS
There are several special types of data sets available in SAS. One of these is
the TYPE=CORR data set. A TYPE=CORR data set may be created from a standard
data set by using PROC CORR. Here is an example using my gradebook from PSYC
6430 (December, 1996):
PROC CORR NOMISS DATA=KLW OUTP=SOL;
VAR CLASSWRK MIDTERM FINAL_EX;
PROC PRINT;
(The PROC PRINT output follows:)
OBS
_TYPE_
1
2
3
4
5
6
MEAN
STD
N
CORR
CORR
CORR
_NAME_
CLASSWRK
MIDTERM
FINAL_EX
CLASSWRK
MIDTERM
FINAL_EX
94.5333
5.4493
15.0000
1.0000
0.6585
0.7615
87.4000
7.7993
15.0000
0.6585
1.0000
0.5557
89.0000
9.3121
15.0000
0.7615
0.5557
1.0000
The first line in the resulting TYPE=CORR data set would has the automatic
variable _TYPE_ = “MEAN” and variables classwrk, midterm, and final_ex have values
equal to the means of those variables. The second line has _TYPE_ = “STD” and
classwrk, midterm, and final_ex equal to the standard deviations of those variables.
The third has_TYPE_ = “N” and classwrk, midterm, and final_ex equal to sample sizes.
The next three lines contain the correlation matrix. Line 4 has _TYPE_ = “CORR”,
_NAME_ = “CLASSWRK”, and classwrk, midterm, and final_ex equal to r11, r12, and r13.
Lines 5 and 6 complete the correlation matrix, with approriate changes in the _NAME_
for each “observation.”
Many SAS procedures compute the correlation matrix (or something very close
to it) as the first step in their data analysis. Often this is the most computationally
expensive part of the procedure. If you are working with very large data sets, you can
save processing time by inputting the correlation matrix rather than the raw data. For
example, I want first to obtain all the bivariate correlations for variables X 1 - X50 and
then I want to do several multiple regressions involving these variables. I first use
PROC CORR to get the correlations from the raw data and to output the TYPE=CORR
data set. I then use the TYPE=CORR data set as the input data set for the multiple
regression analyses. You can save the output correlation file in a SAS system file by
giving it a two level name, for example, “outp=duh.sol” -- “duh” would first have to
defined as a SAS library -- a SAS library is a pointer to a location where SAS files are
stored. A saved SAS data file is brought back in to SAS with the SET command.
When using PROC CORR to output a type=corr data set, you will generally need
to use the NOMISS option in PROC CORR, which results in the deletion of data from
any subject that is missing data on any of the variables. This is highly recommended if

Copyright 2008, Karl L. Wuensch, All Rights Reserved
Type=Corr.doc
2
you are going to input the correlation matrix to another PROC such as PROC REG.
Otherwise the various correlations in the matrix may be based on different subsets of
the entire data set, which may lead to biased results or worse. PROC REG will accept
the default “pairwise” correlation matrix produced if you do not specify NOMISS, printing
a warning in the log about unequal n’s in the matrix (unless the n’s happen to come out
equal despite some variables having missing data), but the results may be biased.
Creating Your Own Type=Corr Data Set
Sometimes you may have the correlation matrix but not the raw data. For
example, you may find a correlation matrix in an article or a book and want to do some
analysis on it. You can type the correlations, N’s, means, and standard deviations into
a Type=Corr data set of your own creation and then use it in SAS.
If you don’t specify the N’s, SAS will assume 10,000 (at least that was the default
the last time I did not specify N -- the default has changed with versions), and if you
don’t specify means and standard deviations for the variables, SAS will assume mean 0
and standard deviation 1.
Here is a little program that uses the same correlation matrix that was presented
in the handout “Using Matrix Algebra to do Multiple Regression.” Copy it into the SAS
editor and submit it.
options pageno=min nodate formdlim='-';
DATA SOL(TYPE=CORR);
LENGTH _NAME_ $ 11;
*Specify the length of the longest variable name, in this case, misanthropy,
11;
INPUT _TYPE_ $ _NAME_ $ idealism relativism misanthropy gender attitude;
CARDS;
corr idealism
1.0000 -0.0870 -0.1395 -0.1011 0.0501
corr relativism -0.0870 1.0000 0.0525 0.0731 0.1581
corr misanthropy -0.1395 0.0525 1.0000 0.1504 0.2259
corr gender
-0.1011 0.0731 0.1504 1.0000 -0.1158
corr attitude
0.0501 0.1581 0.2259 -0.1158 1.0000
N
.
153
153
153
153
153
mean
.
3.64926 3.35810 2.32157 1.18954 2.37276
std
.
0.53439 0.57596 0.67560 0.39323 0.52979
;
*Note the use of a period for missing value for _NAME_ on row of _TYPE_ N,
mean, and std.;
PROC REG; MODEL attitude = idealism -- gender / STB SCORR2; run;
Copyright 2008, Karl L. Wuensch, All Rights Reserved