TYPE=CORR Data Sets in SAS
There are several special types of data sets available in SAS. One of these is the
TYPE=CORR data set. A TYPE=CORR data set may be created from a standard data set by using
PROC CORR. Here is an example using my gradebook from PSYC 6430 (December, 1996):
PROC CORR NOMISS DATA=KLW OUTP=GRADES;
VAR CLASSWRK MIDTERM FINAL_EX;
PROC PRINT;
(The PROC PRINT output follows:)
OBS
_TYPE_
1
2
3
4
5
6
MEAN
STD
N
CORR
CORR
CORR
_NAME_
CLASSWRK
MIDTERM
FINAL_EX
CLASSWRK
MIDTERM
FINAL_EX
94.5333
5.4493
15.0000
1.0000
0.6585
0.7615
87.4000
7.7993
15.0000
0.6585
1.0000
0.5557
89.0000
9.3121
15.0000
0.7615
0.5557
1.0000
The first line in the resulting TYPE=CORR data set has the automatic variable _TYPE_ =
“MEAN” and variables classwrk, midterm, and final_ex have values equal to the means of those
variables. The second line has _TYPE_ = “STD” and classwrk, midterm, and final_ex equal to the
standard deviations of those variables. The third has_TYPE_ = “N” and classwrk, midterm, and
final_ex equal to sample sizes. The next three lines contain the correlation matrix. Line 4 has
_TYPE_ = “CORR”, _NAME_ = “CLASSWRK”, and classwrk, midterm, and final_ex equal to r11, r12,
and r13. Lines 5 and 6 complete the correlation matrix, with appropriate changes in the _NAME_ for
each “observation.”
By default, SAS assumes that character variables (including “_NAME_”) have not more than
eight characters. Suppose that I named “classwork” with nine characters, that is, “classwork” – SAS
would read only the first eight characters and later, when I referenced variable “classwork” SAS would
(cryptically) complain that there is no such variable – SAS created the variable “classwork,” not
“classwork.” To avoid this problem, use names with not more than eight characters or precede your
INPUT statement with a LENGTH statement. In the code below, I set the length of _NAME_ to 11, to
accommodate the name “NoTrendSmth.”
Data climate(type=CORR);
Length _NAME_ $11;
Input _Type_ $ _NAME_ $ NoTrendSmth NAO_Index M_Min T_precip ;
Cards;
N . 431 431 431 431
Mean . -0.10824 0.00174 32.0 3.44093
STD . 0.16032 0.98801 5.2788 1.54558
CORR Notrendsmth 1 -0.11832 0.18098 -0.09132
CORR NAO_Index -0.11832 1 0.16708 0.01369
CORR M_Min
0.18098 0.16708 1 0.11090
CORR T_precip -0.09132 0.01369 0.11090 1
Many SAS procedures compute the correlation matrix (or something very close to it) as the
first step in their data analysis. Often this is the most computationally expensive part of the

Copyright 2013, Karl L. Wuensch, All Rights Reserved
Type=Corr.docx
2
procedure. If you are working with very large data sets, you can save processing time by inputting
the correlation matrix rather than the raw data. For example, I want first to obtain all the bivariate
correlations for variables X1 - X50 and then I want to do several multiple regressions involving these
variables. I first use PROC CORR to get the correlations from the raw data and to output the
TYPE=CORR data set. I then use the TYPE=CORR data set as the input data set for the multiple
regression analyses. You can save the output correlation file in a SAS system file by giving it a two
level name, for example, “outp=duh.sol” -- “duh” would first have to defined as a SAS library -- a SAS
library is a pointer to a location where SAS files are stored. A saved SAS data file is brought back in
to SAS with the SET command.
When using PROC CORR to output a type=corr data set, you will generally need to use the
NOMISS option in PROC CORR, which results in the deletion of data from any subject that is missing
data on any of the variables. This is highly recommended if you are going to input the correlation
matrix to another PROC such as PROC REG. Otherwise the various correlations in the matrix may
be based on different subsets of the entire data set, which may lead to biased results or worse.
PROC REG will accept the default “pairwise” correlation matrix produced if you do not specify
NOMISS, printing a warning in the log about unequal n’s in the matrix (unless the n’s happen to come
out equal despite some variables having missing data), but the results may be biased.
Creating Your Own Type=Corr Data Set
Sometimes you may have the correlation matrix but not the raw data. For example, you may
find a correlation matrix in an article or a book and want to do some analysis on it. You can type the
correlations, N’s, means, and standard deviations into a Type=Corr data set of your own creation and
then use it in SAS.
If you don’t specify the N’s, SAS will assume 10,000 (at least that was the default the last time
I did not specify N -- the default has changed with versions), and if you don’t specify means and
standard deviations for the variables, SAS will assume mean 0 and standard deviation 1.
Here is a little program that uses the same correlation matrix that was presented in the
handout “Using Matrix Algebra to do Multiple Regression.” Copy it into the SAS editor and submit it.
options pageno=min nodate formdlim='-';
DATA SOL(TYPE=CORR);
LENGTH _NAME_ $ 11;
*Specify the length of the longest variable name, in this case, misanthropy, 11;
INPUT _TYPE_ $ _NAME_ $ idealism relativism misanthropy gender attitude;
CARDS;
corr idealism
1.0000 -0.0870 -0.1395 -0.1011 0.0501
corr relativism -0.0870 1.0000 0.0525 0.0731 0.1581
corr misanthropy -0.1395 0.0525 1.0000 0.1504 0.2259
corr gender
-0.1011 0.0731 0.1504 1.0000 -0.1158
corr attitude
0.0501 0.1581 0.2259 -0.1158 1.0000
N
.
153
153
153
153
153
mean
.
3.64926 3.35810 2.32157 1.18954 2.37276
std
.
0.53439 0.57596 0.67560 0.39323 0.52979
;
*Note the use of a period for missing value for _NAME_ on row of _TYPE_ N, mean, and
std.;
PROC REG; MODEL attitude = idealism -- gender / STB SCORR2; run;
Copyright 2013, Karl L. Wuensch, All Rights Reserved