TYPE=CORR Data Sets in SAS There are several special types of data sets available in SAS. One of these is the TYPE=CORR data set. A TYPE=CORR data set may be created from a standard data set by using PROC CORR. Here is an example using my gradebook from PSYC 6430 (December, 1996): PROC CORR NOMISS DATA=KLW OUTP=SOL; VAR CLASSWRK MIDTERM FINAL_EX; PROC PRINT; (The PROC PRINT output follows:) OBS _TYPE_ 1 2 3 4 5 6 MEAN STD N CORR CORR CORR _NAME_ CLASSWRK MIDTERM FINAL_EX CLASSWRK MIDTERM FINAL_EX 94.5333 5.4493 15.0000 1.0000 0.6585 0.7615 87.4000 7.7993 15.0000 0.6585 1.0000 0.5557 89.0000 9.3121 15.0000 0.7615 0.5557 1.0000 The first line in the resulting TYPE=CORR data set would has the automatic variable _TYPE_ = “MEAN” and variables classwrk, midterm, and final_ex have values equal to the means of those variables. The second line has _TYPE_ = “STD” and classwrk, midterm, and final_ex equal to the standard deviations of those variables. The third has_TYPE_ = “N” and classwrk, midterm, and final_ex equal to sample sizes. The next three lines contain the correlation matrix. Line 4 has _TYPE_ = “CORR”, _NAME_ = “CLASSWRK”, and classwrk, midterm, and final_ex equal to r11, r12, and r13. Lines 5 and 6 complete the correlation matrix, with approriate changes in the _NAME_ for each “observation.” Many SAS procedures compute the correlation matrix (or something very close to it) as the first step in their data analysis. Often this is the most computationally expensive part of the procedure. If you are working with very large data sets, you can save processing time by inputting the correlation matrix rather than the raw data. For example, I want first to obtain all the bivariate correlations for variables X 1 - X50 and then I want to do several multiple regressions involving these variables. I first use PROC CORR to get the correlations from the raw data and to output the TYPE=CORR data set. I then use the TYPE=CORR data set as the input data set for the multiple regression analyses. You can save the output correlation file in a SAS system file by giving it a two level name, for example, “outp=duh.sol” -- “duh” would first have to defined as a SAS library -- a SAS library is a pointer to a location where SAS files are stored. A saved SAS data file is brought back in to SAS with the SET command. When using PROC CORR to output a type=corr data set, you will generally need to use the NOMISS option in PROC CORR, which results in the deletion of data from any subject that is missing data on any of the variables. This is highly recommended if Copyright 2008, Karl L. Wuensch, All Rights Reserved Type=Corr.doc 2 you are going to input the correlation matrix to another PROC such as PROC REG. Otherwise the various correlations in the matrix may be based on different subsets of the entire data set, which may lead to biased results or worse. PROC REG will accept the default “pairwise” correlation matrix produced if you do not specify NOMISS, printing a warning in the log about unequal n’s in the matrix (unless the n’s happen to come out equal despite some variables having missing data), but the results may be biased. Creating Your Own Type=Corr Data Set Sometimes you may have the correlation matrix but not the raw data. For example, you may find a correlation matrix in an article or a book and want to do some analysis on it. You can type the correlations, N’s, means, and standard deviations into a Type=Corr data set of your own creation and then use it in SAS. If you don’t specify the N’s, SAS will assume 10,000 (at least that was the default the last time I did not specify N -- the default has changed with versions), and if you don’t specify means and standard deviations for the variables, SAS will assume mean 0 and standard deviation 1. Here is a little program that uses the same correlation matrix that was presented in the handout “Using Matrix Algebra to do Multiple Regression.” Copy it into the SAS editor and submit it. options pageno=min nodate formdlim='-'; DATA SOL(TYPE=CORR); LENGTH _NAME_ $ 11; *Specify the length of the longest variable name, in this case, misanthropy, 11; INPUT _TYPE_ $ _NAME_ $ idealism relativism misanthropy gender attitude; CARDS; corr idealism 1.0000 -0.0870 -0.1395 -0.1011 0.0501 corr relativism -0.0870 1.0000 0.0525 0.0731 0.1581 corr misanthropy -0.1395 0.0525 1.0000 0.1504 0.2259 corr gender -0.1011 0.0731 0.1504 1.0000 -0.1158 corr attitude 0.0501 0.1581 0.2259 -0.1158 1.0000 N . 153 153 153 153 153 mean . 3.64926 3.35810 2.32157 1.18954 2.37276 std . 0.53439 0.57596 0.67560 0.39323 0.52979 ; *Note the use of a period for missing value for _NAME_ on row of _TYPE_ N, mean, and std.; PROC REG; MODEL attitude = idealism -- gender / STB SCORR2; run; Copyright 2008, Karl L. Wuensch, All Rights Reserved