SAS--Proc Import and Export

advertisement
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
SAS – Proc Import and Proc Export
Importing and Exporting Data Files in SAS (Proc Import and Proc Export)
Several types of commonly-used data file formats, including Microsoft Excel spreadsheet files (.xls files),
comma-delimited files (.csv files), space or tab-delimited text files (.prn or .txt), Microsoft Access database files
(.mdb files), or SAS data files (.sas7bdat files) can be imported into and exported from SAS. (SAS can also
import and export STATA and SPSS format files.) In this handout, we’ll consider Microsoft Excel data files.
"Import" means: bring data that are located in a file outside of SAS into SAS's memory.
"Export" means: send data that are in SAS's memory to a file located outside of SAS.
The SAS procedures Proc Import and Proc Export can be used to import and export Microsoft Excel spreadsheet
data files. In addition, Proc Contents can be used to obtain summary information about the variables in a dataset
that has just been imported into SAS. We’ll review Proc Contents in the next handout.
Proc Import--Importing Data from Excel into SAS
First, from inside Excel, convert the Excel data file to the "Excel 97-2003 Workbook" format. You can do
this by opening the Excel data file in Excel and then using "Save As" to save the file in the "Excel 97-2003
Workbook" format.
When the Excel file is in the "Excel 97-2003 Workbook" format, it will have a ".xls" filename extension (not
".xlsx" or ".xlxm" or ".xlsb" or any other filename extension).
IMPORTANT: Before you attempt to import an Excel file into SAS, you need to close the file in
Excel. SAS will not import an Excel file that is open in Excel. SAS will give you an error message if
you attempt to import the Excel file into SAS while the Excel file is open in Excel.
For example, suppose you want to import a (hypothetical) Excel data file called "mydata.xls" into SAS, and
suppose the file is located on the V: drive in a folder named ECN422. You could use the following Proc Import
command in SAS:
proc import datafile="v:\ECN422\mydata.xls" dbms=xls out=dataset01 replace;
run;
In the “proc import” command above, the datafile="v:\ECN422\mydata.xls" tells SAS the location of the data
file on your computer.
The "dbms" means "database management system" and tells SAS what kind of file you are trying to import; in
this case, an Excel file, which is designated an "xls" file in SAS, so we put "dbms=xls".
The "out=dataset01" tells SAS what name to give to the new data set when it is stored in SAS's memory. We
are calling the new data set "dataset01", although we could call it whatever we like. The original data file
mydata.xls remains unchanged on the v: drive. A copy of dataset mydata.xls has been created and stored in
SAS’s memory under as the dataset named dataset01.
The "replace" tells SAS to replace any data set named "dataset01" that it might have in its memory with the new
"dataset01" that is being created as we import the data from mydata.xls. Using "replace" is not necessary, but it's
a good thing to do to ensure that we are creating a new, clean data set.
1
UNC-Wilmington
Department of Economics and Finance
V: drive of your
computer
ECN 377
Dr. Chris Dumas
Proc Import
Dataset
“mydata.xls”
SAS’s
memory
Dataset
“dataset01”
By default, Proc Import will look for variable names in the first row of the data set and use them if they are
present. If variable names are not present in the first row of the data set, Proc Import will assign names VAR1,
VAR2, VAR3, etc., to the variables as it reads them.
Proc Import scans the first 20 rows of data and assigns a variable type, either numeric or character, to each
variable based on the data values in those first 20 rows. Check to make sure that the first 20 values of each
variable are not zero (or unrepresentative in some other way) before using Proc Import.
If a data value is missing, Proc Import will assign a period "." value as the missing value.
Proc Export--Exporting Data from SAS to Excel
Now suppose that after working for a while with data in a data set named "dataset02" inside SAS's memory, we
are ready to save the data to an Excel file. We can name the Excel file whatever we like, so let's call it
"important_data.xls". Also, suppose we want to put this file in folder ECN422 on the V: drive. We could use the
following Proc Export command to accomplish this:
proc export data=dataset02 outfile="v:\ECN422\important_data.xls" dbms=EXCEL5
replace;
run;
The "data=dataset02" tells SAS which data set in its memory you would like to export (sometimes SAS may be
holding more than one data set in its memory).
The "outfile=" tells SAS which drive, folder and filename you want to use when you export the data.
The "dbms" and "replace" command words do the same things they did in the Proc Import command; however,
we need to use dbms=EXCEL5 in Proc Export (whereas we used dbms=xls in Proc Import).
2
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
In this course, we will work with data files in Excel format, however . . .
SAS commands for importing and exporting
other types of data files (non-Excel files) are provided below.
Comma-Delimited ".csv" Data Files
Importing Data
For comma-delimited files, use dbms=csv :
proc import datafile="v:\ECN422\mydata.csv"
dbms=csv out=dataset01 replace;
run;
Exporting Data
proc export data=dataset02 outfile="v:\ECN422\important_data.csv" dbms=csv
replace;
run;
Space or Tab-Delimited ".txt" ".prn" ".dat" " Data Files
Importing Data
For space-delimited data files, use dbms=dlm :
proc import datafile="v:\ECN422\mydata.txt"
dbms=dlm out=dataset01 replace;
run;
For tab-delimited data files, use dbms=tab :
proc import datafile="v:\ECN422\mydata.txt"
dbms=tab out=dataset01 replace;
run;
Exporting Data
For space-delimited data files, use dbms=dlm :
proc export data=dataset02 outfile="v:\ECN422\important_data.txt" dbms=dlm
replace;
run;
or, for tab-delimited data files,
proc export data=dataset02 outfile="v:\ECN422\important_data.txt" dbms=tab
replace;
run;
3
UNC-Wilmington
Department of Economics and Finance
ECN 377
Dr. Chris Dumas
Microsoft Access Database Files
Importing and exporting Microsoft Access data files require a little modification of the basic Proc Import
and Proc Export commands.
Importing Data
proc import datatable='table name' dbms=access out=dataset01 replace;
database="v:\ECN422\mydata.mdb";
run;
Notice that when importing Access data files, we need to specify the datatable within the Access data file.
Also, instead of using a "datafile=..." statement inside the Proc Import command line, we need to use a
"database=...." command outside the Proc Import command line. The "dbms=access" statement works for
Access 2000 through Access 2003. If the data files are Access 2007 format, then use "dbms=access97"
instead of "dbms=access".
Exporting Data
proc export data=dataset02 outfile="v:\ECN422\important_data.mdb" dbms=access
replace;
run;
If the data files are Access 2007 format, then use "dbms=access97" instead of "dbms=access".
SAS Database Files
Importing SAS database files does not use the Proc Import command. Instead, the Data command and a
Set command are used to import data from a SAS database file. The following commands read a SAS
database file named CountyRev2 located in the ECN422 folder on the V: drive into SAS's memory and
name it dataset01. In the ECN422 folder on the V: drive, the SAS database file actually has the name
"CountyRev2.sas7bdat", but you do not need to type the .sas7bdat file name extension, as SAS
automatically adds the ".sas7bdat" file name extension as it reads the file. (Aside: The ".sas7bdat" refers
to SAS version 7; SAS 9.2 uses the same data file format as SAS 7.0, so they kept the filename extension
the same.)
Importing Data
data dataset01;
set 'v:\ecn422\CountyRev2';
run;
Exporting Data
The Proc Export command is not used to export SAS database files. Instead, the Data command and a Set
command are used to export data to a SAS database file. The following commands create a SAS database
file named CountyRev3 in the ECN422 folder on the V: drive and Set the data from dataset01 into the
CountyRev3 file. In the ECN422 folder on the V: drive, the SAS database file will have the name
"CountyRev3.sas7bdat". SAS automatically adds the ".sas7bdat" file name extension as it creates the file.
data 'v:\ECN422\CountyRev3';
set dataset01;
run;
4
Download