Chapter 5

advertisement
Chapter 5
Creating SAS Data Sets from
Raw Files and Excel Worksheets
Overview
Data Entry
Raw Data in
External Files
DATA STEP
SAS Data Set
Excel and Other
Types of Data
Examine the Raw Data File
and File Layout
Partial listing of Sales.dat raw data file:
>----+----10---+----20---+----30SMITH
JAN 140097.98 440356.70
DAVIS
JAN 385873.00 178234.00
JOHNSON JAN 98654.32 339485.00
SMITH
FEB 225983.09 12250.00
DAVIS
FEB 88456.23 55564.00
Each field
represents a
variable. To read
the data set, one
needs to define a
variable name for
each variable:
Field
Name
Start
Column
End
Column
Data Type
SAS Variable
Name
Last Name
1
7
Character
L_name
Month
9
11
Character
month
Residential
13
21
Numeric
residential
Commercial
23
31
Numeric
commercial
Steps to Create a SAS Data Set from a
Raw Data File
This is accomplished in the DATA Step, which requires
program statements for conducting the tasks:
1. Provide a physical location for the new SAS data
set to be store.
2. Identify the location, name of the external file
3. Define a name for the new SAS data set
4. Provide a reference to identify the external file
5. Define and describe the variables and data values
to be read
6. Conduct any additional data manipulations for
processing the data
A Summary of SAS Statements for
accomplishing the required tasks
The Tasks
Use the SAS statements
Reference SAS data library
LIBNAME statement
Reference external file
FILENAME statement
Name SAS data set
DATA dataset-name;
Identify external file
INFILE statement
Describe and read data
INPUT statement
Manipulate variables and data
values
Depending on the objectives. More will
be discussed later
Execute DATA step
RUN statement
List the data
PROC PRINT statement
Analyze and report
Depending on objectives. Different
reporting procedures and data analysis
procedures will be needed.
Execute final program step
RUN statement
The DATA Step to read external data:
Libname libref ‘__________________’;
filename fileref ‘
data _______________ ;
‘;
infile fileref;
NOTE: If you copy a
program from PPT or from
Word File to SAS system,
you MUST retype the
quotation marks in SAS
system. They are defined
differently.
input _________ ;
.
Or:
.
.
run;
data _______________ ;
infile
input
.
run;
‘
‘ ;
_________ ;
.
.
Example
Our objective is to read the raw data: salesdata.dat
stored in folder of the C-drive with the path:
C:\math707\RawData\RawData_dat
And create a SAS data set Sales_sasdata, then store
this SAS data in the Sales folder, a folder needed
to be created prior to writing your program, in Cdrive with the path:
C:\math707\Sales
Two SAS program statements are required in your
SAS program before reading the file:
• A statement to reference the folder to the SAS
data set.
• A statement to reference the external data set.
Reference SAS Data Library
LIBNAME saleslib ‘C:\math707\Sales’;
This statement defines a SAS data library
saleslib referring to the folder Sales, which
will be used to store the new SAS data set to
be created.
Reference the External Raw Data File
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\
salesdata.dat’ ;
NOTE: we define the external raw data file reference
name is sal_dat, and data set is located in the HD
described in the path above.
NOTE: The rules of external file reference name are
the same as Library reference name.
• 1-8 characters, starting with alphabet or
underscore, contains only letters, numbers or
underscores.
More on FILENAME statement
•
It is a global statement.
•
It can reference to ONE external raw data file or a folder of
external data files.
NOTE: LIBNAME references to a folder of SAS data set, not to ONE
SAS data set.
Syntax to reference to ONE external data file:
FILENAME fileref ‘path-to-the_external_datafile_Name’;
NOTE: The fileref will be use in the INFILE statement later to inform
SAS to locate the exact external raw data set.
Ex: FILENAME sal_dat ‘C:\math707\RawData\RawData_dat\
salesdata.dat’ ;
FILENAME for Referencing a GROUP
of External Data files
Syntax to reference to a GROUP of external raw data
files:
FILENAME fileref ‘path-to the_external_datafile_Folder’;
Ex:
FILENAME EXT_DAT ‘C:\math707\RawData\RawData_dat’ ;
An example: Read salesdata.dat
The Tasks
Program Statements in the Data Step
Reference SAS data library
LIBNAME saleslib ‘C:\STA575\Sales’;
Reference external file
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\
salesdata.dat’ ;
Name SAS data set
DATA dataset-name;
Identify external file
INFILE statement
Describe and read data
INPUT statement
Manipulate variables and
data values
Depending on the objectives. More will be
discussed later
Execute DATA step
RUN statement
List the data
PROC PRINT statement
Analyze and report
Depending one objectives. Different reporting
procedures and data analysis procedures will
be needed.
Execute final program step
RUN statement
Define the SAS data set name in DATA
Step to read the external data set
DATA sas_data_setName;
A SAS Data Step is to read data files into SAS system
for further processing and creating a new SAS
data set. Once the external raw data is read and
processed, it requires a new SAS data set name.
This is defined at the DATA statement.
For the example of reading salesdata.dat, we can call
the new SAS data set: Sales_sasdata.
Ex: DATA SALESLIB.Sales_Sasdata;
This creates a SAS data set sales_sasdata, which is
stored in the SAS library SALESLIB.
Am example: Read salesdata.dat
The Tasks
Program Statements in the Data Step
Reference SAS data library
LIBNAME saleslib ‘C:\STA575\Sales’;
Reference external file
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\
salesdata.dat’ ;
Name SAS data set
DATA Saleslib.Sales_sasdata;
Identify external file
INFILE statement
Describe and read data
INPUT statement
Manipulate variables and
data values
Depending on the objectives. More will be
discussed later
Execute DATA step
RUN statement
List the data
PROC PRINT statement
Analyze and report
Depending one objectives. Different reporting
procedures and data analysis procedures will
be needed.
Execute final program step
RUN statement
Identify the External Data Set to be
INPUT into SAS system
In order to read the external raw data set, SAS
will need two statements to accomplish this:
• One is to inform SAS system where to find
the External raw data set. The statement is
INFILE statement.
• One is to read variables in each record
correctly. The Statement is INPUT statement.
INFILE Statement
General syntax:
INFILE file-specification <options>;
The file-specification depends on how the FILENAME
statement defines the fileref.
• If the fileref references to exactly ONE
external raw data set , then, file-specification is
the fileref.
Ex:
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\salesdata.dat ‘;
INFILE sal_dat;
IINFLE statement Continued
If fileref references to a folder of external raw
data sets (an aggregated group of raw data
sets),
then, file-specification needs to be specifically
pointing to the exact data set using:
INFILE fileref(data-set-name.file_extension)
Example for INFILE statement When
FILEREF references to an aggregated
group of Raw data sets
Ex:
FILENAME EXT_DAT
‘C:\math707\RawData\RawData_dat’ ;
INFILE ext_dat(salesdata.dat);
The fileref is EXT_DAT, which references to the entire
folder of external raw data sets. The raw data set
in the folder to be input is salesdata.dat
Am example: Read salesdata.dat
The Tasks
Program Statements in the Data Step
Reference SAS data library
LIBNAME saleslib ‘C:\STA575\Sales’;
Reference external file
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\
salesdata.dat’ ;
Name SAS data set
DATA saleslib.Sales_sasdata;
Identify external file
INFILE sal_dat;
Describe and read data
INPUT statement
Manipulate variables and
data values
Depending on the objectives. More will be
discussed later
Execute DATA step
RUN statement
List the data
PROC PRINT statement
Analyze and report
Depending one objectives. Different reporting
procedures and data analysis procedures will
be needed.
Execute final program step
RUN statement
Describe and Read the Raw External
Data: Fixed Column INPUT using
The INPUT Statement
Now, we have informed SAS where to get the raw data
and where to store the new SAS data.
The next is to describe the variables and read the data
values of the variables from the raw data set. SAS uses
the INPUT statement to accomplish.
SAS needs to know exactly the formats of variables in
the data set. Different INPUT statements are needed to
handle different types of formats in the data set.
In this chapter, we will focus on the variables with
STANDARD and FIXED format.
Determine Variable Type:
Numeric Vs. Character Data Types
Based on examining the raw data file or the file
layout, every SAS variable can be one of two
types:
• character
• numeric.
Character data Type
A variable is considered to be character if the data values of the
variable contains any combination of the following:
• letters (A - Z, a - z)
• numbers (0-9)
• special characters (!, @, #, %, and so on).
NOTE: characters are case-sensitive. ‘Tom’ is different from
‘tom’ or ‘TOM’.
NOTE: Character data is displayed left-adjusted.
Examples:
Mr. John Doe
126 Apt. A
$34,540
583
Numeric Data Type
A variable is considered to be numeric if it
contains
• numbers (0-9).
It may also contain
• a decimal point (.)
• a minus sign (-)
• a letter E to indicate scientific notation.
NOTE: Numeric data is displayed right-adjusted
Examples:
25.6
543
-5.7
4.12E5
[This is 4.12 x 105]
Standard Vs. Nonstandard Numeric Data
Standard numeric data can contain only
• Numbers
• Decimal places
• Numbers in scientific or E-notation (ex, 4.2E3)
• Plus or minus signs
Nonstandard numeric data includes
• Values contain special characters, such as %, $,
comma (,), etc.
• Date and time values
• Data in fractions, integer binary, real binary,,
hexadecimal forms, etc.
Determine if each of the following
numeric data standard or nonstandard
data
345.12
Standard
$345.12
Nonstandard
3,456.12
Nonstandard
20DEC2010
Nonstandard date
12/20/2010
Nonstandard date
Fixed Format Vs. Free format
Fixed format means a variable occupies in a
fixed range of columns from observation to
observation.
Free format means the data values are not in a
fixed range of columns.
Ex:
Fixed format
Free format
12345678901234567890
12345678901234567890
--------------------
--------------------
HIGH
F
HIGH 340 12.5 F
F
LOW 5630 7.5 F
LOW
340 12.5
5630 7.5
MEDIAN 674 26.73 M
MEDIAN 674 26.73 M
Column INPUT
SAS can read a variety of different and complicate
standard and nonstandard data values. This chapter
focuses on reading raw data set with FIXED columns
and in STANDARD format.
The Column INPUT statement describes the columns
in each observation of the raw data set to SAS.
Each variable defined in the INPUT statement
• provides a name to represent each variable in the
data set
• indicates a type of character or numeric
• indicates the starting column and ending column.
The Column INPUT Statement
General form of the Column INPUT statement:
INPUT variable $ start - end . . . ;
variable
is a valid SAS variable name.
$
indicates a character variable.
start
identifies the starting position.
end
identifies the ending position.
The Column INPUT Statement
There are various ways to read data in the INPUT
statement. The following is ‘column Input’.
For the Salesdata Example:
input last_name
month
$
1 - 7
$
9 - 11
residential
13 -21
commercial
23 – 31 ;
Am example: Read salesdata.dat
The Tasks
Program Statements in the Data Step
Reference SAS data library
LIBNAME saleslib ‘C:\STA575\Sales’;
Reference external file
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\
salesdata.dat’ ;
Name SAS data set
DATA saleslib.Sales_sasdata;
Identify external file
INFILE sal_dat;
Describe and read data
INPUT last_name $ 1-7 month $ 9-11
residential 13 -21
commercial
23 – 31;
Manipulate variables and
data values
Depending on the objectives. More will be
discussed later
Execute DATA step
RUN statement
List the data
PROC PRINT statement
Analyze and report
Depending one objectives. Different reporting
procedures and data analysis procedures will
be needed.
The INPUT Statement
This way of describing the input raw data record
to SAS is called column input because it
defines the starting and ending positions of each
field.
This implies that each field in a raw data record
is in the same position in every record of the file.
Data Step to read external raw data
WITH FILENAME Statement
When reading a raw data set, one must inform SAS
system where to read the raw data.
The approach we discussed is to use the following
statements (WITH FILENAME statement):
FILENAME sal_dat
‘C:\math707\RawData\RawData_dat\salesdata.dat’ ;
DATA saleslib.sales_sasdata;
INFILE sal_dat;
INPUT last_name $ 1-7 month $ 9-11 residential 13 -21
commercial 23 – 31;
NOTE: You can refer to the same raw data set in other
DATA steps by using the fileref.
Data Step to read external raw data
WITHOUT FILENAME Statement
To inform SAS where the raw data set is located, we
can ignore the FILEMANE statement and combine
the path into the INFLE statement:
DATA saleslib.Sales_sasdata;
INFILE ‘C:\math707\RawData\RawData_dat\salesdata.dat’ ;
INPUT last_name $ 1-7 month $ 9-11 residential 13 -21
commercial 23 – 31;
NOTE: No Fileref is defined in the above statements to read the
salesdata.dat.
The DATA Step
General form for the complete DATA step without
FILEMANE statement:
DATA SAS_data_set_name;
INFILE ‘path-to-input-raw-data-file’;
INPUT variable $ start - end . . .;
RUN;
General form for the complete DATA step with
Filename statement:
DATA SAS_data_set_name;
FILENAME Fileref ‘path-to-input-raw-data-file’;
INFILE Fileref ;
INPUT variable $ start - end . . .;
RUN;
The order of the variables in the
INPUT statement
Using INPUT to read fixed column data set, it does not need to
be in sequential order of variables in the raw data set.
For example:
INPUT residential 13-21 commercial 23-31 Last_name $ 1-7
Month $ 9-11;
will read the variables residential from col. 13 to 21, commercial
from 23-31, then, move line pointer back to column #1 to
read 1-7 for Last_Name and then 9-11 for month.
The output SAS data set will have the variables in the order of
Residential, commercial, Last_name, Month
Use the option ‘OBS=
statement
‘ in INFILE
When the data set is very large, it is not a good idea
to run the draft program using the entire data set.
It is important to make sure there is no syntax
error and reduce potential data error before
processing the entire data set.
There are two ways to do this:
(1)Use SYSTTEM OPTIONS introduced in previous
chapter:
OPTIONS firstobs = n1, obs=n2;
(2) Use OBS= as an option inside INFILE statement:
INFILE fileref obs = n;
Use _NULL_ as the SAS Data Set Name
in Data Step
Similar to INFLIE fileref OBS = n; for preventing
processing the entire data set until the program is
correct, one does not need to create any SAS data
set (including temporary SAS data set). This can be
done by using:
DATA _NULL_ ;
/* data set name _NULL_ means ‘Do not create any
SAS data set in this data step’ */
FILENAME fileref ‘
‘;
INFILE fileref OBS = n ;
INPUT
RUN;
;
Assignment statements in DATA Step
Assignment statement is to modify, transform, redefine existing
variables or create new variables.
The general syntax is
Variable = expression ;
/*For the Sales data example, the following assignment statement
computes the total sales in the Salesdata: */
Data work.sales;
INFILE ‘
‘;
INPUT
;
Totalsale = residential + commercial;
/*The following assign statements compute the average sales per
month:*/
AvgSale1 = (residential+commercial)/Month;
AvgSale2 = totalsale/Month;
/* AvgSale2 statement must appear after the Totalsale statement */
RUN;
Exercise
The following data admitfix.dat is posted on the class website.
Variable
Type
start
end
Description
ID
num
1
4
Name
char
5
20
patient name
Sex
char
22
22
sex (F or M)
Age
num
28
29
age in years
Date
num
32
36
day of admission
Height
num
42
43
height in inches
Weight
num
49
51
weight in pounds
ActLevel
char
52
55
activity level (LOW, MOD, HIGH)
Fee
num
59
63
clinic admission fee
ID
Name
Sex
Age
2458
Murray,
W
M
27
18251
72
2462
Almers,
C
F
34
18253
2501
Bonaven
ture, T
F
31
18267
patient ID number
Date
Height
Weight
ActLevel
Fee
168
HIGH
85.20
66
172
HIGH
124.80
61
123
LOW
149.75
Write a SAS program to perform the
following tasks:
• Read the data set admitfix.dat using column
format, and create the SAS data set
admit_sasdata in the Work library
• Compute BMI using the formula:
BMI 
Weight (lb)  703
( Height (in))2
• Use PROC CONTENTS to see the variable attributes
• Use PROC PRINT to print the admit_sasdata, and
use the date9. to display the date variable.
Save the program as c5_colInp to your SASEx folder
Solution
filename adm_fix
'C:\math707\RawData\RawData_dat\admitfix.dat';
data admit_sasdata;
infile adm_fix;
input ID 1-4 name $ 5-20 sex $ 22 age 28-29 date 32-36
height 42-43 weight 49-51 acelevel $ 52-55 fee 59-63;
bmi=weight*703/(height**2);
proc contents; run;
proc print;
format date date9.;
run;
Date Constants
As discussed in Chapter 4, SAS treats date as numeric value.
SAS defines the date 01/01/1960 as the date value 0, and
sequentially adding the # of dates for the later date,
subtracting # of dates for the earlier date.
Here are some examples:
Actual Date
SAS Stored Date value
01/01/1960
0
01/25/1960
24
12/25/1959
-7
01/01/1961
366
SAS also provides various formats to display date (as discussed
in Chapter 4: DATE9. and MMDDYY10. are two common date
display formats). Besides how to handle dates, SAS also
provides several formats to represent a date Constant:
‘ddmmmyy’d , ‘ddmmmyyyy’d
or with double quotation marks.
Date Constant (continued)
Here are some examples:
Actual Date
In terms of SAS date constant
3/25/2007
’25mar2007’d
9/8/2009
’08sep2009’d
Today’s date
TODAY()
NOTE: TODAY() is a SAS function, which provides today’s date.
One can request today’s date by an assignment statement:
Today_date = today();
The result will be a numeric date value for today’s date counting
from 01/01/1960.
To properly print (display) the date, refer to Chapter 4: using
Date Format such as DATE9. , MMDDYY10.
Time constant, Date-Time constant in SAS
In addition to Date constant, SAS also provides
TIME constant for any given date:
‘hour:minutes’t
for up to minute.
‘hour:minutes:second’t
for up to second.
Example: Duetime = ’23:59’t ;
TIME constant for a SPECIFIC date:
‘ddmmmyyyy:hour:minute:second’dt
Example: DueDate = ’09sep2009:23:59:59’dt
Exercise
Write a program to practice the following:
•Find out today’s date using TODAY() SAS function.
•Define the July 4th, 2011 as a date constant.
•Define the begin time and end time for Math 707 using time constant.
Begin time is 17:00:00 , end time is 19:45:00
•Define the first second of the year of 2011 using datetime constant.
•Print the date constant using DATE9. , print time constant using
TIME10. , print the datetime constant using DATETIME25.
NOTE: DATE9. display date, TIMEw. displays time, and DATETIMEw.
display datetime. W is the width needed to display the values. It should
be large enough as needed.
Solution
data datetime;
today_D=today();
d_july4='04jul2011'd;
bg_time_S575 = '17:00:00't;
en_time_s575 = '19:59:59't;
dt_jan_2011 = '01jan2011:00:00:01'dt;
proc print;
format today_d d_july4 date9. bg_time_s575
en_time_s575 time10. dt_jan_2011 datetime25.;
run;
Subsetting data cases using conditional
IF statement
In Chapter 4, we use WHERE statement in PROC step, such as
PROC PRINT; to select cases.
In this chapter, we introduce how to use the conditional IF
statement for case selection in the DATA Step.
In the later chapters, we will discuss further the difference
between WHERE and IF.
In DATA Step, we can use the statement:
IF expression;
To select cases that only satisfy the IF expression statement.
NOTE: For cases which do not satisfy the IF condition will not
be kept in the output SAS data set.
Conditional IF to select observations
General Syntax:
IF condition;
NOTE: When the condition is true, the observation
is selected, otherwise, not selected..
For example: IF sex = ‘M’;
Will only select subjects whose sex is ‘M’.
NOTE: if the data value is ‘m’, it is not selected,
since data value is case sensitive.
Example: Using IF for selecting only
Month in Jan, Feb, March for the
Salesdata
Data work.sale;
INFLIE ‘C:\math707\RawData\RawData_dat \Salesdata.dat’;
input last_name $
1 – 7
month $
9 - 11
residential $ 13 -21 commercial $ 23–31;
If Month in (‘JAN’, ‘FEB’, ‘MAR’);
run;
Proc print;
Run;
Can we read data from within the SAS
program?
Answer is YES.
Here is the general steps:
DATA …. ;
INPUT ………….;
……….
DATALINES;
/*CARDS; also works */
Actual data values that is entered based on the format stated in
the INPUT statement.
………
;
RUN;
Example – data are within the SAS program
The following is the scores of quizzes, test1, test2 and final of a class.
Name Q1
Q2
Q3
Q4
Q5
T1
T2
Final
----+----1----+----2----+----3----+----4
CSA
17
18
15
19
18
85
92
145
DB
.
16
14
18
16
72
76
120
QC
20
18
19
16
20
92
95
143
DC
18
15
.
15
20
82
79
125
E
20
18
15
15
18
80
82
135
F
16
16
15
15
16
72
75
116
GC
20
16
17
16
17
.
87
139
HD
18
15
15
.
19
85
79
115
IM
17
18
19
20
20
95
92
145
WB
13
16
14
15
16
72
66
110
Write a SAS program to read the data by having the data
included in the SAS program.
/*Program Statements */
DATA scores;
INPUT
Name $ 1-5 Q1 6-7 Q2 10-11 Q3 14-15 Q4 18-19 Q5 22-23
TEST1 25-27 TEST2 29-31 Final 33-36;
CARDS;
/*You can also use
DATALINES; in place of CARDS;
CSA
17
18
15
19
18
85
92
145
DB
.
16
14
18
16
72
76
120
QC
20
18
19
16
20
92
95
143
DC
18
15
.
15
20
82
79
125
E
20
18
15
15
18
80
82
135
F
16
16
15
15
16
72
75
116
GC
20
16
17
16
17
.
87
139
HD
18
15
15
.
19
85
79
115
IM
17
18
19
20
20
95
92
145
WB
13
16
14
15
16
72
66
110
;
RUN;
PROC PRINT;
RUN;
*/
Creating Raw Data File
So far, we introduce how to READ an external Raw data set
(such as .txt file, .dat file) which has fixed columns for each
variables.
By the same method, we can create a raw data set and save it
to an external location using SAS.
For INPUT external raw data set, we use
INFLIE and INPUT statements.
For creating a raw data and PUT the data to an external
location, we use
FILE and PUT statements.
FILE statement defines the location where the raw data set will
be saved.
PUT statement defines how the variables will be saved.
Syntax for FILE and PUT statements
FILE ‘path of the raw data set location in the storage
space’ ;
PUT var1 start_col – end_col
……. ;
var2 start_col – end_col
When we create a raw data set to be saved as an
external file, we do not need to create any SAS
data set (including temporary data set), therefore,
we can create a _NULL_ Data Step for this
purpose.
The following is an example of creating a raw data set
: admit.dat from the SAS data set admit in the
library mylib, which we created previously.
Read SAS data set Admit from library mylib, and
create a raw data set adAgegt30.dat, then save the
variables in the following order and columns:
NAME (1-20), SEX (22), AGE (24-26), Fee (28-35), Weight
(37-40), Height (42-45)
to the c-drive in the Raw_data folder for AGE > 30.
Libname mylib ‘C:\math707\SASData’;
Data _NULL_;
Set mylib.admit;
IF Age > 30;
FILE ‘C:\math707\RawData\RawData_dat\adagegt30.dat’;
PUT name 1-20 sex 22 age 24-26 fee 28-35 weight 37-40
height 42-45;
RUN;
Similar to FILENAME, INFILE and INPUT, we
can use FILENAME, FILE and PUT statements to
create a new external raw data set
Libname mylib ‘C:\math707\SASData’;
FILENAME age
‘C:\math707\RawData\RawData_dat\adagegt30.dat’ ;
Data _NULL_;
Set mylib.admit;
IF Age > 30;
FILE age;
PUT name 1-20 sex 22 age 24-26 fee 28-35 weight 37-40
height 42-45;
RUN;
Describing data using PUT Statement
Different from INPUT statement is that we DO NOT need to
distinguish between Numeric and Character using $ for
Character variables in the PUT statement. Once SAS writes
the data values to the storage, no further processing is
needed that requires SAS to know if the variable is Numeric
or Character.
NOTE: Usually, you need to have the FILE statement given
before PUT statement. However, if you do not give the FILE
statement before PUT statement, SAS will write the data
values to SAS LOG. Or, if you use LOG as fileref in FILE
statement, SAS will also write the raw data to SAS LOG:
FILE LOG;
PUT ………........;
NOTE: use PRINT as the fileref in FILE statement:
FILE PRINT;
PUT ………..
;
Will write the raw data lines to SAS OUTPUT window.
Exercise
Write a SAS program to read admit data in your Mylib, and
create an external raw data set with only the observations
whose Actlevel = ‘HIGH’.
Put the external data to the folder:
c:\math707\Rawdata\Rawdata_dat, and call the raw data set
as adm_high.dat
by including the following variables
(start Col. – End Col.):
Name (1-20), Age (22-24), Height ( 26-28) , Weight (30-33),
Fee (35-45), and Actlevel (4 7-50)
Solution
data _null_;
set mylib.admit;
file
'c:\math707\Rawdata\\Rawdata_dat\adm_high.da
t';
if Actlevel = 'HIGH';
put name 1-20 Age 22-24 Height 26-28 Weight
30-33 Fee 35-45 Actlevel 47-50;
*put name age height weight fee actlevel; /*Free
format with space as delimiter */
run;
proc print; run;
Other methods of reading external
raw data
You learn how to input raw data using Column
INPUT.
The conditions for Column INPUT are data must
be in standard format and must be in fixed
columns.
Many real world data are not prepared this way.
Various other INPUT techniques will be
discussed in the later chapters.
Reading Microsoft Excel Data
SAS can also read various other data created using
different software. The key relies on how the data
set is referenced. For EXCEL data, you can read the
Excel data using
• SAS/ACCESS LIBNAME
• IMPORT Wizard
Recall: SAS LIBNAME in which a SAS data library is
created by defining a LIBREF to reference the SAS
data set folder.
The SAS/ACCESS LIBNAME is similar. The LIBNAME
statement defines an Excel Workbook that
references to the folder of Excel data sets.
Steps for Reading Excel Data
The Data step must provide the following
instructions to SAS:
• A libref pointing to the location of the Excel
workbook to be read.
• A new SAS data set name and a libref pointing
to the location of the new SAS data set, the
name and location.
• The name of the Excel worksheet to be read.
Tasks and Corresponding SAS
statements to accomplish the tasks
Task
SAS program statement
Reference Excel
Workbook
LIBNAME libref ‘location-of-Excel-workbook’
<options>;
New SAS data set to DATA sas-data-set-name;
be created
Read in an Excel
worksheet
SET libref.Excel-work-sheet ;
Execute Data Step
RUN;
Define SAS/ACCESS LIBNAME
SAS /ACCESS LIBNAME statement has the syntax:
LIBNAME libref ‘location-of-Excel-workbook’ <options>;
Ex:
LIBNAME Exc_lib ‘C:\math707\RawData\RawData_XLS\admit.xls’
;
Exc_lib references to the Excel workbook admit.xls.
NOTE:
It is possible that an Excel workbook consists of more than one
Excel Worksheet. Each of the Worksheets will be read as a
separate SAS data set.
SAS 9.2 can read both .XLS (Excel 2003) and .XLSX (Excel 2007)
How SAS define a Valid Excel
Worksheet Name
NOTE: All Excel worksheet have a special character ($) at the end
of the Excel worksheet name. This is not a Valid data set name
in SAS. In order to recognize a proper Excel worksheet name,
SAS adds a letter n (or N) to a quotation marked Excel
worksheet name.
Ex: Suppose Exercise is a .xlsx Excel worksheet, a valid Excel
worksheet name SAS recongnizes is
‘Exercise$’n
or ‘exercise$’N
Suppose the Exercise.xlsx is stored in C:\Exceldata folder. The
following statements read Exercise.xlsx file into SAS:
LIBNAME TEST ‘C:\exceldata\Exercise.xlsx’;
DATA exer_sasdata;
SET TEST.’exercise$’n ;
RUN;
Named Ranges in Excel Worksheet
A named range is a range of cells within a
worksheet that you define in Excel and assign
a name to.
The valid name for the named Range Excel
Worksheet
tests_week_1 shown in the
SAS Explorer will be tests_week_1 , not
test_week_1$
NOTE: It is a common practice to use PROC
CONTENTS or PROC DATASETS procedures to
view the data set and variable attributes of
the SAS data sets created from Excel data
sets.
Libname Statement Options for readin EXCEL file
LIBNAME libref ‘location-of-Excel_workbook’ <options>;
A variety of options may be useful when referencing Excel workbook:
•
DBMAX_TEXT=n : indicates the length of the longest character
string is n, which is between 256 and 32767. The default n =
1024. (use PROC CONTENTS to check it).
•
GETNAME = YES|NO : determines whether SAS will use the first
row in an Excel worksheet as Column Names. Default = YES
NOTE: It is common that Excel sheet does not include Variable
names as the first case. If this is the case, GETNAME = NO.
•
MIXED=YES|NO: whether to import data with both character and
numeric values and convert all data to Character. Default = NO,
which will read Character as character and Numeric as numeric. A
wrong data type will be read as missing.
•
SCANTEXT=YES|NO: Whether to read the entire data column and
use the length of the longest string as the SAS column width.
Default = YES. If it is NO, then the column width is 255.
MORE Options in LIBNAME statement
•
SCANTIME=YES|NO: Whether to scan all row values in a
date/time column and automatically determine the TIME
format if only time values exist. Default = NO.
If specify YES, the format will be TIME8.
If specify NO, the format will be DATE9.
•
USEDATE = YES|NO: whether to use DATE9. format for
date/time values in Excel workbooks. Default = YES.
If specify NO, the format is DATETIME.
Creating Excel Work-sheets from SAS Data Sets
Besides reading Excel worksheets, SAS can also create Excel
worksheets from SAS data sets:
This is accomplished by defining the new Excel worksheet in
LIBNAME statement and using the LIBREF as the new data
set name in DATA step:
Ex:
LIBNAME Ex_out ‘c:\math707\Exercise_out.xlsx’;
DATA Ex_out.High_exer;
Set work.exercise;
IF level=‘HIGH’;
Run;
Create a new Excel workseet high_exer from SAS data set
Exercise in the WORK library and save it in the new Excel file
Exercise_out.xlsx
An Example – reading Excel sheet,
Admit.xls from external location
/* Read Excel worksheet */
libname ex_adm
'C:\math707\RawData\RawData_XLS\admit.xls'
mixed=Yes GETNAMES = NO;
data ex_admit;
set ex_adm.'admit$'n ;
proc print; run;
Partial list of the output:
The SAS System
Obs
1
2
3
4
F1
2458
2462
2501
2523
16:19 Monday, September 20, 2010
F2
Murray, W
Almers, C
Bonaventure, T
Johnson, R
F3
M
F
F
F
F4
27
34
31
43
F5
1
3
17
31
F6
72
66
61
63
F7
168
152
123
137
1
F8
HIGH
HIGH
LOW
MOD
F9
85.20
124.80
149.75
149.75
How to Rename the Variable Names
from Excel Sheet?
If GETNAME = NO, the default Variable Names are F1, F2, F3,
…..
Two ways to have the correct variable names:
1. To add Variable Names to the Excel Sheet prior to reading
the Excel sheet, and use GETNAME=YES (default).
2. To rename the Variable names in the SAS program in the
DATA statement using RENAME option:
For example:
Libname ex_adm ‘C:\math707\RawData\RawData_XLS\admit.xle’
getname=NO MIXED=Yes;
DATA Ex_admit (RENAME = (F1=ID F2=NAME F3=Sex F4=Age
F5=Date F6=Height F7=Weight F8=Actlevel F9=Admit_fee));
set ex_adm.'admit$'n ;
PROC PRINT; RUN;
Exercise: Read EXCEL data
Write a SAS program to
•read the excel file diabetes.xls, which has 20 cases, located at
c:\math707\rawdata\rawdata_xls
•Use PROC CONTENTS to see the data and variable attributes
•Use PROC PRINT to see the data set. Observe the data set:
(a) Variable Names are F1, F2, …., F8
(b) # of observation is not 20 , but 19.
Solution
libname dia_lib
'C:\math707\RawData\RawData_XLS\diabetes.xls';
data diab;
set dia_lib.'diabetes$'n;
proc contents; run;
proc print; run;
Exercise: revise the program reading
Diabetes.xls data to perform the
following tasks:
•
Use Getname = NO no variable names will be read
from Excel sheet, and Mixed =YES so that character
and numeric will be read as it is.
•
Rename variable names: F1=ID, F2=SEX, F3=AGE,
F4=HEIGHT, F5=WEIGHT, F6=PULSE, F7=FASTGLUC,
F8=POSTGLUC
Answer
libname dia_lib
'C:\math707\RawData\RawData_XLS\diabetes.xls'
getnames=NO mixed = YES;
data diab (rename=(F1=ID F2=SEX F3=AGE F4=HEIGHT
F5=WEIGHT F6=PULSE F7=FASTGLUC F8=POSTGLUC));
set dia_lib.'diabetes$'n;
proc contents; run;
proc print; run;
Exercise: Create Excel file using SAS
Write a SAS program to read diabetes data from
mylib library and perform the following tasks:
•Select only individuals with age >= 50
•Create an excel file, diab_senior.xls, and save it
to c:\math707\rawdata folder.
Solution
libname dia_out
'C:\math707\rawdata\diab_senior1.xlsx';
data dia_out.diab_high;
set mylib.diabetes;
if age >=50;
run;
proc print; run;
NOTE: You can not see the content of the Excel data file
immediately after running the program. In order to access
the Excel file you just created, you need to get out of the
current program by exiting from SAS or by running
another program.
Use IMPORT Wizard to Read Excel
worksheet or other types of Raw Data
In addition to writing SAS program statements, you
can also use the SAS Pull-down menu :
Go to File, Import Data and follow the
instructions to read a variety of data, including:
• dBase file (.dbf)
• Excel 2007 or earlier (.xlsx, .xls, .xlsb, .xlsm)
• Microsoft tables (.mdb, .accdb)
• Delimited files ( *.*)
• Comma-separated files (*.csv)
• Text files (.txt, .tab, .asc, .dat)
Use Export Widzard to write the SAS
Data Set as External Excel data set
One can also use SAS Pull-down menu to write
SAS data sets to an external source:
Go to File, Export, then follow the instructions to
create external data sets from SAS data sets.
Save SAS Codes from using Import
and Export Pull-down menu
When using the SAS Pull-down menu, you can also
save the SAS program codes behind the Import
and Export pull down menu.
Once you save the SAS programs running behind,
you can edit the programs and save them for
future needs
Download