Lecture 3 Getting Your Data into the SAS system

advertisement
Getting Your Data Into SAS
(Chapter 2 in the Little SAS Book)
Animal Science 500
Lecture No. 3
September 7, 2010
IOWA STATE UNIVERSITY
Department of Animal Science
Arithmetic Operators
Operation
Symbol
Example
Result
+
addition
Num + Num
Example: 5 + 3
-
subtraction
Num - Num
subtract the value of 5 Example: 5 – 3
3
or can use two variables
ending wt. – beginning
wt.
*
multiplication (table note
2*y
Always have to have *
cannot use 2(y) or 2y
multiply 2 by the value
of Y
division
var/5
or can use variable
weight gain / days on
test
divide the value of VAR
by 5
exponentiation
a**2
or a^2
raise A to the second
power
1)
/
** can
also use the ^
IOWA STATE UNIVERSITY
Department of Animal Science
add two numbers
together
Comparison Operators
 Comparison
operators set up a comparison,
operation, or calculation with two variables,
constants, or expressions within the dataset
being used .


If the comparison is true, the result is 1.
If the comparison is false, the result is 0.
 Comparison
operators can be expressed as
symbols or with their mnemonic equivalents,
which are shown in the following table:
IOWA STATE UNIVERSITY
Department of Animal Science
Comparison Operators
Symbol
Mnemonic
Equivalent
=
EQ
equal to
^=
NE
not equal to (table note 1)
¬=
NE
not equal to
~=
NE
not equal to
>
GT
greater than
num>5
<
LT
less than
num<8
>=
GE
greater than or equal to (table note
2)
sales>=300
<=
LE
less than or equal to (table note 3)
sales<=100
IN
equal to one of a list
Definition
IOWA STATE UNIVERSITY
Department of Animal Science
Example
a=3
a ne 3
num in (3, 4, 5)
Logical (Boolean) Operators and Expressions
Logical operators, also called Boolean operators, are usually used in
expressions to link sequences of comparisons.
Symbol
Mnemonic Equivalent
Example
&
AND
(a>b & c>d)
|
OR
(a>b or c>d)
!
OR
¦
OR
¬
NOT
ˆ
NOT
~
NOT
IOWA STATE UNIVERSITY
Department of Animal Science
not(a>b)
Finding your data
 Most
of the time your “raw” data files will be
saved as external files
1.
2.
3.
Text files – Word, WordPerfect, Writer, etc.
Spreadsheets - Excel, Lotus, Quattro Pro, etc.
Other systems – Unix, Open VMS, etc.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 The
files containing your stored data will
typically be stored
1.
2.
On the hard drive of the computer that you will
ultimately use to analyze the data with SAS
Stored externally –


USB memory stick (flash memory)
External hard drive
Must get your data from “storage” into SAS to
conduct the analyses
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Use
the Infile statement within a DATA step
 Data
mytrial;
Infile ‘c:\mydocument\trial.xls’;
Input statement (Input variable names
Remember to put the $ for character
variables.
You may have to tell SAS which columns
individual variables are found and place the
decimal
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS

Data mytrial;
Infile ‘c:\mydocument\trial.xls’ DLM=“,” ;
Many options to assist you when using the infile
command.
DLM=
used to specify the delimiter that separates the
variables in your raw data file. For example,
dlm=','indicates a comma is the delimiter (e.g., a
comma separated file, .csv file).
Or, dlm='09'x indicates that tabs are used to separate
your variables (e.g., a tab separated file).
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other



options
DSD
The dsd option has 2 functions.
First, it recognizes two consecutive delimiters as a
missing value.
For example, if your file contained the line 20,30,,50
SAS will treat this as 20 30 50 but with the the dsd
option SAS will treat it as 20 30 . 50 , which is probably
what you intended.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
DSD option allows you to include the delimiter within
quoted strings. For example, you would want to use the
dsd option if you had a comma separated file and your
data included values like "George Bush, Jr.". With the
dsd option, SAS will recognize that the comma in
"George Bush, Jr." is part of the name, and not a
separator indicating a new variable.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
FIRSTOBS=
Tells SAS what on what line you want it to start reading
your raw data file. (Default = 1)
If the first record(s) contains header information such as
variable names, then set firstobs=n where n is the
record number where the data actually begin.
Example: Assume you are reading a comma separated
file or a tab separated file where the variable names are
on the first line.
Use firstobs=2 to tell SAS to begin reading at the
second line. (Ignores the first line with the names of the
variables).
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
MISSOVER
This option prevents SAS from going to a new input line
if it does not find values for all of the variables in the
current line of data.
For example, you may be reading a space delimited file
and that is supposed to have 10 values per line, but one
of the line had only 9 values.
Without the missover option, SAS will look for the 10th
value on the next line of data.
Sets all empty variables to missing when reading a
short line.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
MISSOVER
If your data is supposed to only have one observation
for each line of raw data, then this could cause errors
throughout the rest of your data file. If you have a raw
data file that has one record per line, this option is a
prudent method of trying to keep such errors from
cascading through the rest of your data file.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
OBS=
Indicates which line in your raw data file should be
treated as the last record to be read by SAS.
This is a good option to use for testing your program.
For example, you might use obs=100 to just read in the
first 100 lines of data while you are testing your
program.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other
options
A
typical infile statement for reading a comma
delimited file that contains the variable names
in the first line of data would be:
 INFILE
"test.txt" DLM=',' DSD MISSOVER
FIRSTOBS=2 ;
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
LRECL = logical record length
LRECL is really useful for Windows users.
Default, Windows creates files with a logical record
length of 256.
May appear that SAS is not reading all of your data or
that beyond some point all variables are not being read.
IOWA STATE UNIVERSITY
Department of Animal Science
Reading external files into SAS
 Other

options
LRECL = logical record length
LRECL is really useful for Windows users.
You can tell Windows exactly how long to make the
record length on the filename statement in SAS.
The option is lrecl= (logical record length) and it looks
like this:
filename myFile "c:\some directory\some file.txt"
LRECL= 400;
 This
option is REQUIRED if length of data line
is over 256.
IOWA STATE UNIVERSITY
Department of Animal Science
Knowing what Options are Available
 Obviously



can look up using:
SAS on-line help
SAS manuals and books
Other example programs
Can also determine what options are available using the
PROC Options:
Run;
Quit;
Will output what options are available to you at this step of
your SAS program or code.
IOWA STATE UNIVERSITY
Department of Animal Science
Informats
 Host
of selected informats on pages 46-47 in
the The Little SAS Book, 4th Edition.



Different ways data can be formatted and read in SAS
Dates, Times, and combined DateTime
Reading Julian dates
IOWA STATE UNIVERSITY
Department of Animal Science
Titles and Footnotes
 SAS
allows up to 10 lines of text at the top
(titles) and the bottom (footnote) on each page
of output using the title and footnote
statements.





Title <n> text;
Footnote <n> text;
Where n is the number of lines and have the range of
limits for each 1 to 10.
If text is omitted, the title or footnote is deleted
Otherwise it remains in effect until it is redefined.
IOWA STATE UNIVERSITY
Department of Animal Science
Titles and Footnotes
 SAS
allows up to 10 lines of text at the top
(titles) and the bottom (footnote) on each page
of output using the title and footnote
statements.



To have no titles you can include title;
The default in SAS included the date and page number
at the top of each output.
To get rid of these options

Type nodate and / or nonumber in the options section.
IOWA STATE UNIVERSITY
Department of Animal Science
Temporary versus Permanent SAS Data Sets
 Temporary


Only exists during the current job or session
It is erased by SAS when you finish and close down
SAS
 Permanent


SAS dataset
SAS dataset
Does not mean it is around for ever or eternity
It remains stored even after you close your SAS
session.
 If
you use a data set more than once, it is more
efficient to save it as a permanent SAS data set
IOWA STATE UNIVERSITY
Department of Animal Science
Temporary versus Permanent SAS Data Sets
 Using
the Permanent SAS data set allows you
to skip the infile step whether you use the
import wizard or use an infile statement.
 If
you are going to modify your data set it is
likely easier to use the temporary SAS data
set.



Need to add more data to “final” data set
Have not checked the “final” data set for errors
Maybe other reasons.
IOWA STATE UNIVERSITY
Department of Animal Science
Listing the Contents of a SAS Data Set
 Proc



Contents
Place Proc Contents data=yourdatasetname;
If you leave off the data= then SAS will perform the Proc
Contents on the last data set created.
It is a good way to check and see if all of your data are
being correctly read into SAS for further analyses.
IOWA STATE UNIVERSITY
Department of Animal Science
Listing the Contents of a SAS Data Set
 Output
from Proc Contents –
1.
Data Set Name – be sure you evaluated the
correct data set
2.
Observations – did the correct number of
observations get read into the analysis
3.
Variables - were the correct number of
variables identified
4.
Created – date the analysis was created
5.
Label – Some label you might have provided
IOWA STATE UNIVERSITY
Department of Animal Science
Listing the Contents of a SAS Data Set
 Output
from Proc Contents –
Listing of variables in alphabetical order
The following output is created for each variable
1.
Type – numeric or character
2.
Length – storage size (in bytes)
3.
Format for printing if any (for example the date may
have been converted to worddate)
4.
Informat for input if any (for example mmddyyyy for a
date)
5.
Variable label (e.g. date of birth, height in inches,
weight in pounds
IOWA STATE UNIVERSITY
Department of Animal Science
Processing an Existing Data Set
 When
you want to process an existing SAS
data set

Use the set statement rather than an infile statement
 Each
time SAS encounters a set statement,
SAS inputs an observation from an existing
data set which contains all of the variables
IOWA STATE UNIVERSITY
Department of Animal Science
Processing an Existing Data Set
Data data1; set data2;
Average daily gain = (offweight – onweight) /
daysontest;
Run;
Quit;
Again, if the user does not specify a dataset to perform
the operations, the last dataset used will be used again.
IOWA STATE UNIVERSITY
Department of Animal Science
Arithmetic Operators
 Arithmetic
operators indicate that an
arithmetic calculation is performed, as shown
in the following table:
IOWA STATE UNIVERSITY
Department of Animal Science
Download