SAS syntax, entering your own data, and Input Statements

advertisement
BMTRY 789 Lecture 2
SAS Syntax, entering raw data, etc.
Readings – Chapters 1, 2, 12, & 13
Lab Problems 1.1, 1.2, 1.3, 1.5, 1.10, 12.1,
12.2, 12.6, 12.16, 13.3, 13.8
Homework Due – None
Homework for Next Week – No Class but turn
in HW1!
Lecturer: Annie N. Simpson, MSc.
Summer 2009
BMTRY 789 Intro. To SAS Programming
2
Parts of a SAS Program

What are the two main parts of a SAS
program?
Summer 2009
BMTRY 789 Intro. To SAS Programming
3
Parts of a SAS Program

What is a SAS STATEMENT?
Summer 2009
BMTRY 789 Intro. To SAS Programming
4
DATA Step

What takes place in a DATA step?
Summer 2009
BMTRY 789 Intro. To SAS Programming
5
DATA Step = Do/Create Things

What takes place in a DATA step?







Input Data (what types?)
Do END loops
IF-THEN-ELSE statements
Subset data: IF expression/ IF expression THEN
DELETE
Create and redefine variables
Functions
Interleave, merge, and update
Summer 2009
BMTRY 789 Intro. To SAS Programming
6
PROC Step

What takes place in a PROC step?
Summer 2009
BMTRY 789 Intro. To SAS Programming
7
PROC Step = Produce Results

What takes place in a PROC step?






Perform specific analysis or function
Sorting
Printing
Univariate Analysis
Analysis of variance
Regression…
Summer 2009
BMTRY 789 Intro. To SAS Programming
8
PROC Step

What PROCs have you learned about in
your readings so far?
Summer 2009
BMTRY 789 Intro. To SAS Programming
9
PROC Step


What PROC would you use to produce
Simple Descriptive Statistics?
What about to produce a stem-and-leaf
plot, boxplot, histogram, QQPlot, etc?
Summer 2009
BMTRY 789 Intro. To SAS Programming
10
PROC Step broken down into subgroups


How do you get the Proc Means output
separately for men and women if you have
a GENDER variable?
What descriptive stats can you do on the
non-numeric data? What Proc would you
use?
Summer 2009
BMTRY 789 Intro. To SAS Programming
11
PROC Step for Graphics?

What PROCs can you use to produce
graphs and charts?
Summer 2009
BMTRY 789 Intro. To SAS Programming
12
PROC Step for Graphics?

What is the difference between Proc Plot
and GPlot? Proc Chart and Gchart?
Summer 2009
BMTRY 789 Intro. To SAS Programming
13
DATA…How do we work with it?

What type of data is this?
Data EX1;
INPUT Group$ X Y Z;
DATALINES;
Control 12 17 19
Treat 23 . 29
Control 19 18 16
Treat 22 22 .
;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
14
SAS INPUT & INFILE Statements
In what 2 situations do you use an INPUT
statement?

1.
2.


________
________
When is the only time that you use an INFILE
statement?
What is the INPUT statement really
accomplishing? (i.e. why does SAS need it)
Summer 2009
BMTRY 789 Intro. To SAS Programming
15
SAS INPUT Statement





Before you can analyze your data with SAS software, your
data must be in a form that SAS can read
If you put raw data directly in your SAS program, then
your data are internal
You may want to do this when you have small amounts of
data, or you are testing a program with a small test data
set
INPUT is used to read data from an external source or
from internal data contained in your SAS program
The INFILE statement names an external file from which
to read the data; otherwise the CARDS (or DATALINES)
statement is used to precede the internal data
Summer 2009
BMTRY 789 Intro. To SAS Programming
16
External raw data files


Usually you will want to keep your data in
external files, separating the data from the
program.
Use the INFILE statement to tell SAS the
filename and path (directory) of the external file
containing the data. The INFILE statement
follows the DATA statement and must precede
the INPUT statement. After the INFILE
keyword, the file path and name are enclosed in
single quotes.
Summer 2009
BMTRY 789 Intro. To SAS Programming
17
INPUT statement example
Data one;
Data one;
INFILE ‘c:\MyData\diabetes.dat’;
Input a$ b c;
Input a$ b c;
cards;
Run;
8 76 5
7 43 9
1 22 2
;
Run;
*Reading from an external file into *Reading internal data to
a SAS data set
create SAS data set ‘one’
Summer 2009
BMTRY 789 Intro. To SAS Programming
18
*Note - SAS log



Whenever you read data from an external file,
SAS gives some very valuable information about
the file in the SAS log
Always check this information after you read a
file as it could indicate problems
A simple comparison of the number of records
read from the INFILE with the number of
observations in the SAS data set can tell you a
lot about whether or not SAS is reading your
data correctly
Summer 2009
BMTRY 789 Intro. To SAS Programming
19
*Note – Long Records
In some operating environments, SAS assumes
external files have a record length of 256 or
less. (The record length is the number of
characters, including spaces, on a data line.)
 If you data lines are long, and it looks like SAS
is not reading all your data, then use the
LRECL= option in the INFILE statement to
specify a record length at least as long as the
longest record in your data file.
INFILE ‘c:\MyData\Diabetes.dat’ LRECL=2000;

Summer 2009
BMTRY 789 Intro. To SAS Programming
20
Controlling INPUT with Options in
the INFILE statement


The following options are useful for reading particular
types of data files. Place these options after the filename
in the INFILE statement.
FIRSTOBS=


This tells SAS at what line to begin reading data. This is useful if
you have a data file that contains descriptive text or header
information at the beginning and you want to skip over these lines
to begin reading the data.
OBS=

This tells SAS to stop reading when it gets to that line in the raw
data file.
Summer 2009
BMTRY 789 Intro. To SAS Programming
21
Controlling INPUT with Options in
the INFILE statement (cont.)

MISSOVER


By default, SAS will go to the next data line to read
more data if SAS has reached the end of the data line
and there are still more variables in the INPUT
statement that have not been assigned values.
The MISSOVER option tells SAS that if it runs out of
data, don’t go to the next data line. Instead, assign
missing values to any remaining variables before
proceeding to the next line.
Summer 2009
BMTRY 789 Intro. To SAS Programming
22
Controlling INPUT with Options in
the INFILE statement (cont.)

PAD


You need this option when you are reading data using
column or formatted input and some data lines are
shorter than others. If a variable’s field extends past the
end of the data line, then, by default, SAS will go to the
next line to start reading the variable’s value.
This option tells SAS to read data for the variable until it
reaches the end of the data line, or the last column
specified in the format or column range, whichever
comes first.
Summer 2009
BMTRY 789 Intro. To SAS Programming
23
Data Step: input statement
There are three basic forms of the input statement:
1.
List input (free form) – data fields must be separated by
at least one blank. List the names of the variables, follow
the name with $ for character data
Example: Input Name$ Age;
2.
Column input – follow the variable name (and $ for
character) with a startingcolumn – endingcolumn
Example: Input Name$ 1-15;
3.
Formatted input – Optionally precede the variable
name with @startingcolumn; follow the variable name
with a SAS format designation
Example: Input @1 Name$ 20. @21 DOB mmddyy8.;
Summer 2009
BMTRY 789 Intro. To SAS Programming
24
LIST INPUT:
Reading Raw Data Separated by Spaces



If the values in your raw data file are all separated by at least
one space, then using list input to read the data may be
appropriate
Any missing data must be indicated with a period
Character data, if present, must be simple: no embedded
spaces, and no values greater than eight characters in length.
(Use the LENGTH statement to change the length)
LENGTH Name$ 20.;
If the data files contains dates or other values which need
special treatment, then list input may not be appropriate
INPUT Name$ Age Height;
 The $ after Name indicates that it is a character variable,
whereas the Age and Height variables are both numeric

Summer 2009
BMTRY 789 Intro. To SAS Programming
25
COLUMN INPUT:
Reading Raw Data Separated by Columns
If each of the variable’s values is always found in the same
place in the data line, then you can use column input as long
as all values are character or standard numeric
 Standard numeric data contain only number, decimal points,
plus and minus signs, and E for scientific notation. Dates or
numbers with embedded commas, for example, are not
standard
INPUT Name$ 1-10 Age 11-13 Height 14-18;
 The first variable, Name, is character and the data values are
in columns 1 through 10. The Age and Height variables are
both numeric, since they are not followed by a $, and data
values for both of these variables are in the column ranges
listed after their names

Summer 2009
BMTRY 789 Intro. To SAS Programming
26
FORMATTED INPUT:
Reading Raw Data NOT in Standard Format





This is where you want to use a Formatted Input or a
Mixed Input.
Informats are useful anytime you have non-standard
data
Numbers with embedded commas or dollar signs are
examples of non-standard data
Dates are perhaps the most common non-standard
data
Using date informats, SAS will convert conventional
forms of dates into a number, the number of days
since January 1, 1960. This number is referred to as a SAS
date value (0)
Summer 2009
BMTRY 789 Intro. To SAS Programming
27
Difference between INFORMAT and
FORMAT?




INFORMATs give SAS special instructions for
reading a variable
FORMATs give SAS special instructions for writing
a variable
If specified in a DATA step, the name of the
informat or format will be saved in the data set
and will be printed by PROC CONTENTS
Like the LABEL statement, these can also be used
in the PROC step to customize your reports, but
they would not be stored in the data set
Summer 2009
BMTRY 789 Intro. To SAS Programming
28
Informats: 3 basic types

Character, numeric, date





Character: $informatw.
Numeric: informatw.d
Date: informatw.
The $ indicates character informats, INFORMAT is
the name of the informat, w is the total width, and d
is the number of decimal places (numeric only)
Two informats do not have names: $w., which reads
standard character data, and w.d, which reads
standard numeric data
Summer 2009
BMTRY 789 Intro. To SAS Programming
29
Informats (cont.)

The period in an informat is very important
because it distinguishes an informat from a
variable name, which, by default, cannot contain
any special characters except the underscore
INPUT Name : $10. Age : 3. Height : 5.1 DOB : MMDDYY10.
*Selected Informats can be found in
pp. 44-45 (3rd Ed) in “The Little SAS Book”.
Summer 2009
BMTRY 789 Intro. To SAS Programming
30
Formatted Input Example
INPUT Name : $16. Age : 3. +1 Type : $1. +1 Date
MMDDYY10. (Score1 Score2 Score3 Score4 Score5) (4.1);

The variable Name has an informat of $16., meaning that it
is a character variable 16 columns wide. Variable Age has an
informat of three, is numeric, three columns wide, and has
no decimal places. The +1 skips over one column. Variable
Type is character, and it is one column wide. Variable Date
has an informat MMDDYY10. And reads dates in the form
10-31-1999 or 10/31/1999, each 10 columns wide. The
remaining variables, Score1 through Score5, all require the
same informat, 4.1. By putting the variables and the
informat in separate sets of parentheses, you have only to
list the informat once.
Summer 2009
BMTRY 789 Intro. To SAS Programming
31
Mixing Input Styles


List style is the easiest; column style is a bit
more work; and formatted style is the hardest of
the three. However, column and formatted styles
do not require spaces (or other delimiters)
between variables and can read embedded
blanks.
Sometimes you use one style, sometimes
another, and sometimes the easiest way is to
use a combination of styles. SAS is so flexible
that you can mix and match any of the input
styles for your own convenience.
Summer 2009
BMTRY 789 Intro. To SAS Programming
32
Mixing Input Styles (cont.)



With list style input, SAS automatically scans to
the next non-blank field and starts reading.
With column style input, SAS starts reading in
the exact column that you specify.
But with formatted input, SAS just starts
reading-wherever the pointer is, that is where
SAS reads. Sometimes you need to move the
pointer explicitly, and you can do that by using
the column pointer, @n, where n is the number
of the column SAS should move to.
Summer 2009
BMTRY 789 Intro. To SAS Programming
33
Mixed Input example
INPUT ParkName$ 1-22 State$ Year @40 Acreage COMMA9.;
1--------------------------------------------------------------23----------------------------------------------------40-----------------------
Yellowstone
Everglades
Yosemite
Great Smokey Mountains
Wolf Trap Farm
ID/MT/WY 1872 *
FL 1934 *
CA 1864 *
NC/TN 1926 *
VA 1966 *
4,065,493
1,398,800
760,917
520,269
130
INPUT ParkName$ 1-22 State$ Year Acreage COMMA9.;
Acreage would look like (It would start reading at the *):
4065
.
.
5
.
Summer 2009
BMTRY 789 Intro. To SAS Programming
34
Reading Multiple Lines of Raw
Data per Observation



In a typical raw data file each line of data
represents one observation, but sometimes the
data for each observation are spread out over
more than one line.
To tell SAS when to skip to a new line, you
simply add line pointers to your INPUT
statement.
To read more than one line of raw data for a
single observation, you simply insert a slash (/)
into your INPUT statement when you want to
skip to the next line of raw data.
Summer 2009
BMTRY 789 Intro. To SAS Programming
35
Reading Multiple Lines of Raw
Data per Observation (con.)

The (#n) works the same as (/) but it is more
fexible. The #n works by inserting the number of
the column for that observation where you want
to read your raw data.
Nome AK
55 44
88 29
Miami FL
…
Summer 2009
INPUT City$ State$ / NormHi NormLo #3 RecHi RecLo;
BMTRY 789 Intro. To SAS Programming
36
Reading Multiple Observations
per Line of Raw Data (@@)


When you have multiple observations per line of raw
data, you can use double trailing at signs (@@) at the
end of your INPUT statement.
SAS will hold that line of data, continuing to read
observations until it either runs out of data or reaches an
INPUT statement that does not end with a double trailing
@. This is also known as a “hard hold”.
Nome AK 55 44 88 29 Miami FL 72
62 105 40 Atlanta . 59 . 12
INPUT City$ State$ NormHi NormLo RecHi RecLo @@;
Summer 2009
BMTRY 789 Intro. To SAS Programming
37
Reading Part of a Raw Data
File (@)




You don’t have to read all the data before you tell SAS
whether to keep an observation. Instead, you can read
just enough variables to decide whether to keep the
current observation.
Similar to the @@, SAS will hold that line of data with a
single trailing @. This is known as a “soft hold”.
While the trailing @ holds that line, you can test the
observation with an IF statement to see if it’s one you
want to keep. If it is, you can then read the data for the
remaining variables with a second INPUT statement.
With the trailing single @, SAS will automatically start
reading the next line of raw data with each INPUT
statement.
Summer 2009
BMTRY 789 Intro. To SAS Programming
38
Reading Part of a Raw Data
File (@) Example
Suppose you have a dataset containing heart
and lung transplant information but you are
trying to construct a dataset of only lung
transplant patients. It is a very large data set
that takes a lot of time to run so you don’t
want to read it all in first and then select out
the portion you want to keep. It would be
better to read in only those data that you
want initially.
Summer 2009
BMTRY 789 Intro. To SAS Programming
39
Reading Part of a Raw Data
File (@) Example (cont.)
Heart 7823 12nov1989
Heart 6477 08sep1992
Lung 7231 22jul1995
Heart 2347 30jan1990
Lung 7842 12mar1998
DATA Lung;
INFILE ‘c:\MyData\Trnsplnt.dat’;
INPUT Type$ @;
If Type = ‘Heart’ then DELETE;
INPUT RecNum TranDt : Date9.;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
40
Reading external commadelimited data

We have two choices when given this type of
data


We can use an editor and replace all the commas
with blanks, or
We can leave the commas in the data and use the
DLM= option in the INFILE statement
Data HtWt;
Infile ‘c:\MyData\survey.txt’ DLM=‘,’;
Input ID Gender$ Age Height Weight;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
41
Reading external commadelimited data (cont.)


Another method besides the DLM= option is to use DSD in
the INFILE
This option performs several other functions besides
treating commas as delimiters.
 If it finds two adjacent commas, it will assign a missing
value
 It will allow text strings surrounded by quotes to be
read into a character variable and will strip the quotes
in the process
Data HtWt;
Infile ‘c:\MyData\survey.txt’ DSD;
Input ID Gender$ Age Height Weight;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
42
Permanent SAS Data Sets


A two level name…a Temporary SAS data set is
the one level name that we have been using:
LibraryName.DataSetName
Temporary SAS data sets will not exist when you
shut down the instance of SAS in which they
were created.
Data new;
Set AIDS;
Run;

First define a SAS Library (Libref)
Summer 2009
BMTRY 789 Intro. To SAS Programming
43
Libname Statement
Use this statement to define your SAS Library
location before using your SAS data sets
Example:

LIBNAME Annie ‘C:\SASDATA’;
Proc Means Data = Annie.EX4A N MEAN STD;
Var X Y Z;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
44
Creating Permanent SAS Data Sets
Libname annie “C:\SASDATA”;
Data Annie.EX1;
INPUT Group$ X Y Z;
DATALINES;
Control 12 17 19
Treat 23 . 21
Control 19 18 16
Treat 22 22 .
;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
45
Using the Permanent SAS Data Sets
Libname xyz “C:\SASDATA”;
Title “Means from EX1”;
Proc Means Data=xyz.EX1;
Var X Y Z;
Run;
Summer 2009
BMTRY 789 Intro. To SAS Programming
46
Now let’s try the in-class problems listed on
slide 1
Summer 2009
BMTRY 789 Intro. To SAS Programming
47
Download