CSS 590 Field Plot Technique

advertisement
CROP 590 Experimental Design in Agriculture
Lab Week 1
Introduction to SAS
Recommended reading:
Cody and Smith – Pages 1-18
Part I. SAS Online Documentation
Go to the SAS online documentation website: For
http://support.sas.com/documentation/93/index.html (For version 9.3)
1) Explore the contents of the SAS System Help directory. Base SAS and SAS/STAT software are
most relevant for this course. Note that you can also access SAS online documentation by
using the Help function when you are running SAS.
2) Select a SAS STAT Procedure (PROC) and explore the background information and syntax
documentation.
3) Review the syntax for SAS Help documentation.
SAS keywords, such as statement or procedure names, appear as links in all caps.
Optional arguments appear inside angle brackets (< >).
Values that you must spell as they are given in the syntax appear in
uppercase type.
Argument group that you can repeat are indicated by an
ellipsis.
PROC GLM <options> ;
CLASS variable <(REF= option)> …<variable <(REF=FIRST | LAST)>> </ globaloptions> ;
MODEL dependent-variables = independent-effects </ options> ;
Values that you must supply appear in normal text or in italics.
Mutually exclusive choices are joined with a vertical bar (|).
1
Part II. The SAS Display Manager for Windows
Explore the windows that are displayed and click the help icon for further information about
them.
Editor – this is where you enter programs. Version 9.3 of SAS uses an ‘enhanced editor’ with
many new features. The older ‘program editor’ is also available, but is not recommended.
Log – shows the SAS statements that have been submitted, reports system messages and
identifies errors in your program
Explorer – provides easy access to data sets and SAS files
Results – provides easy access to SAS output files
In SAS 9.3, the default window for results of SAS procedures and analyses is the Results Viewer.
Results are presented in html format. If you prefer the older List (txt) Output format, you
can choose that option by selecting ToolsOptionsPreferences; you then select the
Results Tab and check the box to create a Listing.
Note that you can also navigate SAS windows using the ‘Window’ and ‘View’ drop down menus.
Windows can be cleared when they are active by using the blank page icon on the toolbar. This
is a good housekeeping practice to avoid appending new log and output files to obsolete ones.
Part III. SAS Basics

All SAS statements end with a semicolon.

Case and spacing generally are not important.

Statements can extend to more than one line.

Variable names:

-
begin with a letter
-
up to 32 characters in length (but for list input, the default is 8)
-
cannot contain special characters (eg , ; - /) (underscores _ are OK)
-
should contain no spaces
-
example of a valid SAS variable name: YLD03_KG
-
variables may be designated as numeric or character. Character variables are case
sensitive and sensitive to leading blanks
SAS programs are divided into sections called:
-
The DATA Step – creates a data set and modifies it as needed
-
The PROC Step – specifies SAS procedures to perform (e.g., data analyses)
-
SAS language may be specific to the DATA Step or the PROC Step, but some
statements are universal to both. Global statements apply to all subsequent steps in
the program.
2
Some useful SAS Procedures:
PROC FREQ – Produces frequency and contingency tables for categorical variables and
performs Chi-square tests for goodness of fit
PROC GLIMMIX – Fits statistical models to data with correlations or nonconstant variability and
where the response is not necessarily normally distributed. These models are known as
generalized linear mixed models (GLMM). (Available in version 9.2)
PROC GLM – Performs Analysis of Variance for balanced and unbalanced data; can
accommodate independent variables that are categorical (class variables) as well as
continuous (as in regression)
PROC GPLOT – Creates graphs of data
PROC IML – SAS/IML is a programming language that operates on matrices
PROC LATTICE – Computes Analysis of Variance for lattice designs
PROC MEANS – Computes descriptive statistics for variables across all observations and within
groups of observations
PROC MIXED – Performs analysis of mixed models
PROC PRINT – Prints the observations in a SAS data set
PROC REG – A general-purpose procedure for linear regression
PROC SORT – Sorts observations in a SAS data set by one or more character or numeric
variables
PROC UNIVARIATE – Provides data summarization methods that produce univariate statistics
and information on the distribution of numeric variables
3
Example of a SAS program:
Beginning of DATA step
$ indicates that variety is a
character variable
Creates a new variable
Drops specified observations
from the data set; single
quotes indicate a value for a
character variable
In older versions of SAS, a
‘CARDS’ statement was used
rather than ‘DATALINES’
Each line represents one
observation
Each column represents a
different variable
The period ‘.’ designates a
missing value. Missing values
that are input from Excel
should be left blank.
A semicolon is needed after the
datalines
Comment statement (a note to
yourself that does not affect
the program)
Beginning of PROC Step
Sorts data by variety
Prints the most recently
created data set
Another PROC Step
A title will appear as a
heading on each page of
output until it is reset
Indicates the variables that
you want to analyze.
Requests means for each variety
Always end the program with a
RUN statement
DATA EXAMPLE;
INPUT VARIETY $ PLOTM2 PLOTWT;
YLD03_KG = (PLOTWT/PLOTM2)*10;
IF VARIETY = 'MOREX' THEN DELETE;
DATALINES;
STEPTOE
BARONESSE
HARRINGTON
MOREX
STEPTOE
BARONESSE
HARRINGTON
MOREX
;
4.32
4.28
4.89
4.77
4.61
4.66
4.50
4.35
2355
2825
2236
1980
.
2691
2100
2206
/*DATA SUMMARY*/
PROC SORT DATA=EXAMPLE;
BY VARIETY;
PROC PRINT;
PROC MEANS;
TITLE 'SUMMARY OF BARLEY DATA';
VAR YLD03_KG;
BY VARIETY;
RUN;
QUIT;
4
1) Copy and paste this program into the program editor in SAS. Note how the program is
automatically color coded to signify different types of input. What does each color
represent? Try removing some of the SAS statements in the program – what happens to the
color coding? Horizontal lines indicate the beginning and end of PROC and DATA steps. It is
not a bad idea to explicitly place a ‘RUN’ statement at the end of each step.
2) We have used the simplest form of data input known as ‘list’ input. Each value in a line is
separated by one or more spaces. SAS reads each value and assigns it to the corresponding
variable in the input statement. Many other formats can be specified, such as column input
and comma separated input.
3) Remove some of the blank lines and edit the comments and title statements as you wish.
Click on the ‘+’ and ‘-’ icons on the left side of the program editor window to compress and
expand parts of your program. Save your program.
4) Run the program using the ‘Run’ dropdown menu or the Running man icon on the toolbar.
View the information in the log and output windows.
If your program statements are cleared automatically from the editor when you submit
them, you might want to consider changing the options for the enhanced editor on the
tools menu. Usually it is more convenient to retain your program in the editor window for
further use.
Part IV. Working with large data sets
1) Open Lab1.xlsx and save the data set on your hard drive. Rearrange the data so that
‘locations’ can be used as a SAS variable. Ensure that the format of the file meets
requirements for use as a SAS data set. Variable names will be read by SAS from the first
row on the spreadsheet.
2) Open SAS and import the data using the file import wizard. Choose a member name such as
‘barley’ to create a data set called ‘work.barley’.
3) Write a program to summarize this data using PROC MEANS. Rather than enter the data
lines directly in the program, instruct SAS to use the data file you have created in the data
step:
PROC MEANS DATA=BARLEY;
Alternatively, you could duplicate the original data set in another DATA step:
DATA NEW;
SET BARLEY;
A copy of work.barley is made and assigned the name ‘NEW’. No input statement is
required in this case because the variable names are already defined in the data set.
4) Summarize the data by locations and by varieties.
5
Download