Document

advertisement
Statistical Methods
Lynne Stokes
Department of Statistical Science
Lecture 7: Introduction to SAS
Programming Language
Preliminaries
• Create a Folder: c:/Stat6337
– Send to the Desktop
• Access Blackboard
• Download the Eysenck Data File
• Download the lecture7Eysenck.sas File
• Download the lecture7class.sas File
• Download the lecture7SASSummary.doc File
2
Eysenck’s Data File
Age
Group
old
old
old
old
old
old
old
old
old
old
young
young
young
young
young
young
young
young
Counting
9
8
6
8
10
4
6
5
7
7
8
6
4
6
7
6
5
7
Recall Condition
Rhyming
Adjective
Imagery
7
11
12
9
13
11
6
8
16
6
6
11
6
14
9
11
11
23
6
13
12
3
13
10
8
10
19
7
11
11
10
14
20
7
11
16
8
18
16
10
14
15
4
13
18
7
22
16
10
17
20
6
16
22
Intentional
10
19
14
5
10
11
14
15
11
11
21
19
17
15
22
16
22
22
3
Open the SAS Program
• Double-click the lecture7.sas File
– Press the Run Icon (Runner Image)
• Editor
– Create and Modify SAS Command Files
– Can Save in the Stat 6337 Folder : File / Save As …
• Log
– Messages about the Compilation and Execution of the
SAS Program
– Contains Error Messages (in red), if any
– Can Save in the Stat 6337 Folder : File / Save As …
• Output
– Results of the Execution of the SAS Program
– Can Save in the Stat 6337 Folder : File / Save As …
To Erase the Contents of the Log or Output Files
Right Click, Select “Clear All”
4
SAS Structure
• DATA Step
– Describe the data, provide names for variables, define new
or transformed variables
• PROCs : SAS Procedures
–
–
–
–
–
–
–
Descriptive Statistics: Proc Univariate, Proc Means
Graphics: Proc Chart, Proc Plot
Regression: Proc Reg
Two-sample t-tests: Proc Ttest
Analysis of Variance: Proc Anova, Proc GLM, Proc Mixed
Specialized Data Operations: Proc Sort
etc.
5
SAS Syntax
• Every command MUST end with a semicolon
– Commands can continue over two or more lines
– This WILL be Your #1, #2 & #3 Mistakes !!!!
• Variable names are 1-8 characters (letters and
numerals, beginning with a letter or
underscore), but no blanks or special
characters
– Note: values for character variables can exceed 8
characters
• Comments
– Begin with *, end with ;
– Can comment several lines: begin with /* and end with
*/
6
Data Input in the SAS File
• Data fname ;
– creates temporary file with the data that are described in the data
step
• Input name . . . name $ . . . ;
– list input: lists the variable names (1 – 8 characters/letters), name is
assumed to be a quantitative variable
– name MUST be followed by $ if name is a character variable
– alternatives: comma separated, column specified
• Datalines (or Cards) ;
– indicates that the data follow, line by line
• ;
– indicates that the last line of data has been input, the semicolon is
on a line by itself
• Example: lecture7class.sas
– Open lecture7class.sas
» Change filename, if necessary
– Clear output and log files; Run lecture7class.sas
7
Data Input with Multiple Responses
on a Single Line of the Data File
• SAS Requires that Each Response Value be on a Separate
Line of Data
• When n Responses are on One Line of Data
–
–
–
–
–
Input y1 y2 … yn
y = y1; output;
y = y2; output;
...
y = yn; output;
Creates n Data
Lines with 1 Response Value
on Each Line
• If y1 …yn Represent Responses for n Levels of a Factor
–
–
–
–
–
Input y1 y2 … yn
factor = ‘Level 1’; y = y1; output;
factor = ‘Level 2’; y = y2; output;
...
factor = ‘Level n’; y = yn; output;
Creates n Data
Lines with 1
Factor & Response Value
on Each Line
• Example: lecture7.sas
– Data Flow2
8
Data Input from an External File
• Filename fn ‘complete directory/file specification’ ;
– e.g., filename eysdata ‘c:/Stat6337/EysenckRecall.dat’
– Be Careful with Spaces in Directories and File Names !!!
• Data fname ;
– creates temporary file with the data that are described in the data step
• Infile fn ;
– input the data from the file labeled fn
• Input name . . . name $ . . . ;
– lists the variable names (1 – 8 characters/letters), name is assumed to be a
quantitative variable
– name MUST be followed by $ if name is a character variable
• Run ;
– indicates that the data step is completed
• Example: lecture7class.sas
– Data Recall
9
Program Data Vector
• One line of data is stored, as indicated on the
Input statement of the Data Step
• Any calculations, deletions, etc. in the Data
Step are performed on that line of data
• When the Data Step is completed, the variables
in the Program Data Vector are output to a
temporary (work) file
• Can force data lines to be written at any time
with the Output statement
10
Operations in the Data Step
• Arithmetic Operations
– x=u+v;
• Transformations
– x = log(y) ;
• Logical
– If x > 0 then z = y/x ;
• Recoding
– If gender = ‘m’ then gender = ‘Male’;
else if gender = ‘f’ then gender = ‘Female’;
– Note: SAS formats based on the first value of a variable
– To force a length (e.g., character variable), use length
11
Titles and Labels
• Title# ‘…’ ;
– Up to 10 title lines: title# ‘include your title here’;
– Can be placed in Data Steps or Procs
– Changing Title# replaces that title and eliminates Titlex, where
x>#
• Label name = ‘…’ ;
– Can be in a Data Step or Proc Print
12
Some Useful PROCs
• Proc Chart
– vertical or horizontal bar charts
• Proc Freq
– frequency distributions, cross tabs
• Proc Means
– select summary statistics
• Proc Plot
– scatterplots
• Proc Print
– prints data files
• Proc Sort
– sorts data files by the values of one or more variables
• Proc Univariate
– a wide range of summary statistics, box plots
13
General Form of PROCs
PROC xxxx data=fname options;
by groups;
proc-specific statements;
title . . . ;
output out = fn . . . ;
run ;
14
Printing to the Output File
• Proc Print data = fname ;
– var . . . ;
omitted)
– run ;
complete
lists the variables to be printed (can be
indicates the print commands are
15
Group Analyses
• Sort the Groups
– Proc Sort data= … ;
– by group;
– run;
• Execute the Proc, by Group
– Proc xxx data= … ;
– by group;
– ...
– run;
16
Summarize the Recall Data
Calculate the average, standard deviation,
minimum, and maximum to 2 decimal places
Proc Means
Graph a histogram of the recall data
Proc Chart
Calculate frequencies for each condition/group and each age
Proc Freq
17
Summarize the Recall Data
Calculate descriptive statistics for each condition/group
Proc Means, Proc Univariate
Note: Sort First, then Use the BY Command.
Graph Average Recall for All Combinations of
Recall Condition/Group and Age
Use a Group Identifier as the Plotting Symbol
Proc Plot
18
Proc Anova
• Only for Complete Factorial Experiments in
Completely Randomized Designs
– Otherwise: Proc GLM
• MUST have an Equal Number of Repeats for
Each Factor-Level Combination
19
Proc Anova
• Proc Anova data = fn ;
– By … ;
» Separate ANOVA Fits for Each Value of the BY variable(s).
– Class … ;
» List all the factors.
– Model … / options;
» e.g., model recall = age group age*group ;
• factors: list individually; e.g. age group
• interactions: connect with asterisk(s); e.g., age*group
– Means … / options;
» e.g., means age group age*group / t bon;
– Run;
20
Eysenck’s Study of Incidental
Learning
Make analysis of variance calculations,
use only recall condition as factor.
Calculate factor-level averages, with the t
option.
21
Effect of Cocaine Usage on
Newborn Infant Body Lengths
Usage Groups: First Trimester
Throughout Pregnancy
Drug-Free
Research Question:
Do Mean Body Lengths (cm) Differ by
Cocaine Usage?
22
Effect of Cocaine Usage on
Newborn Infant Body Lengths
First
T hroughout
Ca se
T rime ste r P re gna ncy D rug-Fre e
1
45.1
40.2
44.3
2
45.7
41.3
45.3
3
45.8
41.7
46.9
4
46.7
41.9
47.0
5
47.3
43.4
47.2
Ave ra ge
46.12
41.70
46.14
23
Assignment
• Create a Data File
• Input the Data File into a SAS Program
• Cocaine Usage Groups
– Calculate Averages and Standard Deviations
– Make Comparative Box Plots
– Test the Equality of the Group Means
• Email Me ONLY the FINAL .log File
24
Download