Ray's SAS lecture

SAS for Categorical Data
Copyright © 2004 Leland Stanford Junior University. All rights reserved.
Warning: This presentation is protected by copyright law and international treaties.
Unauthorized reproduction of this presentation, or any portion of it, may result in
severe civil and criminal penalties and will be prosecuted to maximum extent possible
under the law.
SAS
 SAS is a huge integrated data management
and analysis suite. It takes years to master
20% of SAS. Most people take weeks if not
months to get comfortable working with it.
 The course I teach has online slides which
demonstrate how to do categorical data
analyses as well as data management.
 http://www.stanford.edu/class/hrp223/
 Topic 0 has information on using SAS as a
calculator.
Using SAS
 When you start SAS in a windowing
environment you automatically have access
to at least 4 windows.
 The (enhanced) program editor is a place where
you type instructions to SAS.
 The log window gives you feedback on how SAS
interprets your work.
 The output window displays any printed results
from your request.
 There is also a two pained window. One,
Explorer, allows you to look at data sets. The
other, Results, acts like a hyperlinked table of
contents for the output window.
Telling SAS what to do.
 You type instructions in the program
editor and then push the run button.
 The instructions you will use for this
class will be data steps (to create
data sets) and procedures (to analyze
data sets).
Data steps
 In data steps you can create variables (a variable is
just like a box that can hold either numbers or
letters). You can do math on variables including using
functions that are build into SAS.
data work.someData;
theAnswer = 1 + 1;
run;
 After you type the instructions you have to tell SAS to
actually do the work. Push the running person icon to
do this.
Data steps
 The above code will
create a data set that
will exist until you quit
SAS. You can view it
as if it was a
spreadsheet by double
clicking on Libraries
then the Work library
and finally the data set
inside the SAS
Explorer window.
Functions
 SAS has thousands of functions built in:
data work.blah;
numberOne = 1;
someTrigThing = sin(numberOne);
run;
 I have tried to document the ones that
students frequently need in Lecture 2 of
223. Take a look at the slides labeled
Frequently Used Functions.
Finding fuctions
 … or you can look up the function in the SAS
online documentation.
 One of the useful links in the useful links
section of the class website
http://www.stanford.edu/class/hrp223/2002f/usefulLinks.html
is the SAS online documentation.
 The URL of SAS OnLineDoc is:
http://v9doc.sas.com/sasdoc/
 If you enter a bad password 3 times and it
will take you to the registration page. Access
to the documentation is free.
Example of a Function
 If you roll a die 50 times what's the
chance that you'll get more than 10
"6"'s?
data work.pfft;
x = 1 - CDF('BINOMIAL',10, 1/6, 50) ;
run;
Procedures
 SAS has many built in statistical
analysis procedures. The ones you
will use for this class are:
 proc freq – contingency tables
 See 223 topics 12 and 13
 proc logistic – logistic regression
 See 223 topics 14 and 15
Real data looks like this:
































data work.epi;
input subjectID exposure $ disease $;
datalines;
1 exposed Diseased
2 exposed Diseased
3 exposed Diseased
4 exposed Diseased
5 exposed notDiseased
6 notExposed notDiseased
7 exposed Diseased
8 exposed Diseased
9 exposed Diseased
10 notExposed notDiseased
11 exposed notDiseased
12 exposed Diseased
13 notExposed Diseased
14 notExposed notDiseased
15 exposed Diseased
16 exposed Diseased
17 exposed notDiseased
18 notExposed notDiseased
19 exposed Diseased
20 exposed Diseased
21 exposed Diseased
22 notExposed notDiseased
23 exposed notDiseased
24 exposed Diseased
25 notExposed notDiseased
26 exposed notDiseased
27 notExposed notDiseased
28 exposed Diseased
; run;
Contingency tables
 You can get a frequency table like this:
proc freq data = work.epi;
tables exposure * disease;
run;
Contingency tables analysis
proc freq data= epi;
tables exposure*disease /chisq;
run;
Grouped Data
 You will get grouped data in statistics classes…
 In a case-control study of 50 patients with pancreatic
cancer and 50 hospital controls, 15 patients and 25
controls are non-coffee-drinkers, 15 patients and 10
controls are mid-level coffee drinkers, and 20 patients
and 15 controls are high-octane coffee addicts. What
are the odds ratios for the association between coffee
drinking and pancreatic cancer (comparing high to
low, high to none, low to none, and any to none)?
Grouped Data
data work.epi;
input exposure $ disease $ people;
datalines;
notExposed diseased 15
notExposed notDiseased 25
little diseased 15
little notDiseased 10
lots diseased 20
lots notDiseased 15
;
run;
Problems…
proc freq data = epi;
tables exposure * disease;
run;
Weighted data
proc freq data = epi;
weight people;
tables exposure * disease;
run;
Analysis of weighted data
proc freq data = epi;
weight people;
tables exposure * disease /relrisk;
where exposure in ("notExpos", "lots");
run;
Other groups
 Just copy and paste the proc freq and
pick different groups.
 To get the combined groups use a
character format (if you took 223) or
just add the two exposed groups by
hand.
Formats in Freq
proc format;
value $coffee
"lots" = "Exposed"
"little" = "Exposed"
"notExpos" = "notExpos"
;
run;
proc freq
weight
format
tables
run;
data = epi;
people;
exposure $coffee.;
exposure * disease /relrisk;
Analysis of Formatted Grouped Data