SAS Appendices - University of South Carolina

advertisement
SAS and Other Packages

-

SAS can interact with other packages in a
variety of different ways. We will briefly
discuss
SPSSX (PASW)
SUDAAN
IML
SQL will be discussed in more detail
© Fall
2011 John Grego and the University of South Carolina
1
SPSSX


SPSSX is a statistics package popular in
the social sciences. It was originally more
of a programming language, but now most
users are familiar only with the menudriven features
I really like the way in which SPSSX
creates output labels and formats
© Fall
2011 John Grego and the University of South Carolina
2
SPSSX


SAS interaction with SPSSX is fairly
simple and straightforward—it imports
SPSSX data sets
SAS used to import only portable file
formats
© Fall
2011 John Grego and the University of South Carolina
3
SPSSX



Starting with SAS 9.1.3, the Import Wizard
can import SPSSX data sets of any type
I.e., SPSSX .sav files no longer need to
be saved as .por files prior to import
The import preserves value coding.
© Fall
2011 John Grego and the University of South Carolina
4
SPSSX


Unfortunately, this coding/labeling is not
preserved when the data set is saved as a
permanent SAS data set
It can either be reconstructed by hand, or
the SPSSX data set can be imported each
time it is needed, or the format catalog can
also be saved
© Fall
2011 John Grego and the University of South Carolina
5
SPSSX

So what’s the point? It’s convenient to
import data sets into SAS if (1) we need to
take advantage of SAS’s additional
functionality or (2) we don’t have a SPSSX
license!
© Fall
2011 John Grego and the University of South Carolina
6
SUDAAN


SUDAAN is a package for analyzing
complex surveys developed by RTI, but
coordinated with SAS
Researchers can use SUDAAN even
though they do not have an intimate
knowledge of survey sampling
7
SUDAAN


Many complex survey databases available
for public use include a set of precalculated weights, and often some
applicable SUDAAN code
With the data, the weights, and some
knowledge of how the survey was
constructed, researchers are ready to go
8
SUDAAN


SAS can embed SUDAAN code in a regular
SAS program. Syntax for SUDAAN and
SAS-callable SUDAAN are so similar that
you wouldn’t distinguish them at first
glance
SAS executes SUDAAN-style PROC steps,
but with slight name changes to avoid
confusion with existing SAS PROC steps
9
SUDAAN

In the following examples, PROC
REGRESS in SUDAAN is replaced by PROC
SURVEYREG in SAS; NEST is replaced by
STRATUM; WEIGHT is unchanged.
CLUSTER is an important statement not
represented here
10
SUDAAN
SUDAAN:
proc regress
data=one
filetype=sas
design=wr;
nest SSTRATID;
weight byqwt;
model
by2xstd=byses;
run;
SAS:
proc surveyreg
data=one;
stratum SSTRATID;
weight byqwt;
model
by2xmstd=byses/a
djrsq anova
clparm deff;
run;
11
IML


The way in which SAS uses IML
(Interactive Matrix Language) is quite
different from the above two examples
IML allows a form of object-oriented
programming in SAS—when I first started
grad school, it was one of the very few
ways to do matrix math
12
IML


IML uses some typical SAS features
(semicolons, comments, etc.), but
resembles other object-oriented languages
such as R or Minitab as well.
The basic format PROC IML; .. IML
commands..QUIT; is a pattern we will
see repeated with PROC SQL
13
SQL in SAS


SQL stands for Structured Query
Language, a language suited for database
management and manipulation
SQL can interact with all the standard
database packages
14
SQL in SAS



We will focus on SQL commands in SAS,
though SAS has many other methods for
interacting with databases (PROC IMPORT
for example)
PROC SQL is a SAS procedure that is
based on SQL statements
We are familiar with one SQL statement
already: WHERE
15
SQL in SAS



Some of the syntax is similar to the SAS
data step, but there are key differences,
e.g., CREATE TABLE (rather than DATA)
creates a data set
PROC SQL is built from extended clauses,
rather than a set of discrete statements
PROC SQL does not need a RUN;
statement to execute. PROC SQL is
typically ended with a QUIT; statement
16
SQL in SAS

1.
2.
PROC SQL performs many of the same
tasks as the DATA step, but PROC SQL
has some advantages:
Faster execution speed
Joining tables with PROC SQL is
considered by many to be more
convenient than MERGE in a DATA step
17
SQL in SAS

PROC SQL performs many of the same
tasks as the DATA step, but PROC SQL
has some advantages:
3. SQL code can easily access external
databases (e.g., Oracle, DB2, Access)

In the examples we will study in class,
advantages in processing speed will not
be obvious
18
SQL in SAS
An easy way to do this:
PROC SQL;
SELECT *
FROM tablename;
QUIT;


One of the simplest
tasks in PROC SQL
is to select and print
a data set that is
already created.
The * says to select
all variables
(columns) in the
table
19
SQL in SAS


By default this code prints the data set
to the output window
We can also select only a few variables
by specifying the variable names
(separated by commas) in the SELECT
statement
20
SQL in SAS
CREATE TABLE
newtablename AS
SELECT var1, var3,
var4
FROM oldtablename;
We may wish to create
a new data set from
part of a previous
one. We use the
CREATE TABLE..AS
statement
21
SQL in SAS

Some DATA step keywords work in
PROC SQL as well (DROP, KEEP,
RENAME)

Other tasks using SQL keywords:


DISTINCT: selects unique values of
variables that have duplicate values
ORDER BY: sorts a table by the values of
one or more variables
22
SQL in SAS


One way to create a data set from
scratch is to use CREATE TABLE
keywords without AS
After the CREATE TABLE line, you
specify the names and types of the
variables
23
SQL in SAS
CREATE TABLE
tablename (var1
var1type var2
vaqr2type var 3
var3type);
INSERT INTO ..;


The raw data is
entered into the
table with an
INSERT INTO
statement
As you can imagine,
this isn’t practical
for large data sets!
24
SQL in SAS

Subsetting in PROC SQL is typically
done with a WHERE statement

Various calculations can be done (using
AS) to create new variables.
Calculations may be done on the whole
table, or on groups of observations
identified by some grouping variable
(Use GROUP BY)
25
SQL in SAS


If a calculation involves a variable not in
the original data set, but which has been
calculated, use keyword CALCULATED
with that variable
To “subset” based on “calculated”
variables, do not use WHERE, but rather
use the HAVING keyword
26
SQL in SAS
CASE expression
WHEN expvalue1
THEN resvalueA
WHEN expvalue2
THEN resvalueB
..
ELSE resvalueZ END
AS resultcolumn

The PROC SQL
equivalent of an
IF-THEN statement
is a CASE statement
27
Joining Tables in PROC SQL



Compared to merging data sets in the
DATA step, joining tables in PROC SQL
is executed faster
In PROC SQL, the key columns (BY
variables) do not need to be sorted first
“Many to many” merges are possible
using PROC SQL
28
Joining Tables in PROC SQL

There are four main methods of joining
tables using PROC SQL: the inner join,
the left join, the right join, and the full
join. Other interesting options are also
available.
29
Joining Tables in PROC SQL



FROM statement specifies source tables
and “aliases” for those source tables,
and also specifies the method of joining
ON statement specifies “key columns”
(like BY variables in a DATA step merge)
and possibly logical operators
SELECT statement contains the table
aliases as well as the variables to be
selected
30
Joining Tables in PROC SQL


Inner join: result lists only observations
for which the values of the “key columns”
match
Left join: result lists all observations in
the “left” table (listed first in the FROM
statement) and only the matching
observations in the “right” table (Similar
to use of IN= in a SAS merge)
31
Joining Tables in PROC SQL


Right join: result lists all observations
(listed second in the FROM statement)
and only the matching observations in
the “left” table. (Similar to use of IN= in
a SAS merge)
Full join: a combination of the left and
right joins
32
Joining Tables in PROC SQL


These “joins” have some undesirable
effects—information on important
variables can be lost
The COALESCE function can recover
information from the ON variables
33
Joining Tables in PROC SQL


Creating logical indicators (much like
IN=) may prove useful too
There are many other methods of
combining tables in SQL—you can rely
on WHERE rather than ON, and there are
additional types of “joins”
34
Editing Tables in PROC SQL


INSERT INTO is a typical way to add
new observations to a table
VALUES statement specifies the values
to be added (in parentheses, separated
by spaces)
35
Editing Tables in PROC SQL
SET col1=7,
col2=‘charstring’,
col3=44;

Another way: Use a
SET keyword with
column names and
newly assigned
values
36
Editing Tables in PROC SQL


To delete observations from a table, use
DELETE FROM statement
To change the values of one or more
columns in a table, use UPDATE
statement along with SET statement
37
Editing Tables in PROC SQL


ALTER TABLE can be used to change
column formats or to delete columns
from a table (Typically done with the
MODIFY and DROP keywords,
respectively)
DROP TABLE can also be used to delete
an entire table
38
Other Topics in PROC SQL


NOPRINT option suppresses printing to
the OUTPUT window:
PROC SQL NOPRINT;
Note: When a CREATE statement is
used, NOPRINT is the default
39
Download