Creating Subsets of Observations (SAS_9_10)

advertisement
Creating Subsets of Observations
(SAS_9_10.doc)
Data Set ARTS.ARTTOUR
OBS
CITY NIGHTS LANDCOST EVENTS DESCRIBE
GUIDE
1
Rome
3
750
7
4 M, 3 G
D'Amico
2
Paris
8
1680
6
5 M, 1 other
Lucas
3
New York 6
.
8
5 M, 1 G, 2 other Lucas
......
/* Deleting observations based on the condition that landcost is missing. */
IF landcost= . THEN DELETE;
BACKUP
Torres
Lucas
D’Amico
/* Selected records in which the value of nights is 6 */
DATA subset;
SET arts.arttour;
IF nights=6;
/* Selected records in which the value of nights is 6 or higher or the landcost is 1500 or higher*/
DATA subset;
SET arts.arttour;
IF landcost >=1500 OR nights =>6;
Selecting Observations to Multiple SAS Data Sets
OUTPUT <SAS data set>;
DATA
ltour othrtour; /* othrthour is the default data set */
SET perm.arts;
IF guide=‘Lucas’ THEN OUTPUT ltour;
ELSE OUTPUT othrtour;
(in ltour)
CITY NIGHTS
Paris
8
New York 6
…
(in othtour)
CITY NIGHTS
Rome 3
…
LANDCOST EVENTS DESCRIBE
1680
6
5 M, 1 other
.
8
5 M, 1 G, 2 other
GUIDE
Lucas
Lucas
BACKUP
Lucas
D’Amico
LANDCOST EVENTS DESCRIBE
750
7
4 M, 3 G
GUIDE
D'Amico
BACKUP
Torres
More on OUTPUT statement
DATA
ltour othrtour;
SET perm.arts;
days = nights + 1;
IF guide=‘Lucas’ THEN
ELSE OUTPUT othrtour;
OUTPUT
ltour;
/* An OUTPUT statement tells the SAS system to output the observation when the OUTPUT statement is
processed, not at the end of the DATA step. Therefore, any assignment statements should be ahead of
output statements in order to be processed and store in the new data sets. */
/* One can write more IF/ELSE statements to output the same observations to other data sets. */
Working with Grouped or Sorted Observations
BY list-of-variables;
To use a BY statement, the data must meet these conditions:
1. The observations must be in a SAS data set, not an external file.
2. The variables that define the groups must appear in the BY statement.
3. All observations in a group must appear together in the data set. (SORT)
Before SORT
----------------------------------------------------------------OBS COUNTRY
1
2
3
4
Spain
Japan
Switzerland
France
TOURTYPE
architecture
architecture
scenery
architecture
NIGHTS LANDCOST VENDOR
10
8
9
8
510
720
734
575
World
Express
World
World
SORT Procedure
LIBNAME save ‘a’;
DATA save.type;
INFILE ‘touragnt data a’;
INPUT country $ 1-11 tourtype
vendor $;
PROC
$ 13-24
nights
landcost
SORT DATA=save.type OUT=type2;
BY tourtype; /* The sorted observations go into to data set type2. */
After SORT
---------------------------------------------------------OBS COUNTRY
1
2
3
4
Spain
Japan
France
Switzerland
TOURTYPE
architecture
architecture
architecture
scenery
NIGHTS LANDCOST VENDOR
10
8
8
9
510
720
575
734
World
Express
World
World
Grouping by More than One Variable
PROC
SORT DATA=save.type OUT=type3;
BY
tourtype vendor landcost;
After SORT
---------------------------------------------------------OBS COUNTRY
1
2
3
4
Japan
Spain
France
Switzerland
TOURTYPE
architecture
architecture
architecture
scenery
NIGHTS LANDCOST VENDOR
8
10
8
9
720
510
575
734
Express
World
World
World
Arranging Groups in Descending Order
PROC
SORT DATA=save.type OUT= type4;
BY DESCENDING
tourtype vendor
landcost;
After SORT
---------------------------------------------------------OBS COUNTRY
1
2
3
4
Switzerland
France
Spain
Japan
TOURTYPE
scenery
architecture
architecture
architecture
NIGHTS LANDCOST VENDOR
9
8
10
8
734
575
510
720
World
World
World
Express
Finding the First or Last Observation in a Group
DATA
temp;
SET type3;
BY
tourtype;
frsttour=FIRST.tourtype;
lasttour=LAST.tourtype;
/* BY statement create two variables called FIRST.tourtype and
LAST.tourtype. SAS doesn’t write FIRST. and LAST. variables
to the output data set. Therefore, new variables are needed
to store their values. */
PROC PRINT DATA=temp;
VAR country
tourtype
frsttour
lasttour;
After SORT
---------------------------------------------------------OBS COUNTRY
TOURTYPE
FRSTTOUR
LASTTOUR
1
2
3
4
Japan
Spain
France
Switzerland
architecture
architecture
architecture
scenery
1
0
0
1
0
0
1
1
PROC SORT DATA=save.type OUT=type5;
BY tourtype lancost;
RUN;
DATA lowcost;
SET type5;
BY tourtype;
IF FIRST.tourtype;
RUN;
(Before) Data set save.type
---------------------------------------------------------OBS COUNTRY
TOURTYPE
NIGHTS LANDCOST VENDOR
1 Spain
architecture
10
510
World
2 Japan
architecture
8
720
Express
3 Switzerland scenery
9
734
World
4 France
architecture
8
575
World
(After) Data set work.lowcost
---------------------------------------------------------OBS COUNTRY
TOURTYPE
NIGHTS LANDCOST VENDOR
1
Spain
architecture
10
510
World
2
Switzerland scenery
9
734
World
Deleting Duplicated Observations
DATA
save.type;
INFILE ‘touragnt data a’;
INPUT country $ 1-11 tourtype $ 13-24
$;
nights
landcost
vendor
PROC SORT DATA=save.type OUT=type3 NODUPLICATES;
BY tourtype;
/* The sorted observations go into to data set type3. */
Differences between IF and WHERE statements
DATA
subset;
SET arts.arttour;
IF guide=’Lucas’;
is the same as
DATA
subset;
SET arts.arttour;
WHERE guide=’Lucas’;
Difference between IF and WHERE Statements
 The WHERE statement may be more efficient than the IF because it checks on the validity of the
condition before the observation is brought into a temporary holding area (program data vector).
 WHERE statement can only be used with variables in the existing data set, whereas IF statement can
be used with raw data as well.
 WHERE does not affect the logical values of the FIRST. and LAST. Variables.
 WHERE can be included in SAS procedures.
The following three statements are equivalent:
WHERE age GE 20 AND age LE 40;
WHERE 20 LE age LE 40;
WHERE age BETWEEN 20 and 40;
CONTAINS or ?
WHERE name CONTAINS ‘Mc’;
WHERE name ? ‘Mc’;
IS MISSING or IS NULL
WHERE name IS MISSING;
WHERE name IS NULL;
LIKE
WHERE name LIKE ‘BOY%’;
/*Select BOY followed by anything. */
WHERE name LIKE ‘A___’;
/* (A followed by three underscores) */
/* Select all names of length 4, beginning with A. */
WHERE name LIKE ‘A_%’;
/* Select all names that begin with A and are at least two characters in length. */
Exercise:
A researcher treated three groups of rats (Groups A, B and C) and recorded the weight of each rat after one
week. The data were arranged with each GROUP and WEIGHT in pairs.
A
C
B
B
34
55
52
62
B
C
C
A
58
56
58
28
A 28
A 27
A 21
C
.
B 60
B
.
C 59
Write a SAS program to read this data set (RATSORT DATA) create SAS data set that contains only the
lightest weight (excluding missing values) in each group and print this data set.
Download