90-776 Manipulation of Large Data Sets Lab 1 March 10, 1999

advertisement
90-776 Manipulation of Large Data Sets
Lab 1
March 10, 1999
Major Skills covered in today’s lab:
 Bringing in data into SAS
 Exporting data from SAS
 Learning to program in SAS
I.
Example Program
First, let’s examine the program, l:\academic\90776\programs\distance.sas.
Note: As written, this program saves some files to the “thunderbolt” directory, which
you do not have write access to. If you wish to run the program, you will have to change
the file references.
/*Example Program for 90-9776*/
/**** DISTANCE.SAS (l:\academic\90776\programs\distance.sas) is a
program that performs some basic SAS commands.
I use data from the example in class and on the top of page 4 of the
handout.
I have named this data example.txt and saved it as
L:\academic\90776\data\text\example.txt ****/
/* The Program will create l:\academic\90776\data\lab1.sd2 */
/* Note: it is always a good idea to include your name and date and
important file locations at the top of our programs */
/* Created by: Rob Greenbaum */
/* Date: November 14, 1998 */
/* last 3/6/1999*/
/* We can set some options if we want to - here I set the page length,
width, and page number */
OPTIONS ps = 80 ls = 65 pageno = 1;
/*I want to create an alias for the directory I will eventually save my
data in */
/*LIBNAME sets up an alias for a DIRECTORY */
LIBNAME mydisk 'l:\academic\90776\data';
/* FILENAME sets up an alias for particular text files*/
/* Next, I will give a name to the location of the existing ascii data
*/
FILENAME extext 'l:\academic\90776\data\text\example.txt';
/* I also want to give a name for a an ascii file that I will create */
FILENAME lab1txt 'l:\academic\90776\data\text\lab1.txt';
/* Now I will tell SAS to create a temporary SAS data set called DIST*/
1
/* Temporary data sets disappear when the current SAS session ends.
DIST is temporary because I do not tell SAS to save the data to any
dive */
/* I'll then put the ascii data l:\academic\90776\data\text\example.txt
into the temporary SAS data set DIST using infile and input (see page 7
of the handout)*/
DATA dist;
INFILE extext; /*tells SAS to find
l:\academic\90776\data\text\example.txt */
INPUT name $ sex $ age distance; /*tells SAS var names and order of
vars*/
RUN;
/* Note that character variable names must be followed by a "$" in the
INPUT statement */
/* Let's see what variables SAS read in*/
PROC CONTENTS data=dist;
RUN;
/* I want to make sure that SAS read in the data properly, so let's
tell SAS to
print out all of the data*/
PROC PRINT data = dist;
RUN;
/* Let's find the mean of distance to work*/
PROC MEANS data=dist;
var distance;
RUN;
/* let's create a new variable and save the new data set as both a
permanent SAS data set and as a new ASCII data set */
/* Note, we can only create new variables inside of data steps, so we
need a new data step */
DATA mydisk.lab1; /* This will create l:\academic\90776\data\lab1.sd2
*/
SET dist; /* This brings in the temporary SAS data set dist */
FILE lab1txt; /* Analogous to INFILE, except that I want to write the
file to l:\academic\90776\data\text\lab1.txt*/
age2 = age**2; /* this creates an age squared variable */
put name $ sex $ age distance age2;
RUN;
PROC contents data = mydisk.lab1;
RUN; /* we always need to finish the program with a run statement */
2
II.
Read in ASCII data from a file
Let’s read in the ASCII data set that the program DISTANCE.SAS created. Refer to the
above program for the variable names. To read in ASCII data, we need to use the
INFILE and INPUT commands.
1) Write the necessary code to read the file ASCII file
l:\academic\90776\data\text\lab1.txt into a temporary SAS data set.
2) Check your log file to make sure that you made no errors.
3) Use the CONTENTS and PRINT procedures to make sure you made no errors.
4) Use the MEANS procedure to find the mean of age and age2. (To perform PROC
MEANS on only certain variables, we use the subcommand VAR.)
III.
Read in SAS data from a file
Let’s read in the SAS data set that was created in the program DISTANCE.SAS. To read
in an existing SAS data set, we use the SET command. We also need to use a LIBNAME
statement to tell SAS what directory the data is in.
1) Write a SAS program to read the file l:\academic\90776\data\lab1.sd2 into a
temporary SAS data set.
2) Check your log file to make sure that you made no errors.
3) Use the CONTENTS and MEANS procedures to make sure you made no errors.
4) Save your short program, log file, and output file to your disk.
IV.
Enter ASCII data right in your program. Save SAS data.
To enter SAS data right into your program, we can use INPUT and DATALINES
commands. See the lecture notes or the book for more information.
To save a permanent SAS data set, you must use a LIBNAME and a two-part name for
the file.
1) Write the code to bring the 1999 Minnesota Timberwolves home attendance data into
a permanent SAS data set. Save this data somewhere on your own disk.
DATE
1
2
3
4
ATTENDANCE
16422
19006
18151
17907
TOTAL
16422
35428
53579
71486
AVERAGE
16422
17714
17860
17872
(Did you remember that variable names cannot exceed eight characters?)
2) Use the PRINT procedure to check your work.
3) Check your directory to confirm that you saved the SAS dataset.
3
V.
Enter ASCII data with missing observations. Save the data as ASCII data.
When you have records that have missing values at the end of a record, you need to
include the option MISSOVER in the INFILE statement. This option tells SAS to skip
over the missing value and go on to the next record. SAS records a missing observation
as a period.
The Minnesota Timberwolves data now includes information for the first eight games of
the season. This data is stored as l:\academic\90776\data\text\twolves.txt.
1) Read in this ASCII data set into a temporary data set. The 4 variables are the same as
before.
2) Use the PRINT procedure to look at the data. The data is reproduced in the table
below. Does your data look like the table below? Why not?
3) Fix your INFILE command so that it contains the word MISSOVER at the end of the
command (see p. 303 of the text for more help).
4) Create a new average attendance variable and call it AVG2. Hint: the average
attendance is just the total attendance divided by the number of games played.
5) Save this new data set as an ASCII file on your own disk (you will need to use the
FILE and PUT commands – see pp. 304-305 of the text for more help).
6) Use the PRINT procedure to print out only the average and avg2 variables.
Next week we will learn how do drop extra variables such as the AVERAGE variable.
1999 Minnesota Timberwolves Home Attendance
DATE
ATTENDANCE
TOTAL
AVERAGE
1
16422
16422
16422
2
19006
35428
17714
3
18151
53579
17860
4
17907
71486
17872
5
16848
88334
6
15374
103708
17285
7
16219
119927
8
14776
134703
16838
4
Download