SAS Lesson 2: More Ways to Input Data

SAS Lesson 2: More Ways to Input Data In the previous lesson, the following statements were used to create a dataset. DATA oranges; INPUT state $ 1-10 early 12-14 late 16-18; DATALINES; Florida 130 90 California 37 26 Texas 1.3 .15 Arizona .65 .85 ; This is an example of column input, in which the columns of text in which data are stored are explicitly furnished to SAS. It may be impractical to follow this example for all datasets. SAS has facilities for reading data stored in many different ways; some of these facilities are described below. Reading data from a text file Suppose that the data above are stored in a folder on a drive. Blank spaces, not tabs, are used to separate the values. Also, the file is a simple ASCII text file. There are no hidden codes in the file pertaining to word processors (such as margins and font sizes) or spreadsheets (such as formulas and graphs). The following lines of code could be used to read the dataset: DATA oranges; INFILE 'D:\yields\oranges.dat' FIRSTOBS=3 OBS=6; INPUT state $ 1-10 early 12-14 late 16-18; The INFILE statement replaces DATALINES. INFILE gives the location of the external text file, including the drive name and any subdirectories. The FIRSTOBS option tells SAS to skip the first two lines of the file and to begin reading data on line 3. The OBS option tells SAS that Line 6 is the last line which contains legitimate data. If the data were edited in a text file by removing the top two lines and the bottom two lines, so that the text file contained only data. Then, the INFILE statement listed below would be sufficient. Reading data from the Internet SAS like R offer the capability to read data from files available on the Internet. For example, the eggs dataset contains data on the yearly average number of eggs produced by female king crabs near Kodiak Island, Alaska. The following statements could be used to create a SAS dataset. FILENAME kodiak url 'http://lib.stat.cmu.edu:80/crab/eggs'; DATA eggdata; INFILE kodiak; INPUT year 1-2 numeggs 4-9; In these statements, kodiak is a nickname used by SAS to refer to the longer Internet address. The statements create a dataset called eggdata which contains two variables, year and numeggs. SAS can also read data from FTP sites, but you must supply the appropriate information about the FTP site address, subdirectories, user names, and passwords. This example shows how to obtain the eggs dataset by anonymous FTP. FILENAME kodiak ftp 'eggs' cd='/crab/' user='anonymous' pass='guest' host='lib.stat.cmu.edu'; DATA eggdata; INFILE kodiak; INPUT year 1-2 numeggs 4-9; Of course, it would be easy to find the eggs dataset in a Web browser, save the file as a text file on your computer, and use the methods for reading data from a text file. However, this method would be useful if the data files are very large. It is also convenient if data files are continually updated; example include stock market data, weather data, and batting averages. Creating and reading permanent SAS datasets In all of the previous examples, the SAS datasets that have been created were temporary. They remain in working memory and can be used throughout the SAS session, but they disappear when the SAS session ends. Permanent SAS datasets can be created; these are stored on a disk and can be recalled easily in future SAS sessions. Permanent SAS datasets are convenient to use when the amount of data is large. Also, if you have to go through several steps to create a SAS dataset, you only need to do those steps once if you create a permanent dataset. A LIBNAME statement is needed to create a permanent dataset or to read one that has already been created. The LIBNAME is a surrogate name for the location on a disk where the permanent dataset is or will be stored. For example, consider the following statement: LIBNAME college 'D:'; This prepares SAS to look in Drive D for permanent datasets. The name college is called a libref (library reference). The names that you can supply for librefs follow the same rules as dataset and variable names; however, you should not use the names LIBRARY, WORK, USER, or anything starting with the three letters SAS, since these are reserved for special uses within SAS. For example, the following statements create a permanent SAS dataset. LIBNAME college 'D:'; DATA original; INPUT dept $ 1-8 count 10-13 class $ 15-21; DATALINES; FineArts 449 day Science 1411 day Music 259 evening Language 759 day ; DATA college.enrolled;SET original; IF class='evening' THEN DELETE; PROC PRINT; RUN; This creates the new file ENROLLED.SD in your drive. This is the new permanent SAS dataset. Only SAS will be able to interpret this file; you will not be able to see its contents by using a word processor or spreadsheet program. The two statements after data work like the following: "DATA name1; SET name2;" means to create a data set name1 from dataset name2. The "IF ... THEN DELETE;" statement does the obvious operation. We will spend more time on them in Lesson 5. Thus, the data set college.enrolled has only three observations with class='day'. In order to use the dataset that was just created, you must refer to it with its full name, college.enrolled. To retrieve this data set from your diskette: LIBNAME college 'D:'; DATA tempenrl;SET college.enrolled; PROC PRINT DATA=tempenrl; RUN; Delimited files With list input, blank spaces are delimiters, or special characters used to separate the values of variables in a line. SAS can also interpret other characters as delimiters. For example, suppose that the dataset in the previous example was stored in the text file A:\GRADES.TXT as follows: Ann/84/90/A-/0 Bill/78/84/B/0 Cathy/95/89/A/1 David/84/88/B+/1 Then, the following statements could be used to create the GRADES dataset. DATA grades; INFILE 'a:\grades.txt' delimiter='/'; INPUT name $ quiz test project $ absences; The phrase 'dlm=' can be used in place of 'delimiter='. This option is used if a keyboard character, such as a comma, slash, or asterisk, separates values in a line. Of course, the character used to separate variables should not appear within a data value. For example, in a comma-delimited file, the number 125,000 would have to be written as 125000; otherwise, SAS would try to break it apart into two variables with values 125 and 000. Tabs are an exception to the use of the DELIMITER option. In a tab-delimited file, the EXPANDTABS option replaces the DELIMITER option. In the example, if tabs had been used in place of slashes, the proper statement for reading the data would be: INFILE 'a:\grades.txt' expandtabs; Column pointers Suppose that the data set with student grades is stored in D:\GRADES.TXT as follows: Ann Bill Cathy David 84 78 95 84 90 84 89 88 AB A B+ 0 0 1 1 Creating a dataset from this file would be easy to read with column input and even easier with list input. However, suppose that you only needed to use the students' names and project grades. You can use column pointers to skip over undesired data. Column pointers use the @ symbol to tell SAS to begin reading data at a specified column. In this example, we need to know that the names start in Column 1; project grades, in Column 13. The following statements could be used. DATA grades; INFILE 'D:\grades.txt'; INPUT @1 name $ @13 project $; The quiz grade, test grade, and absences do not appear in this dataset. As shown above, column pointers can be used to skip over unneeded data. Mixed input You may occasionally find it necessary or convenient to use a combination of input techniques for a particular dataset. For example, suppose that the dataset of grades appears as follows: Ann Bill Catherine David 84 78 95 84 90 84 89 88 AB A B+ 0 0 1 1 Recall that list input can be used only when character variables have 8 or fewer characters, with no blanks. Catherine has 9 letters, so simple list input cannot be used. However, the absences are not neatly aligned in a column, and the last four variables would be easy to read with list input. The following statements could be used: DATA grades; INFILE 'A:\grades.txt'; INPUT name $ 1-9 quiz test project $ absences; It is also possible to use list input, column input, and column pointers simultaneously. For example, if you only needed the name, project grade, and absences, you could use the following INPUT statement: INPUT name $ 1-9 @17 project $ absences; Line pointers So far, all of the data for each observation have appeared in one line. You may occasionally encounter data in which the variables for one observation appear in two or more consecutive lines, as shown below: Ann 84 90 A- 0 Bill 78 84 B 0 Cathy 95 89 A 1 David 84 88 B+ 1 You may need to use a line pointer to read such data. A line pointer is like a column pointer, except that it specifies the line on which SAS should begin reading the data. A slash (/) tells SAS to skip to the next line, and #number tell SAS to go to that line of an observation's data to resume reading data. If the data above were stored in D:\GRADES.TXT, a dataset could be created in SAS as follows: DATA grades; INFILE 'a:\grades.txt'; INPUT name $ / quiz test project $ absences; Equivalently, you could use the following INPUT statement: INPUT name $ #2 quiz test project $ absences; Since the data consist of simple characters and numbers, the following INPUT statement could also be used. Notice that there are no line pointers. INPUT name $ quiz test project $ absences; SAS will automatically go to the next line of data to complete the set of variables listed in the INPUT statement. However, if the data are irregular (missing values, blanks in character variables, etc.), then line pointers may be necessary. Multiple observations on one line Discussed in Lesson 1 with the @@ symbol.

SAS Lesson 2: More Ways to Input Data

Related documents

Products

Support

SAS Lesson 2: More Ways to Input Data

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib