Getting Your Data Into SAS (Chapter 2 in the Little SAS Book) Animal Science 500 Lecture No. 3 September 7, 2010 IOWA STATE UNIVERSITY Department of Animal Science Arithmetic Operators Operation Symbol Example Result + addition Num + Num Example: 5 + 3 - subtraction Num - Num subtract the value of 5 Example: 5 – 3 3 or can use two variables ending wt. – beginning wt. * multiplication (table note 2*y Always have to have * cannot use 2(y) or 2y multiply 2 by the value of Y division var/5 or can use variable weight gain / days on test divide the value of VAR by 5 exponentiation a**2 or a^2 raise A to the second power 1) / ** can also use the ^ IOWA STATE UNIVERSITY Department of Animal Science add two numbers together Comparison Operators Comparison operators set up a comparison, operation, or calculation with two variables, constants, or expressions within the dataset being used . If the comparison is true, the result is 1. If the comparison is false, the result is 0. Comparison operators can be expressed as symbols or with their mnemonic equivalents, which are shown in the following table: IOWA STATE UNIVERSITY Department of Animal Science Comparison Operators Symbol Mnemonic Equivalent = EQ equal to ^= NE not equal to (table note 1) ¬= NE not equal to ~= NE not equal to > GT greater than num>5 < LT less than num<8 >= GE greater than or equal to (table note 2) sales>=300 <= LE less than or equal to (table note 3) sales<=100 IN equal to one of a list Definition IOWA STATE UNIVERSITY Department of Animal Science Example a=3 a ne 3 num in (3, 4, 5) Logical (Boolean) Operators and Expressions Logical operators, also called Boolean operators, are usually used in expressions to link sequences of comparisons. Symbol Mnemonic Equivalent Example & AND (a>b & c>d) | OR (a>b or c>d) ! OR ¦ OR ¬ NOT ˆ NOT ~ NOT IOWA STATE UNIVERSITY Department of Animal Science not(a>b) Finding your data Most of the time your “raw” data files will be saved as external files 1. 2. 3. Text files – Word, WordPerfect, Writer, etc. Spreadsheets - Excel, Lotus, Quattro Pro, etc. Other systems – Unix, Open VMS, etc. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS The files containing your stored data will typically be stored 1. 2. On the hard drive of the computer that you will ultimately use to analyze the data with SAS Stored externally – USB memory stick (flash memory) External hard drive Must get your data from “storage” into SAS to conduct the analyses IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Use the Infile statement within a DATA step Data mytrial; Infile ‘c:\mydocument\trial.xls’; Input statement (Input variable names Remember to put the $ for character variables. You may have to tell SAS which columns individual variables are found and place the decimal IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Data mytrial; Infile ‘c:\mydocument\trial.xls’ DLM=“,” ; Many options to assist you when using the infile command. DLM= used to specify the delimiter that separates the variables in your raw data file. For example, dlm=','indicates a comma is the delimiter (e.g., a comma separated file, .csv file). Or, dlm='09'x indicates that tabs are used to separate your variables (e.g., a tab separated file). IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options DSD The dsd option has 2 functions. First, it recognizes two consecutive delimiters as a missing value. For example, if your file contained the line 20,30,,50 SAS will treat this as 20 30 50 but with the the dsd option SAS will treat it as 20 30 . 50 , which is probably what you intended. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options DSD option allows you to include the delimiter within quoted strings. For example, you would want to use the dsd option if you had a comma separated file and your data included values like "George Bush, Jr.". With the dsd option, SAS will recognize that the comma in "George Bush, Jr." is part of the name, and not a separator indicating a new variable. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options FIRSTOBS= Tells SAS what on what line you want it to start reading your raw data file. (Default = 1) If the first record(s) contains header information such as variable names, then set firstobs=n where n is the record number where the data actually begin. Example: Assume you are reading a comma separated file or a tab separated file where the variable names are on the first line. Use firstobs=2 to tell SAS to begin reading at the second line. (Ignores the first line with the names of the variables). IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options MISSOVER This option prevents SAS from going to a new input line if it does not find values for all of the variables in the current line of data. For example, you may be reading a space delimited file and that is supposed to have 10 values per line, but one of the line had only 9 values. Without the missover option, SAS will look for the 10th value on the next line of data. Sets all empty variables to missing when reading a short line. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options MISSOVER If your data is supposed to only have one observation for each line of raw data, then this could cause errors throughout the rest of your data file. If you have a raw data file that has one record per line, this option is a prudent method of trying to keep such errors from cascading through the rest of your data file. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options OBS= Indicates which line in your raw data file should be treated as the last record to be read by SAS. This is a good option to use for testing your program. For example, you might use obs=100 to just read in the first 100 lines of data while you are testing your program. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options A typical infile statement for reading a comma delimited file that contains the variable names in the first line of data would be: INFILE "test.txt" DLM=',' DSD MISSOVER FIRSTOBS=2 ; IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options LRECL = logical record length LRECL is really useful for Windows users. Default, Windows creates files with a logical record length of 256. May appear that SAS is not reading all of your data or that beyond some point all variables are not being read. IOWA STATE UNIVERSITY Department of Animal Science Reading external files into SAS Other options LRECL = logical record length LRECL is really useful for Windows users. You can tell Windows exactly how long to make the record length on the filename statement in SAS. The option is lrecl= (logical record length) and it looks like this: filename myFile "c:\some directory\some file.txt" LRECL= 400; This option is REQUIRED if length of data line is over 256. IOWA STATE UNIVERSITY Department of Animal Science Knowing what Options are Available Obviously can look up using: SAS on-line help SAS manuals and books Other example programs Can also determine what options are available using the PROC Options: Run; Quit; Will output what options are available to you at this step of your SAS program or code. IOWA STATE UNIVERSITY Department of Animal Science Informats Host of selected informats on pages 46-47 in the The Little SAS Book, 4th Edition. Different ways data can be formatted and read in SAS Dates, Times, and combined DateTime Reading Julian dates IOWA STATE UNIVERSITY Department of Animal Science Titles and Footnotes SAS allows up to 10 lines of text at the top (titles) and the bottom (footnote) on each page of output using the title and footnote statements. Title <n> text; Footnote <n> text; Where n is the number of lines and have the range of limits for each 1 to 10. If text is omitted, the title or footnote is deleted Otherwise it remains in effect until it is redefined. IOWA STATE UNIVERSITY Department of Animal Science Titles and Footnotes SAS allows up to 10 lines of text at the top (titles) and the bottom (footnote) on each page of output using the title and footnote statements. To have no titles you can include title; The default in SAS included the date and page number at the top of each output. To get rid of these options Type nodate and / or nonumber in the options section. IOWA STATE UNIVERSITY Department of Animal Science Temporary versus Permanent SAS Data Sets Temporary Only exists during the current job or session It is erased by SAS when you finish and close down SAS Permanent SAS dataset SAS dataset Does not mean it is around for ever or eternity It remains stored even after you close your SAS session. If you use a data set more than once, it is more efficient to save it as a permanent SAS data set IOWA STATE UNIVERSITY Department of Animal Science Temporary versus Permanent SAS Data Sets Using the Permanent SAS data set allows you to skip the infile step whether you use the import wizard or use an infile statement. If you are going to modify your data set it is likely easier to use the temporary SAS data set. Need to add more data to “final” data set Have not checked the “final” data set for errors Maybe other reasons. IOWA STATE UNIVERSITY Department of Animal Science Listing the Contents of a SAS Data Set Proc Contents Place Proc Contents data=yourdatasetname; If you leave off the data= then SAS will perform the Proc Contents on the last data set created. It is a good way to check and see if all of your data are being correctly read into SAS for further analyses. IOWA STATE UNIVERSITY Department of Animal Science Listing the Contents of a SAS Data Set Output from Proc Contents – 1. Data Set Name – be sure you evaluated the correct data set 2. Observations – did the correct number of observations get read into the analysis 3. Variables - were the correct number of variables identified 4. Created – date the analysis was created 5. Label – Some label you might have provided IOWA STATE UNIVERSITY Department of Animal Science Listing the Contents of a SAS Data Set Output from Proc Contents – Listing of variables in alphabetical order The following output is created for each variable 1. Type – numeric or character 2. Length – storage size (in bytes) 3. Format for printing if any (for example the date may have been converted to worddate) 4. Informat for input if any (for example mmddyyyy for a date) 5. Variable label (e.g. date of birth, height in inches, weight in pounds IOWA STATE UNIVERSITY Department of Animal Science Processing an Existing Data Set When you want to process an existing SAS data set Use the set statement rather than an infile statement Each time SAS encounters a set statement, SAS inputs an observation from an existing data set which contains all of the variables IOWA STATE UNIVERSITY Department of Animal Science Processing an Existing Data Set Data data1; set data2; Average daily gain = (offweight – onweight) / daysontest; Run; Quit; Again, if the user does not specify a dataset to perform the operations, the last dataset used will be used again. IOWA STATE UNIVERSITY Department of Animal Science Arithmetic Operators Arithmetic operators indicate that an arithmetic calculation is performed, as shown in the following table: IOWA STATE UNIVERSITY Department of Animal Science