BMTRY 789 Lecture 2 SAS Syntax, entering raw data, etc. Readings – Chapters 1, 2, 12, & 13 Lab Problems 1.1, 1.2, 1.3, 1.5, 1.10, 12.1, 12.2, 12.6, 12.16, 13.3, 13.8 Homework Due – None Homework for Next Week – No Class but turn in HW1! Lecturer: Annie N. Simpson, MSc. Summer 2009 BMTRY 789 Intro. To SAS Programming 2 Parts of a SAS Program What are the two main parts of a SAS program? Summer 2009 BMTRY 789 Intro. To SAS Programming 3 Parts of a SAS Program What is a SAS STATEMENT? Summer 2009 BMTRY 789 Intro. To SAS Programming 4 DATA Step What takes place in a DATA step? Summer 2009 BMTRY 789 Intro. To SAS Programming 5 DATA Step = Do/Create Things What takes place in a DATA step? Input Data (what types?) Do END loops IF-THEN-ELSE statements Subset data: IF expression/ IF expression THEN DELETE Create and redefine variables Functions Interleave, merge, and update Summer 2009 BMTRY 789 Intro. To SAS Programming 6 PROC Step What takes place in a PROC step? Summer 2009 BMTRY 789 Intro. To SAS Programming 7 PROC Step = Produce Results What takes place in a PROC step? Perform specific analysis or function Sorting Printing Univariate Analysis Analysis of variance Regression… Summer 2009 BMTRY 789 Intro. To SAS Programming 8 PROC Step What PROCs have you learned about in your readings so far? Summer 2009 BMTRY 789 Intro. To SAS Programming 9 PROC Step What PROC would you use to produce Simple Descriptive Statistics? What about to produce a stem-and-leaf plot, boxplot, histogram, QQPlot, etc? Summer 2009 BMTRY 789 Intro. To SAS Programming 10 PROC Step broken down into subgroups How do you get the Proc Means output separately for men and women if you have a GENDER variable? What descriptive stats can you do on the non-numeric data? What Proc would you use? Summer 2009 BMTRY 789 Intro. To SAS Programming 11 PROC Step for Graphics? What PROCs can you use to produce graphs and charts? Summer 2009 BMTRY 789 Intro. To SAS Programming 12 PROC Step for Graphics? What is the difference between Proc Plot and GPlot? Proc Chart and Gchart? Summer 2009 BMTRY 789 Intro. To SAS Programming 13 DATA…How do we work with it? What type of data is this? Data EX1; INPUT Group$ X Y Z; DATALINES; Control 12 17 19 Treat 23 . 29 Control 19 18 16 Treat 22 22 . ; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 14 SAS INPUT & INFILE Statements In what 2 situations do you use an INPUT statement? 1. 2. ________ ________ When is the only time that you use an INFILE statement? What is the INPUT statement really accomplishing? (i.e. why does SAS need it) Summer 2009 BMTRY 789 Intro. To SAS Programming 15 SAS INPUT Statement Before you can analyze your data with SAS software, your data must be in a form that SAS can read If you put raw data directly in your SAS program, then your data are internal You may want to do this when you have small amounts of data, or you are testing a program with a small test data set INPUT is used to read data from an external source or from internal data contained in your SAS program The INFILE statement names an external file from which to read the data; otherwise the CARDS (or DATALINES) statement is used to precede the internal data Summer 2009 BMTRY 789 Intro. To SAS Programming 16 External raw data files Usually you will want to keep your data in external files, separating the data from the program. Use the INFILE statement to tell SAS the filename and path (directory) of the external file containing the data. The INFILE statement follows the DATA statement and must precede the INPUT statement. After the INFILE keyword, the file path and name are enclosed in single quotes. Summer 2009 BMTRY 789 Intro. To SAS Programming 17 INPUT statement example Data one; Data one; INFILE ‘c:\MyData\diabetes.dat’; Input a$ b c; Input a$ b c; cards; Run; 8 76 5 7 43 9 1 22 2 ; Run; *Reading from an external file into *Reading internal data to a SAS data set create SAS data set ‘one’ Summer 2009 BMTRY 789 Intro. To SAS Programming 18 *Note - SAS log Whenever you read data from an external file, SAS gives some very valuable information about the file in the SAS log Always check this information after you read a file as it could indicate problems A simple comparison of the number of records read from the INFILE with the number of observations in the SAS data set can tell you a lot about whether or not SAS is reading your data correctly Summer 2009 BMTRY 789 Intro. To SAS Programming 19 *Note – Long Records In some operating environments, SAS assumes external files have a record length of 256 or less. (The record length is the number of characters, including spaces, on a data line.) If you data lines are long, and it looks like SAS is not reading all your data, then use the LRECL= option in the INFILE statement to specify a record length at least as long as the longest record in your data file. INFILE ‘c:\MyData\Diabetes.dat’ LRECL=2000; Summer 2009 BMTRY 789 Intro. To SAS Programming 20 Controlling INPUT with Options in the INFILE statement The following options are useful for reading particular types of data files. Place these options after the filename in the INFILE statement. FIRSTOBS= This tells SAS at what line to begin reading data. This is useful if you have a data file that contains descriptive text or header information at the beginning and you want to skip over these lines to begin reading the data. OBS= This tells SAS to stop reading when it gets to that line in the raw data file. Summer 2009 BMTRY 789 Intro. To SAS Programming 21 Controlling INPUT with Options in the INFILE statement (cont.) MISSOVER By default, SAS will go to the next data line to read more data if SAS has reached the end of the data line and there are still more variables in the INPUT statement that have not been assigned values. The MISSOVER option tells SAS that if it runs out of data, don’t go to the next data line. Instead, assign missing values to any remaining variables before proceeding to the next line. Summer 2009 BMTRY 789 Intro. To SAS Programming 22 Controlling INPUT with Options in the INFILE statement (cont.) PAD You need this option when you are reading data using column or formatted input and some data lines are shorter than others. If a variable’s field extends past the end of the data line, then, by default, SAS will go to the next line to start reading the variable’s value. This option tells SAS to read data for the variable until it reaches the end of the data line, or the last column specified in the format or column range, whichever comes first. Summer 2009 BMTRY 789 Intro. To SAS Programming 23 Data Step: input statement There are three basic forms of the input statement: 1. List input (free form) – data fields must be separated by at least one blank. List the names of the variables, follow the name with $ for character data Example: Input Name$ Age; 2. Column input – follow the variable name (and $ for character) with a startingcolumn – endingcolumn Example: Input Name$ 1-15; 3. Formatted input – Optionally precede the variable name with @startingcolumn; follow the variable name with a SAS format designation Example: Input @1 Name$ 20. @21 DOB mmddyy8.; Summer 2009 BMTRY 789 Intro. To SAS Programming 24 LIST INPUT: Reading Raw Data Separated by Spaces If the values in your raw data file are all separated by at least one space, then using list input to read the data may be appropriate Any missing data must be indicated with a period Character data, if present, must be simple: no embedded spaces, and no values greater than eight characters in length. (Use the LENGTH statement to change the length) LENGTH Name$ 20.; If the data files contains dates or other values which need special treatment, then list input may not be appropriate INPUT Name$ Age Height; The $ after Name indicates that it is a character variable, whereas the Age and Height variables are both numeric Summer 2009 BMTRY 789 Intro. To SAS Programming 25 COLUMN INPUT: Reading Raw Data Separated by Columns If each of the variable’s values is always found in the same place in the data line, then you can use column input as long as all values are character or standard numeric Standard numeric data contain only number, decimal points, plus and minus signs, and E for scientific notation. Dates or numbers with embedded commas, for example, are not standard INPUT Name$ 1-10 Age 11-13 Height 14-18; The first variable, Name, is character and the data values are in columns 1 through 10. The Age and Height variables are both numeric, since they are not followed by a $, and data values for both of these variables are in the column ranges listed after their names Summer 2009 BMTRY 789 Intro. To SAS Programming 26 FORMATTED INPUT: Reading Raw Data NOT in Standard Format This is where you want to use a Formatted Input or a Mixed Input. Informats are useful anytime you have non-standard data Numbers with embedded commas or dollar signs are examples of non-standard data Dates are perhaps the most common non-standard data Using date informats, SAS will convert conventional forms of dates into a number, the number of days since January 1, 1960. This number is referred to as a SAS date value (0) Summer 2009 BMTRY 789 Intro. To SAS Programming 27 Difference between INFORMAT and FORMAT? INFORMATs give SAS special instructions for reading a variable FORMATs give SAS special instructions for writing a variable If specified in a DATA step, the name of the informat or format will be saved in the data set and will be printed by PROC CONTENTS Like the LABEL statement, these can also be used in the PROC step to customize your reports, but they would not be stored in the data set Summer 2009 BMTRY 789 Intro. To SAS Programming 28 Informats: 3 basic types Character, numeric, date Character: $informatw. Numeric: informatw.d Date: informatw. The $ indicates character informats, INFORMAT is the name of the informat, w is the total width, and d is the number of decimal places (numeric only) Two informats do not have names: $w., which reads standard character data, and w.d, which reads standard numeric data Summer 2009 BMTRY 789 Intro. To SAS Programming 29 Informats (cont.) The period in an informat is very important because it distinguishes an informat from a variable name, which, by default, cannot contain any special characters except the underscore INPUT Name : $10. Age : 3. Height : 5.1 DOB : MMDDYY10. *Selected Informats can be found in pp. 44-45 (3rd Ed) in “The Little SAS Book”. Summer 2009 BMTRY 789 Intro. To SAS Programming 30 Formatted Input Example INPUT Name : $16. Age : 3. +1 Type : $1. +1 Date MMDDYY10. (Score1 Score2 Score3 Score4 Score5) (4.1); The variable Name has an informat of $16., meaning that it is a character variable 16 columns wide. Variable Age has an informat of three, is numeric, three columns wide, and has no decimal places. The +1 skips over one column. Variable Type is character, and it is one column wide. Variable Date has an informat MMDDYY10. And reads dates in the form 10-31-1999 or 10/31/1999, each 10 columns wide. The remaining variables, Score1 through Score5, all require the same informat, 4.1. By putting the variables and the informat in separate sets of parentheses, you have only to list the informat once. Summer 2009 BMTRY 789 Intro. To SAS Programming 31 Mixing Input Styles List style is the easiest; column style is a bit more work; and formatted style is the hardest of the three. However, column and formatted styles do not require spaces (or other delimiters) between variables and can read embedded blanks. Sometimes you use one style, sometimes another, and sometimes the easiest way is to use a combination of styles. SAS is so flexible that you can mix and match any of the input styles for your own convenience. Summer 2009 BMTRY 789 Intro. To SAS Programming 32 Mixing Input Styles (cont.) With list style input, SAS automatically scans to the next non-blank field and starts reading. With column style input, SAS starts reading in the exact column that you specify. But with formatted input, SAS just starts reading-wherever the pointer is, that is where SAS reads. Sometimes you need to move the pointer explicitly, and you can do that by using the column pointer, @n, where n is the number of the column SAS should move to. Summer 2009 BMTRY 789 Intro. To SAS Programming 33 Mixed Input example INPUT ParkName$ 1-22 State$ Year @40 Acreage COMMA9.; 1--------------------------------------------------------------23----------------------------------------------------40----------------------- Yellowstone Everglades Yosemite Great Smokey Mountains Wolf Trap Farm ID/MT/WY 1872 * FL 1934 * CA 1864 * NC/TN 1926 * VA 1966 * 4,065,493 1,398,800 760,917 520,269 130 INPUT ParkName$ 1-22 State$ Year Acreage COMMA9.; Acreage would look like (It would start reading at the *): 4065 . . 5 . Summer 2009 BMTRY 789 Intro. To SAS Programming 34 Reading Multiple Lines of Raw Data per Observation In a typical raw data file each line of data represents one observation, but sometimes the data for each observation are spread out over more than one line. To tell SAS when to skip to a new line, you simply add line pointers to your INPUT statement. To read more than one line of raw data for a single observation, you simply insert a slash (/) into your INPUT statement when you want to skip to the next line of raw data. Summer 2009 BMTRY 789 Intro. To SAS Programming 35 Reading Multiple Lines of Raw Data per Observation (con.) The (#n) works the same as (/) but it is more fexible. The #n works by inserting the number of the column for that observation where you want to read your raw data. Nome AK 55 44 88 29 Miami FL … Summer 2009 INPUT City$ State$ / NormHi NormLo #3 RecHi RecLo; BMTRY 789 Intro. To SAS Programming 36 Reading Multiple Observations per Line of Raw Data (@@) When you have multiple observations per line of raw data, you can use double trailing at signs (@@) at the end of your INPUT statement. SAS will hold that line of data, continuing to read observations until it either runs out of data or reaches an INPUT statement that does not end with a double trailing @. This is also known as a “hard hold”. Nome AK 55 44 88 29 Miami FL 72 62 105 40 Atlanta . 59 . 12 INPUT City$ State$ NormHi NormLo RecHi RecLo @@; Summer 2009 BMTRY 789 Intro. To SAS Programming 37 Reading Part of a Raw Data File (@) You don’t have to read all the data before you tell SAS whether to keep an observation. Instead, you can read just enough variables to decide whether to keep the current observation. Similar to the @@, SAS will hold that line of data with a single trailing @. This is known as a “soft hold”. While the trailing @ holds that line, you can test the observation with an IF statement to see if it’s one you want to keep. If it is, you can then read the data for the remaining variables with a second INPUT statement. With the trailing single @, SAS will automatically start reading the next line of raw data with each INPUT statement. Summer 2009 BMTRY 789 Intro. To SAS Programming 38 Reading Part of a Raw Data File (@) Example Suppose you have a dataset containing heart and lung transplant information but you are trying to construct a dataset of only lung transplant patients. It is a very large data set that takes a lot of time to run so you don’t want to read it all in first and then select out the portion you want to keep. It would be better to read in only those data that you want initially. Summer 2009 BMTRY 789 Intro. To SAS Programming 39 Reading Part of a Raw Data File (@) Example (cont.) Heart 7823 12nov1989 Heart 6477 08sep1992 Lung 7231 22jul1995 Heart 2347 30jan1990 Lung 7842 12mar1998 DATA Lung; INFILE ‘c:\MyData\Trnsplnt.dat’; INPUT Type$ @; If Type = ‘Heart’ then DELETE; INPUT RecNum TranDt : Date9.; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 40 Reading external commadelimited data We have two choices when given this type of data We can use an editor and replace all the commas with blanks, or We can leave the commas in the data and use the DLM= option in the INFILE statement Data HtWt; Infile ‘c:\MyData\survey.txt’ DLM=‘,’; Input ID Gender$ Age Height Weight; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 41 Reading external commadelimited data (cont.) Another method besides the DLM= option is to use DSD in the INFILE This option performs several other functions besides treating commas as delimiters. If it finds two adjacent commas, it will assign a missing value It will allow text strings surrounded by quotes to be read into a character variable and will strip the quotes in the process Data HtWt; Infile ‘c:\MyData\survey.txt’ DSD; Input ID Gender$ Age Height Weight; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 42 Permanent SAS Data Sets A two level name…a Temporary SAS data set is the one level name that we have been using: LibraryName.DataSetName Temporary SAS data sets will not exist when you shut down the instance of SAS in which they were created. Data new; Set AIDS; Run; First define a SAS Library (Libref) Summer 2009 BMTRY 789 Intro. To SAS Programming 43 Libname Statement Use this statement to define your SAS Library location before using your SAS data sets Example: LIBNAME Annie ‘C:\SASDATA’; Proc Means Data = Annie.EX4A N MEAN STD; Var X Y Z; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 44 Creating Permanent SAS Data Sets Libname annie “C:\SASDATA”; Data Annie.EX1; INPUT Group$ X Y Z; DATALINES; Control 12 17 19 Treat 23 . 21 Control 19 18 16 Treat 22 22 . ; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 45 Using the Permanent SAS Data Sets Libname xyz “C:\SASDATA”; Title “Means from EX1”; Proc Means Data=xyz.EX1; Var X Y Z; Run; Summer 2009 BMTRY 789 Intro. To SAS Programming 46 Now let’s try the in-class problems listed on slide 1 Summer 2009 BMTRY 789 Intro. To SAS Programming 47