Chapter 1: Getting Started • • • • • • • Overview of basic components Data Sets Data Steps Windowing (DM) environment Submitting programs Reviewing Output System options The SAS Language • • • • Actually, SAS contains several languages. SAS statements vs. SAS commands All statements end with “;” . Free format language. Can have – multiple statements per line – multiple lines for a single statement. – Neither is a good idea most of the time. SAS Names • Used to be limited to 8 characters. With v7 the limit went to 32. • First character a letter or underscore (_). • Subsequent characters in name can be letters, digits or underscores. • Case is significant only for cosmetic/display purposes. SAS stores names in mixed case but will match totpop and TotPop. Exception: librefs & filerefs • Names associated with SAS data libraries and ascii files are still limited to 8 characters. (Because of platform limits on MVS, CMS, others?) • Also applies to names of SAS formats created with PROC FORMAT. SAS Comments • Two kinds: – “statement style” begin with “*” and end with “;” – Other kind begin with “/*’ and end w. “*/” • If you use statement-style for your real comments then you can use the other kind to “comment out” sections of code. Ex. of “Commented Out” Code • • • • • • • • • • • • • /* ===========Begin commented out code========= *---Step 1: Read the data--; data one; infile ‘name_of_the_file’; input a b c d e f g; if a=1 then a=0; else if b=2 then b=3; *--edit vals; run; *--Step 2: Sort and Print the Data--; proc sort data=one; by d e g; run; proc print data=one; by d; title ‘Data Set One’; run; ================end comment================= */ *---Step 3: Begin statistical analysis of the data--; proc univariate data=one ; ----etc----- SAS Data Sets • This is where SAS stores the data. • Statistical vs. database terminology: – Observations = Rows – Variables = Columns – Data Sets = Tables • The observations describe entities, the variables are attributes of those entities. • In our environment the rows are usually geographic areas and the variables are summary statistics regarding those areas. Variable Attributes • Type (character or numeric) • Length (3-8 bytes for nums, 1-2000 for character strings. • Label: Up to 256 characters. • Format: Used by default when the variable is displayed. E.g. comma9. or $mocnty. • Informat: Format used to convert typed values entered interactively. Date Variables • No such thing as an explicit date var type. • Dates are stored as numeric values as the # of days since Jan. 1, 1960. • Format codes are used to read and display data variables. I.e. read it with mmddyy6. And display it with date9. Sample Pgm: Dates • • • • • • • • • • • • • data dates; input dateval mmddyy6. sales; format dateval date9.; datalines; 020198 1234 122501 5678 80199 725 091101 1,023 run; options ls=80; proc print data=dates; title 'Listing of dates'; run; Sample Pgm: Dates - Output • Obs • • • • 1 2 3 4 dateval sales 01FEB1998 25DEC2001 01AUG1999 11SEP2001 1234 5678 725 . Program Steps • Data steps and Proc(edure) steps. • Some stmts (e.g. title, options, %let) are not part of any specific step. (“global statements”). • Step boundaries: – Begin with data or proc statements. – End with run stmt or next step or EOF. • Highly recommended: always use run; How Many Steps? • • • • • • • • • • • • • • data dates2; input date mmddyy6. sales; informat sales comma.; format date date9.; datalines; 020198 1234 122501 5678 10299 725 091101 1,023 run; options ls=80; proc print data=dates2; proc sort data=dates2; by date; Data Step Cycles (“Built-In Loop”) • Most data steps have 1 and only 1 data source. Usually an infile/input or a set or merge statement represent the data source. • SAS executes the data step stmts once for each input line/observation. • The data step stmts are compiled and, if no errors, executed -- once for each set of data. • Variable _n_ (“automatic”) counts the cycles through the implicit loop. SAS Windowing Environment • AKA DM - “Display Manager” • You can run SAS without using it -- edit code with a text editor and use batch mode. • It takes some getting used to, but it’s worth it. • The Windows version is different than all the rest. Platform independence vs. MS software standards clashed and MS won. The Enhanced Editor • Only mentioned in TLSB. It is here and it makes the PROGRAM window obsolete under Windows. (But still needed for Unix and all other platforms.) • It is a Windows editor. The text editor used in the Program window was modeled after the SPF editor developed by IBM in the 70s. Major Differences • Code does not disappear and have to be recalled when you submit it. • Code is color-coded as you type to serve as a serious debugging aide. • Does not support many of the commands that the pgm window does. New users won’t care. • You can have bunches of them open at the same time. Other Windows • Log: see what happened with submitted code. Error messages, notes, warnings,etc. • Output: “Printed” output goes here. Results of most SAS procs. • Explorer and Results. • Notepad: another text editor; for data usually. • Keys: Define function keys. Different ones for different window types. • Filename, Libname, Dir and Var very handy. Ways to Issue Commands • Not only are there lots of different windows with lots of different commands, but there are lots of ways to specify those commands. • Pull down menus. (The pmenu option can be used to turn on/off these menus.) • Toolbar icons associated with commands. • Entering command in the Command box. • Function keys! (Not mentioned in TLSB). Accessing Windows • To bring a window to the foreground and make it the “active” window: – – – – Click within it if it is visible Enter the name of the window as a command Use Window pull-down and select it. Use a function key associated with the window name. (E.g. if F10 = “Log”, just hit F10 to go to the log window. – Enter Next command to go to “next” window. – Click on the window name tab in bar at bottom. Submitting Code • Differs somewhat between pgm window and Enhanced Editor window. • If text is selected in the window then only selected text is submitted. Otherwise, the entire program is submitted. • In Program window you need to use Recall command to bring the submitted code back. Viewing Results of Submit • The log window tells you what happened. Rather detailed. Error messages color coded. • If no errors and code executes, “printed” results go to Output window and/or to a html file (output destinations can be specified.) • Results window is a sort of index to the Output window. Compile & Go Phases • Code must be compiled prior to executing. The execution phase will be skipped if there are errors at compile phase. • In batch runs, SAS will set “options obs=0” when it detects an error. In this mode, later steps will compile but not execute. • Once a step fails, it can cause lots of bogus error messages in subsequent steps. SAS System Options • System opts control all sorts of things regarding how SAS runs. • Options can be specified in many ways at different times (at SAS startup, or during execution.) • Can be specified via: – config file with “-set ..” stmt – as a parameter at invocation – using options statement or Options window. Common Options • Printing options: – linesize= ; pagesize= ; date/nodate; center/nocenter; number/nonumber • • • • DMS, DMR (invocation options) Obs= (limit # observations to process) (no)source (show source code in log) (no)mprint (show code generated by macros) Sample SAS Code • Follow the URL: mcdc2.missouri.edu/cgi-bin/uexplore?/pub/data/indctrs@secure • Click on the “Tools” subdirectory and then on the mocopop.sas file. • The direct URL for this file is: mcdc2.missouri.edu/data/indctrs/Tools/mocopop.sas Browsing .sas, .log and .lst Files • The Windows Registry may associate the SAS program with these 3 filetypes. • With IE, this can cause an instance of SAS to start up when all you want to do is browse the contents of a .sas file. • You can do a manual remove of the registry entry. • Netscape does not recognize the association. mocopop.sas • You are NOT yet expected to understand (completely) most of what’s in the program. • It has lots of steps, and accesses a set of 5 data sources -- 4 SAS data sets from the archive and 1 dbf file. • A common key, fipco, is kept on each data set. Such keys are critical. • Step 5 uses a merge stmt to bring all the data together into a single permanent SAS data set. Note the by fipco; statement. mocopop.sas - 2 • Note how all data definition statements -libname and filename statements -- are grouped at the top of the program. Not required, but a good convention. • Note (extensive) use of only statement-style comments. In debugging this setup, we used /* - */ “commenting out” extensively. mocopop.sas - 3 • Note the “classic” SAS data step for accessing the data archive: – data <set-name>; – set / merge <set(s)>; (often with data set options specified). – where statement to filter observations. – Assignment stmts to edit data or create derived variables. Sometimes as part if if … then . – Keep or drop stmts to specify variables to be included on output set. mocopop.sas - 4 • Note ability to access dbf file via proc dbf. Could also have used proc import. • Note use of attrib statements in Step 5 to establish not only the attributes of the variables (labels, length/types and formats) but also the order of the variables on the output set. • Note that the obs identifier variables are of type character, but all indicator variables are numeric. mocopop.sas - 5 • The creation of indctrs.mocopopg as a sas data step view is way too advanced for us now. • For now, just know that there is a way to combine data sets logically rather than physically. Indctrs.mocopopg looks like a data set to SAS, but is stored as code, not actual data. mocopop.sas - 6 • The step to aggregate the data in mocopopg to DED regions is still further beyond what we have covered so far. • Involves use of an application macro named %agg. This macro is like an extension of the language for us. • Aggregation of our data is a critically important capability. mocopop.sas - 7 • Use the uexplore utility application to browse the indictrs data directory. • Display hypercon reports for the mocopop and mocopopg data sets. • Extract data regarding the pop change over the decade of the 90’s with components of change. Create a listing report and a csv (opened with Excel) file. mocopop.sas - Summary • A typical “real world” SAS program. • In a way, quite complex; but with SAS it becomes just a little long. • Most of the processing is fairly routine once you have mastered a small subset of the SAS language. • Organizing such applications into carefully structured and commented modules makes it easy for us to document how we got our data. The Data Archive • The source of most data you’ll be working with. Specialists create these sets and verify the data. • The uexplore/xtract/hypercon tools are - for now critical in making these data accessible to the outside world. Wide use helps insure reliability. • For you, access directly via SAS is much faster and flexible. • The key indicators data base is just one -- very important -- component of the archive.