You - Missouri Census Data Center

advertisement
Chapter 1: Getting Started
•
•
•
•
•
•
•
Overview of basic components
Data Sets
Data Steps
Windowing (DM) environment
Submitting programs
Reviewing Output
System options
The SAS Language
•
•
•
•
Actually, SAS contains several languages.
SAS statements vs. SAS commands
All statements end with “;” .
Free format language. Can have
– multiple statements per line
– multiple lines for a single statement.
– Neither is a good idea most of the time.
SAS Names
• Used to be limited to 8 characters. With v7
the limit went to 32.
• First character a letter or underscore (_).
• Subsequent characters in name can be
letters, digits or underscores.
• Case is significant only for cosmetic/display
purposes. SAS stores names in mixed case
but will match totpop and TotPop.
Exception: librefs & filerefs
• Names associated with SAS data libraries
and ascii files are still limited to 8
characters. (Because of platform limits on
MVS, CMS, others?)
• Also applies to names of SAS formats
created with PROC FORMAT.
SAS Comments
• Two kinds:
– “statement style” begin with “*” and end with
“;”
– Other kind begin with “/*’ and end w. “*/”
• If you use statement-style for your real
comments then you can use the other kind
to “comment out” sections of code.
Ex. of “Commented Out” Code
•
•
•
•
•
•
•
•
•
•
•
•
•
/* ===========Begin commented out code=========
*---Step 1: Read the data--;
data one; infile ‘name_of_the_file’;
input a b c d e f g;
if a=1 then a=0; else if b=2 then b=3; *--edit vals;
run;
*--Step 2: Sort and Print the Data--;
proc sort data=one; by d e g; run;
proc print data=one; by d; title ‘Data Set One’; run;
================end comment================= */
*---Step 3: Begin statistical analysis of the data--;
proc univariate data=one ;
----etc-----
SAS Data Sets
• This is where SAS stores the data.
• Statistical vs. database terminology:
– Observations = Rows
– Variables
= Columns
– Data Sets
= Tables
• The observations describe entities, the
variables are attributes of those entities.
• In our environment the rows are usually
geographic areas and the variables are
summary statistics regarding those areas.
Variable Attributes
• Type (character or numeric)
• Length (3-8 bytes for nums, 1-2000 for
character strings.
• Label: Up to 256 characters.
• Format: Used by default when the variable
is displayed. E.g. comma9. or $mocnty.
• Informat: Format used to convert typed
values entered interactively.
Date Variables
• No such thing as an explicit date var type.
• Dates are stored as numeric values as the #
of days since Jan. 1, 1960.
• Format codes are used to read and display
data variables. I.e. read it with mmddyy6.
And display it with date9.
Sample Pgm: Dates
•
•
•
•
•
•
•
•
•
•
•
•
•
data dates;
input dateval mmddyy6. sales;
format dateval date9.;
datalines;
020198 1234
122501 5678
80199
725
091101 1,023
run;
options ls=80;
proc print data=dates;
title 'Listing of dates';
run;
Sample Pgm: Dates - Output
• Obs
•
•
•
•
1
2
3
4
dateval
sales
01FEB1998
25DEC2001
01AUG1999
11SEP2001
1234
5678
725
.
Program Steps
• Data steps and Proc(edure) steps.
• Some stmts (e.g. title, options, %let) are not
part of any specific step. (“global
statements”).
• Step boundaries:
– Begin with data or proc statements.
– End with run stmt or next step or EOF.
• Highly recommended: always use run;
How Many Steps?
•
•
•
•
•
•
•
•
•
•
•
•
•
•
data dates2;
input date mmddyy6. sales;
informat sales comma.;
format date date9.;
datalines;
020198 1234
122501 5678
10299
725
091101 1,023
run;
options ls=80;
proc print data=dates2;
proc sort data=dates2; by date;
Data Step Cycles (“Built-In Loop”)
• Most data steps have 1 and only 1 data
source. Usually an infile/input or a set or
merge statement represent the data source.
• SAS executes the data step stmts once for
each input line/observation.
• The data step stmts are compiled and, if no
errors, executed -- once for each set of data.
• Variable _n_ (“automatic”) counts the
cycles through the implicit loop.
SAS Windowing Environment
• AKA DM - “Display Manager”
• You can run SAS without using it -- edit
code with a text editor and use batch mode.
• It takes some getting used to, but it’s worth
it.
• The Windows version is different than all
the rest. Platform independence vs. MS
software standards clashed and MS won.
The Enhanced Editor
• Only mentioned in TLSB. It is here and it
makes the PROGRAM window obsolete
under Windows. (But still needed for Unix
and all other platforms.)
• It is a Windows editor. The text editor used
in the Program window was modeled after
the SPF editor developed by IBM in the
70s.
Major Differences
• Code does not disappear and have to be
recalled when you submit it.
• Code is color-coded as you type to serve as
a serious debugging aide.
• Does not support many of the commands
that the pgm window does. New users won’t
care.
• You can have bunches of them open at the
same time.
Other Windows
• Log: see what happened with submitted code.
Error messages, notes, warnings,etc.
• Output: “Printed” output goes here. Results of
most SAS procs.
• Explorer and Results.
• Notepad: another text editor; for data usually.
• Keys: Define function keys. Different ones for
different window types.
• Filename, Libname, Dir and Var very handy.
Ways to Issue Commands
• Not only are there lots of different windows
with lots of different commands, but there
are lots of ways to specify those commands.
• Pull down menus. (The pmenu option can be
used to turn on/off these menus.)
• Toolbar icons associated with commands.
• Entering command in the Command box.
• Function keys! (Not mentioned in TLSB).
Accessing Windows
• To bring a window to the foreground and
make it the “active” window:
–
–
–
–
Click within it if it is visible
Enter the name of the window as a command
Use Window pull-down and select it.
Use a function key associated with the window
name. (E.g. if F10 = “Log”, just hit F10 to go to
the log window.
– Enter Next command to go to “next” window.
– Click on the window name tab in bar at bottom.
Submitting Code
• Differs somewhat between pgm window
and Enhanced Editor window.
• If text is selected in the window then only
selected text is submitted. Otherwise, the
entire program is submitted.
• In Program window you need to use Recall
command to bring the submitted code back.
Viewing Results of Submit
• The log window tells you what happened.
Rather detailed. Error messages color
coded.
• If no errors and code executes, “printed”
results go to Output window and/or to a
html file (output destinations can be
specified.)
• Results window is a sort of index to the
Output window.
Compile & Go Phases
• Code must be compiled prior to executing.
The execution phase will be skipped if there
are errors at compile phase.
• In batch runs, SAS will set “options obs=0”
when it detects an error. In this mode, later
steps will compile but not execute.
• Once a step fails, it can cause lots of bogus
error messages in subsequent steps.
SAS System Options
• System opts control all sorts of things
regarding how SAS runs.
• Options can be specified in many ways at
different times (at SAS startup, or during
execution.)
• Can be specified via:
– config file with “-set ..” stmt
– as a parameter at invocation
– using options statement or Options window.
Common Options
• Printing options:
– linesize= ; pagesize= ; date/nodate;
center/nocenter; number/nonumber
•
•
•
•
DMS, DMR (invocation options)
Obs= (limit # observations to process)
(no)source (show source code in log)
(no)mprint (show code generated by
macros)
Sample SAS Code
• Follow the URL:
mcdc2.missouri.edu/cgi-bin/uexplore?/pub/data/indctrs@secure
• Click on the “Tools” subdirectory and then on the
mocopop.sas file.
• The direct URL for this file is:
mcdc2.missouri.edu/data/indctrs/Tools/mocopop.sas
Browsing .sas, .log and .lst Files
• The Windows Registry may associate the
SAS program with these 3 filetypes.
• With IE, this can cause an instance of SAS
to start up when all you want to do is
browse the contents of a .sas file.
• You can do a manual remove of the registry
entry.
• Netscape does not recognize the association.
mocopop.sas
• You are NOT yet expected to understand
(completely) most of what’s in the program.
• It has lots of steps, and accesses a set of 5
data sources -- 4 SAS data sets from the
archive and 1 dbf file.
• A common key, fipco, is kept on each data
set. Such keys are critical.
• Step 5 uses a merge stmt to bring all the
data together into a single permanent SAS
data set. Note the by fipco; statement.
mocopop.sas - 2
• Note how all data definition statements -libname and filename statements -- are
grouped at the top of the program. Not
required, but a good convention.
• Note (extensive) use of only statement-style
comments. In debugging this setup, we used
/* - */ “commenting out” extensively.
mocopop.sas - 3
• Note the “classic” SAS data step for
accessing the data archive:
– data <set-name>;
– set / merge <set(s)>;
(often with data set options specified).
– where statement to filter observations.
– Assignment stmts to edit data or create derived
variables. Sometimes as part if if … then .
– Keep or drop stmts to specify variables to be
included on output set.
mocopop.sas - 4
• Note ability to access dbf file via proc dbf.
Could also have used proc import.
• Note use of attrib statements in Step 5 to
establish not only the attributes of the
variables (labels, length/types and formats)
but also the order of the variables on the
output set.
• Note that the obs identifier variables are of
type character, but all indicator variables
are numeric.
mocopop.sas - 5
• The creation of indctrs.mocopopg as a sas
data step view is way too advanced for us
now.
• For now, just know that there is a way to
combine data sets logically rather than
physically. Indctrs.mocopopg looks like a
data set to SAS, but is stored as code, not
actual data.
mocopop.sas - 6
• The step to aggregate the data in mocopopg
to DED regions is still further beyond what
we have covered so far.
• Involves use of an application macro named
%agg. This macro is like an extension of
the language for us.
• Aggregation of our data is a critically
important capability.
mocopop.sas - 7
• Use the uexplore utility application to
browse the indictrs data directory.
• Display hypercon reports for the mocopop
and mocopopg data sets.
• Extract data regarding the pop change over
the decade of the 90’s with components of
change. Create a listing report and a csv
(opened with Excel) file.
mocopop.sas - Summary
• A typical “real world” SAS program.
• In a way, quite complex; but with SAS it
becomes just a little long.
• Most of the processing is fairly routine
once you have mastered a small subset of
the SAS language.
• Organizing such applications into carefully
structured and commented modules makes
it easy for us to document how we got our
data.
The Data Archive
• The source of most data you’ll be working with.
Specialists create these sets and verify the data.
• The uexplore/xtract/hypercon tools are - for now critical in making these data accessible to the
outside world. Wide use helps insure reliability.
• For you, access directly via SAS is much faster
and flexible.
• The key indicators data base is just one -- very
important -- component of the archive.
Download