SAS Workshop Introduction to Enterprise Guide IOWA STATE UNIVERSITY

advertisement
SAS Workshop
Introduction to Enterprise Guide
Day 3 - Morning Session
IOWA STATE UNIVERSITY
May 11, 2016
Presented by: Mervyn Marasinghe
Introduction to SAS Enterprise Guide
Start SAS Enterprise Guide on a PC from the Start menu, or by double-clicking on the Enterprise
Guide icon on the desktop.
(Since you are a first time user, you may close the welcome window requesting you to open an
existing SAS program or a project)
The Project Tree
displays a hierarchical
view of the active
project. You can
delete, rename, and
reorder the items in
the project using this
window.
This is the workspace area. The
Process Flow window is the only
window open in this area at the start.
The Process Flow displays
the objects in your project
and the relationships
between them.
If you click on Servers, you can find
data on your local SAS server or on a
remote SAS server (if you have
access to one).
When additional windows are
produced in the workspace
area, as a result of running a
process, they may be opened
using tabs that will appear at
the top of this window.
Introduction to SAS Enterprise Guide
The default layout which windows consists of the Project Tree, the Server List and
the workspace area.
The workspace area is the main area of the SAS EG application and is used to
display, for example, your data, task results and process flows.
At first, the Process Flow is the only window that is open in the workspace area.
When you generate reports or open data, other windows open in the workspace with
a tabbed interface.
The work that you do in the workspace is saved in projects.
A project is a collection of objects: data, tasks, and results.
You can save a project and its contents to any location by selecting
File Save Project As… Project file is saved with the extension .egp.
The Project Tree displays a hierarchical view of the active project.
Let us execute an already written SAS program (progD1.sas) in EG.
Introduction to SAS Enterprise Guide
Click on Program Open Program and select progD1.sas from your folder. We will
see the program in the EG workspace:
Notice that the tab on top of the workspace has changed from Process Flow to
progD1. Execute the program by selecting the Run tab.
Introduction to SAS Enterprise Guide
The resulting log and output(s) produces new tabs in your workspace window.
The program also produces the SAS dataset named biology. The output is produced
as a SAS report in ODS format and as a pdf file (this is the default).
Examine each of these by selecting each of the tabs.
Introduction to SAS Enterprise Guide
Click on the down arrow near the progD1 tab on top of the workspace. You will see
the Process Flow item drop down. Select this to open the Process Flow window.
This is a simple example of a Process Flow diagram. Here we see a SAS program
run to produce two output objects as the results: the SAS dataset and the SAS Report
We do not yet see a task object as it is hidden inside the SAS program. Next we shall
repeat the same process but request that the output be also directed to a pdf file.
Introduction to SAS Enterprise Guide
To do this, select Tools
Options… then, on the left panel of the resulting dialog
box, click on the Results menu item. Then check the PDF box on the right.
Now re-open the process flow diagram and click on Run Process Flow (menu item
found under the Run tab. The entire process will be re-executed and the output
replaced. Additionally, the Results-PDF tab now appears on the progD1 window.
Open the Process Flow window as before using the progD1 tab on top of the
workspace to view the new process flow chart with the PDF output object.
Introduction to SAS Enterprise Guide
Next we will read the biology data set from a text file directly and do a complete
statistical analysis within EG (as a new process). Save the current project and exit.
Start up EG. First let us import the text file and create a SAS data set needed for the
analysis. In EG, click on File
Import Data, navigate to your folder on the Desktop
and double-click on the biology.txt file. The import data wizard shown below opens:
Introduction to SAS Enterprise Guide
Page through the wizard using the Next> button at the bottom of page. On page 2,
select the Fixed columns button and mark the column breaks by clicking on the
ruler.
This is the easiest method to access a fixed width data file. As this file does not
contain field names in the first line of data go to the next page using Next>.
Introduction to SAS Enterprise Guide
Defining field attributes is the most important task in importing data. Here you get to
assign names, informats, formats, and labels to your SAS variables. For example,
click on Modify… button (below right) and set these values for Field 1 (F1):
Introduction to SAS Enterprise Guide
Name and Label are the only attributes that need to be reset. Use defaults for others.
For example, we could change the Input Format attribute for ID to 4., but it will make
no difference. Press OK.
Next select Field 2 (F2) and click on Modify. Next screen shot shows the defining of
values for Field 2. We changed toe informat to $4. Using the drop down menu …
Introduction to SAS Enterprise Guide
We continue until attributes for all 6 fields are defined. Note that F7 is omitted from
included in the SAS data set, by unchecking the corresponding Inc box on the left.
Introduction to SAS Enterprise Guide
We exit from the Import data wizard. Note that the current Process Flow is as
displayed below where the Import Data task object in the middle.
Right click on the Data Imported object and select Open. The SAS data set is
displayed:
Introduction to SAS Enterprise Guide
Now we will use several tasks on this data set. First let us obtain a simple SAS
Report of the data. Go back to the Process Flow and click the data object (to make it
active) and select Tasks Describe List Data… from the menu bar.
The following dialog box opens:
Introduction to SAS Enterprise Guide
We use this dialog to assign roles to each variable. Here we want all of the variables
to appear in the SAS Report. We can drag and drop each variable from the left box to
the right box (under the empty space below List variables role), one at a time.
Or shift-click on first and last variable names on the left to select them all, and drag
them under List variables on the right pane. Click Run button at the bottom.
Note that there are other options that are available on the panel on the left margin.
Introduction to SAS Enterprise Guide
The process flow is updated with List Data object that produces two output objects
(by default).
Double-click on either object to view the results. Or select List Data under process
flow menu and use tabs on top of the window:
Introduction to SAS Enterprise Guide
Let us perform a statistical task on the same SAS data set. Make the data object
active and select Tasks Describe Summary Statistics Wizard...
On page 2 of the wizard, we get to assign variables to roles, as shown below
Just hi-lite variables (on the left panel) and use arrows to move them into panels on
the right) or use drag-and-drop.
We use Next> to go to page 3 of the wizard where we select statistics and some
graphics to be output.
Introduction to SAS Enterprise Guide
We select the Histogram in addition
to the summary statistics and modify the
default statistics to be output using the Edit
button on the top right corner. Opens an
Edit Statistics window:
We select the Median from
the Percentile tab, to be
added to the list of summary
statistics to be computed.
Click Ok, then Next> provide
a title and click Finish.
Introduction to SAS Enterprise Guide
Again the process flow is updated with Summary Statistics task object that produces
a SAS report and a pdf file. Double-click on the SAS report object to see the output:
Introduction to SAS Enterprise Guide
The final process flow diagram is:
Any portion of the process flow may be re-executed. For example, notice that there
are too many decimal places for computed statistics printed in the summary table.
Introduction to SAS Enterprise Guide
We could modify the Summary Statistics task after changing the number of decimal
places printed for all computed statistics to, say 2, and then re-executing the task.
Right click on the Summary Statistics and select Modify Summary Statistics
The Summary Statistics wizard appears. Go to page 3 using Next> repeatedly.
Using the Edit button on the top right corner and open the Edit Statistics window.
Under the Basic tab, change the value of Decimal places to 2 (as shown on the next
slide).
Introduction to SAS Enterprise Guide
Click Ok, then Finish. Select yes to the next prompt (i.e., replace previous results).
This branch of the tree is then re-run automatically. The outputs will be re-produced
replacing previous results.
The statistics table resulting from this modification is shown below.
If the entire process flow is re-run, the originally imported data files (such as text or
Excel files) will be re-accessed. So if they were moved or renamed the data set will
not be created.
Introduction to SAS Enterprise Guide
Also most SAS data sets created are temporary. So if you save a process for running
it at a later time, make sure that you re-create those SAS data sets by atleast rerunning those nodes.
Any part of a Process Flow diagram may be re-run by highlighting the parts to be
re-run and using the Run button.
A query is a request to retrieve data from one or more data sources and perform
tasks such as joining multiple data sets, grouping, classifying, sub-setting, or sorting
data. A query might also require that new variable(s) to be created and added to a
data set.
The results of queries can then be analyzed to answer statistical questions.
In EG, we create queries using the Query Builder (a topic to be discussed in the next
EG Session). This is a task available at Tasks Data Query Builder…
Download