SAS Workshop Introduction to Enterprise Guide Day 3 - Afternoon Session

advertisement
SAS Workshop
Introduction to Enterprise Guide
Day 3 - Afternoon Session
IOWA STATE UNIVERSITY
May 11, 2016
Presented by: Mervyn Marasinghe
Introduction to SAS Enterprise Guide
Start up a new project by clicking on the EG icon.
Click File
Import Data, select the Anthro.xls file from your Desktop and click Open.
The Import Data wizard appears (we’ve seen this before) that we can page through using
Next> (and <Back if needed). We click Next> and go to page 2.
Check to make sure we have the correct worksheet and proper options are checked. In our
case, they are ok so we click Next> , go to page 3.
Introduction to SAS Enterprise Guide
Here we check to make sure that SAS is importing the fields correctly. We could change the
labels and some of the output formats but it is not necessary. Note that all fields are numeric.
We decide not to change anything here,
so click Finish.
The SAS data set created is displayed
here along with the new Process Flow.
Introduction to SAS Enterprise Guide
Now we’ll import an Excel file containing some personal information such as age and gender
of the participants in this study. The name of this file is AnthroPersonal.xls
Data files may be selected in their location and dragged and dropped in the Process Flow
area in EG. Then an Import Data wizard opens automatically.
We select worksheet Sheet 1 and click Next> to Page 3 of the Import Data wizard:
On page 3, we can adjust output formats that are most suitable for printing. Here we see
that nothing needs to be changed.
Introduction to SAS Enterprise Guide
Click Next> to leave page 2 and go to page 3 of the Import Data wizard.
Select Next> and then Finish. We can see the data set and the modified Process Flow.
Now that we have two data sets, first part of our analysis is to combine the information on
both data sets. Recall that both data sets have one common variable: Id
Right click on the SAS data set imported from Anthro.xls and select Query Builder…
Introduction to SAS Enterprise Guide
Now click the Add Tables button on the Query Builder wizard.
Your Project folder will open. Two icons that appear to have the same name appears. If
you select a data set (i.e., click on the icon), you will be identify which data set it is.
Select the data set imported from AnthroPersonal.txt. See below for a screen shot.
You can move to the appropriate Project folder if necessary (we are already there since
this is the only project open).
Introduction to SAS Enterprise Guide
After selecting the file, click Open. Make sure that the File name: field says Data Imported
from AnthroPersonal.txt
The data will be added to the Query Builder. See a screen shot of the Query Builder wizard
as it must look after the above action.
Introduction to SAS Enterprise Guide
Now that we have both data sets in the Query Builder. We may select variables from each
data sets to create an output data set. Query Builder will automatically create Proc SQL
statements to do the join operation.
We hold down the Shift key and select the variables you want included in the join (from
each input table at a time). Using the left mouse drag the selected variables to the right
side pane
Note that we select the variable Id only from one of the input data tables.
Introduction to SAS Enterprise Guide
The query builder will now display the merged file:
You may examine at the Proc SQL code by clicking on the Preview button. The code appears
on a separate window that you may close after having a look.
Recall that we may create new variables when we create datasets in a SAS Program. We now
learn how to create a new variable using the Query Builder…
Introduction to SAS Enterprise Guide
Recall that the Query Builder wizard is still open. Click on the Computed Columns button (on
the top panel on the left) and the following dialog box opens:
Click on the New button and the following New Computed Column wizard opens.
Check the last option Advanced Expression and click Next> This will take you to page 2 of the
wizard, shown on the next slide.
Introduction to SAS Enterprise Guide
In this window enter the expression
10000.0*Mass/Height**2
in the Enter an expression pane.
In constructing the expression, you may use
any of the symbols displayed below this pane.
Also you may click open the Functions list and
double-click on any function available there to
insert it anywhere in the expression.
Click Next> to go to page 3 and enter new values for Column Name and Label.
Then click on Change… to select an
appropriate Format. This is done on the
Formats wizard that pops up, as shown on the
next slide.
Introduction to SAS Enterprise Guide
For BMI, and appropriate format would be a numeric format like w.d or bestw.d. Using the
Formats dialog box we set Best5.1 as the format, as shown above. Click Ok and back on page
3, click Finish. Now we see that we have a new computed variable.
Introduction to SAS Enterprise Guide
We can create as many as we wish but this will do for us here. Click the Close button on the
Computed Columns box (shown above). Now you will see the new variable in Query Builder:
If we wish, we may change/add attributes of any of the other columns. Select the Column Name
and clicking on the third button on the right margin (Properties button:
). This opens a
Properties for … dialog that allows you to specify labels, formats etc., for the selected column.
Click the Run button and you will see a display of the combined data table and an updated
Process Flow diagram:
Introduction to SAS Enterprise Guide
Introduction to SAS Enterprise Guide
If we check the output data we will see that the two data sets are correctly merged although
the observations in the two input data sets are in different order of values for Id.
If the tables to be joined do not have columns that have the same name and data type, a
warning is displayed requesting that a manual join be performed. We will illustrate manual
join later.
We can perform any statistical analysis on the merged data set that we wish using the Tasks
list. For example some summary statistics can be produced using Tasks
Describe
Summary Statistics Wizard that we already used earlier in Part A today.
Instead, let us perform a two sample t-test this time to test if the BMI means are different for
the males and female populations.
Select the merged data set object on the Process Flow diagram and select Tasks
ANOVA
t Test (see next slide for a screen shot of the result of this action).
On the resulting t Test task dialog check the Two Sample option. You may choose any of the
options on the left pane to continue the analysis.
Introduction to SAS Enterprise Guide
Now select Data and drag Gender from the left pane onto Classification variable on the right
pane and similarly drag BMI to be the Analysis variable.
Select Plots and check Histogram and Normal quantile-quantile (Q-Q) plot. Click Run
Introduction to SAS Enterprise Guide
Introduction to SAS Enterprise Guide
You can see the SAS Report produced from this analysis. The current Process Flow diagram is:
The user is able to customize the output produced. Right click on the t Test task object (on the
Process Flow) and select Properties
This opens up a dialog box where you first select Results and then the button Customize
results…. (see next slide for a screen shot)
Under Customize results…., check the boxes that correspond to output formats you want to
request; here we checked SAS Report and RTF. Click Ok.
Right click the t Test task object and select >Run. Answer “yes” to replace the old output (on
the Process Flow diagram) with those resulting from this run.
Introduction to SAS Enterprise Guide
A screen shot of the modified Process Flow.
Selecting properties of t Test output
Use the tabs to peruse the different outputs.
Introduction to SAS Enterprise Guide
Select the merged data set object on the Process Flow diagram and select Tasks
Regression
Linear Regression. This will open Linear Regression task dialog.
Drag BMI and drop it as the Dependent Variable on the right pane.
Select all Anthro variables (except Id, Mass, and Height) and drag them to the right pane and
drop to the Explanatory variables role as shown. Add the the personal data variable Age as an
additional explanatory variable.
Introduction to SAS Enterprise Guide
Select Model from the list in the upper left panel. See next slide for a screen shot.
From the Model selection method drop list, select R-squared selection method.
Check some Model fit statistics in the lower middle pane, as shown. These statistics
will be computed for each selected model.
Introduction to SAS Enterprise Guide
Select Statistics from the list in the upper left panel.
Check a few items to be output such as confidence intervals for the parameters and
variance inflation factors (vif’s).
Introduction to SAS Enterprise Guide
Select Plots from the list in the upper left panel.
Select some Custom plots by checking items from the list provided, as shown above.
Introduction to SAS Enterprise Guide
Next click Run. The Linear Regression task is executed. We can peruse the output
produced. Notice that the plots output are for the full model.
What we notice is that too many models are produced by the R-Square method. Is
there an option to limit the number of models considered?
Under the Linear Regression task, click on Modify Task and check for such an
option. Since there isn’t one so we need to modify the actual SAS code to make this
work.
Click the Code tab under the Linear Regression task (see under the top menus).
Scroll down to the PROC REG step in the SAS program. Add the following line after
the SELECTION=RQUARE line: START=3 STOP=5 BEST=3 (Note the code
completion when you type!)
Note: In the current version of EG this will produce a separate SAS program called
Code for Linear Regression that you can modify. This then can be run as a
separate task.
Introduction to SAS Enterprise Guide
Click Run on top of the task window and check the output. We get a more
manageable output.
Introduction to SAS Enterprise Guide
The final Process Flow diagram
Download