SAS Workshop Introduction to Enterprise Guide Day 3 - Afternoon Session IOWA STATE UNIVERSITY May 11, 2016 Presented by: Mervyn Marasinghe Introduction to SAS Enterprise Guide Start up a new project by clicking on the EG icon. Click File Import Data, select the Anthro.xls file from your Desktop and click Open. The Import Data wizard appears (we’ve seen this before) that we can page through using Next> (and <Back if needed). We click Next> and go to page 2. Check to make sure we have the correct worksheet and proper options are checked. In our case, they are ok so we click Next> , go to page 3. Introduction to SAS Enterprise Guide Here we check to make sure that SAS is importing the fields correctly. We could change the labels and some of the output formats but it is not necessary. Note that all fields are numeric. We decide not to change anything here, so click Finish. The SAS data set created is displayed here along with the new Process Flow. Introduction to SAS Enterprise Guide Now we’ll import an Excel file containing some personal information such as age and gender of the participants in this study. The name of this file is AnthroPersonal.xls Data files may be selected in their location and dragged and dropped in the Process Flow area in EG. Then an Import Data wizard opens automatically. We select worksheet Sheet 1 and click Next> to Page 3 of the Import Data wizard: On page 3, we can adjust output formats that are most suitable for printing. Here we see that nothing needs to be changed. Introduction to SAS Enterprise Guide Click Next> to leave page 2 and go to page 3 of the Import Data wizard. Select Next> and then Finish. We can see the data set and the modified Process Flow. Now that we have two data sets, first part of our analysis is to combine the information on both data sets. Recall that both data sets have one common variable: Id Right click on the SAS data set imported from Anthro.xls and select Query Builder… Introduction to SAS Enterprise Guide Now click the Add Tables button on the Query Builder wizard. Your Project folder will open. Two icons that appear to have the same name appears. If you select a data set (i.e., click on the icon), you will be identify which data set it is. Select the data set imported from AnthroPersonal.txt. See below for a screen shot. You can move to the appropriate Project folder if necessary (we are already there since this is the only project open). Introduction to SAS Enterprise Guide After selecting the file, click Open. Make sure that the File name: field says Data Imported from AnthroPersonal.txt The data will be added to the Query Builder. See a screen shot of the Query Builder wizard as it must look after the above action. Introduction to SAS Enterprise Guide Now that we have both data sets in the Query Builder. We may select variables from each data sets to create an output data set. Query Builder will automatically create Proc SQL statements to do the join operation. We hold down the Shift key and select the variables you want included in the join (from each input table at a time). Using the left mouse drag the selected variables to the right side pane Note that we select the variable Id only from one of the input data tables. Introduction to SAS Enterprise Guide The query builder will now display the merged file: You may examine at the Proc SQL code by clicking on the Preview button. The code appears on a separate window that you may close after having a look. Recall that we may create new variables when we create datasets in a SAS Program. We now learn how to create a new variable using the Query Builder… Introduction to SAS Enterprise Guide Recall that the Query Builder wizard is still open. Click on the Computed Columns button (on the top panel on the left) and the following dialog box opens: Click on the New button and the following New Computed Column wizard opens. Check the last option Advanced Expression and click Next> This will take you to page 2 of the wizard, shown on the next slide. Introduction to SAS Enterprise Guide In this window enter the expression 10000.0*Mass/Height**2 in the Enter an expression pane. In constructing the expression, you may use any of the symbols displayed below this pane. Also you may click open the Functions list and double-click on any function available there to insert it anywhere in the expression. Click Next> to go to page 3 and enter new values for Column Name and Label. Then click on Change… to select an appropriate Format. This is done on the Formats wizard that pops up, as shown on the next slide. Introduction to SAS Enterprise Guide For BMI, and appropriate format would be a numeric format like w.d or bestw.d. Using the Formats dialog box we set Best5.1 as the format, as shown above. Click Ok and back on page 3, click Finish. Now we see that we have a new computed variable. Introduction to SAS Enterprise Guide We can create as many as we wish but this will do for us here. Click the Close button on the Computed Columns box (shown above). Now you will see the new variable in Query Builder: If we wish, we may change/add attributes of any of the other columns. Select the Column Name and clicking on the third button on the right margin (Properties button: ). This opens a Properties for … dialog that allows you to specify labels, formats etc., for the selected column. Click the Run button and you will see a display of the combined data table and an updated Process Flow diagram: Introduction to SAS Enterprise Guide Introduction to SAS Enterprise Guide If we check the output data we will see that the two data sets are correctly merged although the observations in the two input data sets are in different order of values for Id. If the tables to be joined do not have columns that have the same name and data type, a warning is displayed requesting that a manual join be performed. We will illustrate manual join later. We can perform any statistical analysis on the merged data set that we wish using the Tasks list. For example some summary statistics can be produced using Tasks Describe Summary Statistics Wizard that we already used earlier in Part A today. Instead, let us perform a two sample t-test this time to test if the BMI means are different for the males and female populations. Select the merged data set object on the Process Flow diagram and select Tasks ANOVA t Test (see next slide for a screen shot of the result of this action). On the resulting t Test task dialog check the Two Sample option. You may choose any of the options on the left pane to continue the analysis. Introduction to SAS Enterprise Guide Now select Data and drag Gender from the left pane onto Classification variable on the right pane and similarly drag BMI to be the Analysis variable. Select Plots and check Histogram and Normal quantile-quantile (Q-Q) plot. Click Run Introduction to SAS Enterprise Guide Introduction to SAS Enterprise Guide You can see the SAS Report produced from this analysis. The current Process Flow diagram is: The user is able to customize the output produced. Right click on the t Test task object (on the Process Flow) and select Properties This opens up a dialog box where you first select Results and then the button Customize results…. (see next slide for a screen shot) Under Customize results…., check the boxes that correspond to output formats you want to request; here we checked SAS Report and RTF. Click Ok. Right click the t Test task object and select >Run. Answer “yes” to replace the old output (on the Process Flow diagram) with those resulting from this run. Introduction to SAS Enterprise Guide A screen shot of the modified Process Flow. Selecting properties of t Test output Use the tabs to peruse the different outputs. Introduction to SAS Enterprise Guide Select the merged data set object on the Process Flow diagram and select Tasks Regression Linear Regression. This will open Linear Regression task dialog. Drag BMI and drop it as the Dependent Variable on the right pane. Select all Anthro variables (except Id, Mass, and Height) and drag them to the right pane and drop to the Explanatory variables role as shown. Add the the personal data variable Age as an additional explanatory variable. Introduction to SAS Enterprise Guide Select Model from the list in the upper left panel. See next slide for a screen shot. From the Model selection method drop list, select R-squared selection method. Check some Model fit statistics in the lower middle pane, as shown. These statistics will be computed for each selected model. Introduction to SAS Enterprise Guide Select Statistics from the list in the upper left panel. Check a few items to be output such as confidence intervals for the parameters and variance inflation factors (vif’s). Introduction to SAS Enterprise Guide Select Plots from the list in the upper left panel. Select some Custom plots by checking items from the list provided, as shown above. Introduction to SAS Enterprise Guide Next click Run. The Linear Regression task is executed. We can peruse the output produced. Notice that the plots output are for the full model. What we notice is that too many models are produced by the R-Square method. Is there an option to limit the number of models considered? Under the Linear Regression task, click on Modify Task and check for such an option. Since there isn’t one so we need to modify the actual SAS code to make this work. Click the Code tab under the Linear Regression task (see under the top menus). Scroll down to the PROC REG step in the SAS program. Add the following line after the SELECTION=RQUARE line: START=3 STOP=5 BEST=3 (Note the code completion when you type!) Note: In the current version of EG this will produce a separate SAS program called Code for Linear Regression that you can modify. This then can be run as a separate task. Introduction to SAS Enterprise Guide Click Run on top of the task window and check the output. We get a more manageable output. Introduction to SAS Enterprise Guide The final Process Flow diagram