SAS Workshop Introduction to Enterprise Guide Day 3 - Morning Session IOWA STATE UNIVERSITY May 11, 2016 Presented by: Mervyn Marasinghe Introduction to SAS Enterprise Guide Start SAS Enterprise Guide on a PC from the Start menu, or by double-clicking on the Enterprise Guide icon on the desktop. (Since you are a first time user, you may close the welcome window requesting you to open an existing SAS program or a project) The Project Tree displays a hierarchical view of the active project. You can delete, rename, and reorder the items in the project using this window. This is the workspace area. The Process Flow window is the only window open in this area at the start. The Process Flow displays the objects in your project and the relationships between them. If you click on Servers, you can find data on your local SAS server or on a remote SAS server (if you have access to one). When additional windows are produced in the workspace area, as a result of running a process, they may be opened using tabs that will appear at the top of this window. Introduction to SAS Enterprise Guide The default layout which windows consists of the Project Tree, the Server List and the workspace area. The workspace area is the main area of the SAS EG application and is used to display, for example, your data, task results and process flows. At first, the Process Flow is the only window that is open in the workspace area. When you generate reports or open data, other windows open in the workspace with a tabbed interface. The work that you do in the workspace is saved in projects. A project is a collection of objects: data, tasks, and results. You can save a project and its contents to any location by selecting File Save Project As… Project file is saved with the extension .egp. The Project Tree displays a hierarchical view of the active project. Let us execute an already written SAS program (progD1.sas) in EG. Introduction to SAS Enterprise Guide Click on Program Open Program and select progD1.sas from your folder. We will see the program in the EG workspace: Notice that the tab on top of the workspace has changed from Process Flow to progD1. Execute the program by selecting the Run tab. Introduction to SAS Enterprise Guide The resulting log and output(s) produces new tabs in your workspace window. The program also produces the SAS dataset named biology. The output is produced as a SAS report in ODS format and as a pdf file (this is the default). Examine each of these by selecting each of the tabs. Introduction to SAS Enterprise Guide Click on the down arrow near the progD1 tab on top of the workspace. You will see the Process Flow item drop down. Select this to open the Process Flow window. This is a simple example of a Process Flow diagram. Here we see a SAS program run to produce two output objects as the results: the SAS dataset and the SAS Report We do not yet see a task object as it is hidden inside the SAS program. Next we shall repeat the same process but request that the output be also directed to a pdf file. Introduction to SAS Enterprise Guide To do this, select Tools Options… then, on the left panel of the resulting dialog box, click on the Results menu item. Then check the PDF box on the right. Now re-open the process flow diagram and click on Run Process Flow (menu item found under the Run tab. The entire process will be re-executed and the output replaced. Additionally, the Results-PDF tab now appears on the progD1 window. Open the Process Flow window as before using the progD1 tab on top of the workspace to view the new process flow chart with the PDF output object. Introduction to SAS Enterprise Guide Next we will read the biology data set from a text file directly and do a complete statistical analysis within EG (as a new process). Save the current project and exit. Start up EG. First let us import the text file and create a SAS data set needed for the analysis. In EG, click on File Import Data, navigate to your folder on the Desktop and double-click on the biology.txt file. The import data wizard shown below opens: Introduction to SAS Enterprise Guide Page through the wizard using the Next> button at the bottom of page. On page 2, select the Fixed columns button and mark the column breaks by clicking on the ruler. This is the easiest method to access a fixed width data file. As this file does not contain field names in the first line of data go to the next page using Next>. Introduction to SAS Enterprise Guide Defining field attributes is the most important task in importing data. Here you get to assign names, informats, formats, and labels to your SAS variables. For example, click on Modify… button (below right) and set these values for Field 1 (F1): Introduction to SAS Enterprise Guide Name and Label are the only attributes that need to be reset. Use defaults for others. For example, we could change the Input Format attribute for ID to 4., but it will make no difference. Press OK. Next select Field 2 (F2) and click on Modify. Next screen shot shows the defining of values for Field 2. We changed toe informat to $4. Using the drop down menu … Introduction to SAS Enterprise Guide We continue until attributes for all 6 fields are defined. Note that F7 is omitted from included in the SAS data set, by unchecking the corresponding Inc box on the left. Introduction to SAS Enterprise Guide We exit from the Import data wizard. Note that the current Process Flow is as displayed below where the Import Data task object in the middle. Right click on the Data Imported object and select Open. The SAS data set is displayed: Introduction to SAS Enterprise Guide Now we will use several tasks on this data set. First let us obtain a simple SAS Report of the data. Go back to the Process Flow and click the data object (to make it active) and select Tasks Describe List Data… from the menu bar. The following dialog box opens: Introduction to SAS Enterprise Guide We use this dialog to assign roles to each variable. Here we want all of the variables to appear in the SAS Report. We can drag and drop each variable from the left box to the right box (under the empty space below List variables role), one at a time. Or shift-click on first and last variable names on the left to select them all, and drag them under List variables on the right pane. Click Run button at the bottom. Note that there are other options that are available on the panel on the left margin. Introduction to SAS Enterprise Guide The process flow is updated with List Data object that produces two output objects (by default). Double-click on either object to view the results. Or select List Data under process flow menu and use tabs on top of the window: Introduction to SAS Enterprise Guide Let us perform a statistical task on the same SAS data set. Make the data object active and select Tasks Describe Summary Statistics Wizard... On page 2 of the wizard, we get to assign variables to roles, as shown below Just hi-lite variables (on the left panel) and use arrows to move them into panels on the right) or use drag-and-drop. We use Next> to go to page 3 of the wizard where we select statistics and some graphics to be output. Introduction to SAS Enterprise Guide We select the Histogram in addition to the summary statistics and modify the default statistics to be output using the Edit button on the top right corner. Opens an Edit Statistics window: We select the Median from the Percentile tab, to be added to the list of summary statistics to be computed. Click Ok, then Next> provide a title and click Finish. Introduction to SAS Enterprise Guide Again the process flow is updated with Summary Statistics task object that produces a SAS report and a pdf file. Double-click on the SAS report object to see the output: Introduction to SAS Enterprise Guide The final process flow diagram is: Any portion of the process flow may be re-executed. For example, notice that there are too many decimal places for computed statistics printed in the summary table. Introduction to SAS Enterprise Guide We could modify the Summary Statistics task after changing the number of decimal places printed for all computed statistics to, say 2, and then re-executing the task. Right click on the Summary Statistics and select Modify Summary Statistics The Summary Statistics wizard appears. Go to page 3 using Next> repeatedly. Using the Edit button on the top right corner and open the Edit Statistics window. Under the Basic tab, change the value of Decimal places to 2 (as shown on the next slide). Introduction to SAS Enterprise Guide Click Ok, then Finish. Select yes to the next prompt (i.e., replace previous results). This branch of the tree is then re-run automatically. The outputs will be re-produced replacing previous results. The statistics table resulting from this modification is shown below. If the entire process flow is re-run, the originally imported data files (such as text or Excel files) will be re-accessed. So if they were moved or renamed the data set will not be created. Introduction to SAS Enterprise Guide Also most SAS data sets created are temporary. So if you save a process for running it at a later time, make sure that you re-create those SAS data sets by atleast rerunning those nodes. Any part of a Process Flow diagram may be re-run by highlighting the parts to be re-run and using the Run button. A query is a request to retrieve data from one or more data sources and perform tasks such as joining multiple data sets, grouping, classifying, sub-setting, or sorting data. A query might also require that new variable(s) to be created and added to a data set. The results of queries can then be analyzed to answer statistical questions. In EG, we create queries using the Query Builder (a topic to be discussed in the next EG Session). This is a task available at Tasks Data Query Builder…