Chapter 5 Using SAS® ETL Studio Section 5.1 SAS ETL Studio Overview What Is SAS ETL Studio? SAS ETL Studio, a Java application, is a visual design tool that helps organizations quickly build, implement, and manage ETL processes from source to destination, regardless of the data sources or platforms. Users can standardize metadata across the organization and perform in-depth transformations with minimal programming or manual work to meet enterprise data integration requirements and to support business and analytic intelligence. 3 What Is SAS ETL Studio? SAS ETL Studio enables you to perform the following tasks: the Extraction of data from operational data stores the Transformation of this data the Loading of the extracted data into your data warehouse or data mart. 4 What Is SAS ETL Studio? SAS ETL Studio is an application that enables you to manage ETL process flows by allowing: specification of metadata for sources, such as tables in an operational system specification of metadata for targets – the tables and other data stores in a data warehouse creation of jobs that specify how data is extracted, transformed, and loaded from a source to a target. 5 SAS ETL Studio: Change Management In SAS ETL Studio, the change management facility enables multiple SAS ETL Studio users to work with the same metadata repository at the same time without overwriting each other’s changes. 6 SAS ETL Studio: Data Surveyor Wizards Optional Data Surveyor wizards can be licensed that provide access to the metadata in enterprise applications, such as PeopleSoft SAP R/3 Siebel Oracle Applications. 7 SAS ETL Studio: Metadata CWM Compliant The metadata maintained by SAS ETL Studio is CWM (Common Warehouse Metamodel) compliant and portable to other CWM-compliant applications. Likewise, metadata from other CWM-compliant applications (that is, data modeling tools) can be imported easily into SAS ETL Studio. 8 SAS ETL Studio: Data Quality SAS ETL Studio is fully integrated with the data quality software from DataFlux Corporation. Both products now use the same Quality Knowledge Base (QKB), which contains rules, routines, and schemes necessary to integrate data quality into the ETL process. 9 Extending SAS ETL Studio Functionality The SAS ETL Studio functionality is extended by Java plug-ins packaged with the product. Further extensions can be implemented by writing additional plug-ins (Java programming required) using the Transformation Generator Wizard (no Java programming required). 10 Server Connections and SAS ETL Studio As a client, SAS ETL Studio must connect to a SAS Metadata Server to read or write metadata. It must connect to other servers to run SAS code, connect to a third-party database management system, or to perform other tasks. 11 Interaction with SAS Application Servers SAS ETL Studio can use different types of application servers: SAS Metadata Server Required to read and write metadata in a SAS metadata repository. SAS Workspace Server Required to execute SAS code and access data. SAS/CONNECT Server 12 Required to submit generated SAS code to machines that are remote to the default SAS application server. ... Section 5.2 The SAS ETL Studio Interface SAS ETL Studio: The Interface SAS ETL Studio is a Java client developed to control the ETL process. The interface has several “ease-of-use” features including copy and paste in any text field multiple windows can be open at one time (including multiple process flow diagrams) Windows look and feel wizard-driven interfaces. 14 Tools, Menus, and Online Help SAS ETL Studio takes full advantage of toolbars and pulldown menus. The icons available on the toolbar depend on which window is active from within the interface. Menus and Tools 15 The Shortcut Bar One of the most significant features of SAS ETL Studio is the new process-driven functionality. Processes are available via a Shortcut bar on the far left side of the main SAS ETL Studio window. 16 Shortcut Bar The Shortcut Bar The Shortcut bar is populated with icons for each task an ETL user would typically perform, including: Source Designer defines metadata about the source(s) for a process. Metadata Importer imports metadata from other applications. Metadata Exporter exports metadata to be used by other applications. Process Designer defines metadata about the ETL processes. continued... 17 ... The Shortcut Bar 18 Target Designer defines metadata about the target table(s) to be created by the process. Options provides numerous options for the SAS ETL Studio user to customize the look and feel of the application. ... Tree View The SAS ETL Studio Tree View enables you to view the metadata associated with the current metadata repository display different views or “trees” Tree View of the current repository. 19 Tree View There are several tabs available in the tree view area: Inventory Tree lists the metadata objects in the default metadata repository (and any dependant repositories), organized by predetermined groupings. continued... 20 ... Tree View Custom Tree lists the metadata objects in the default metadata repository (and any dependant repositories), organized by user-defined groupings of objects. continued... 21 ... Tree View Process Library Tree 22 lists the available data transformations to be used in the ETL process. ... Process Library Tree The Process Library tree displays a collection of transformation templates. There are four collections (folders) of templates that are provided with SAS ETL Studio: Analysis Data Transforms Output Publish. 23 Process Designer View The Process Designer window is the workspace for building ETL processes. The Process Designer view appears as a final step in the Process Designer wizard. Once the process is defined, the Process Designer view is populated with icons that represent the chosen processes. The Process Designer window can be used to view SQL source code review the SAS log (from submitting jobs) view the resulting output from running a SAS job. 24 Process Designer and Overview Windows Process Designer View Overview window 25 ... Overview Window The Overview window shows you the complete process from the process view. From within the Overview window, you can control which part of the process is displayed in the Process View window. 26 SAS ETL Studio Wizards There are shortcuts which invoke wizards that aid the user in performing various tasks with SAS ETL Studio. Some of these wizards are Source Designer Target Designer New Job. 27 Source Designer The Source Designer is a wizard-driven interface that enables you to define the physical layout of existing tables using a data dictionary or metadata information from the source system. The result of running the Source Designer successfully is a metadata registration that describes the data source. 28 Target Designer The Target Designer is a wizard that allows metadata to be entered for a target. In designing the target table, you can access any metadata about any source tables and columns registered in the metadata repository override any metadata that was imported from another source and add new columns to the target table create indexes on the target table being created. 29 Target Designer The person designing the target table has full control over the type of table being built. The types of targets that can be built include database types that are supported by the SAS/ACCESS products SAS data sets (including both data files and data views) SAS/SHARE data sets SPDE tables. 30 New Job Wizard The New Job wizard enables you to define the metadata necessary to run an ETL process to load data into a target or targets. 31 Additional Wizards Other wizards available to provide assistance with various tasks in SAS ETL Studio include Metadata Importer Metadata Exporter Cube Designer Transformation Generator wizard. You can also install optional data surveyor wizards, which provide access to the metadata in enterprise applications, such as PeopleSoft, SAP R/3, Siebel, and Oracle. 32 Options Window The Options window can be used to define standard settings for the SAS ETL Studio interface. There are several tabs in the Options window: General Process Editor Metadata Tree SAS Server Data Quality. 33 Course Case Study Tasks Recall the case study tasks diagram discussed earlier. Each of these tasks involves either reading or writing (or both) metadata. 34 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal SAS ETL Studio Case Study Tasks SAS ETL Studio will concentrate on the following four tasks: 35 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal SAS ETL Studio Case Study These tasks will be performed in sequence: 1. Define Data Libraries (+) 2. Define Source Tables Metadata 36 3. Define Target Tables Metadata 4. Define and Run Jobs SAS ETL Studio Case Study – Setup Tasks 1. Define Data Libraries (+) Build Custom Tree Groupings Libraries 2. Define Source Tables Metadata 3. Jobs Source Tables Target Tables Define Target Tables Metadata 4. Define and Run Jobs Exercises Define Additional Library Definitions Target Tables Library Source Tables Library 37 Demo Demo Exercises ... SAS ETL Studio Case Study – Define Sources 1. Define Data Libraries (+) 2. Define Source Tables Metadata 3. Define Target Tables Metadata 4. Define and Run Jobs The Source Designer defines metadata for the source tables. Orders Demo Order_Item Exercises Product_List 38 ... SAS ETL Studio Case Study – Define Targets 1. Define Data Libraries (+) 2. Define Source Tables Metadata 3. Define Target Tables Metadata 4. Define and Run Jobs The Target Designer defines metadata for the target tables. OrderFact ProductDim Demo* Exercises * Some derived columns for OrderFact are completed in the exercises. 39 ... SAS ETL Studio Case Study – Define Jobs 1. Define Data Libraries (+) 2. Define Source Tables Metadata 3. 4. Define Target Tables Metadata Define and Run Jobs The Process Designer defines metadata for jobs that contain the process flow diagrams necessary to load the target tables. Populate the OrderFact table 40 Populate the ProductDim table Demo Exercises ... Creating the OrderFact Table The OrderFact table will be created from the Orders and Order_Item tables. Target Table Source Tables 41 ... Creating the OrderFact Table The source tables, Orders and Order_Item, will be combined using the SQL Join transformation. SQL Join The SQL Join will be used to define computed columns. 42 ... Creating the OrderFact Table The table that is the result of the SQL Join will then be loaded into the OrderFact table. Loader 43 ... Creating the ProductDim Table The ProductDim table will be created from the Product_List table. Target Table Source Table 44 ... Creating the ProductDim Table The Extract transformation will be used so that a computed column can be defined. SAS Extract 45 ... Creating the ProductDim Table The results of the Extract transformation will then be loaded into the target table, ProductDim. Loader 46 ... SAS ETL Studio Case Study – Setup Tasks 1. Define Data Libraries (+) Build Custom Tree Groupings Libraries 2. Define Source Tables Metadata 3. Jobs Source Tables Target Tables Define Target Tables Metadata 4. Define and Run Jobs Exercises Define Additional Library Definitions Target Tables Library Source Tables Library 47 Demo Demo Exercises Create a Logical Grouping and Adding a Library Definition This demonstration shows how to define a logical grouping object and create a library definition to store in the new grouping. 48 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Exercises This exercise creates logical grouping elements and defines two SAS libraries. 49 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Using the Source Designer The Source Designer is a wizard that generates metadata for one or more selected tables, based on the physical structure of the table(s) The Source Designer can be used to specify metadata for any existing table, not just tables used as data sources for ETL jobs. 50 Using the Source Designer The Source Designer is an easy to use wizard interface. 51 ... SAS ETL Studio Case Study – Define Sources 1. Define Data Libraries (+) 2. Define Source Tables Metadata 3. Define Target Tables Metadata 4. Define and Run Jobs The Source Designer defines metadata for the source tables. Orders Demo Order_Item Exercises Product_List 52 Add a Source Table Definition This demonstration shows how to add a source table definition for the Orders table. 53 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Exercises These exercises add source table definitions for several source tables. 54 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Using the Target Designer The Target Designer is a wizard that can create new metadata about a single table that might or might not already exist in physical storage. It can also be used to create and edit metadata about an OLAP cube. 55 Using the Target Designer The Target Designer is an easy to use wizard interface. 56 ... SAS ETL Studio Case Study – Define Targets 1. Define Data Libraries (+) 2. Define Source Tables Metadata 3. Define Target Tables Metadata 4. Define and Run Jobs The Target Designer defines metadata for the target tables. OrderFact ProductDim Demo* Exercises * Some derived columns for OrderFact are completed in the exercises. 57 Defining a Target Table This demonstration illustrates defining a target table definition for the OrderFact table. 58 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Exercises These exercises add target table definitions for several tables. 59 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Using the Process Designer The Process Designer invokes the New Job wizard to create metadata about a job. That metadata is used to build a process flow diagram for the job. A job is a metadata object that specifies processes that create output. SAS ETL Studio organizes sources, targets, and transformations into jobs that can be displayed in a process flow diagram. SAS ETL Studio uses each job to generate and/or retrieve SAS code that reads sources and creates targets on a file system. 60 Using the Process Designer The New Job wizard prompts for information that is used to build a template in the Process Designer. 61 ... SAS ETL Studio Case Study – Define Jobs 1. Define Data Libraries (+) 2. Define Source Tables Metadata 3. 4. Define Target Tables Metadata Define and Run Jobs The Process Designer defines metadata for jobs that contain the process flow diagrams necessary to load the target tables. Populate the OrderFact table 62 Populate the ProductDim table Demo Exercises Defining a Job This demonstration shows how to define a job for the OrderFact target table and enter information about the extraction and transformation of data 63 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Exercises This exercise shows how to create a new job and enter information about the extraction and transformation of data. 64 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Loading the Target Tables This demonstration shows how to specify the load process attributes as well as executing and verifying a job. 65 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Exercises This exercise shows how to specify the load process attributes as well as executing and verifying a job. 66 Define Data Libraries Create Stored Processes Register Source Tables View and Analyze Data Define Target Tables Create Information Maps Metadata Create ETL Jobs Create Reports Create OLAP Cubes Use the Information Delivery Portal Section 5.3 Advanced SAS ETL Studio Features (Self-Study) Advanced Features This section introduces several advanced features to review on your own, including: Data Quality Plug-Ins Importing and Exporting Metadata Change Management. 68 Data Quality Plug-Ins SAS ETL Studio contains two data quality transformation templates in the Process Library tree: Create Match Code Used to create a job that creates match codes and cluster numbers for a specified source column and based on a set of criterion. Apply Lookup Used to create a job that standardizes Standardization the values of a source column according to the contents of a specified standardization scheme. These templates increase the value of data through data analysis and data cleansing. 69 ... Data Quality Plug-Ins To use the data quality transformation templates, the SAS Data Quality Server software must be installed a SAS application server must be configured to access a Quality Knowledge Base the Quality Knowledge Base must contain the locales needed to reference data quality jobs. When the prerequisites have been met, the data quality transformations can be dragged into process flow diagrams. 70 Create Match Code Plug-In The Create Match Code plug-in is a tabbed dialog box that reads the Quality Knowledge Base for the specified locale creates match codes based on the user-specified criterion. The match code can then be used to de-duplicate data or join data as part of the transformation step in defining the target table. 71 Apply Lookup Standardization Plug-In The Apply Lookup Standardization plug-in is a tabbed dialog box that reads the Quality Knowledge Base for the specified locale loads all of the available standardization schemes. You can then apply the scheme to one of the source columns as part of the transformation step in defining the target table. 72 Metadata Import Wizard The Metadata Import Wizard is an interface for importing metadata files that are compliant with the Common Warehouse Metamodel (CWM) standard. By using the Import Wizard to import the metadata from a previously defined data model (source tables or target tables), you do not have to enter the metadata for each table individually. You simply reference a location for the model file, which was created by a third-party modeling tool. 73 Which Metadata Can Be Imported? The CWM standard for metadata was developed by Object Management Group (OMG). More information about OMG and the CWM metadata standard can be obtained from: http://www.omg.org More information about Meta Integration Technology, Inc., and the purchase of MIMBs, can be obtained from the following location: http://www.metaintegration.net 74 Metadata Export Wizard The Metadata Export Wizard is an interface for exporting metadata from within SAS ETL Studio to third-party CWM-compliant applications. The user has the ability to specify the path and the file to create from the export of the metadata. Once the user completes the Metadata Exporter wizard, a confirmation window verifies all of the selections the user has made for the export of the metadata. Upon exiting this window, the metadata is written to the external file that was specified in the wizard. 75 Change Management SAS ETL Studio enables you to create metadata objects that define sources, targets, and the transformations that connect them. These objects are saved to one or more metadata repositories. The change management feature (or more specifically, metadata source control) enables multiple SAS ETL Studio users to work with the same metadata repository at the same time without overwriting each other's changes. 76 Change Management Change management features in SAS ETL Studio include: menus that support change management operations such as check out and check in the Inventory tree and the Custom tree for working with metadata that is contained in a change-managed repository the Project tree for working with metadata that is contained in a project repository an audit history for each metadata object. 77 Change Management After an object has been checked out by one person, it is locked so that it cannot be updated by another person until the object has been checked back in. The only people who can change the metadata in a change-managed repository are the person who started the metadata server administrators who have write access to the repository any users who are authorized to use a project repository for the change-managed repository. 78