SAS® 9.1 ETL Studio

advertisement
Chapter 5
Using SAS® ETL Studio
Section 5.1
SAS ETL Studio Overview
What Is SAS ETL Studio?
SAS ETL Studio, a Java application, is a visual design
tool that helps organizations quickly build, implement,
and manage ETL processes from source to destination,
regardless of the data sources or platforms.
Users can standardize metadata across the organization
and perform in-depth transformations with minimal
programming or manual work to meet enterprise data
integration requirements and to support business and
analytic intelligence.
3
What Is SAS ETL Studio?
SAS ETL Studio enables you to perform the following
tasks:
 the Extraction of data from operational data stores
 the Transformation of this data
 the Loading of the extracted data into your data
warehouse or data mart.
4
What Is SAS ETL Studio?
SAS ETL Studio is an application that enables you to
manage ETL process flows by allowing:
 specification of metadata for sources, such as tables
in an operational system
 specification of metadata for targets – the tables and
other data stores in a data warehouse
 creation of jobs that specify how data is extracted,
transformed, and loaded from a source to a target.
5
SAS ETL Studio: Change Management
In SAS ETL Studio, the change management facility
enables multiple SAS ETL Studio users to work with the
same metadata repository at the same time  without
overwriting each other’s changes.
6
SAS ETL Studio: Data Surveyor Wizards
Optional Data Surveyor wizards can be licensed that
provide access to the metadata in enterprise applications,
such as
 PeopleSoft
 SAP R/3
 Siebel
 Oracle Applications.
7
SAS ETL Studio: Metadata CWM Compliant
The metadata maintained by SAS ETL Studio is CWM
(Common Warehouse Metamodel) compliant and portable
to other CWM-compliant applications. Likewise, metadata
from other CWM-compliant applications (that is, data
modeling tools) can be imported easily into SAS ETL
Studio.
8
SAS ETL Studio: Data Quality
SAS ETL Studio is fully integrated with the data quality
software from DataFlux Corporation. Both products now
use the same Quality Knowledge Base (QKB), which
contains rules, routines, and schemes necessary to
integrate data quality into the ETL process.
9
Extending SAS ETL Studio Functionality
The SAS ETL Studio functionality is extended by Java
plug-ins packaged with the product.
Further extensions can be implemented by
 writing additional plug-ins
(Java programming required)
 using the Transformation Generator Wizard
(no Java programming required).
10
Server Connections and SAS ETL Studio
As a client, SAS ETL Studio must connect to a SAS
Metadata Server to read or write metadata. It must
connect to other servers to run SAS code, connect to a
third-party database management system, or to perform
other tasks.
11
Interaction with SAS Application Servers
SAS ETL Studio can use different types of application
servers:
SAS Metadata Server
Required to read and write
metadata in a SAS metadata
repository.
SAS Workspace Server Required to execute SAS code
and access data.
SAS/CONNECT Server
12
Required to submit generated
SAS code to machines that are
remote to the default SAS
application server.
...
Section 5.2
The SAS ETL Studio Interface
SAS ETL Studio: The Interface
SAS ETL Studio is a Java client developed to control the
ETL process. The interface has several “ease-of-use”
features including
 copy and paste in any text field
 multiple windows can be open at one time (including
multiple process flow diagrams)
 Windows look and feel
 wizard-driven interfaces.
14
Tools, Menus, and Online Help
SAS ETL Studio takes full advantage of toolbars and pulldown menus. The icons available on the toolbar depend
on which window is active from within the interface.
Menus and
Tools
15
The Shortcut Bar
One of the most significant features of SAS ETL Studio
is the new
process-driven
functionality.
Processes are
available via a
Shortcut bar on
the far left side
of the main
SAS ETL Studio
window.
16
Shortcut Bar
The Shortcut Bar
The Shortcut bar is populated with icons for each task an
ETL user would typically perform, including:
Source Designer
defines metadata about the
source(s) for a process.
Metadata Importer
imports metadata from other
applications.
Metadata Exporter
exports metadata to be used by
other applications.
Process Designer
defines metadata about the ETL
processes.
continued...
17
...
The Shortcut Bar
18
Target Designer
defines metadata about the
target table(s) to be created by
the process.
Options
provides numerous options for
the SAS ETL Studio user to
customize the look and feel of the
application.
...
Tree View
The SAS ETL Studio Tree View enables you to
 view the metadata
associated with
the current
metadata
repository
 display different
views or “trees”
Tree View
of the current
repository.
19
Tree View
There are several tabs available in the tree view area:
Inventory Tree
lists the metadata objects in the
default metadata repository (and
any dependant repositories),
organized by predetermined
groupings.
continued...
20
...
Tree View
Custom Tree
lists the metadata objects in the
default metadata repository
(and any dependant
repositories), organized by
user-defined groupings of
objects.
continued...
21
...
Tree View
Process Library Tree
22
lists the available data
transformations to be used in
the ETL process.
...
Process Library Tree
The Process Library tree displays a collection of
transformation templates.
There are four collections (folders) of templates that are
provided with SAS ETL Studio:
 Analysis
 Data Transforms
 Output
 Publish.
23
Process Designer View
The Process Designer window is the workspace for
building ETL processes. The Process Designer view
appears as a final step in the Process Designer wizard.
Once the process is defined, the Process Designer view
is populated with icons that represent the chosen
processes.
The Process Designer window can be used to
 view SQL source code
 review the SAS log (from submitting jobs)
 view the resulting output from running a SAS job.
24
Process Designer and Overview Windows
Process
Designer
View
Overview
window
25
...
Overview Window
The Overview window shows you the complete process
from the process view.
From within the Overview window, you can control which
part of the process is displayed in the Process View
window.
26
SAS ETL Studio Wizards
There are shortcuts which invoke wizards that aid the
user in performing various tasks with SAS ETL Studio.
Some of these wizards are
 Source Designer
 Target Designer
 New Job.
27
Source Designer
The Source Designer is a wizard-driven
interface that enables you to define the
physical layout of existing tables using a
data dictionary or metadata information
from the source system.
The result of running the Source Designer
successfully is a metadata registration
that describes the data source.
28
Target Designer
The Target Designer is a wizard that
allows metadata to be entered for a target.
In designing the target table, you can
 access any metadata about any
source tables and columns registered
in the metadata repository
 override any metadata that was
imported from another source and add
new columns to the target table
 create indexes on the target table
being created.
29
Target Designer
The person designing the target table has full control over
the type of table being built.
The types of targets that can be built include
 database types that are supported by the
SAS/ACCESS products
 SAS data sets (including both data files and data views)
 SAS/SHARE data sets
 SPDE tables.
30
New Job Wizard
The New Job wizard enables you to
define the metadata necessary to run an
ETL process to load data into a target or
targets.
31
Additional Wizards
Other wizards available to provide assistance with various
tasks in SAS ETL Studio include
 Metadata Importer
 Metadata Exporter
 Cube Designer
 Transformation Generator wizard.
You can also install optional data surveyor wizards, which
provide access to the metadata in enterprise applications,
such as PeopleSoft, SAP R/3, Siebel, and Oracle.
32
Options Window
The Options window can be used to define standard
settings for the SAS ETL Studio interface.
There are several tabs in the Options window:
 General
 Process
 Editor
 Metadata Tree
 SAS Server
 Data Quality.
33
Course Case Study Tasks
Recall the case study tasks diagram discussed earlier.
Each of these tasks involves either reading or writing (or
both) metadata.
34
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
SAS ETL Studio Case Study Tasks
SAS ETL Studio will concentrate on the following four
tasks:
35
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
SAS ETL Studio Case Study
These tasks will be performed in sequence:
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
36
3.
Define Target
Tables Metadata
4.
Define and
Run Jobs
SAS ETL Studio Case Study – Setup Tasks
1.
Define
Data Libraries (+)

Build Custom Tree Groupings
Libraries
2. Define Source
Tables Metadata
3.
Jobs
Source Tables
Target Tables
Define Target
Tables Metadata

4.
Define and
Run Jobs
Exercises
Define Additional Library Definitions
Target Tables Library
Source Tables Library
37
Demo
Demo
Exercises
...
SAS ETL Studio Case Study – Define Sources
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
3.
Define Target
Tables Metadata
4.
Define and
Run Jobs
The Source Designer defines
metadata for the source tables.
Orders
Demo
Order_Item
Exercises
Product_List
38
...
SAS ETL Studio Case Study – Define Targets
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
3.
Define Target
Tables Metadata
4.
Define and
Run Jobs
The Target Designer defines
metadata for the target tables.
OrderFact
ProductDim
Demo*
Exercises
* Some derived columns for OrderFact
are completed in the exercises.
39
...
SAS ETL Studio Case Study – Define Jobs
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
3.
4.
Define Target
Tables Metadata
Define and
Run Jobs
The Process Designer defines
metadata for jobs that contain the
process flow diagrams necessary
to load the target tables.
Populate the
OrderFact table
40
Populate the
ProductDim table
Demo
Exercises
...
Creating the OrderFact Table
The OrderFact table will be created from the Orders and
Order_Item tables.
Target Table
Source Tables
41
...
Creating the OrderFact Table
The source tables, Orders and Order_Item, will be
combined using the SQL Join transformation.
SQL Join
 The SQL Join will be
used to define computed
columns.
42
...
Creating the OrderFact Table
The table that is the result of the SQL Join will then be
loaded into the OrderFact table.
Loader
43
...
Creating the ProductDim Table
The ProductDim table will be created from the
Product_List table.
Target Table
Source Table
44
...
Creating the ProductDim Table
The Extract transformation will be used so that a
computed column can be defined.
SAS Extract
45
...
Creating the ProductDim Table
The results of the Extract transformation will then be
loaded into the target table, ProductDim.
Loader
46
...
SAS ETL Studio Case Study – Setup Tasks
1.
Define
Data Libraries (+)

Build Custom Tree Groupings
Libraries
2. Define Source
Tables Metadata
3.
Jobs
Source Tables
Target Tables
Define Target
Tables Metadata

4.
Define and
Run Jobs
Exercises
Define Additional Library Definitions
Target Tables Library
Source Tables Library
47
Demo
Demo
Exercises
Create a Logical Grouping and
Adding a Library Definition
This demonstration shows how to define a logical
grouping object and create a library definition to
store in the new grouping.
48
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Exercises
This exercise creates logical grouping elements and
defines two SAS libraries.
49
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Using the Source Designer
The Source Designer is a wizard that generates metadata
for one or more selected tables, based on the physical
structure of the table(s)
The Source Designer can be used to specify metadata for
any existing table, not just tables used as data sources for
ETL jobs.
50
Using the Source Designer
The Source Designer is an easy to use wizard interface.
51
...
SAS ETL Studio Case Study – Define Sources
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
3.
Define Target
Tables Metadata
4.
Define and
Run Jobs
The Source Designer defines
metadata for the source tables.
Orders
Demo
Order_Item
Exercises
Product_List
52
Add a Source Table Definition
This demonstration shows how to add a source
table definition for the Orders table.
53
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Exercises
These exercises add source table definitions for
several source tables.
54
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Using the Target Designer
The Target Designer is a wizard that can create new
metadata about a single table that might or might not
already exist in physical storage.
It can also be used to create and edit metadata about an
OLAP cube.
55
Using the Target Designer
The Target Designer is an easy to use wizard interface.
56
...
SAS ETL Studio Case Study – Define Targets
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
3.
Define Target
Tables Metadata
4.
Define and
Run Jobs
The Target Designer defines
metadata for the target tables.
OrderFact
ProductDim
Demo*
Exercises
* Some derived columns for OrderFact
are completed in the exercises.
57
Defining a Target Table
This demonstration illustrates defining a target table
definition for the OrderFact table.
58
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Exercises
These exercises add target table definitions for
several tables.
59
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Using the Process Designer
The Process Designer invokes the New Job wizard to
create metadata about a job. That metadata is used to
build a process flow diagram for the job.
A job is a metadata object that specifies processes that
create output. SAS ETL Studio organizes sources,
targets, and transformations into jobs that can be
displayed in a process flow diagram.
SAS ETL Studio uses each job to generate and/or retrieve
SAS code that reads sources and creates targets on a file
system.
60
Using the Process Designer
The New Job wizard prompts for information that is used
to build a template in the Process Designer.
61
...
SAS ETL Studio Case Study – Define Jobs
1.
Define
Data Libraries (+)
2. Define Source
Tables Metadata
3.
4.
Define Target
Tables Metadata
Define and
Run Jobs
The Process Designer defines
metadata for jobs that contain the
process flow diagrams necessary
to load the target tables.
Populate the
OrderFact table
62
Populate the
ProductDim table
Demo
Exercises
Defining a Job
This demonstration shows how to define a job for the
OrderFact target table and enter information about
the extraction and transformation of data
63
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Exercises
This exercise shows how to create a new job and enter
information about the extraction and transformation of
data.
64
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Loading the Target Tables
This demonstration shows how to specify the load
process attributes as well as executing and verifying
a job.
65
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Exercises
This exercise shows how to specify the load process
attributes as well as executing and verifying
a job.
66
Define
Data Libraries
Create
Stored Processes
Register
Source Tables
View and Analyze
Data
Define
Target Tables
Create
Information Maps
Metadata
Create
ETL Jobs
Create Reports
Create
OLAP Cubes
Use the Information
Delivery Portal
Section 5.3
Advanced SAS ETL Studio
Features (Self-Study)
Advanced Features
This section introduces several advanced features to
review on your own, including:
 Data Quality Plug-Ins
 Importing and Exporting Metadata
 Change Management.
68
Data Quality Plug-Ins
SAS ETL Studio contains two data quality transformation
templates in the Process Library tree:
Create Match
Code
Used to create a job that creates match
codes and cluster numbers for a
specified source column and based on
a set of criterion.
Apply Lookup Used to create a job that standardizes
Standardization the values of a source column
according to the contents of a specified
standardization scheme.
These templates increase the value of data through data
analysis and data cleansing.
69
...
Data Quality Plug-Ins
To use the data quality transformation templates,
 the SAS Data Quality Server software must be installed
 a SAS application server must be configured to access
a Quality Knowledge Base
 the Quality Knowledge Base must contain the locales
needed to reference data quality jobs.
When the prerequisites have been met, the data quality
transformations can be dragged into process flow
diagrams.
70
Create Match Code Plug-In
The Create Match Code plug-in is a tabbed dialog box
that
 reads the Quality Knowledge Base for the specified
locale
 creates match codes based on the user-specified
criterion.
The match code can then be used to de-duplicate data or
join data as part of the transformation step in defining the
target table.
71
Apply Lookup Standardization Plug-In
The Apply Lookup Standardization plug-in is a tabbed
dialog box that
 reads the Quality Knowledge Base for the specified
locale
 loads all of the available standardization schemes.
You can then apply the scheme to one of the source
columns as part of the transformation step in defining the
target table.
72
Metadata Import Wizard
The Metadata Import Wizard is an interface for importing
metadata files that are compliant with the Common
Warehouse Metamodel (CWM) standard.
By using the Import Wizard to import the metadata from
a previously defined data model (source tables or target
tables), you do not have to enter the metadata for each
table individually. You simply reference a location for the
model file, which was created by a third-party modeling tool.
73
Which Metadata Can Be Imported?
The CWM standard for metadata
was developed by Object
Management Group (OMG).
More information about OMG and the CWM metadata
standard can be obtained from: http://www.omg.org
More information about Meta Integration Technology, Inc.,
and the purchase of MIMBs, can be obtained from the
following location: http://www.metaintegration.net
74
Metadata Export Wizard
The Metadata Export Wizard is an interface for exporting metadata
from within SAS ETL Studio to third-party CWM-compliant
applications.
The user has the ability to specify the path and the file to create from
the export of the metadata.
Once the user completes the Metadata Exporter wizard, a
confirmation window verifies all of the selections the user has made
for the export of the metadata. Upon exiting this window, the
metadata is written to the external file that was specified in the
wizard.
75
Change Management
SAS ETL Studio enables you to create metadata objects
that define sources, targets, and the transformations that
connect them. These objects are saved to one or more
metadata repositories.
The change management feature (or more specifically,
metadata source control) enables multiple SAS ETL
Studio users to work with the same metadata repository at
the same time without overwriting each other's changes.
76
Change Management
Change management features in SAS ETL Studio include:
 menus that support change management operations
such as check out and check in
 the Inventory tree and the Custom tree for working
with metadata that is contained in a change-managed
repository
 the Project tree for working with metadata that is
contained in a project repository
 an audit history for each metadata object.
77
Change Management
After an object has been checked out by one person, it
is locked so that it cannot be updated by another person
until the object has been checked back in.
The only people who can change the metadata in a
change-managed repository are
 the person who started the metadata server
 administrators who have write access to the repository
 any users who are authorized to use a project repository
for the change-managed repository.
78
Download