Informatica PowerMart / PowerCenter 7 Basics Education Services PC6B-20030512 Informatica Corporation, 2003. All rights reserved. Course Objectives At the end of this course you will: Understand how to use all major PowerCenter 7 components Be able to perform basic Repository administration tasks Be able to build basic ETL Mappings and Mapplets Be able to create, run and monitor Workflows Understand available options for loading target data Be able to troubleshoot most problems 2 Extract, Transform, and Load Operational Systems RDBMS Mainframe Decision Support Data Warehouse Other Aggregate Data • Transaction level data Cleanse Data • Optimized for Transaction Consolidate Data Response Time Apply Business Rules De-normalize • Current • Normalized or DeTransform Normalized data Extract ETL • Aggregated data • Historical Load 3 PowerCenter Architecture 4 PowerCenter 7 Architecture Server native native Targets Sources TCP/IP Heterogeneous Targets Repository Server TCP/IP Heterogeneous Targets Repository Agent native Repository Designer Workflow Workflow Rep Server Manager Manager Monitor Administrative Console Not Shown: Client ODBC Connections for Source and Target metadata Repository 5 PowerCenter 7 Components PowerCenter Repository PowerCenter Repository Server PowerCenter Client • Designer • Repository Manager • Repository Server Administration Console • Workflow Manager • Workflow Monitor PowerCenter Server External Components • Sources • Targets 6 Repository Topics By the end of this section you will be familiar with: The purpose of the Repository Server and Agent The Repository Server Administration Console GUI interface The Repository Manager GUI interface Repository maintenance operations Security and privileges Object sharing, searching and locking Metadata Extensions 7 Repository Server Each Repository has an independent architecture for the management of the physical Repository tables Components: one Repository Server, and a Repository Agent for each Repository Server Repository Server Repository Agent Repository Manager Repository Server Administration Console Repository Client overhead for Repository management is greatly reduced by the Repository Server 8 Repository Server Features Manages connections to the Repository from client applications Can manage multiple Repositories on different machines on a network Uses one Repository Agent process to insert, update and fetch objects from the Repository database tables, for each Repository it manages Maintains object consistency by controlling object locking The Repository Server runs on the same system running the Repository Agent 9 Repository Server Administration Console Use Repository Administration console to Administer Repository Servers and Repositories through Repository Server. Following tasks can be performed: Add, Edit and Remove Repository Configurations Export and Import Repository Configurations Create a Repository *Promote a local Repository to a Global Repository Copy a Repository Delete a Repository from the Database Backup and Restore a Repository Start, Stop, enable and Disable a Repositories View Repository connections and locks Close Repository connections. Upgrade a Repository 10 Repository Server Administration Console Information Nodes HTML View Console Tree Hypertext Links to Repository Maintenance Tasks 11 Repository Management Perform all Repository maintenance tasks through Repository Server from the Repository Server Administration Console Create the Repository Configuration Select Repository Configuration and perform maintenance tasks: • Create • Notify Users • Delete • Propagate • Backup • Register • Copy from • Restore • Disable • Un-Register • Export Connection • Upgrade • Make Global 12 Repository Manager Use Repository manager to navigate through multiple folders and repositories. Perform following tasks: Manage the Repository • Launch Repository Server Administration Console for this purpose Implement Repository Security • Managing Users and Users Groups Perform folder functions • Create, Edit, Copy and Delete folders View Metadata • Analyze Source, Target, Mappings and Shortcut dependencies. 13 Repository Manager Interface Navigator Window Main Window Dependency Window Output Window 14 Users, Groups and Repository Privileges Steps: Create groups Create users Assign users to groups Assign privileges to groups Assign additional privileges to users (optional) 15 Managing Privileges Check box assignment of privileges 16 Folder Permissions Assign one user as the folder owner for first tier permissions Select one of the owner’s groups for second tier permissions All users and groups in the Repository will be assigned the third tier permissions 17 Object Locking Object Locks preserve Repository integrity Use the Edit menu for Viewing Locks and Unlocking Objects 18 Object Searching (Menu- Analyze – Search) Keyword search • Limited to keywords previously defined in the Repository (via Warehouse Designer) Search all • Filter and search objects 19 Object Sharing Reuse existing objects Enforces consistency Decreases development time Share objects by using copies and shortcuts COPY SHORTCUT Copy object to another folder Link to an object in another folder Changes to original object not captured Duplicates space Dynamically reflects changes to original object Preserves space Copy from shared or unshared folder Created from a shared folder Required security settings for sharing objects: • Repository Privilege: • Originating Folder Permission: • Destination Folder Permissions: Use Designer Read Read/Write 20 Adding Metadata Extensions Allows developers and partners to extend the metadata stored in the Repository Accommodates the following metadata types: • Vendor-defined - Third-party application vendor-created metadata lists • For example, Applications such as Ariba or PowerConnect for Siebel can add information such as contacts, version, etc. • User-defined - PowerCenter/PowerMart users can define and create their own metadata Must have Administrator Repository or Super User Repository privileges 21 Sample Metadata Extensions Sample User Defined Metadata, e.g. - contact information, business user Reusable Metadata Extensions can also be created in the Repository Manager 22 Design Process 1. Create Source definition(s) 2. Create Target definition(s) 3. Create a Mapping 4. Create a Session Task 5. Create a Workflow from Task components 6. Run the Workflow 7. Monitor the Workflow and verify the results 23 Source Object Definitions By the end of this section you will: Be familiar with the Designer GUI interface Be familiar with Source Types Be able to create Source Definitions Understand Source Definition properties Be able to use the Data Preview option 24 Source Analyzer Designer Tools Analyzer Window Navigation Window 25 Methods of Analyzing Sources Repository Import from Database Import from File Import from Cobol File Import from XML file Create manually Relational XML file Source Analyzer Flat file COBOL file 26 Analyzing Relational Sources Source Analyzer Relational Source ODBC Table View Synonym DEF Repository Server TCP/IP Repository Agent native DEF Repository 27 Analyzing Relational Sources Editing Source Definition Properties 28 Analyzing Flat File Sources Source Analyzer Mapped Drive NFS Mount Local Directory Flat File DEF Fixed Width or Delimited Repository Server TCP/IP Repository Agent native DEF Repository 29 Flat File Wizard Three-step wizard Columns can be renamed within wizard Text, Numeric and Datetime datatypes are supported Wizard ‘guesses’ datatype 30 XML Source Analysis Source Analyzer Mapped Drive NFS Mounting Local Directory .DTD File DEF DATA Repository Server TCP/IP Repository Agent native DEF In addition to the DTD file, an XML Schema or XML file can be used as a Source Definition Repository 31 Analyzing VSAM Sources Source Analyzer Mapped Drive NFS Mounting Local Directory Repository Server .CBL File DEF DATA TCP/IP Repository Agent native DEF Supported Numeric Storage Options: COMP, COMP-3, COMP-6 Repository 32 VSAM Source Properties 33 Target Object Definitions By the end of this section you will: Be familiar with Target Definition types Know the supported methods of creating Target Definitions Understand individual Target Definition properties 34 Creating Target Definitions Methods of creating Target Definitions Import from Database Import from an XML file Manual Creation Automatic Creation 35 Automatic Target Creation Drag-anddrop a Source Definition into the Warehouse Designer Workspace 36 Import Definition from Database Can “Reverse engineer” existing object definitions from a database system catalog or data dictionary Warehouse Designer Database ODBC Repository Server DEF TCP/IP Table View Synonym Repository Agent native Repository DEF 37 Manual Target Creation 1. Create empty definition 2. Add desired columns 3. Finished target definition ALT-F can also be used to create a new column 38 Target Definition Properties 39 Target Definition Properties 40 Creating Physical Tables DEF DEF DEF Execute SQL via Designer LOGICAL PHYSICAL Repository target table definitions Target database tables 41 Creating Physical Tables Create tables that do not already exist in target database Connect - connect to the target database Generate SQL file - create DDL in a script file Edit SQL file - modify DDL script as needed Execute SQL file - create physical tables in target database Use Preview Data to verify the results (right mouse click on object) 42 Transformation Concepts By the end of this section you will be familiar with: Transformation types and views Transformation calculation error treatment Null data treatment Informatica data types Expression transformation Expression Editor Informatica Functions Expression validation 43 Transformation Types Informatica PowerCenter 7 provides 23 objects for data transformation Aggregator: performs aggregate calculations Application Source Qualifier: reads Application object sources as ERP Custom: Calls a procedure in shared library or DLL Expression: performs row-level calculations External Procedure (TX): calls compiled code for each row Filter: drops rows conditionally Joiner: joins heterogeneous sources Lookup: looks up values and passes them to other objects Normalizer: reorganizes records from VSAM, Relational and Flat File Rank: limits records to the top or bottom of a range Input: Defines mapplet input rows. Available in Mapplet designer Output: Defines mapplet output rows. Available in Mapplet designer 44 Transformation Types Router: splits rows conditionally Sequence Generator: generates unique ID values Sorter: sorts data Source Qualifier: reads data from Flat File and Relational Sources Stored Procedure: calls a database stored procedure Transaction Control: Defines Commit and Rollback transactions Union: Merges data from different databases Update Strategy: tags rows for insert, update, delete, reject XML Generator: Reads data from one or more Input ports and outputs XML through single output port XML Parser: Reads XML from one or more Input ports and outputs data through single output port XML Source Qualifier: reads XML data 45 Transformation Views A transformation has three views: Iconized - shows the transformation in relation to the rest of the mapping Normal - shows the flow of data through the transformation Edit - shows transformation ports and properties; allows editing 46 Edit Mode Allows users with folder “write” permissions to change or create transformation ports and properties Define port level handling Define transformation level properties Enter comments Make reusable Switch between transformations 47 Expression Transformation Perform calculations using non-aggregate functions (row level) Passive Transformation Connected Ports • Mixed • Variables allowed Create expression in an output or variable port Click here to invoke the Expression Editor Usage • Perform majority of data manipulation 48 Expression Editor An expression formula is a calculation or conditional statement Used in Expression, Aggregator, Rank, Filter, Router, Update Strategy Performs calculation based on ports, functions, operators, variables, literals, constants and return values from other transformations 49 Informatica Functions - Samples ASCII CHR CHRCODE CONCAT INITCAP INSTR LENGTH LOWER LPAD LTRIM RPAD RTRIM SUBSTR UPPER REPLACESTR REPLACECHR Character Functions Used to manipulate character data CHRCODE returns the numeric value (ASCII or Unicode) of the first character of the string passed to this function For backwards compatibility only - use || instead 50 Informatica Functions TO_CHAR (numeric) TO_DATE TO_DECIMAL TO_FLOAT TO_INTEGER TO_NUMBER ADD_TO_DATE DATE_COMPARE DATE_DIFF GET_DATE_PART LAST_DAY ROUND (date) SET_DATE_PART TO_CHAR (date) TRUNC (date) Conversion Functions Used to convert datatypes Date Functions Used to round, truncate, or compare dates; extract one part of a date; or perform arithmetic on a date To pass a string to a date function, first use the TO_DATE function to convert it to an date/time datatype 51 Informatica Functions Numerical Functions ABS CEIL CUME EXP FLOOR LN LOG MOD MOVINGAVG MOVINGSUM POWER ROUND SIGN SQRT TRUNC Used to perform mathematical operations on numeric data Scientific Functions Used to calculate geometric values of numeric data COS COSH SIN SINH TAN TANH 52 Informatica Functions ERROR ABORT DECODE IIF Special Functions Used to handle specific conditions within a session; search for certain values; test conditional statements IIF(Condition,True,False) ISNULL IS_DATE IS_NUMBER IS_SPACES Test Functions Used to test if a lookup result is null Used to validate data SOUNDEX Encoding Functions METAPHONE Used to encode string values 53 Expression Validation The Validate or ‘OK’ button in the Expression Editor will: Parse the current expression • Remote port searching (resolves references to ports in other transformations) Parse transformation attributes • e.g. - filter condition, lookup condition, SQL Query Parse default values Check spelling, correct number of arguments in functions, other syntactical errors 54 Variable Ports Use to simplify complex expressions • e.g. - create and store a depreciation formula to be referenced more than once Use in another variable port or an output port expression Local to the transformation (a variable port cannot also be an input or output port) Available in the Expression, Aggregator and Rank transformations 55 Informatica Data Types NATIVE DATATYPES TRANSFORMATION DATATYPES Specific to the source and target database types Display in source and target tables within Mapping Designer Native PowerMart / PowerCenter internal datatypes based on ANSI SQL-92 Display in transformations within Mapping Designer Transformation Native Transformation datatypes allow mix and match of source and target database types When connecting ports, native and transformation datatypes must be compatible (or must be explicitly converted) 56 Datatype Conversions Integer Decimal Double Char Date Raw Integer Decimal Double Char Date X X X X X X X X X X X X X X X X X X X Raw X All numeric data can be converted to all other numeric datatypes, e.g. - integer, double, and decimal All numeric data can be converted to string, and vice versa Date can be converted only to date and string, and vice versa Raw (binary) can only be linked to raw Other conversions not listed above are not supported These conversions are implicit; no function is necessary 57 Mappings By the end of this section you will be familiar with: Mapping components Source Qualifier transformation Mapping validation Data flow rules System Variables Mapping Parameters and Variables 58 Mapping Designer Transformation Toolbar Mapping List Iconized Mapping 59 Pre-SQL and Post-SQL Rules Can use any command that is valid for the database type; no nested comments Can use Mapping Parameters and Variables in SQL executed against the source Use a semi-colon (;) to separate multiple statements Informatica Server ignores semi-colons within single quotes, double quotes or within /* ...*/ To use a semi-colon outside of quotes or comments, ‘escape’ it with a back slash (\) Workflow Manager does not validate the SQL 60 Data Flow Rules Each Source Qualifier starts a single data stream (a dataflow) Transformations can send rows to more than one transformation (split one data flow into multiple pipelines) Two or more data flows can meet together -- if (and only if) they originate from a common active transformation Cannot add an active transformation into the mix ALLOWED DISALLOWED Active Passive T T T Example holds true with Normalizer in lieu of Source Qualifier. Exceptions are: Mapplet Input and Joiner transformations T 61 Connection Validation Examples of invalid connections in a Mapping: Connecting ports with incompatible datatypes Connecting output ports to a Source Connecting a Source to anything but a Source Qualifier or Normalizer transformation Connecting an output port to an output port or an input port to another input port Connecting more than one active transformation to another transformation (invalid dataflow) 62 Mapping Validation Mappings must: • Be valid for a Session to run • Be end-to-end complete and contain valid expressions • Pass all data flow rules Mappings are always validated when saved; can be validated without being saved Output Window will always display reason for invalidity 63 Workflows By the end of this section, you will be familiar with: The Workflow Manager GUI interface Workflow Schedules Setting up Server Connections Relational, FTP and External Loader Creating and configuring Workflows Workflow properties Workflow components Workflow Tasks 64 Workflow Manager Interface Task Tool Bar Workflow Designer Tools Workspace Navigator Window Output Window Status Bar 65 Workflow Manager Tools Workflow Designer • Maps the execution order and dependencies of Sessions, Tasks and Worklets, for the Informatica Server Task Developer • Create Session, Shell Command and Email tasks • Tasks created in the Task Developer are reusable Worklet Designer • Creates objects that represent a set of tasks • Worklet objects are reusable 66 Workflow Structure A Workflow is set of instructions for the Informatica Server to perform data transformation and load Combines the logic of Session Tasks, other types of Tasks and Worklets The simplest Workflow is composed of a Start Task, a Link and one other Task Link Start Task Session Task 67 Workflow Scheduler Objects Setup reusable schedules to associate with multiple Workflows Used in Workflows and Session Tasks 68 Server Connections Configure Server data access connections Used in Session Tasks Configure: 1. Relational 2. MQ Series 3. FTP 4. Custom 5. External Loader 69 Relational Connections (Native ) Create a relational (database) connection Instructions to the Server to locate relational tables Used in Session Tasks 70 Relational Connection Properties Define native relational (database) connection User Name/Password Database connectivity information Rollback Segment assignment (optional) Optional Environment SQL (executed with each use of database connection) 71 FTP Connection Create an FTP connection Instructions to the Server to ftp flat files Used in Session Tasks 72 External Loader Connection Create an External Loader connection Instructions to the Server to invoke database bulk loaders Used in Session Tasks 73 Task Developer Create basic Reusable “building blocks” – to use in any Workflow Reusable Tasks • Session • Command • Email Set of instructions to execute Mapping logic Specify OS shell / script command(s) to run during the Workflow Send email at any point in the Workflow Session Command Email 74 Session Task Server instructions to runs the logic of ONE specific Mapping • e.g. - source and target data location specifications, memory allocation, optional Mapping overrides, scheduling, processing and load instructions Becomes a component of a Workflow (or Worklet) If configured in the Task Developer, the Session Task is reusable (optional) 75 Command Task Specify one (or more) Unix shell or DOS (NT, Win2000) commands to run at a specific point in the Workflow Becomes a component of a Workflow (or Worklet) If configured in the Task Developer, the Command Task is reusable (optional) Commands can also be referenced in a Session through the Session “Components” tab as Pre- or Post-Session commands 76 Command Task 77 Additional Workflow Components Two additional components are Worklets and Links Worklets are objects that contain a series of Tasks Links are required to connect objects in a Workflow 78 Developing Workflows Create a new Workflow in the Workflow Designer Customize Workflow name Select a Server 79 Workflow Properties Customize Workflow Properties Workflow log displays Select a Workflow Schedule (optional) May be reusable or non-reusable 80 Workflows Properties Create a User-defined Event which can later be used with the Raise Event Task Define Workflow Variables that can be used in later Task objects (example: Decision Task) 81 Building Workflow Components Add Sessions and other Tasks to the Workflow Connect all Workflow components with Links Save the Workflow Start the Workflow Save Sessions in a Workflow can be independently executed Start Workflow 82 Workflow Designer - Links Required to connect Workflow Tasks Can be used to create branches in a Workflow All links are executed -- unless a link condition is used which makes a link false Link 1 Link 3 Link 2 83 Session Tasks After this section, you will be familiar with: How to create and configure Session Tasks Session Task properties Transformation property overrides Reusable vs. non-reusable Sessions Session partitions 84 Session Task Created to execute the logic of a mapping (one mapping only) Session Tasks can be created in the Task Developer (reusable) or Workflow Developer (Workflow-specific) Steps to create a Session Task • Select the Session button from the Task Toolbar or • Select menu Tasks | Create Session Task Bar Icon 85 Session Task - General 86 Session Task - Properties 87 Session Task – Config Object 88 Session Task - Sources 89 Session Task - Targets 90 Session Task - Transformations Allows overrides of some transformation properties Does not change the properties in the Mapping 91 Session Task - Partitions 92 Monitor Workflows By the end of this section you will be familiar with: The Workflow Monitor GUI interface Monitoring views Server monitoring modes Filtering displayed items Actions initiated from the Workflow Monitor Truncating Monitor Logs 93 Monitor Workflows The Workflow Monitor is the tool for monitoring Workflows and Tasks Review details about a Workflow or Task in two views • Gantt Chart view • Task view Gantt Chart view Task view 94 Monitoring Workflows Perform operations in the Workflow Monitor • • • • Restart -- restart a Task, Workflow or Worklet Stop -- stop a Task, Workflow, or Worklet Abort -- abort a Task, Workflow, or Worklet Resume -- resume a suspended Workflow after a failed Task is corrected View Session and Workflow logs Abort has a 60 second timeout • If the Server has not completed processing and committing data during the timeout period, the threads and processes associated with the Session are killed Stopping a Session Task means the Server stops reading data 95 Monitoring Workflows Task View Status Bar Task Workflow Start, Stop, Abort, Resume Tasks,Workflows and Worklets Worklet Start Time Completion Time 96 Monitor Window Filtering Task View provides filtering Monitoring filters can be set using drop down menus Minimizes items displayed in Task View Right-click on Session to retrieve the Session Log (from the Server to the local PC Client) 97 Debugger By the end of this section you will be familiar with: Creating a Debug Session Debugger windows & indicators Debugger functionality and options Viewing data with the Debugger Setting and using Breakpoints Tips for using the Debugger 98 Debugger Features Debugger is a Wizard driven tool • • • • • View source / target data View transformation data Set break points and evaluate expressions Initialize variables Manually change variable values Debugger is • Session Driven • Data can be loaded or discarded • Debug environment can be saved for later use 99 Debugger Interface Debugger windows & indicators Debugger Mode indicator Solid yellow arrow Current Transformation indicator Flashing yellow SQL indicator Transformation Instance Data window Debugger Log tab Session Log tab Target Data window 100 Filter Transformation Drops rows conditionally Active Transformation Connected Ports • All input / output Specify a Filter condition Usage • Filter rows from flat file sources • Single pass source(s) into multiple targets 101 Aggregator Transformation Performs aggregate calculations Active Transformation Connected Ports • Mixed • Variables allowed • Group By allowed Create expressions in output or variable ports Usage • Standard aggregations 102 Informatica Functions Aggregate Functions AVG COUNT FIRST LAST MAX MEDIAN MIN PERCENTILE STDDEV SUM VARIANCE Return summary values for non-null data in selected ports Use only in Aggregator transformations Use in output ports only Calculate a single value (and row) for all records in a group Only one aggregate function can be nested within an aggregate function Conditional statements can be used with these functions 103 Aggregate Expressions Aggregate functions are supported only in the Aggregator Transformation Conditional Aggregate expressions are supported Conditional SUM format: SUM(value, condition) 104 Aggregator Properties Sorted Input Property Instructs the Aggregator to expect the data to be sorted Set Aggregator cache sizes (on Informatica Server machine) 105 Sorted Data The Aggregator can handle sorted or unsorted data • Sorted data can be aggregated more efficiently, decreasing total processing time The Server will cache data from each group and release the cached data -- upon reaching the first record of the next group Data must be sorted according to the order of the Aggregator “Group By” ports Performance gain will depend upon varying factors 106 Incremental Aggregation Trigger in Session Properties, Performance Tab MTD calculation Cache is saved into $PMCacheDir: aggregatorname.DAT aggregatorname.IDX Upon next run, files are overwritten with new cache information Example: When triggered, PowerCenter Server will save new MTD totals. Upon next run (new totals), Server will subtract old totals; difference will be passed forward Best Practice is to copy these files in case a rerun of data is ever required. Reinitialize when no longer needed, e.g. – at the beginning new month processing 107 Joiner Transformation By the end of this section you will be familiar with: When to use a Joiner Transformation Homogeneous Joins Heterogeneous Joins Joiner properties Joiner Conditions Nested joins 108 Homogeneous Joins Joins that can be performed with a SQL SELECT statement: Source Qualifier contains a SQL join Tables on same database server (or are synonyms) Database server does the join “work” Multiple homogenous tables can be joined 109 Heterogeneous Joins Joins that cannot be done with a SQL statement: An Oracle table and a Sybase table Two Informix tables on different database servers Two flat files A flat file and a database table 110 Joiner Transformation Performs heterogeneous joins on records from different databases or flat file sources Active Transformation Connected Ports • All input or input / output • “M” denotes port comes from master source Specify the Join condition Usage • Join two flat files • Join two tables from different databases • Join a flat file with a relational table 111 Joiner Conditions Multiple join conditions are supported 112 Joiner Properties Join types: • “Normal” (inner) • Master outer • Detail outer • Full outer Set Joiner Cache Joiner can accept sorted data (configure the join condition to use the sort origin ports) 113 Mid-Mapping Join The Joiner does not accept input in the following situations: Both input pipelines begin with the same Source Qualifier Both input pipelines begin with the same Normalizer Both input pipelines begin with the same Joiner Either input pipeline contains an Update Strategy 114 Sorter Transformation Can sort data from relational tables or flat files Sort takes place on the Informatica Server machine Multiple sort keys are supported The Sorter transformation is often more efficient than a sort performed on a database with an ORDER BY clause 115 Lookup Transformation By the end of this section you will be familiar with: Lookup principles Lookup properties Lookup conditions Lookup techniques Caching considerations 116 How a Lookup Transformation Works For each Mapping row, one or more port values are looked up in a database table If a match is found, one or more table values are returned to the Mapping. If no match is found, NULL is returned Lookup value(s) Lookup transformation Return value(s) 117 Lookup Transformation Looks up values in a database table and provides data to other components in a Mapping Passive Transformation Connected / Unconnected Ports • Mixed • “L” denotes Lookup port • “R” denotes port used as a return value (unconnected Lookup only) Specify the Lookup Condition Usage • Get related values • Verify if records exists or if data has changed 118 Lookup Properties Override Lookup SQL option Toggle caching Native Database Connection Object name 119 Additional Lookup Properties Set cache directory Make cache persistent Set Lookup cache sizes 120 Lookup Conditions Multiple conditions are supported 121 To Cache or not to Cache? Caching can significantly impact performance Cached • Lookup table data is cached locally on the Server • Mapping rows are looked up against the cache • Only one SQL SELECT is needed Uncached • Each Mapping row needs one SQL SELECT Rule Of Thumb: Cache if the number (and size) of records in the Lookup table is small relative to the number of mapping rows requiring lookup 122 Target Options By the end of this section you will be familiar with: Row type indicators Row operations at load time Constraint-based loading considerations Rejected row handling options 123 Target Properties Session Task Select target instance Row loading operations Error handling Properties Tab 124 Constraint-based Loading Maintains referential integrity in the Targets pk1 fk1, pk2 fk2 pk1 fk1, pk2 fk2 Example 1 With only One Active source, rows for Targets 1-3 will be loaded properly and maintain referential integrity Example 2 With Two Active sources, it is not possible to control whether rows for Target 3 will be loaded before or after those for Target 2 The following transformations are ‘Active sources’: Advanced External Procedure, Source Qualifier, Normalizer, Aggregator, Sorter, Joiner, Rank, Mapplet (containing any of the previous transformations) 125 Update Strategy Transformation By the end of this section you will be familiar with: Update Strategy functionality Update Strategy expressions Refresh strategies Smart aggregation 126 Update Strategy Transformation Used to specify how each individual row will be used to update target tables (insert, update, delete, reject) Active Transformation Connected Ports • All input / output Specify the Update Strategy Expression Usage • Updating Slowly Changing Dimensions • IIF or DECODE logic determines how to handle the record 127 Target Refresh Strategies Single snapshot: Target truncated, new records inserted Sequential snapshot: new records inserted Incremental: Only new records are inserted. Records already present in the target are ignored Incremental with Update: Only new records are inserted. Records already present in the target are updated 128 Router Transformation Rows sent to multiple filter conditions Active Transformation Connected Ports • All input/output • Specify filter conditions for each Group Usage • Link source data in one pass to multiple filter conditions 129 Router Transformation in a Mapping 130 Parameters and Variables By the end of this section you will understand: System Variables Creating Parameters and Variables Features and advantages Establishing values for Parameters and Variables 131 System Variables SYSDATE Provides current datetime on the Informatica Server machine • Not a static value $$$SessStartTime Returns the system date value as a string. Uses system clock on machine hosting Informatica Server • format of the string is database type dependent • Used in SQL override • Has a constant value SESSSTARTTIME Returns the system date value on the Informatica Server • Used with any function that accepts transformation date/time data types • Not to be used in a SQL override • Has a constant value 132 Mapping Parameters and Variables Apply to all transformations within one Mapping Represent declared values Variables can change in value during run-time Parameters remain constant during run-time Provide increased development flexibility Defined in Mapping menu Format is $$VariableName or $$ParameterName 133 Mapping Parameters and Variables Sample declarations Userdefined names Set the appropriate aggregation type Set optional Initial Value Declare Variables and Parameters in the Designer Mappings menu 134 Functions to Set Mapping Variables SetCountVariable -- Counts the number of evaluated rows and increments or decrements a mapping variable for each row SetMaxVariable -- Evaluates the value of a mapping variable to the higher of two values SetMinVariable -- Evaluates the value of a mapping variable to the lower of two values SetVariable -- Sets the value of a mapping variable to a specified value 135 Unconnected Lookup Will be physically “unconnected” from other transformations • There can be NO data flow arrows leading to or from an unconnected Lookup Lookup function can be set within any transformation that supports expressions Lookup data is called from the point in the Mapping that needs it Function in the Aggregator calls the unconnected Lookup 136 Conditional Lookup Technique Two requirements: Must be Unconnected (or “function mode”) Lookup Lookup function used within a conditional statement Condition Row keys (passed to Lookup) IIF ( ISNULL(customer_id),:lkp.MYLOOKUP(order_no)) Lookup function Conditional statement is evaluated for each row Lookup function is called only under the pre-defined condition 137 Conditional Lookup Advantage Data lookup is performed only for those rows which require it. Substantial performance can be gained EXAMPLE: A Mapping will process 500,000 rows. For two percent of those rows (10,000) the item_id value is NULL. Item_ID can be derived from the SKU_NUMB. IIF ( ISNULL(item_id), :lkp.MYLOOKUP (sku_numb)) Condition (true for 2 percent of all rows) Lookup (called only when condition is true) Net savings = 490,000 lookups 138 Connected vs. Unconnected Lookups CONNECTED LOOKUP UNCONNECTED LOOKUP Part of the mapping data flow Separate from the mapping data flow Returns multiple values (by linking output ports to another transformation) Returns one value (by checking the Return (R) port option for the output port that provides the return value) Executed for every record passing through the transformation Only executed when the lookup function is called More visible, shows where the lookup values are used Less visible, as the lookup is called from an expression within another transformation Default values are used Default values are ignored 139 Heterogeneous Targets By the end of this section you will be familiar with: Heterogeneous target types Heterogeneous target limitations Target conversions 140 Definition: Heterogeneous Targets Supported target definition types: Relational database Flat file XML ERP (SAP BW, PeopleSoft, etc.) A heterogeneous target is where the target types are different or the target database connections are different within a single Session Task 141 Step One: Identify Different Target Types Oracle table Oracle table Tables are EITHER in two different databases, or require different (schemaspecific) connect strings Flat file One target is a flatfile load 142 Step Two: Different Database Connections The two database connections WILL differ Flatfile requires separate location information 143 Target Type Override (Conversion) Example: Mapping has SQL Server target definitions. Session Task can be set to load Oracle tables instead, using an Oracle database connection. Only the following overrides are supported: Relational target to flat file target Relational target to any other relational database type SAP BW target to a flat file target CAUTION: If target definition datatypes are not compatible with datatypes in newly selected database type, modify the target definition 144 Mapplet Designer Mapplet Designer Tool Mapplet Transformation Icons Mapplet Output Transformation 145 Mapplet Advantages Useful for repetitive tasks / logic Represents a set of transformations Mapplets are reusable Use an ‘instance’ of a Mapplet in a Mapping Changes to a Mapplet are inherited by all instances Server expands the Mapplet at runtime 146 Active and Passive Mapplets Passive Mapplets contain only passive transformations Active Mapplets contain one or more active transformations CAUTION: changing a passive Mapplet into an active Mapplet may invalidate Mappings which use that Mapplet • Do an impact analysis in Repository Manager first 147 Using Active and Passive Mapplets Passive Active Multiple Passive Mapplets can populate the same target instance Multiple Active Mapplets or Active and Passive Mapplets cannot populate the same target instance 148 Reusable Transformations By the end of this section you will be familiar with: Reusable transformation advantages Reusable transformation rules Promoting transformations to reusable Copying reusable transformations 149 Reusable Transformations Define once - reuse many times Reusable Transformations • • • • • Can be a copy or a shortcut Edit Ports only in Transformation Developer Can edit Properties in the mapping Instances dynamically inherit changes Be careful: It is possible to invalidate mappings by changing reusable transformations Transformations that cannot be made reusable • Source Qualifier • ERP Source Qualifier • Normalizer used to read a Cobol data source 150 Promoting a Transformation to Reusable Place a check in the “Make reusable” box This action is not reversible 151 Sequence Generator Transformation Generates unique keys for any port on a row Passive Transformation Connected Ports • Two predefined output ports, NEXTVAL and CURRVAL • No input ports allowed Usage • Generate sequence numbers • Shareable across mappings 152 Sequence Generator Properties Number of Cached Values 153 Dynamic Lookup By the end of this section you will be familiar with: Dynamic lookup theory Dynamic lookup advantages Dynamic lookup rules 154 Additional Lookup Cache Options Make cache persistent Cache File Name Prefix • Reuse cache by name for another similar business purpose Recache from Database • Overrides other settings and Lookup data is refreshed Dynamic Lookup Cache • Allows a row to know about the handling of a previous row 155 Persistent Caches By default, Lookup caches are not persistent When Session completes, cache is erased Cache can be made persistent with the Lookup properties When Session completes, the persistent cache is stored on server hard disk files The next time Session runs, cached data is loaded fully or partially into RAM and reused Can improve performance, but “stale” data may pose a problem 156 Dynamic Lookup Cache Advantages When the target table is also the Lookup table, cache is changed dynamically as the target load rows are processed in the mapping New rows to be inserted into the target or for update to the target will affect the dynamic Lookup cache as they are processed Subsequent rows will know the handling of previous rows Dynamic Lookup cache and target load rows remain synchronized throughout the Session run 157 Update Dynamic Lookup Cache NewLookupRow port values • 0 – static lookup, cache is not changed • 1 – insert row to Lookup cache • 2 – update row in Lookup cache Does NOT change row type Use the Update Strategy transformation before or after Lookup, to flag rows for insert or update to the target Ignore NULL Property • Per port • Ignore NULL values from input row and update the cache using only with non-NULL values from input 158 Example: Dynamic Lookup Configuration Router Group Filter Condition should be: NewLookupRow = 1 This allows isolation of insert rows from update rows 159 Concurrent and Sequential Workflows By the end of this section you will be familiar with: Concurrent Workflows Sequential Workflows Scheduling Workflows Stopping, aborting, and suspending Tasks and Workflows 160 Multi-Task Workflows - Sequential Tasks can be run sequentially: Tasks shows are all Sessions, but they can also be other Tasks, such as Commands, Timer or Email Tasks 161 Multi-Task Workflows - Concurrent Tasks can be run concurrently: Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer or Email Tasks. 162 Multi-Task Workflows - Combined Tasks can be run in a combination concurrent and sequential pattern within one Workflow: Tasks shows are all Sessions, but they can also be other Tasks such as Commands, Timer or Email Tasks 163 Additional Transformations By the end of this section you will be familiar with: The Rank transformation The Normalizer transformation The Stored Procedure transformation The External Procedure transformation The Advanced External Procedure transformation 164 Rank Transformation Filters the top or bottom range of records Active Transformation Connected Ports • Mixed • One pre-defined output port RANKINDEX • Variables allowed • Group By allowed Usage • Select top/bottom • Number of records 165 Normalizer Transformation Normalizes records from relational or VSAM sources Active Transformation Connected Ports • Input / output or output Usage • Required for VSAM Source definitions • Normalize flat file or relational source definitions • Generate multiple records from one record 166 Normalizer Transformation Turn one row YEAR,ACCOUNT,MONTH1,MONTH2,MONTH3, … MONTH12 1997,Salaries,21000,21000,22000,19000,23000,26000,29000,29000,34000,34000,40000,45000 1997,Benefits,4200,4200,4400,3800,4600,5200,5800,5800,6800,6800,8000,9000 1997,Expenses,10500,4000,5000,6500,3000,7000,9000,4500,7500,8000,8500,8250 Into multiple rows 167 Stored Procedure Transformation Calls a database stored procedure Passive Transformation Connected/Unconnected Ports • Mixed • “R” denotes port will return a value from the stored function to the next transformation Usage • Perform transformation logic outside PowerMart / PowerCenter 168 External Procedure Transformation (TX) Calls a passive procedure defined in a dynamic linked library (DLL) or shared library Passive Transformation Connected/Unconnected Ports • Mixed • “R” designates return value port of an unconnected transformation Usage • Perform transformation logic outside PowerMart / PowerCenter Option to allow partitioning 169 Advanced TX Transformation Calls an active procedure defined in a dynamic linked library (DLL) or shared library Active Transformation Connected Mode only Ports • Mixed Usage • Perform transformation logic outside PowerMart / PowerCenter • Sorting, Aggregation Option to allow partitioning 170 Transaction Control Transformation Allows custom commit types (source- or targetbased) and user-defined conditional commits Passive Transformation Connected Mode Only Ports • Input and Output Properties • Continue • Commit Before • Commit After • Rollback Before • Rollback After 171 Transaction Control Functionality Commit Types • Target Based Commit Commit Based on “approximate” number of records written to target • Source Based Commit – Ensures that a source record is committed in all targets • User Defined Commit – Uses Transaction Control Transform to specify commits and rollbacks in the mapping based on conditions Set the Commit Type (and other specifications) in the Transaction Control Condition 172 Versioning View Object Version Properties Track Changes to an Object Check objects “in” and “out” Delete or Purge Object version Apply Labels and Run queries Deployment Groups 173 Informatica Business Analytics Suite Modular Plug-&-Play Approach Packaged Analytic Solutions Custom Built Analytic Solutions 174 Informatica Warehouses / Marts Informatica Warehouse™ Customer Relationship Finance Human Resources Supply Chain Sales G/L Compensation Planning Marketing Receivables Scorecard Sourcing Service Payables Inventory Web Profitability Quality Common Dimensions Customer Product Organization Supplier Time Geography Employee 175 Inside the Informatica Warehouse Business Intelligence Business Adapters™ (Extract) • Extract Transform Load Informatica Warehouse™ Advanced Analytic Calculation Data Model Engine • Analytic Bus™ (Transform) • • Warehouse Loader™ Analytic Bus™ ORCL i2 SEBL PSFT Custom Industry Best Practice Metrics Process-centric model & conformed dimensions Advanced Calculation Engine • SAP Type I, II slowly changing dimensions History and changed record tracking Analytic Data Model • • Transaction consolidation and standardization Source independent interface Warehouse Loader (Load) • • Business Adapters™ Data Source Connectivity with Minimal Load Structural/Functional Knowledge of Sources • Pre-aggregations for rapid query response Complex calculation metrics (e.g. statistical) 176 PowerConnect Products Family of enterprise software products that allow companies to directly source and integrate ERP, CRM, real-time message queue, mainframe, AS/400, remote data and metadata with other enterprise data PowerConnect for MQSeries (real time) PowerConnect for TIBCO (real time) PowerConnect for PeopleSoft PowerConnect for SAP R/3 PowerConnect for SAP BW PowerConnect for Siebel PowerConnect for Mainframe PowerConnect for AS/400 PowerConnect for Remote Data PowerConnect SDK 177 178