Page |1 Function Usage AlNum Cab be used to check if the given string has alphanumeric characters Alpha TRUE if string is completely alphabetic CompactWhiteSpace all consective whitespace will be reduced to single space Compare Compares two strings for sort ComparNoCase Compare two strings irrespective of Case in-sensitiveness ComparNum Compare the first n characters of the two strings CompareNumNoCase Compare first n characters of the two strings irrespective of case insensitiveness Convert Replace character in a string with the given character. Count Count number of times a given substring occurs in a string Dcount Returns count of delimited fields in a string DownCase Change all uppercase letters in a string to lowercase DQuote Enclose a string in double quotation marks Field Return 1 or more delimited substrings Index Find starting character position of substring Left Finds leftmost n characters of string Len Length of the string or total number of characters in a string Num Return 1 if string can be converted to a number PadString Return the string padded with the optional pad character and optional length Right Finds Rightmost n characters of string Soundex Returns a string which identifies a set of words that are phonetically similar Space Return a string of N space characters Squote Covers a string into single quotation marks Str Repeat a string StripWhiteSpace Return the string after removing all whitespace Trim Remove all leading and trailing spaces and tabs. Also reduce the internal occurrences of spaces and tabs into one. TrimB Remove all trailing spaces and tabs TrimF Remove all leading spaces and tabs Trim Returns a string with leading and trailing whitespace removed Upcase Change all lowercase letters in a string to uppercase Page |2 Category DataSatge 7x DataSatge 8.1 DataSatge 8.5 DataSatge 8.7 Logging File System Database Database Database Metadata File System Database Database Database Lookup No Range Lookup Range Lookup Range Lookup Range Lookup Connection Objects No Yes Yes Yes Client Separate Manager Client Merged with Designer Client Merged with Designer Client Merged with Designer Client Stage No SCD Stage SCD Stage Present SCD Stage Present SCD Stage Present Stage No multi reference in Join Stage Multi reference in Join Stage Multi reference in Join Stage Multi reference in Join Stage Stage No Vertical Pivoting No Vertical Pivoting Horizontal and Vertical Pivoting Horizontal and Vertical Pivoting Stage No looping in Transformer Stage No looping in Transformer Stage Looping possible in Transformer Looping possible in Transformer Stage No last record detection in transformer No last record detection in transformer Last record detection in transformer Last record detection in transformer Stage Explicit handling of Null values Explicit handling of Null values Nulls can be handlled in any expressions Nulls can be handlled in any expressions. Utilities No job comparisons Jobs can be compared Jobs can be compared Jabs can be compared Utilities No Advance Search Advance Search Available Advance Search Available Advance Search Available Web Console No Yes Yes Yes Source Control No No Yes Yes Administration Can not copy Roles from an existing project Can not copy Roles from an existing project Can copy Roles from an existing project Can copy Roles from an existing project Administration No Auto creation of environment variables if not already present. No Auto creation of environment variables if not already present. Auto creation of environment variables if not already present. Auto creation of environment variables if not already present. Debugging No Debugger available No Debugger available No Debugger available Debugger available Reporting No Dashboard reporting for DataStage operations No Dashboard reporting for DataStage operations No Dashboard reporting for DataStage operations Dashboard reporting available Page |3 DataStage EE environment variables The default environment variables settings are provided during the Datastage installation (common for all users). Users have a few options to override the default settings with Datastage client applications: With Datastage Administrator - project-wide defaults for general environment variables, set per project in the Projects tab under Properties -> General Tab -> Environment... With Datastage Designer - settings at the job level in Job Properties With Datastage Director - settings per run, overrides all other settings and is very useful for testing and debuging. The Datastage environment variables are grouped and each variable falls into one of categories. Basically the default values set up during an installation are resonable and in most cases there is no need to modify them. Setting environment variables for parallel execution in Datastage Administrator Environment variables overview Listed below are only environment variables that are candidates to adjustment in real-life project deployments. Please refer to the datastage help for details on variables not listed here. General variables Page |4 LD_LIBRARY_PATH - specifies the location of dynamic libraries on Unix PATH - Unix shell search path TMPDIR - temporary directory Parallel properties APT_CONFIG_FILE - the parallel job configuration file. It points to the active configuration file on the server. Please refer to Datastage EE configuration guide for more details on creating a config file. APT_DISABLE_COMBINATION - prevents operators (stages) from being combined into one process. Used mainly for benchmarks. APT_ORCHHOME - home path for parallel content. APT_STRING_PADCHAR - defines a pad character which is used when a varchar is converted to a fixed length string Operator specific The operator specific variables under parallel properties are stage specific settings and usually set during an installation. The settings apply to the supported parallel database engines (DB2, Oracle, Sas and Teradata). APT_DBNAME - default DB2 database name to use APT_RDBMS_COMMIT_ROWS - RDBMS commit interval Reporting The reporting variables control logging options and take True/False values only. APT_DUMP_SCORE - shows operators, datasets, nodes, partitions, combinations and processes used in a job. APT_RECORD_COUNTS - helps detect and analyze load imbalance. It prints the number of records consumed by getRecord() and produced by putRecord() OSH_PRINT_SCHEMAS - shows unformatted metadata for all stages (interface schema) and datasets (record schema). OSH_PRINT_SCHEMAS environment variable should be set to verify that runtime schemas match the job design column definitions (especially from Oracle). OSH_DUMP - shows an OSH script and produces a verbose description of a step before executing it APT_NO_JOBMON - disables performance statistics and process metadata reporting in Designer. Compiler APT_COMPILER - path to the C++ compiler needed to compile transformer stages Page |5 DataStage EE provides a number of environment variables to control how jobs operate on a UNIX system. In addition to providing required information, environment variables can be used to enable or disable various DataStage features, and to tune performance settings. Data Stage Environment Variable Settings for All Jobs Ascential recommends the following environment variable settings for all Enterprise Edition jobs. These settings can be made at the project level, or may be set on an individual basis within the properties for each job. Environment Variable Settings For All Jobs Environment Variable Setting Description $APT_CONFIG_FILE filepath Specifies the full pathname to the EE configuration file. 1 Outputs EE score dump to the DataStage job log, providing detailed information about actual job flow including operators, processes, and datasets. Extremely useful for understanding how a job actually ran in the environment. (see section 10.1 Reading a Score Dump) $OSH_ECHO 1 Includes a copy of the generated osh in the job’s DataStage log. Starting with v7, this option is enabled when “Generated OSH visible for Parallel jobs in ALL projects” option is enabled in DataStage Administrator. $APT_RECORD_COUNTS 1 Outputs record counts to the DataStage job log as each operator completes processing. The count is per operator per partition. $APT_PM_SHOW_PIDS 1 Places entries in DataStage job log showing UNIX process ID (PID) for each process started by a job. Does not report PIDs of DataStage “phantom” processes started by Server shared containers. $APT_BUFFER_MAXIMUM_TIMEOUT 1 Maximum buffer delay in seconds 1 Only needed for DataStage v7.0 and earlier. Setting this environment variable significantly reduces memory usage for very large (>100 operator) jobs. $APT_DUMP_SCORE $APT_THIN_SCORE (DataStage 7.0 and earlier) Additional Environment Variable Settings Ascential recommends setting the following environment variables on an as-needed basis. These variables can be used to tune the performance of a particular job flow, to assist in debugging, and to change the default behavior of specific EE stages. Page |6 NOTE: The environment variable settings in this section are only examples. Set values that are optimal to your environment. Sequential File Stage Environment Variables Environment Variable $APT_EXPORT_FLUSH_COUNT Setting Description [nrows] Specifies how frequently (in rows) that the Sequential File stage (export operator) flushes its internal buffer to disk. Setting this value to a low number (such as 1) is useful for realtime applications, but there is a small performance penalty from increased I/O. [Kbytes] Defines size of I/O buffer for Sequential File reads (imports) and writes (exports) respectively. Default is 128 (128K), with a minimum of 8. Increasing these values on heavily-loaded file servers may improve performance. [bytes] In some disk array configurations, setting this variable to a value equal to the read / write size in bytes can improve performance of Sequential File import/export operations. $APT_IMPORT_BUFFER_SIZE $APT_EXPORT_BUFFER_SIZE $APT_CONSISTENT_BUFFERIO_SIZE $APT_DELIMITED_READ_SIZE [bytes] Specifies the number of bytes the Sequential File (import) stage reads-ahead to get the next delimiter. The default is 500 bytes, but this can be set as low as 2 bytes. This setting should be set to a lower value when reading from streaming inputs (eg. socket, FIFO) to avoid blocking. By default, Sequential File (import) will read ahead 500 bytes to get the next delimiter. If it is not found the importer looks ahead 4*500=2000 (1500 more) bytes, and so on (4X) up to 100,000 bytes. $APT_MAX_DELIMITED_READ_SIZE [bytes] This variable controls the upper bound which is by default 100,000 bytes. When more than 500 bytes read-ahead is desired, use this variable instead of APT_DELIMITED_READ_SIZE. Oracle Environment Variables Setting Description $ORACLE_HOME [path] Specifies installation directory for current Oracle instance. Normally set in a user’s environment by Oracle scripts. $ORACLE_SID [sid] Specifies Environment Variable the Oracle service name, Page |7 corresponding to a TNSNAMES entry. These two environment variables work together to specify how often target rows are committed for target Oracle stages with Upsert method. $APT_ORAUPSERT_COMMIT_ROW_INTERVAL [num] $APT_ORAUPSERT_COMMIT_TIME_INTERVAL [seconds] $APT_ORACLE_LOAD_OPTIONS [SQL* Loader options] Specifies Oracle SQL*Loader options used in a target Oracle stage with Load method. By default, this is set to OPTIONS(DIRECT=TRUE, PARALLEL=TRUE) $APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM 1 When set, a target Oracle stage with Load method will limit the number of players to the number of datafiles in the table’s tablespace. $APT_ORA_WRITE_FILES [filepath] Useful in debugging Oracle SQL*Loader issues. When set, the output of a Target Oracle stage with Load method is written to files instead of invoking the Oracle SQL*Loader. The filepath specified by this environment variable specifies the file with the SQL*Loader commands. $DS_ENABLE_RESERVED_CHAR_CONVERT 1 Allows DataStage to handle Oracle databases which use the special characters # and $ in column names. Commits are made whenever the time interval period has passed or the row interval is reached, whichever comes first. By default, commits are made every 2 seconds or 5000 rows. Environment Variable Setting Description $APT_MONITOR_TIME [seconds] In v7 and later, specifies the time interval (in seconds) for generating job monitor information at runtime. To enable size-based job monitoring, unset this environment variable, and set $APT_MONITOR_SIZEbelow. [rows] Determines the minimum number of records the job monitor reports. The default of 5000 records is usually too small. To minimize the number of messages during large job runs, set this to a higher value (eg. 1000000). $APT_NO_JOBMON 1 Disables job monitoring completely. In rare instances, this may improve performance. In general, this should only be set on a per-job basis when attempting to resolve performance bottlenecks. $APT_RECORD_COUNTS 1 Prints record counts in the job log as each operator completes processing. The count is per operator per partition. $APT_MONITOR_SIZE Page |8 Job Monitoring Environment Variables Environment Variable Setting Description $APT_MONITOR_TIME [seconds] In v7 and later, specifies the time interval (in seconds) for generating job monitor information at runtime. To enable size-based job monitoring, unset this environment variable, and set $APT_MONITOR_SIZEbelow. [rows] Determines the minimum number of records the job monitor reports. The default of 5000 records is usually too small. To minimize the number of messages during large job runs, set this to a higher value (eg. 1000000). $APT_NO_JOBMON 1 Disables job monitoring completely. In rare instances, this may improve performance. In general, this should only be set on a per-job basis when attempting to resolve performance bottlenecks. $APT_RECORD_COUNTS 1 Prints record counts in the job log as each operator completes processing. The count is per operator per partition. $APT_MONITOR_SIZE