Data Stage Environment Variable Settings for

advertisement
Page |1
Function
Usage
AlNum
Cab be used to check if the given string has alphanumeric characters
Alpha
TRUE if string is completely alphabetic
CompactWhiteSpace
all consective whitespace will be reduced to single space
Compare
Compares two strings for sort
ComparNoCase
Compare two strings irrespective of Case in-sensitiveness
ComparNum
Compare the first n characters of the two strings
CompareNumNoCase
Compare first n characters of the two strings irrespective of case insensitiveness
Convert
Replace character in a string with the given character.
Count
Count number of times a given substring occurs in a string
Dcount
Returns count of delimited fields in a string
DownCase
Change all uppercase letters in a string to lowercase
DQuote
Enclose a string in double quotation marks
Field
Return 1 or more delimited substrings
Index
Find starting character position of substring
Left
Finds leftmost n characters of string
Len
Length of the string or total number of characters in a string
Num
Return 1 if string can be converted to a number
PadString
Return the string padded with the optional pad character and optional
length
Right
Finds Rightmost n characters of string
Soundex
Returns a string which identifies a set of words that are
phonetically similar
Space
Return a string of N space characters
Squote
Covers a string into single quotation marks
Str
Repeat a string
StripWhiteSpace
Return the string after removing all whitespace
Trim
Remove all leading and trailing spaces and tabs. Also reduce the internal
occurrences of spaces and tabs into one.
TrimB
Remove all trailing spaces and tabs
TrimF
Remove all leading spaces and tabs
Trim
Returns a string with leading and trailing whitespace removed
Upcase
Change all lowercase letters in a string to uppercase
Page |2
Category
DataSatge 7x
DataSatge 8.1
DataSatge 8.5
DataSatge 8.7
Logging
File System
Database
Database
Database
Metadata
File System
Database
Database
Database
Lookup
No Range Lookup
Range Lookup
Range Lookup
Range Lookup
Connection
Objects
No
Yes
Yes
Yes
Client
Separate Manager
Client
Merged with
Designer Client
Merged with
Designer Client
Merged with Designer
Client
Stage
No SCD Stage
SCD Stage Present
SCD Stage Present
SCD Stage Present
Stage
No multi reference in
Join Stage
Multi reference in
Join Stage
Multi reference in
Join Stage
Multi reference in Join
Stage
Stage
No Vertical Pivoting
No Vertical Pivoting
Horizontal and
Vertical Pivoting
Horizontal and Vertical
Pivoting
Stage
No looping in
Transformer Stage
No looping in
Transformer Stage
Looping possible in
Transformer
Looping possible in
Transformer
Stage
No last record
detection in
transformer
No last record
detection in
transformer
Last record
detection in
transformer
Last record detection in
transformer
Stage
Explicit handling of
Null values
Explicit handling of
Null values
Nulls can be
handlled in any
expressions
Nulls can be handlled in
any expressions.
Utilities
No job comparisons
Jobs can be
compared
Jobs can be
compared
Jabs can be compared
Utilities
No Advance Search
Advance Search
Available
Advance Search
Available
Advance Search Available
Web Console
No
Yes
Yes
Yes
Source Control
No
No
Yes
Yes
Administration
Can not copy Roles
from an existing
project
Can not copy Roles
from an existing
project
Can copy Roles from
an existing project
Can copy Roles from an
existing project
Administration
No Auto creation of
environment
variables if not
already present.
No Auto creation of
environment
variables if not
already present.
Auto creation of
environment
variables if not
already present.
Auto creation of
environment variables if
not already present.
Debugging
No Debugger
available
No Debugger
available
No Debugger
available
Debugger available
Reporting
No Dashboard
reporting for
DataStage
operations
No Dashboard
reporting for
DataStage
operations
No Dashboard
reporting for
DataStage
operations
Dashboard reporting
available
Page |3
DataStage EE environment variables
The default environment variables settings are provided during the Datastage installation
(common
for
all
users).
Users have a few options to override the default settings with Datastage client applications:
 With Datastage Administrator - project-wide defaults for general environment variables, set per
project in the Projects tab under Properties -> General Tab -> Environment...
 With Datastage Designer - settings at the job level in Job Properties
 With Datastage Director - settings per run, overrides all other settings and is very useful for testing
and debuging.
The Datastage environment variables are grouped and each variable falls into one of categories.
Basically the default values set up during an installation are resonable and in most cases there is no
need to modify them.
Setting environment variables for parallel execution in Datastage Administrator
Environment variables overview
Listed below are only environment variables that are candidates to adjustment in real-life
project deployments. Please refer to the datastage help for details on variables not listed
here.
General variables
Page |4
 LD_LIBRARY_PATH - specifies the location of dynamic libraries on Unix
 PATH - Unix shell search path
 TMPDIR - temporary directory
Parallel properties
 APT_CONFIG_FILE - the parallel job configuration file. It points to the active configuration file on the
server. Please refer to Datastage EE configuration guide for more details on creating a config file.
 APT_DISABLE_COMBINATION - prevents operators (stages) from being combined into one process.
Used mainly for benchmarks.
 APT_ORCHHOME - home path for parallel content.
 APT_STRING_PADCHAR - defines a pad character which is used when a varchar is converted to a fixed
length string
Operator specific
The operator specific variables under parallel properties are stage specific settings and
usually set during an installation. The settings apply to the supported parallel database
engines (DB2, Oracle, Sas and Teradata).
 APT_DBNAME - default DB2 database name to use
 APT_RDBMS_COMMIT_ROWS - RDBMS commit interval
Reporting
The reporting variables control logging options and take True/False values only.
 APT_DUMP_SCORE - shows operators, datasets, nodes, partitions, combinations and processes used
in a job.
 APT_RECORD_COUNTS - helps detect and analyze load imbalance. It prints the number of records
consumed by getRecord() and produced by putRecord()
 OSH_PRINT_SCHEMAS - shows unformatted metadata for all stages (interface schema) and datasets
(record schema). OSH_PRINT_SCHEMAS environment variable should be set to verify that runtime
schemas match the job design column definitions (especially from Oracle).
 OSH_DUMP - shows an OSH script and produces a verbose description of a step before executing it
 APT_NO_JOBMON - disables performance statistics and process metadata reporting in Designer.
Compiler
 APT_COMPILER - path to the C++ compiler needed to compile transformer stages
Page |5
DataStage EE provides a number of environment variables to control how jobs operate on a
UNIX system. In addition to providing required information, environment variables can be
used to enable or disable various DataStage features, and to tune performance settings.
Data Stage Environment Variable Settings for All Jobs
Ascential recommends the following environment variable settings for all Enterprise Edition
jobs. These settings can be made at the project level, or may be set on an individual basis
within the properties for each job.
Environment Variable Settings For All Jobs
Environment Variable
Setting
Description
$APT_CONFIG_FILE
filepath
Specifies the full pathname to the EE configuration file.
1
Outputs EE score dump to the DataStage job log,
providing detailed information about actual job flow
including operators, processes, and datasets. Extremely
useful for understanding how a job actually ran in the
environment. (see section 10.1 Reading a Score Dump)
$OSH_ECHO
1
Includes a copy of the generated osh in the job’s
DataStage log. Starting with v7, this option is enabled
when “Generated OSH visible for Parallel jobs in ALL
projects” option is enabled in DataStage Administrator.
$APT_RECORD_COUNTS
1
Outputs record counts to the DataStage job log as each
operator completes processing. The count is per operator
per partition.
$APT_PM_SHOW_PIDS
1
Places entries in DataStage job log showing UNIX process
ID (PID) for each process started by a job. Does not
report PIDs of DataStage “phantom” processes started by
Server shared containers.
$APT_BUFFER_MAXIMUM_TIMEOUT
1
Maximum buffer delay in seconds
1
Only needed for DataStage v7.0 and earlier. Setting this
environment variable significantly reduces memory usage
for very large (>100 operator) jobs.
$APT_DUMP_SCORE
$APT_THIN_SCORE
(DataStage 7.0 and earlier)
Additional Environment Variable Settings
Ascential recommends setting the following environment variables on an as-needed basis.
These variables can be used to tune the performance of a particular job flow, to assist in
debugging, and to change the default behavior of specific EE stages.
Page |6
NOTE: The environment variable settings in this section are only examples. Set values that
are optimal to your environment.
Sequential File Stage Environment Variables
Environment Variable
$APT_EXPORT_FLUSH_COUNT
Setting
Description
[nrows]
Specifies how frequently (in rows) that the
Sequential File stage (export operator) flushes
its internal buffer to disk. Setting this value to
a low number (such as 1) is useful for realtime
applications, but there is a small performance
penalty from increased I/O.
[Kbytes]
Defines size of I/O buffer for Sequential File
reads
(imports)
and
writes
(exports)
respectively. Default is 128 (128K), with a
minimum of 8. Increasing these values on
heavily-loaded file servers may improve
performance.
[bytes]
In some disk array configurations, setting this
variable to a value equal to the read / write
size in bytes can improve performance of
Sequential File import/export operations.
$APT_IMPORT_BUFFER_SIZE
$APT_EXPORT_BUFFER_SIZE
$APT_CONSISTENT_BUFFERIO_SIZE
$APT_DELIMITED_READ_SIZE
[bytes]
Specifies the number of bytes the Sequential
File (import) stage reads-ahead to get the next
delimiter. The default is 500 bytes, but this can
be set as low as 2 bytes.
This setting should be set to a lower value
when reading from streaming inputs (eg.
socket, FIFO) to avoid blocking.
By default, Sequential File (import) will read
ahead 500 bytes to get the next delimiter. If it
is not found the importer looks ahead
4*500=2000 (1500 more) bytes, and so on
(4X) up to 100,000 bytes.
$APT_MAX_DELIMITED_READ_SIZE
[bytes]
This variable controls the upper bound which is
by default 100,000 bytes. When more than
500 bytes read-ahead is desired, use this
variable
instead
of
APT_DELIMITED_READ_SIZE.
Oracle Environment Variables
Setting
Description
$ORACLE_HOME
[path]
Specifies installation directory for current
Oracle instance. Normally set in a user’s
environment by Oracle scripts.
$ORACLE_SID
[sid]
Specifies
Environment Variable
the
Oracle
service
name,
Page |7
corresponding to a TNSNAMES entry.
These two environment variables work
together to specify how often target rows are
committed for target Oracle stages with Upsert
method.
$APT_ORAUPSERT_COMMIT_ROW_INTERVAL
[num]
$APT_ORAUPSERT_COMMIT_TIME_INTERVAL
[seconds]
$APT_ORACLE_LOAD_OPTIONS
[SQL*
Loader
options]
Specifies Oracle SQL*Loader options used in a
target Oracle stage with Load method. By
default, this is set to OPTIONS(DIRECT=TRUE,
PARALLEL=TRUE)
$APT_ORA_IGNORE_CONFIG_FILE_PARALLELISM
1
When set, a target Oracle stage with Load
method will limit the number of players to the
number of datafiles in the table’s tablespace.
$APT_ORA_WRITE_FILES
[filepath]
Useful in debugging Oracle SQL*Loader issues.
When set, the output of a Target Oracle stage
with Load method is written to files instead of
invoking the Oracle SQL*Loader. The filepath
specified by this environment variable specifies
the file with the SQL*Loader commands.
$DS_ENABLE_RESERVED_CHAR_CONVERT
1
Allows DataStage to handle Oracle databases
which use the special characters # and $ in
column names.
Commits are made whenever the time interval
period has passed or the row interval is
reached, whichever comes first. By default,
commits are made every 2 seconds or 5000
rows.
Environment Variable
Setting
Description
$APT_MONITOR_TIME
[seconds]
In v7 and later, specifies the time interval (in seconds)
for generating job monitor information at runtime. To
enable size-based job monitoring, unset this environment
variable, and set $APT_MONITOR_SIZEbelow.
[rows]
Determines the minimum number of records the job
monitor reports. The default of 5000 records is usually
too small. To minimize the number of messages during
large job runs, set this to a higher value (eg. 1000000).
$APT_NO_JOBMON
1
Disables job monitoring completely. In rare instances,
this may improve performance. In general, this should
only be set on a per-job basis when attempting to
resolve performance bottlenecks.
$APT_RECORD_COUNTS
1
Prints record counts in the job log as each operator
completes processing. The count is per operator per
partition.
$APT_MONITOR_SIZE
Page |8
Job Monitoring Environment Variables
Environment Variable
Setting
Description
$APT_MONITOR_TIME
[seconds]
In v7 and later, specifies the time interval (in seconds)
for generating job monitor information at runtime. To
enable size-based job monitoring, unset this environment
variable, and set $APT_MONITOR_SIZEbelow.
[rows]
Determines the minimum number of records the job
monitor reports. The default of 5000 records is usually
too small. To minimize the number of messages during
large job runs, set this to a higher value (eg. 1000000).
$APT_NO_JOBMON
1
Disables job monitoring completely. In rare instances,
this may improve performance. In general, this should
only be set on a per-job basis when attempting to
resolve performance bottlenecks.
$APT_RECORD_COUNTS
1
Prints record counts in the job log as each operator
completes processing. The count is per operator per
partition.
$APT_MONITOR_SIZE
Download