tNetezzaNzLoad-documentation

advertisement
tNetezzaNzLoad Documentation
Description:
This component invokes Netezza’s nzload utility to insert records into the Netezza database. This
component can be either in standalone mode, loading from an existing data file; or connected to an
input row and loading data from the connected component. Both these scenarios are described in the
sample jobs later.
Component
Family
Function
Purpose
Basic Settings
Databases/Netezza
tNetezzaNzLoad component inserts data into a Netezza database table using Netezza’s
nzload utility
To bulk data into a Netezza table either from an existing data file, an input flow, or a
named-pipe.
Property Type
Choose between a built-in or repository mode to specify the
database connection parameters
Host
Database host address.
Port
Database port to connect to.
Database
Netezza database name to connect to.
User
Database user name.
Password
Database password.
Table
Database table to insert data into.
Action on Table
Used to create or truncate table before loading data.
Drop and Create: executes a drop and create table statement
before loading data.
Create Table: executes only a create table before loading data.
Create table if not exists: executes a create table statement
only if the table doesn’t already exists in the database.
Drop table if exists and create: drops the table only if it
already exists in the database and creates it again.
Clear table: executes a delete statement prior to loading data
to clear the entire content of the table.
Truncate table: executes a truncate statement prior to loading
data to clear the entire content of the table.
Schema
Component’s schema.
Data file
Full path to the data file to use. If this component is used by
itself (not connected to another component with input flow)
then this is the name of an existing data file to load into the
database. But if this component is connected with an input
flow to another component; this is the name of the file to
Use named-pipe
Named-pipe name
Advanced
Settings
Use existing control
file
Control file
Field Separator
Require quotes
around fields
Escape char
Advanced
options
(nzload’s
arguments)
-lf
-bf
generate and write the incoming data to later be used with
nzload to load into the database.
Check this option if you like to use a named-pipe instead of a
data file. This option can only be used when this component is
connected with an input flow to another component. With this
option checked, there would be no data file generated and the
data is passed to nzload through a named-pipe. This is option
is strongly recommended to improve performance both Linux
and Windows mode. To configure the named-pipe option to
work on Windows platform please read the instructions
below.
Specify a name for the named-pipe to use for loading. Make
sure to enter a valid name.
Check this option if you like to provide a control file to be used
with the nzload utility instead of specifying all the options
explicitly through the component. When this option is
specified, data file and all other nzload related options are
ignored. Please refer to Netezza’s nzload manual for details on
creating a control file.
The name of the nzload control to use. This option is passed to
nzload utility via –cf argument.
The delimiter to use while loading data. This is nzload’s –delim
argument. If you do not use Quoted values option, you must
make sure that the delimiter is not included in the data that’s
inserted to the database. The default value is \t or TAB. To
improve performance you can keep the default value.
This is nzload’s –quotedValue argument. This option is only
applied to columns with String, Byte, Byte[], Char, and Object
type .Valid options are:
None: (NO) do not wrap column values in quotes.
Single quote: (SINGLE) wrap column values in a single quote
(').
Double quote: (DOUBLE) wrap column values in double quote
(").
PLEASE refer to the description below to load data with
quoted values or carriage returns.
Escape character to escape quotes in data. The only valid
option is backslash (\). This option must be specified if
previous option is set to either ‘single quote’ or ‘double quote’.
Name of the log file to generate. The logs will be appended if
the log file already exists. If the parameter is not specified, the
default name for the log file is
‘<table_name>.<db_name>.nzlog’. And it’s generated under
the current working directory where the job is running.
Name of the bad file to generate. The bad file contains all the
records that could not be loaded due to an internal Netezza
-ouputDir
-logFileSize
-compress
-skipRows <n>
-maxRows <n>
-maxErrors
-ignoreZero
-requireQuotes
-nullValue <token>
-fillRecord
-ctrlChar
-ctInString
error. The records will be appended if the bad file already
exists. If the parameter is not specified, the default name for
the bad file is ‘<table_name>.<db_name>.nzbad’. And it’s
generated under the current working directory where the job
is running.
Directory path to where the log and the bad file is generated. If
the parameter is not specified the files are generated under
the current directory where the job is currently running.
Maximum size for the log file. The value is in MB. The default
value is 2000 or 2GB. To save hard disk space, specify a smaller
amount if your job runs often.
Specify this option if the data file is compressed. This option is
only valid if this component is used by itself and not connected
to another component via an input flow. Valid values are
"TRUE" or "FALSE". Default value if "FALSE".
Number of rows to skip from the beginning of the data file.
This option should only be used if this component is used by
itself and not connected to another component via an input
flow. Set the value to "1" if you like to skip the header row
from the data file. The default value is "0".
Maximum number of rows to load from the data file. This
option should only be used if this component is used by itself
and not connected to another component via an input flow.
Maximum number of error records to allow before terminating
the load process. The default value if "1".
Binary zero bytes in the input data will generate errors. Set this
option no "NO" to generate error or to "YES" to ignore zero
bytes. The default value is "NO".
This option requires all the values to be wrapped in quotes.
Note: this option currently does not work with input flow. Use
this option only in standalone mode with an existing file.
default value is "FALSE".
Specify the token to indicate a null value in the data file. The
default value is "NULL". To improve slightly performance you
can set this value to an empty field by specifying the value for
the option as to single quotes: "\'\'".
Treat missing trailing input fields as null. You do not need to
specify a value for this option in the value field of the table.
This option is not turned on by default, therefore input fields
must match exactly all the columns of the table by default.
Note: trailing input fields must be nullable in the database.
Accept control chars in char/varchar fields (must escape NUL,
CR and LF). You do not need to specify a value for this option in
the value field of the table. This option is turned off by default.
Accept un-escaped CR in char/varchar fields (LF becomes only
end of row). You do not need to specify a value for this option
in the value field of the table. This option is turned off by
-truncString
-dateStyle
-dateDelim
-y2Base
-timeStyle
-timeDelim
-timeRoundNanos
-boolStyle
-allowRelay
-allowRelay <n>
default.
Truncate any string value that exceeds its declared
char/varchar storage. You do not need to specify a value for
this option in the value field of the table. This option is turned
off by default.
Specify the date format in which the input data is written in.
Valid values are: "YMD", "Y2MD", "DMY", "DMY2", "MDY",
"MDY2", "MONDY", "MONDY2". The default value is "YMD".
Note: the date format of the column in the component’s
schema must match the value specified here. For example if
you want to load a DATE column, specify the date format in
the component schema as "yyyy-MM-dd" and the -dateStyle
option as "YMD".
For more description on loading date and time fields please
see the additional notes below.
Delimiter character between date parts. The default value is "" for all date styles except for "MONDY[2]" which is " " (empty
space).
Note: the date format of the column in the component’s
schema must match the value specified here.
First year expressible using two digit year (Y2) dateStyle.
Specify the time format in which the input data is written in.
Valid values are: "24HOUR" and "12HOUR". The default value
is "24HOUR". For slightly better performance you should keep
the default value.
Note: the time format of the column in the component’s
schema must match the value specified here. For example if
you want to load a TIME column, specify the date format in the
component schema as "HH:mm:ss" and the -timeStyle option
as "24HOUR".
For more description on loading date and time fields please
see the additional notes below.
Delimiter character between time parts. The default value is
":".
Note: the date format of the column in the component’s
schema must match the value specified here.
Allow but round non-zero digits with smaller than microsecond
resolution.
Specify the format in which Boolean data is written in the data.
The valid values are: "1_0", "T_F", "Y_N", "TRUE_FALSE",
"YES". The default value is "1_0". For slightly better
performance keep the default value.
Allow load to continue after one or more SPU reset or failed
over. The default is not allowed.
Specify number of allowable continuation of a load. Default
value is "1".
Encoding
Specify nzload path
Full path to nzload
executable
Encoding of the data file.
Check this option to specify the full path for nzload executable.
You must check this option if nzload path is not specified in the
PATH environment variable.
Full path to the nzload executable on the running machine.
It’s recommended to put the nzload path in PATH environment
variable and do not specify this option.
Configuring Named-pipe Option on Windows Platforms
This component on named-pipe mode uses a JNI interface to create and write to a named-pipe on any
Windows platform. Therefore the path to the associated JNI DLL must be configured inside the java
library path. The component comes with two DLLs for both 32 and 64 bit operating systems called:
namedpipe_jni.dll and namedpipe_jni_64bit.dll.
There are many ways to specify include these DLLs in the java library path. But the easiest to copy the
correct library file for your platform inside C:\WINDOWS directory. Note, for 64 bit systems you must
first rename the library to namedpipe_jni.dll and place inside the windows directory.
You can also explicitly specify the java library path through the JVM command line. Therefore add the
following argument to the JVM call:
-Djava.library.path="<your directory>/namedpipe_jni.dll
For more information you can read the README-WIN.txt file included with the component.
Loading DATE, TIME, and TIMESTAMP columns
When this component is used with an input flow, the date format specified inside the component’s
schema must match the value specified for -dateStyle, -dateDelim, -timeStyle, and -timeDelim options.
Please refer to following example:
Data type
Schema Date format
-dateStyle -dateDelim
-timeStyle
-timeDelim
n/a
DATE
"yyyy-MM-dd"
"YMD"
"-"
n/a
TIME
"HH:mm:ss"
n/a
n/a
"24HOUR" ":"
"-"
"24HOUR" ":"
TIMESTAMP "yyyy-MM-dd HH:mm:ss" "YMD"
Sample scenarios
Attached zip file, tNetezzaNzLoad_sample.zip, contains three example to use this component in three
different modes:
-
test4 job: uses this component with an input flow but using a data file to load with nzload.
-
test5 job: uses this component with an input flow but using a named-pipe to load to nzload.
-
test6 job: uses this component in standalone mode, with no incoming input flows, and uses an
existing data file to be loaded with nzload.
Please use Talend version 3.2.3 or higher to import the zip file.
Download