DLM Appendix C: Partitioned Index for Very Large Database

advertisement
Appendix C
Partitioned Index for Very Large
Database
Overview
Until Release L1G of BASIS, an index could be stored in only one host file. This limited
the size of the index to 2 gigabytes (or less) on most host systems. This limitation was a
problem for very large databases and for databases having most of their index terms
generated from a single field. To alleviate this problem, the AREA parameter of the
SDM INDEX definition has been expanded to let you specify either an area name or an
index distribution list. An index distribution list enables the index space to be distributed
across several host files according to index term value.
This appendix describes the expanded syntax of the INDEX AREA parameter, provides
examples of DDL using the new parameter, describes how the partitioned index is
enabled, and describes how a partitioned index is used during loading and indexing. The
appendix also describes the effect of a partitioned index on the DMR SHOW action and
on the HVU_MRG_SPLITS option that enables the splitting and merging of sort work
files for numeric indexes.
Partitioned Index for Very Large Database  877
Partitioned Index
Partitioned indexing allows a large index to be distributed across several host files. The
files where index terms of a partitioned index are stored is determined by the index
distribution list as defined in the INDEX definition in the Structural Data Model.
INDEX Definition Syntax:
INDEX index, TYPE=EXACT | INCLUSIVE | UNIQUE
{,AREA=area_name | index_distribution_list}
{,TERMS=({, INDEX_NULLS=NO | YES}
{,INSERT_METHOD=RANDOM | SEQUENTIAL |
MOSTLY_SEQUENTIAL}
{,SIZE=n | m:n})}
{,REFERENCES=({,INSERT_METHOD=RANDOM |
SEQUENTIAL | MOSTLY_SEQUENTIAL}
{,PROXIMITY=WORD | CONTEXT | NONE})};
Parameter:
Only the AREA parameter as expanded to accommodate partitioned indexing is described
below. For information about all other parameters of the INDEX definition, see
Database Definition and Development, “SDM Definitions.”
AREA=area_name | index_distribution_list
(Optional)
area_name
:
identifier
(1 to 15 characters long)
index_distribution_list
:
(index_dist_spec {,index_dist_spec}0:48
,ALL_OTHER_TERMS=area_name)
index_dist_spec
:
high_key=area_name
high_key
:
numeric | char_cons
(less than or equal to 250
characters)
Identifies the area in which the index is placed or the list of areas into which the index is
being partitioned. Valid characters for area_name or index_distribtion_list are letters,
digits, and underscores.
If the AREA parameter is omitted, the index is placed in the DEFAULT_INDEX_AREA.
878  Partitioned Index for Very Large Database
Key Points

Each entry in an index distribution list specifies the highest term value (high_key) for
a partition (area) in which index nodes will be stored and names the area.

The ALL_OTHER_TERMS parameter in an index distribution list identifies the area
into which index nodes will be stored for all terms other than those covered by the
high_keys specified.

When the high key of a character index is shorter than a term, the key is padded with
blanks for comparison purposes. Depending on the collating sequence, this could
lead to unexpected results. For example, if a partition is to contain all terms
beginning with J and SIZE=6, the key of 'JZZZZZ' would be sufficient if Z is the
highest character in the collating sequence of those characters in the term. A key of
'J' would not be satisfactory if Z is not the highest character in the collating sequence
in use or if the index terms are not RAISEd (z is higher than Z in ASCII tables).

An area may not be referenced more than once on the area distribution list.

The order of entries in the list is unimportant; the index build routines will order it.

Each area must be assigned to a different file.
Restrictions
Attempts to partition the following types of indexes will be blocked at APPLY time:

Partitioned numeric string indexes are not supported.

The primary key index for a record with an indirect index structure cannot be
partitioned.

Partitioned index support allows an existing index to be partitioned without
dump/reload if a DMR DROP/CREATE can be performed on that index. Note,
however, that DMR DROP/CREATE cannot be performed on indirect primary key
indexes.

An index for a system-generated key field—for example,USAGE=SYSTEM_KEY or
USAGE=DATE_KEY—cannot be partitioned.

Partitioned index definition is supported only in DMDBA statement mode.

Partitioned index parameters can be updated only in DMDBA statement mode.
Partitioned Index for Very Large Database  879
Examples
1.
Create an inclusive index for the ABSTRACT field in the REF record. Make three
partitions: the first for terms up through JZZZZZ, the second for other terms up
though RZZZZZ, and the third for all other terms.
INDEX=REF.ABSTRACT, TYPE=INCLUSIVE,
AREAS=('JZZZZZ'=AREA1,
'RZZZZZ'=AREA2,
ALL_OTHER_TERMS=AREA3) . . .;
Index terms (and their references) that are less than or equal to the key 'JZZZZZ' will be
stored in AREA1. Terms starting with the letters 'K' through 'RZZZZZ' will be stored in
AREA2. Terms greater than the key 'RZZZZZ' will be stored in AREA3. This example
assumes that a blank collates lower than "Z" and term values are RAISEd to uppercase.
2.
Create an exact index for the SALARY field in the EMPLOYEE record. Make four
partitions:
INDEX=EMPLOYEE.SALARY, TYPE=EXACT,+
AREAS=(1000.00=AREA1,+
2000.00=AREA2,+
3000.00=AREA3,+
ALL_OTHER_TERMS=AREA4) . . .;
This results in salaries of
0000.00:1000.00 going into AREA1
1000.01:2000.00 going into AREA2
2000.01:3000.00 going into AREA3
3000.01:upwards going into AREA4
The index precision is applied to numeric values in determining the index partition.
880  Partitioned Index for Very Large Database
Creating a Partitioned Index
The following steps describe how to create a partitioned index:
1.
Back up the entire database; that is, create save copies of the Definition Database
(DDB) file, Record Database (RDB) files and journal files.
2.
If the index currently exists, use DMR to drop the index.
3.
Use DMDBA in statement mode to change the INDEX definition. Also define any
new areas or database files needed because of the index partition.
4.
If the index was dropped in step 2, use DMR to create the index. The amount of
work space required to create the index can be reduced by creating the index by
partition number. For information, see “DMR Restructure” in the Utilities
Reference section. The time required for this step may be reduced by the
Merge/Split functionality described in the "Disk Space Considerations for DMQ, 1
DMR, and HVU" section. For information about the Merge/Split functionality, see
“Utilities Reference.”
5.
Back up the entire database; that is, create save copies of the Definition Database
(DDB) file, Record Database (RDB) files and journal files.
1
DMQ is not applicable to Windows.
Example:
UNIX, VMS
1.
Given the following DDL segment for an existing database, TEST1, partition the
ABSTRACT index. Move the index into its own set of three areas, and split the
index so that terms that come before or are equal to MZZZ are in one partition, terms
that come after MZZZ through SZZZ are in a second partition, and terms that come
after SZZZ are in a third partition.
ACTUAL_DATA_MODEL;
*
RECORD=REC1, STYLE=CONTINUOUS, PRIMARY_KEY=KEY;
*
FIELD=KEY, UNIQUE=YES, OCCURS=1:1, TYPE=INTEGER;
*
FIELD=ABSTRACT, SORT_SIZE=200, OCCURS=0:1, +
CONTEXT_PARSER=SENTENCE,USAGE=TEXT_STREAM;
*
*
Partitioned Index for Very Large Database  881
STRUCTURAL_DATA_MODEL;
*
AREA=DMAREA, ALLOCATION=DYNAMIC, +
DEFAULT_RECORD_AREA=YES, DEFAULT_INDEX_AREA=YES, +
DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575;
*
AREA=DMQAREA, ALLOCATION=DYNAMIC, +
DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, +
DEFAULT_QUEUE_AREA=YES, PAGES_IN_AREA=1048575;
*
FILE(1)='/patha/directory/filename.type', +
AREAS=(DMAREA), BACKUP_CYCLES=10;
FILE(63)='/pathq/directory/filename.type', +
AREAS=(DMQAREA), BACKUP_CYCLES=10;
*
INDEX=REC1.KEY, TYPE=UNIQUE, +
TERMS=(INSERT_METHOD=RANDOM), AREA=DMAREA;
*
INDEX=REC1.ABSTRACT, TYPE=INCLUSIVE,+
TERMS=(INSERT_METHOD=RANDOM,SIZE=1:200),+
REFERENCES=(INSERT_METHOD=RANDOM,PROXIMITY=WORD),+
AREA=DMAREA;
*
RECORD_STORAGE=REC1, AREAS=DMAREA, +
QUEUE_AREA=DMQAREA, UPDATE=IMMEDIATE;
*
*
This process will take 5 steps as described below:
A.
Backup the entire database.
B.
Because the ABSTRACT index currently exists, use DMR to drop it.
C.
Update the TEST1 database definition so that it looks like this:
882  Partitioned Index for Very Large Database
*
STRUCTURAL_DATA_MODEL;
*
* Add three new areas for the index partitions
AREA=A11, ALLOCATION=DYNAMIC, +
DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, +
DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575;
AREA=A12, ALLOCATION=DYNAMIC, +
DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, +
DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575;
AREA=A13, ALLOCATION=DYNAMIC, +
DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, +
DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575;
*
* The index partitions cannot be within the same file.
* Define three new files.
FILE(11)='/pathb/directory/filename.type', +
AREAS=(A11), BACKUP_CYCLES=10;
FILE(12)='/pathc/directory/filename.type', +
AREAS=(A12), BACKUP_CYCLES=10;
FILE(13)='/pathd/directory/filename.type', +
AREAS=(A13), BACKUP_CYCLES=10;
*
* Update the ABSTRACT index with 3 partitions
INDEX=REC1.ABSTRACT, TYPE=INCLUSIVE,+
TERMS=(INSERT_METHOD=RANDOM,SIZE=1:200),+
REFERENCES=(INSERT_METHOD=RANDOM,PROXIMITY=WORD),+
AREAS=('MZZZ'=A11, 'SZZZ'=A12, +
ALL_OTHER_TERMS=A13);
*
D.
Using DMR, create the partitioned index for ABSTRACT.
E.
Back up the entire new database.
Partitioned Index for Very Large Database  883
Associated Functionality
Partitioned indexing affects what is displayed as a result of running DMR with
ACTION=SHOW and affects the activity of the HVU_MRG_SPLITS option used in
OPEN/DIRECT operations of DMR, DMQ, 1 and HVU.
1
DMQ is not applicable to Windows.
DMR SHOW
With ACTION=SHOW and TRACE=FULL, DMR will describe the term and reference
statistics of each partition defined for the index specified with the INDEX parameter.
DMR CREATE
The INDEX= parameter supports creating partitioned indexed by partition number. This
can reduce the work space required for the creation of large partitioned indexes.
Merge/Splits
The HVU_MRG_SPLITS option used in OPEN/DIRECT operations of DMR, DMQ, 1
and HVU has been extended to allow splitting and merging of sort work files for numeric
indexes. This feature improves sorting and may be necessary if the sort work file would
exceed the 2 gigabyte limit on files. Typically numeric terms do not present a problem
during HVU and DMQ loading. Use of numeric merge/splits is expected when building
or rebuilding a numeric index with DMR. For more information about merging and
splitting sort work files, see “Utilities Reference.”
1
DMQ is not applicable to Windows.
884  Partitioned Index for Very Large Database
Restrictions

Because only one type merge/split may be specified, you cannot use numeric
merge/splits and character splits at the same time.

You also cannot specify multiple data types of numeric splits.

Individual DMR ACTION=CREATE steps may be necessary to create all the
indexes for a record using the merge/split functionality.
Usage
Assignment of the environment variable HVU_MRG_SPLITS:
The format is a pair of values delimited by a comma. The first value in the pair is a
storage code, and the second is the number of buckets to be allocated. For example:
E4,4
creates 4 buckets that contain 4 byte Exact_binary terms. The maximum number of
buckets is 10.
Note: For information about environment variables, see BASIS Reference,
“Environment Variables.”
Like character merge/split files, numeric buckets may be allocated to different devices to
distribute I/O load or space requirements. Numeric merge/split file naming conventions
are as follows:
Partitioned Index for Very Large Database  885
The sort work input and output file names created for numeric merge/split buckets
have the pattern HVLSIdd? and HVLSOdd?
where:
I and O
The "I" identifies sort input files, and the "O" identifies sort output
files.
dd
The "dd" is the storage code identified with a data type as listed
below:
?
1
StorageCode
Data Type
Contains
IV
Exact Binary
(integer)
4 or 8 bytes *
E8
Exact Binary
(long integer)
8 bytes
RV
Real Precision
4 or 8 bytes 1
DV
Double Precision
8 or 16 bytes 1
AX
Extended Precision
16 bytes 1
P4
Exact Decimal
(4-byte packed)
4 bytes
P8
Exact Decimal
(8-byte packed)
8 bytes
PX
Exact Decimal
(16-byte packed)
16 bytes
This is the bucket sequence number (0,1,2,...9)
Byte lengths differ respectively on 32- and 64-bit machines.
886  Partitioned Index for Very Large Database
Download