Appendix C Partitioned Index for Very Large Database Overview Until Release L1G of BASIS, an index could be stored in only one host file. This limited the size of the index to 2 gigabytes (or less) on most host systems. This limitation was a problem for very large databases and for databases having most of their index terms generated from a single field. To alleviate this problem, the AREA parameter of the SDM INDEX definition has been expanded to let you specify either an area name or an index distribution list. An index distribution list enables the index space to be distributed across several host files according to index term value. This appendix describes the expanded syntax of the INDEX AREA parameter, provides examples of DDL using the new parameter, describes how the partitioned index is enabled, and describes how a partitioned index is used during loading and indexing. The appendix also describes the effect of a partitioned index on the DMR SHOW action and on the HVU_MRG_SPLITS option that enables the splitting and merging of sort work files for numeric indexes. Partitioned Index for Very Large Database 877 Partitioned Index Partitioned indexing allows a large index to be distributed across several host files. The files where index terms of a partitioned index are stored is determined by the index distribution list as defined in the INDEX definition in the Structural Data Model. INDEX Definition Syntax: INDEX index, TYPE=EXACT | INCLUSIVE | UNIQUE {,AREA=area_name | index_distribution_list} {,TERMS=({, INDEX_NULLS=NO | YES} {,INSERT_METHOD=RANDOM | SEQUENTIAL | MOSTLY_SEQUENTIAL} {,SIZE=n | m:n})} {,REFERENCES=({,INSERT_METHOD=RANDOM | SEQUENTIAL | MOSTLY_SEQUENTIAL} {,PROXIMITY=WORD | CONTEXT | NONE})}; Parameter: Only the AREA parameter as expanded to accommodate partitioned indexing is described below. For information about all other parameters of the INDEX definition, see Database Definition and Development, “SDM Definitions.” AREA=area_name | index_distribution_list (Optional) area_name : identifier (1 to 15 characters long) index_distribution_list : (index_dist_spec {,index_dist_spec}0:48 ,ALL_OTHER_TERMS=area_name) index_dist_spec : high_key=area_name high_key : numeric | char_cons (less than or equal to 250 characters) Identifies the area in which the index is placed or the list of areas into which the index is being partitioned. Valid characters for area_name or index_distribtion_list are letters, digits, and underscores. If the AREA parameter is omitted, the index is placed in the DEFAULT_INDEX_AREA. 878 Partitioned Index for Very Large Database Key Points Each entry in an index distribution list specifies the highest term value (high_key) for a partition (area) in which index nodes will be stored and names the area. The ALL_OTHER_TERMS parameter in an index distribution list identifies the area into which index nodes will be stored for all terms other than those covered by the high_keys specified. When the high key of a character index is shorter than a term, the key is padded with blanks for comparison purposes. Depending on the collating sequence, this could lead to unexpected results. For example, if a partition is to contain all terms beginning with J and SIZE=6, the key of 'JZZZZZ' would be sufficient if Z is the highest character in the collating sequence of those characters in the term. A key of 'J' would not be satisfactory if Z is not the highest character in the collating sequence in use or if the index terms are not RAISEd (z is higher than Z in ASCII tables). An area may not be referenced more than once on the area distribution list. The order of entries in the list is unimportant; the index build routines will order it. Each area must be assigned to a different file. Restrictions Attempts to partition the following types of indexes will be blocked at APPLY time: Partitioned numeric string indexes are not supported. The primary key index for a record with an indirect index structure cannot be partitioned. Partitioned index support allows an existing index to be partitioned without dump/reload if a DMR DROP/CREATE can be performed on that index. Note, however, that DMR DROP/CREATE cannot be performed on indirect primary key indexes. An index for a system-generated key field—for example,USAGE=SYSTEM_KEY or USAGE=DATE_KEY—cannot be partitioned. Partitioned index definition is supported only in DMDBA statement mode. Partitioned index parameters can be updated only in DMDBA statement mode. Partitioned Index for Very Large Database 879 Examples 1. Create an inclusive index for the ABSTRACT field in the REF record. Make three partitions: the first for terms up through JZZZZZ, the second for other terms up though RZZZZZ, and the third for all other terms. INDEX=REF.ABSTRACT, TYPE=INCLUSIVE, AREAS=('JZZZZZ'=AREA1, 'RZZZZZ'=AREA2, ALL_OTHER_TERMS=AREA3) . . .; Index terms (and their references) that are less than or equal to the key 'JZZZZZ' will be stored in AREA1. Terms starting with the letters 'K' through 'RZZZZZ' will be stored in AREA2. Terms greater than the key 'RZZZZZ' will be stored in AREA3. This example assumes that a blank collates lower than "Z" and term values are RAISEd to uppercase. 2. Create an exact index for the SALARY field in the EMPLOYEE record. Make four partitions: INDEX=EMPLOYEE.SALARY, TYPE=EXACT,+ AREAS=(1000.00=AREA1,+ 2000.00=AREA2,+ 3000.00=AREA3,+ ALL_OTHER_TERMS=AREA4) . . .; This results in salaries of 0000.00:1000.00 going into AREA1 1000.01:2000.00 going into AREA2 2000.01:3000.00 going into AREA3 3000.01:upwards going into AREA4 The index precision is applied to numeric values in determining the index partition. 880 Partitioned Index for Very Large Database Creating a Partitioned Index The following steps describe how to create a partitioned index: 1. Back up the entire database; that is, create save copies of the Definition Database (DDB) file, Record Database (RDB) files and journal files. 2. If the index currently exists, use DMR to drop the index. 3. Use DMDBA in statement mode to change the INDEX definition. Also define any new areas or database files needed because of the index partition. 4. If the index was dropped in step 2, use DMR to create the index. The amount of work space required to create the index can be reduced by creating the index by partition number. For information, see “DMR Restructure” in the Utilities Reference section. The time required for this step may be reduced by the Merge/Split functionality described in the "Disk Space Considerations for DMQ, 1 DMR, and HVU" section. For information about the Merge/Split functionality, see “Utilities Reference.” 5. Back up the entire database; that is, create save copies of the Definition Database (DDB) file, Record Database (RDB) files and journal files. 1 DMQ is not applicable to Windows. Example: UNIX, VMS 1. Given the following DDL segment for an existing database, TEST1, partition the ABSTRACT index. Move the index into its own set of three areas, and split the index so that terms that come before or are equal to MZZZ are in one partition, terms that come after MZZZ through SZZZ are in a second partition, and terms that come after SZZZ are in a third partition. ACTUAL_DATA_MODEL; * RECORD=REC1, STYLE=CONTINUOUS, PRIMARY_KEY=KEY; * FIELD=KEY, UNIQUE=YES, OCCURS=1:1, TYPE=INTEGER; * FIELD=ABSTRACT, SORT_SIZE=200, OCCURS=0:1, + CONTEXT_PARSER=SENTENCE,USAGE=TEXT_STREAM; * * Partitioned Index for Very Large Database 881 STRUCTURAL_DATA_MODEL; * AREA=DMAREA, ALLOCATION=DYNAMIC, + DEFAULT_RECORD_AREA=YES, DEFAULT_INDEX_AREA=YES, + DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575; * AREA=DMQAREA, ALLOCATION=DYNAMIC, + DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, + DEFAULT_QUEUE_AREA=YES, PAGES_IN_AREA=1048575; * FILE(1)='/patha/directory/filename.type', + AREAS=(DMAREA), BACKUP_CYCLES=10; FILE(63)='/pathq/directory/filename.type', + AREAS=(DMQAREA), BACKUP_CYCLES=10; * INDEX=REC1.KEY, TYPE=UNIQUE, + TERMS=(INSERT_METHOD=RANDOM), AREA=DMAREA; * INDEX=REC1.ABSTRACT, TYPE=INCLUSIVE,+ TERMS=(INSERT_METHOD=RANDOM,SIZE=1:200),+ REFERENCES=(INSERT_METHOD=RANDOM,PROXIMITY=WORD),+ AREA=DMAREA; * RECORD_STORAGE=REC1, AREAS=DMAREA, + QUEUE_AREA=DMQAREA, UPDATE=IMMEDIATE; * * This process will take 5 steps as described below: A. Backup the entire database. B. Because the ABSTRACT index currently exists, use DMR to drop it. C. Update the TEST1 database definition so that it looks like this: 882 Partitioned Index for Very Large Database * STRUCTURAL_DATA_MODEL; * * Add three new areas for the index partitions AREA=A11, ALLOCATION=DYNAMIC, + DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, + DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575; AREA=A12, ALLOCATION=DYNAMIC, + DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, + DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575; AREA=A13, ALLOCATION=DYNAMIC, + DEFAULT_RECORD_AREA=NO, DEFAULT_INDEX_AREA=NO, + DEFAULT_QUEUE_AREA=NO, PAGES_IN_AREA=1048575; * * The index partitions cannot be within the same file. * Define three new files. FILE(11)='/pathb/directory/filename.type', + AREAS=(A11), BACKUP_CYCLES=10; FILE(12)='/pathc/directory/filename.type', + AREAS=(A12), BACKUP_CYCLES=10; FILE(13)='/pathd/directory/filename.type', + AREAS=(A13), BACKUP_CYCLES=10; * * Update the ABSTRACT index with 3 partitions INDEX=REC1.ABSTRACT, TYPE=INCLUSIVE,+ TERMS=(INSERT_METHOD=RANDOM,SIZE=1:200),+ REFERENCES=(INSERT_METHOD=RANDOM,PROXIMITY=WORD),+ AREAS=('MZZZ'=A11, 'SZZZ'=A12, + ALL_OTHER_TERMS=A13); * D. Using DMR, create the partitioned index for ABSTRACT. E. Back up the entire new database. Partitioned Index for Very Large Database 883 Associated Functionality Partitioned indexing affects what is displayed as a result of running DMR with ACTION=SHOW and affects the activity of the HVU_MRG_SPLITS option used in OPEN/DIRECT operations of DMR, DMQ, 1 and HVU. 1 DMQ is not applicable to Windows. DMR SHOW With ACTION=SHOW and TRACE=FULL, DMR will describe the term and reference statistics of each partition defined for the index specified with the INDEX parameter. DMR CREATE The INDEX= parameter supports creating partitioned indexed by partition number. This can reduce the work space required for the creation of large partitioned indexes. Merge/Splits The HVU_MRG_SPLITS option used in OPEN/DIRECT operations of DMR, DMQ, 1 and HVU has been extended to allow splitting and merging of sort work files for numeric indexes. This feature improves sorting and may be necessary if the sort work file would exceed the 2 gigabyte limit on files. Typically numeric terms do not present a problem during HVU and DMQ loading. Use of numeric merge/splits is expected when building or rebuilding a numeric index with DMR. For more information about merging and splitting sort work files, see “Utilities Reference.” 1 DMQ is not applicable to Windows. 884 Partitioned Index for Very Large Database Restrictions Because only one type merge/split may be specified, you cannot use numeric merge/splits and character splits at the same time. You also cannot specify multiple data types of numeric splits. Individual DMR ACTION=CREATE steps may be necessary to create all the indexes for a record using the merge/split functionality. Usage Assignment of the environment variable HVU_MRG_SPLITS: The format is a pair of values delimited by a comma. The first value in the pair is a storage code, and the second is the number of buckets to be allocated. For example: E4,4 creates 4 buckets that contain 4 byte Exact_binary terms. The maximum number of buckets is 10. Note: For information about environment variables, see BASIS Reference, “Environment Variables.” Like character merge/split files, numeric buckets may be allocated to different devices to distribute I/O load or space requirements. Numeric merge/split file naming conventions are as follows: Partitioned Index for Very Large Database 885 The sort work input and output file names created for numeric merge/split buckets have the pattern HVLSIdd? and HVLSOdd? where: I and O The "I" identifies sort input files, and the "O" identifies sort output files. dd The "dd" is the storage code identified with a data type as listed below: ? 1 StorageCode Data Type Contains IV Exact Binary (integer) 4 or 8 bytes * E8 Exact Binary (long integer) 8 bytes RV Real Precision 4 or 8 bytes 1 DV Double Precision 8 or 16 bytes 1 AX Extended Precision 16 bytes 1 P4 Exact Decimal (4-byte packed) 4 bytes P8 Exact Decimal (8-byte packed) 8 bytes PX Exact Decimal (16-byte packed) 16 bytes This is the bucket sequence number (0,1,2,...9) Byte lengths differ respectively on 32- and 64-bit machines. 886 Partitioned Index for Very Large Database