CA IDMS Index Orphans One of the more misunderstood aspects of CA IDMS indexing is the creation and impact of orphan records. This document will describe the creation of orphans as part of the process of the splitting of SR8 records, how they impact index performance, resolution of the orphan condition, and ways to minimize occurrences of the creation of orphans within high activity indexes. Splitting SR8 Records To understand the reason index orphans are created requires a basic knowledge of the way CA IDMS inserts data records into a sorted index structure. To describe this process we will consider a systemowned index on an EMPLOYEE record where the index is sorted on the employee’s last name. The index structure is composed of an owner record known as an SR7 record. The index’s SR7 record is stored into a database area as a CALC record where the name assigned to the index is used as the record’s calckey. The SR7 record owns SR8 records. SR8 records for sorted index sets are linked to the owning SR7 record occurrence through standard NEXT, PRIOR, and OWNER pointers and are also maintained in a binary tree structure used to search for data record occurrences based on the data record’s index key as defined for the index. The following diagram is a simple representation of an SR8 record for an index defined with an order of SORTED. Diagram 1 Fixed portion E N T R Y 1 E N T R Y 2 E N T R Y 3 C U S H I O N 1 The ‘fixed portion’ of an SR8 contains the NEXT, PRIOR, and OWNER pointers to connect the SR8 to the index’s owner which in our example will be an SR7 record. It also contains a number of flags defining some of the characteristics of the index set and numeric fields defining the contents of the SR8 occurrence. Included in this portion of the SR8 is a field known as the orphan count. We will examine how this field is used a little later in this document. The boxes below the ‘fixed portion’ represent entries that will contain information about records considered lower in the binary tree structure of the index. The number of entries within an SR8 is determined by the Index Block Count (IBC) as defined for the index. In the above example the IBC for the index would have been defined as a 3. When inserting a record into an index, the target index may already be full. In that case an additional entry is used during the insertion process known as the ‘cushion’ and will only contain data during the split of the target SR8. The contents of these entries is going to be determined based on whether the SR8 is considered to be a Level-0 or an intermediate SR8. A Level-0 SR8, or also known as a low level SR8, is an SR8 record occurrence that points to the data record occurrences that are being indexed. For Level 0 SR8s an entry contains the symbolic index key contained within the data record occurrence and a dbkey pointer to that record. All other SR8 records in the structure are considered to be intermediate SR8s. Entries in an intermediate SR8 point to another SR8 occurrence that resides in a lower level of the index structure. The symbolic index key in the entry will be the highest or lowest key value in the lower level SR8 depending on whether the index is sorted as ascending or descending respectively. The dbkey pointer in the entry is the dbkey assigned to the lower level SR8. Diagram 2 represents our index set after 3 EMPLOYEE records have been added to the structure. For the sake of clarity only the index’s NEXT pointer and the pointers contained within the SR8’s entries have been included. Diagram 2 SR7 SR8 A A R N E ARNE C R U Z CRUZ W A T S WATS Notice in the above diagram that an additional set of pointers represented by the red line have been added. These pointers are the INDEX pointers or more commonly referred to as UP pointers. They are optional on system-owned index sets but are required for any user-owned index. Each record’s UP pointer is intended to point to the SR8 record occurrence in which that record’s entry resides. Since the SR8 records are maintained in a binary tree structure each SR8 also contains an UP pointer to the SR8 on the next highest level in which its entry resides. It is the presence of these UP pointers that results in orphan records being created. Our index structure now contains one Level-0 SR8 which can be considered to be full because we said our IBC was 3 and the level-0 SR8 contains 3 entries. A fourth record is to be added with a key of ‘Dunn’. To accomplish this CA IDMS will use the SR8’s cushion to insert the record temporarily resulting in SR8 A containing 4 entries. Since this exceeds the IBC of 3 the entries for ARNE and CRUZ will remain in SR8 A and a new Level-0 SR8, B, will be created to contain the entries for DUNN and WATS. The creation of the second Level-0 SR8 also requires CA IDMS to create an intermediate SR8 on level 1 which we will designate as SR8 C. Diagram 3 shows the results of the insertion of the fourth data record. The UP pointers within the SR8 records are omitted from the structure to simplify the diagram. SR7 Diagram 3 SR8 C C R U Z SR8 A A R N E ARNE O=0 W A T S O=1 SR8 B C R U Z D U N N CRUZ DUNN O=0 W A T S WATS When CA IDMS splits an SR8 record it does not update the UP pointer in each record occurrence whose index entry was moved to a new SR8. To do this would create a considerable performance problem. In the preceding diagram notice that the UP pointer in the data record with the key WATS still points to SR8 A even though that data record’s index entry now resides in SR8 B. The record with the key value of WATS is now considered to be an orphan record because it does not point at the SR8 containing its index entry. Also added into the diagram is the orphan count for each SR8 record in the structure. This orphan count is maintained within each SR8’s header portion. SR8 A now has an orphan count of 1 since one record in the index structure points to it but the corresponding index entry does not reside within that SR8. It is conceivable that over the course of time subsequent processing will result in the data record occurrences with the keys ARNE and CRUZ being deleted from the database. Such actions would leave SR8 A with no entries but it would still have an orphan count of 1 within its header. In this case SR8 A would remain within the index structure until all of its orphans have their UP pointer resolved. In this type of situation SR8 A would be referred to an ‘orphan-only’ SR8 and only the header portion of the SR8 would be retained within the database. Orphan-only SR8 occurrences are never used to hold future records inserted into the index. Diagram 4 shows what the database would look like given this scenario. SR7 Diagram 4 SR8 C O=0 W A T S SR8 A O=1 SR8 B D U N N DUNN O=0 W A T S WATS Performance Impact of Orphans In many instances the presence of orphans has no impact on the performance of the DBMS against the index structure. For instance, if an attempt is made to read a data record using the record’s index key the search for the record starts at the top of the index and uses a binary search to pinpoint the SR8 in which the data record’s index entry resides. In this case the fact that the desired record may be an orphan has no direct impact on the search. Performance is negatively impacted when an orphan record is accessed through some other means and then an attempt is made to walk the index. Assuming the data records in our example are stored using a location mode of CALC let’s examine what the DBMS must do to perform the following sequence of commands when the first command accesses the data record containing the value WATS using the database structure as represented by Diagram 3. OBTAIN CALC EMPLOYEE. OBTAIN NEXT EMPLOYEE WITHIN EMPLOYEE-INDEX. During the processing of the OBTAIN CALC command CA IDMS will use whatever fields are defined as the calckey for the record type and retrieve the record. Currencies for sets in which the record occurrence participates will be established based on the dbkeys within the record’s prefix. Since the EMPLOYEE record participates within the EMPLOYEE-INDEX set, the current data record of that set will be set to the dbkey of the WATS occurrence and the UP pointer in the record’s prefix will be used to establish the current SR8 for the set. Since the WATS occurrence is an orphan the dbkey that will be used will be the dbkey of SR8 A. When the OBTAIN NEXT command is issued to walk the index set CA IDMS must first locate the index entry for the current data record of the set. To accomplish this it will use the SR8 dbkey saved as the current SR8 which in our example would be the dbkey for SR8 A. SR8 A will then be searched to locate the entry for data record WATS which will not be found. The orphan count in the header is interrogated and found to be larger than 0 which means that a record someplace in the database points to SR8 A but whose entry reside in another SR8. CA IDMS will then use the SR8’s NEXT pointer to locate the next SR8 in the index which in our example is SR8 B. SR8 B is then searched and the entry for the record with the key value of WATS is located. The DBMS then tries to get the next entry in the index and realizes that none exists and an error status of 0307 (End-Of-Set) is returned to the requesting program. In this case the added overhead to determine that we were at the end of the set was an extra read of SR8 A. However it is possible that SR8 B may have been subsequently split moving the entry for WATS to an SR8 record later in the structure. In that case the DBMS would have had to read SR A and SR8 B before finding the SR8 containing the entry for WATS. In indexes with a high volume of random insertions the number SR8s that must be accessed can quickly multiply. However CA IDMS does attempt to resolve orphan conditions which can result in additional run-time overhead. Resolving Orphans It is the responsibility of the DBMS to clean up orphan records whenever possible. We will again visit the scenario created by the following sequence of instructions. OBTAIN CALC EMPLOYEE. OBTAIN NEXT EMPLOYEE WITHIN EMPLOYEE-INDEX. Once again the record occurrence to be accessed using the OBTAIN CALC command will be the EMPLOYEE record whose index key is WATS. However this time the database transaction will have readied its database areas in an UPDATE mode. The processing described earlier remains the same until the DBMS determines that the index entry for the data record does not exist within SR8 A. CA IDMS identifies that the areas are readied in update and then decrements the orphan count in SR8 A by 1. It then continues walking the set of SR8 records searching for the SR8 record containing the index entry for the WATS record occurrence. When the entry is found in SR8 B, the DBMS will update the UP pointer in the data record occurrence so that it now points to SR8 B resolving the orphan condition. If the database had the orphan-only structure described by Diagram 4, the DBMS would have recognized the fact that the orphan count of SR8 A was now zero and no index entries existed with the SR8. SR8 A would then be erased from the database. This processing is transparent o the program issuing the OBTAIN NEXT command. However although the DBMS is cleaning up an undesirable situation, the process does add additional overhead into transactions running in an update mode. The other drawback is that orphans are being resolved one at a time. It is very possible that in high volume indexes the number of orphans being created may exceed the number of times that existing orphans are being cleaned up creating a significant negative performance impact. There are a few methods that can be used to provide a more extensive clean-up of the orphans within an index structure. Some sites have created user written programs that navigate the database using logic that would force the DBMS into orphan addition scenarios. This usually would involve a loop that was driven by an area sweep of the indexed data records. The following is an example of some pseudocode that might be used for this purpose. LOOP1. FIND NEXT EMPLOYEE WITHIN EMP-DEMO-REGION. IF ERROR-STATUS = ‘0307’ GO TO LOOP-EXIT. MOVE DBKEY TO SAVE-DBKEY. FIND NEXT EMPLOYEE WITHIN EMPLOYEE-INDEX. FIND EMPLOYEE DB-KEY IS SAVE-DBKEY. GO TO LOOP1. LOOP-EXIT. A user-written program will provide a good reduction of the number of data records that have been orphaned but it will typically not reduce the number of SR8 records that may have been orphaned at various levels of the index structure. If run against the database in a SHARED UPDATE mode within a CV, it will also generate significant record locks that may lead to contention issues if run with transactions concurrently attempting to access the same database. This could lead to slowed processing or an increase in deadlock situations. Finally this method has little impact on reducing the number of levels maintained by the index or resolving any other performance issues that may exist within the structure. As a result sites with indexes that exhibit significant orphan issues typically turn to CA IDMS utilities to improve the performance characteristics of their indexes. Historically this was done by periodically using the MAINTAIN INDEX utility using the REBUILD FROM INDEX option. MAINTAIN INDEX would extract the symbolic index keys and dbkeys from the existing index, delete the existing index structure, and rebuild the index from scratch. Each data record that contained an UP pointer would have that pointer set to the SR8 occurrence in which its entry resided so that no orphans would exist at the conclusion of the process. This process would also result in a minimal number of levels SR8 records being created. The major drawback of this utility is that it required exclusive control of the index strictures and would have to be performed at some time when no other processing was occurring against the database. Starting with Release 17.0, the TUNE INDEX utility was enhanced to provide extensive statistics concerning the state of an index including the numbers of records that are orphaned. A number of tuning options have also been introduced. Regardless of the options selected, any execution of TUNE INDEX will first adopt all orphaned records whether they are data records or SR8 records within the index structure. TUNE INDEX was also designed to contain many features that allowed the tuning of indexes to be run concurrently through a Central Version with other transactions using the indexes being processed with minimal impact. Because of this ability, many sites have implemented the TUNE INDEX utility as part of regularly scheduled processing without having to worry about outage times for the target databases. Minimizing Orphans The creation of orphan records when an SR8 record within a sorted index is split cannot be avoided. However many sites have implemented a simple procedure that delays the onset of orphan creation for a time. This procedure can be implemented when using either the MAINTAIN INDEX or TUNE INDEX utilities to tune an index or when the index is initially loaded. For this discussion we will assume that our target index will have an IBC of 100 entries per SR8. This is the value that has been determined to be required at run time. Indexing is very flexible when it comes to specifying an index’s IBC and the IBC specification can be changed at anytime without having to rebuild the entire index. If an index’s IBC is increased after the index is built CA IDMS will simply continue expanding the size of each SR8 until the new IBC value is exceeded or additional space is not available on the SR8 record’s page. Our index that requires an IBC of 100 has been scheduled to be rebuilt for some reason which may be that excessive orphan counts have developed over time. If the DBA were to rebuild the index using an IBC of 100, all level-0 SR8s would contain 100 entries except for possibly the last SR8 in the structure. As a result the first time a new data record occurrence is added to the index a split will occur and approximately 50 orphans will be created. This can be avoided by reducing the IBC used to rebuild the index and applying a PAGE RESERVE to the area in which SR8s reside during the rebuild process. This does require that a separate schema/subschema and/or DMCL be created and used for the rebuild of the index. For this example we opt to define our rebuild environment to use an IBC of 80. Let’s also assume that the index has been sized to allow for 4 full size SR8 records per page. We would determine the difference in the size of an SR8 containing 100 entries and an SR8 that contains 80 entries. Since we expect 4 SR8s per page we would multiply the calculated difference by 4. The resulting value would be used as the PAGE RESERVE specification in the new DMCL used to rebuild the index. After the rebuild is completed, the run-time environment would continue to use the schema, subschema, and DMCL that define the IBC as 100 and that does not have a PAGE RESERVE assigned to the index’s area. By doing this, you will have created an index structure where up to 20 new records can be stored into each SR8 in the structure before a spilt occurs. By delaying the splitting of SR8s you will also be delaying the creation of orphan records within the database. Creation of an alternate schema, subschema, or DMCL is necessary when using the MAINTAIN INDEX utility to rebuild the index or when doing an initial load with a user-written program. One of the enhancements made to the TUNE INDEX is the ability to specify an alternate IBC and PAGE RESERVE as parameters to the utility to only be used during the tuning operation. This eliminates the need to define any alternate schemas, subschemas, or DMCLs for the tuning process, greatly simplifying the procedure. Conclusion The creation of orphaned indexed records is the result of insuring that the updating of indexes is as efficient as possible. It eliminates the need to update numerous data records’ UP pointers when the index entries for those record occurrences are moved to a new SR8 record as a result of an SR8 split. The cost of this performance feature is a potential increase in processing overhead when an index is serially walked. Indexes that do not experience a high number of record insertions typically do not experience significant performance problems as a result of the presence of some orphaned records. However those indexes that do have a large number of record insertions are prone to the creation of large numbers of orphan records. To minimize the possibility of experiencing related performance problems these indexes should be monitored closely and periodically rebuilt or tuned to adopt orphaned records.