CA IDMS Index Orphans

advertisement
CA IDMS Index Orphans
One of the more misunderstood aspects of CA IDMS indexing is the creation and impact of orphan
records. This document will describe the creation of orphans as part of the process of the splitting of
SR8 records, how they impact index performance, resolution of the orphan condition, and ways to
minimize occurrences of the creation of orphans within high activity indexes.
Splitting SR8 Records
To understand the reason index orphans are created requires a basic knowledge of the way CA IDMS
inserts data records into a sorted index structure. To describe this process we will consider a systemowned index on an EMPLOYEE record where the index is sorted on the employee’s last name. The index
structure is composed of an owner record known as an SR7 record. The index’s SR7 record is stored into
a database area as a CALC record where the name assigned to the index is used as the record’s calckey.
The SR7 record owns SR8 records. SR8 records for sorted index sets are linked to the owning SR7 record
occurrence through standard NEXT, PRIOR, and OWNER pointers and are also maintained in a binary
tree structure used to search for data record occurrences based on the data record’s index key as
defined for the index. The following diagram is a simple representation of an SR8 record for an index
defined with an order of SORTED.
Diagram 1
Fixed portion
E
N
T
R
Y
1
E
N
T
R
Y
2
E
N
T
R
Y
3
C
U
S
H
I
O
N
1
The ‘fixed portion’ of an SR8 contains the NEXT, PRIOR, and OWNER pointers to connect the SR8 to the
index’s owner which in our example will be an SR7 record. It also contains a number of flags defining
some of the characteristics of the index set and numeric fields defining the contents of the SR8
occurrence. Included in this portion of the SR8 is a field known as the orphan count. We will examine
how this field is used a little later in this document.
The boxes below the ‘fixed portion’ represent entries that will contain information about records
considered lower in the binary tree structure of the index. The number of entries within an SR8 is
determined by the Index Block Count (IBC) as defined for the index. In the above example the IBC for
the index would have been defined as a 3. When inserting a record into an index, the target index may
already be full. In that case an additional entry is used during the insertion process known as the
‘cushion’ and will only contain data during the split of the target SR8. The contents of these entries is
going to be determined based on whether the SR8 is considered to be a Level-0 or an intermediate SR8.
A Level-0 SR8, or also known as a low level SR8, is an SR8 record occurrence that points to the data
record occurrences that are being indexed. For Level 0 SR8s an entry contains the symbolic index key
contained within the data record occurrence and a dbkey pointer to that record. All other SR8 records
in the structure are considered to be intermediate SR8s. Entries in an intermediate SR8 point to another
SR8 occurrence that resides in a lower level of the index structure. The symbolic index key in the entry
will be the highest or lowest key value in the lower level SR8 depending on whether the index is sorted
as ascending or descending respectively. The dbkey pointer in the entry is the dbkey assigned to the
lower level SR8.
Diagram 2 represents our index set after 3 EMPLOYEE records have been added to the structure. For
the sake of clarity only the index’s NEXT pointer and the pointers contained within the SR8’s entries
have been included.
Diagram 2
SR7
SR8 A
A
R
N
E
ARNE
C
R
U
Z
CRUZ
W
A
T
S
WATS
Notice in the above diagram that an additional set of pointers represented by the red line have been
added. These pointers are the INDEX pointers or more commonly referred to as UP pointers. They are
optional on system-owned index sets but are required for any user-owned index. Each record’s UP
pointer is intended to point to the SR8 record occurrence in which that record’s entry resides. Since the
SR8 records are maintained in a binary tree structure each SR8 also contains an UP pointer to the SR8 on
the next highest level in which its entry resides. It is the presence of these UP pointers that results in
orphan records being created.
Our index structure now contains one Level-0 SR8 which can be considered to be full because we said
our IBC was 3 and the level-0 SR8 contains 3 entries. A fourth record is to be added with a key of ‘Dunn’.
To accomplish this CA IDMS will use the SR8’s cushion to insert the record temporarily resulting in SR8 A
containing 4 entries. Since this exceeds the IBC of 3 the entries for ARNE and CRUZ will remain in SR8 A
and a new Level-0 SR8, B, will be created to contain the entries for DUNN and WATS. The creation of
the second Level-0 SR8 also requires CA IDMS to create an intermediate SR8 on level 1 which we will
designate as SR8 C. Diagram 3 shows the results of the insertion of the fourth data record. The UP
pointers within the SR8 records are omitted from the structure to simplify the diagram.
SR7
Diagram 3
SR8 C
C
R
U
Z
SR8 A
A
R
N
E
ARNE
O=0
W
A
T
S
O=1
SR8 B
C
R
U
Z
D
U
N
N
CRUZ
DUNN
O=0
W
A
T
S
WATS
When CA IDMS splits an SR8 record it does not update the UP pointer in each record occurrence whose
index entry was moved to a new SR8. To do this would create a considerable performance problem. In
the preceding diagram notice that the UP pointer in the data record with the key WATS still points to
SR8 A even though that data record’s index entry now resides in SR8 B. The record with the key value of
WATS is now considered to be an orphan record because it does not point at the SR8 containing its
index entry. Also added into the diagram is the orphan count for each SR8 record in the structure. This
orphan count is maintained within each SR8’s header portion. SR8 A now has an orphan count of 1
since one record in the index structure points to it but the corresponding index entry does not reside
within that SR8.
It is conceivable that over the course of time subsequent processing will result in the data record
occurrences with the keys ARNE and CRUZ being deleted from the database. Such actions would leave
SR8 A with no entries but it would still have an orphan count of 1 within its header. In this case SR8 A
would remain within the index structure until all of its orphans have their UP pointer resolved. In this
type of situation SR8 A would be referred to an ‘orphan-only’ SR8 and only the header portion of the
SR8 would be retained within the database. Orphan-only SR8 occurrences are never used to hold future
records inserted into the index. Diagram 4 shows what the database would look like given this scenario.
SR7
Diagram 4
SR8 C
O=0
W
A
T
S
SR8 A
O=1
SR8 B
D
U
N
N
DUNN
O=0
W
A
T
S
WATS
Performance Impact of Orphans
In many instances the presence of orphans has no impact on the performance of the DBMS against the
index structure. For instance, if an attempt is made to read a data record using the record’s index key
the search for the record starts at the top of the index and uses a binary search to pinpoint the SR8 in
which the data record’s index entry resides. In this case the fact that the desired record may be an
orphan has no direct impact on the search. Performance is negatively impacted when an orphan record
is accessed through some other means and then an attempt is made to walk the index.
Assuming the data records in our example are stored using a location mode of CALC let’s examine what
the DBMS must do to perform the following sequence of commands when the first command accesses
the data record containing the value WATS using the database structure as represented by Diagram 3.
OBTAIN CALC EMPLOYEE.
OBTAIN NEXT EMPLOYEE WITHIN EMPLOYEE-INDEX.
During the processing of the OBTAIN CALC command CA IDMS will use whatever fields are defined as
the calckey for the record type and retrieve the record. Currencies for sets in which the record
occurrence participates will be established based on the dbkeys within the record’s prefix. Since the
EMPLOYEE record participates within the EMPLOYEE-INDEX set, the current data record of that set will
be set to the dbkey of the WATS occurrence and the UP pointer in the record’s prefix will be used to
establish the current SR8 for the set. Since the WATS occurrence is an orphan the dbkey that will be
used will be the dbkey of SR8 A.
When the OBTAIN NEXT command is issued to walk the index set CA IDMS must first locate the index
entry for the current data record of the set. To accomplish this it will use the SR8 dbkey saved as the
current SR8 which in our example would be the dbkey for SR8 A. SR8 A will then be searched to locate
the entry for data record WATS which will not be found. The orphan count in the header is interrogated
and found to be larger than 0 which means that a record someplace in the database points to SR8 A but
whose entry reside in another SR8. CA IDMS will then use the SR8’s NEXT pointer to locate the next SR8
in the index which in our example is SR8 B. SR8 B is then searched and the entry for the record with the
key value of WATS is located. The DBMS then tries to get the next entry in the index and realizes that
none exists and an error status of 0307 (End-Of-Set) is returned to the requesting program.
In this case the added overhead to determine that we were at the end of the set was an extra read of
SR8 A. However it is possible that SR8 B may have been subsequently split moving the entry for WATS
to an SR8 record later in the structure. In that case the DBMS would have had to read SR A and SR8 B
before finding the SR8 containing the entry for WATS. In indexes with a high volume of random
insertions the number SR8s that must be accessed can quickly multiply. However CA IDMS does attempt
to resolve orphan conditions which can result in additional run-time overhead.
Resolving Orphans
It is the responsibility of the DBMS to clean up orphan records whenever possible. We will again visit
the scenario created by the following sequence of instructions.
OBTAIN CALC EMPLOYEE.
OBTAIN NEXT EMPLOYEE WITHIN EMPLOYEE-INDEX.
Once again the record occurrence to be accessed using the OBTAIN CALC command will be the
EMPLOYEE record whose index key is WATS. However this time the database transaction will have
readied its database areas in an UPDATE mode. The processing described earlier remains the same until
the DBMS determines that the index entry for the data record does not exist within SR8 A. CA IDMS
identifies that the areas are readied in update and then decrements the orphan count in SR8 A by 1. It
then continues walking the set of SR8 records searching for the SR8 record containing the index entry
for the WATS record occurrence. When the entry is found in SR8 B, the DBMS will update the UP
pointer in the data record occurrence so that it now points to SR8 B resolving the orphan condition. If
the database had the orphan-only structure described by Diagram 4, the DBMS would have recognized
the fact that the orphan count of SR8 A was now zero and no index entries existed with the SR8. SR8 A
would then be erased from the database.
This processing is transparent o the program issuing the OBTAIN NEXT command. However although the
DBMS is cleaning up an undesirable situation, the process does add additional overhead into
transactions running in an update mode. The other drawback is that orphans are being resolved one at
a time. It is very possible that in high volume indexes the number of orphans being created may exceed
the number of times that existing orphans are being cleaned up creating a significant negative
performance impact.
There are a few methods that can be used to provide a more extensive clean-up of the orphans within
an index structure. Some sites have created user written programs that navigate the database using
logic that would force the DBMS into orphan addition scenarios. This usually would involve a loop that
was driven by an area sweep of the indexed data records. The following is an example of some pseudocode that might be used for this purpose.
LOOP1.
FIND NEXT EMPLOYEE WITHIN EMP-DEMO-REGION.
IF ERROR-STATUS = ‘0307’ GO TO LOOP-EXIT.
MOVE DBKEY TO SAVE-DBKEY.
FIND NEXT EMPLOYEE WITHIN EMPLOYEE-INDEX.
FIND EMPLOYEE DB-KEY IS SAVE-DBKEY.
GO TO LOOP1.
LOOP-EXIT.
A user-written program will provide a good reduction of the number of data records that have been
orphaned but it will typically not reduce the number of SR8 records that may have been orphaned at
various levels of the index structure. If run against the database in a SHARED UPDATE mode within a CV,
it will also generate significant record locks that may lead to contention issues if run with transactions
concurrently attempting to access the same database. This could lead to slowed processing or an
increase in deadlock situations. Finally this method has little impact on reducing the number of levels
maintained by the index or resolving any other performance issues that may exist within the structure.
As a result sites with indexes that exhibit significant orphan issues typically turn to CA IDMS utilities to
improve the performance characteristics of their indexes. Historically this was done by periodically
using the MAINTAIN INDEX utility using the REBUILD FROM INDEX option. MAINTAIN INDEX would
extract the symbolic index keys and dbkeys from the existing index, delete the existing index structure,
and rebuild the index from scratch. Each data record that contained an UP pointer would have that
pointer set to the SR8 occurrence in which its entry resided so that no orphans would exist at the
conclusion of the process. This process would also result in a minimal number of levels SR8 records
being created. The major drawback of this utility is that it required exclusive control of the index
strictures and would have to be performed at some time when no other processing was occurring
against the database.
Starting with Release 17.0, the TUNE INDEX utility was enhanced to provide extensive statistics
concerning the state of an index including the numbers of records that are orphaned. A number of
tuning options have also been introduced. Regardless of the options selected, any execution of TUNE
INDEX will first adopt all orphaned records whether they are data records or SR8 records within the
index structure. TUNE INDEX was also designed to contain many features that allowed the tuning of
indexes to be run concurrently through a Central Version with other transactions using the indexes
being processed with minimal impact. Because of this ability, many sites have implemented the TUNE
INDEX utility as part of regularly scheduled processing without having to worry about outage times for
the target databases.
Minimizing Orphans
The creation of orphan records when an SR8 record within a sorted index is split cannot be avoided.
However many sites have implemented a simple procedure that delays the onset of orphan creation for
a time. This procedure can be implemented when using either the MAINTAIN INDEX or TUNE INDEX
utilities to tune an index or when the index is initially loaded.
For this discussion we will assume that our target index will have an IBC of 100 entries per SR8. This is
the value that has been determined to be required at run time. Indexing is very flexible when it comes
to specifying an index’s IBC and the IBC specification can be changed at anytime without having to
rebuild the entire index. If an index’s IBC is increased after the index is built CA IDMS will simply
continue expanding the size of each SR8 until the new IBC value is exceeded or additional space is not
available on the SR8 record’s page.
Our index that requires an IBC of 100 has been scheduled to be rebuilt for some reason which may be
that excessive orphan counts have developed over time. If the DBA were to rebuild the index using an
IBC of 100, all level-0 SR8s would contain 100 entries except for possibly the last SR8 in the structure. As
a result the first time a new data record occurrence is added to the index a split will occur and
approximately 50 orphans will be created. This can be avoided by reducing the IBC used to rebuild the
index and applying a PAGE RESERVE to the area in which SR8s reside during the rebuild process. This
does require that a separate schema/subschema and/or DMCL be created and used for the rebuild of
the index.
For this example we opt to define our rebuild environment to use an IBC of 80. Let’s also assume that
the index has been sized to allow for 4 full size SR8 records per page. We would determine the
difference in the size of an SR8 containing 100 entries and an SR8 that contains 80 entries. Since we
expect 4 SR8s per page we would multiply the calculated difference by 4. The resulting value would be
used as the PAGE RESERVE specification in the new DMCL used to rebuild the index. After the rebuild is
completed, the run-time environment would continue to use the schema, subschema, and DMCL that
define the IBC as 100 and that does not have a PAGE RESERVE assigned to the index’s area. By doing
this, you will have created an index structure where up to 20 new records can be stored into each SR8 in
the structure before a spilt occurs. By delaying the splitting of SR8s you will also be delaying the
creation of orphan records within the database.
Creation of an alternate schema, subschema, or DMCL is necessary when using the MAINTAIN INDEX
utility to rebuild the index or when doing an initial load with a user-written program. One of the
enhancements made to the TUNE INDEX is the ability to specify an alternate IBC and PAGE RESERVE as
parameters to the utility to only be used during the tuning operation. This eliminates the need to define
any alternate schemas, subschemas, or DMCLs for the tuning process, greatly simplifying the procedure.
Conclusion
The creation of orphaned indexed records is the result of insuring that the updating of indexes is as
efficient as possible. It eliminates the need to update numerous data records’ UP pointers when the
index entries for those record occurrences are moved to a new SR8 record as a result of an SR8 split.
The cost of this performance feature is a potential increase in processing overhead when an index is
serially walked. Indexes that do not experience a high number of record insertions typically do not
experience significant performance problems as a result of the presence of some orphaned records.
However those indexes that do have a large number of record insertions are prone to the creation of
large numbers of orphan records. To minimize the possibility of experiencing related performance
problems these indexes should be monitored closely and periodically rebuilt or tuned to adopt
orphaned records.
Download
Study collections