Voyager bibliographic duplicate detection profiles and bulk import

advertisement
VOYAGER BIBLIOGRAPHIC DUPLICATE DETECTION PROFILES AND BULK IMPORT RULES 12/19/12spg, rev. 8/20/15
Bib Dup Detection Profiles and Bulk Import Rules appear under the Cataloging section of System Administration. Refer to
the Voyager System Administration User’s Guide “Cataloging Configuration” chapter as needed.
BIB DUP DETECTION PROFILES
Create a Bib Dup Profile (if none exists already) before creating the corresponding Bulk Import Rule. The former controls
how dup bibs are handled. It specifies the Voyager index(es) against which incoming records are matched, and also what
to do with dup records. Below is the Profile tab of our default bib dup profile BIB (Add/Replace/Merge 0350):.
In this case the Bib Dup Profile has the same code and name as the Bib Bulk Import Rule, which is not always the case
because a particular Bib Dup Profile can be used in multiple Bulk Import Rules.
Only left-anchored single field indexes can be used. You must specify at least one index, even if the records you’re
loading are all new to the catalog.
When records are imported into Voyager the 001 field of the incoming records is replaced by the Voyager bib ID and the
001 data goes into an 035. If there is an 003 in the incoming record, that data goes into the 035 as a prefix in
parentheses. For OCLC records, two 035’s are generated:
035 __ |a (OCoLC)ocm01358480
035 __ |a (OCoLC)1358480
If there are invalid OCLC numbers it looks like this:
035 __ |a (OCoLC)ocm01358480
019 __ |a 2498502 |a 732719088
035 __ |a (OCoLC)1358480 |z (OCoLC)2498502 |z (OCoLC)732719088
►When reloading records exported from Voyager, the 001 (bib ID) is copied to an 035 by default. Be sure to select:
Don't create 035 during import of bib ?:
optional -- default: create 035 when importing bib
This is a fairly recent option, so there are plenty of records in the catalog that have an 035 with the Voyager bib ID. This
is unfortunate because it can lead to mistaken matches later on when records are loaded using the 035 indexes.
Voyager bulk import rules and bibliographic duplicate detection profiles -- p. 2
►Use the BBID index when reloading Voyager records as with bulk import rule BIBR001 (Replace) or BIB001
(Add/replace).
Profile tab specifies the Profile Name and Profile Code. When creating a profile name and code be as specific about the
function as you can within the number of characters allowed, and maintain consistency with existing codes. Avoid
creating a profile that is exactly the same as another profile; if needed, rename the existing profile to be more inclusive.
Duplicate Handling:
 Add-Unconditional adds incoming records even if dups are detected
 Replace replaces existing records. If no match it is added as new record unless Discard incoming records that
do not match existing records is chosen.
 Add-Conditional adds incoming records so long as there are no dups; if dups are detected the system alerts the
operator (in the Cataloging client) or writes the records to a discard file (in bulk import).
 Merge adds incoming records so long as there are no dups; if dups are detected they are replaced but selected
local fields are retained. If no match it is added as new record unless Discard incoming records that do not
match existing records is chosen.
 Bi-Directional Merge is not something we use
Profiles named “Replace” have the specification Discard incoming records that do not match existing records
because only replaces are expected. If there are any non-matches they are counted as discards in the log.imp file,
with the records themselves in discard.imp. They of course should be followed up. Profiles named “Add/Replace,”
on the other hand, do not specify Discard incoming records that do not match existing records because non-matches
(adds) are expected along with the replaces.
Cancellation is always set to None
Duplicate Replace is the level at or above which a single match may be automatically replaced or merged. To date
we have always specified 100, an exact match.
Duplicate Warn is the level at or above which the system warns the user of matches that are close but should be
reviewed. The value must be less than or equal to the Duplicate Replace value. We have always specified 100, an
exact match.
Field Definitions tab specifies the Voyager index(es) to be used for matching. For the BIB dup profile it specifies
the 0350 index, which matches 035’s in incoming records against 035’s in existing records:
If there is no match on the first 035 field in an incoming record, it will look at other 035 fields in succession, if any. If
there is still no match, the record is not a duplicate.
The 0350 matches against the entire 035 field, but some indexes match against only part of a field. The 035A index,
for example, matches existing 035’s minus their parenthetical data; as such it’s a poor choice for matching.
Voyager can also match incoming records against two or more indexes in succession (not all at once). The order of
indexes is important. For Bibnotes, records are matched against the 0350 index. If unsuccessful, they are matched
against the 019A (cancelled OCLC number) index.
You are not limited to matching a particular field against a particular index. Field and Sub Field Override allow for
matching any field against any index(es):

Field Override + Sub Field Override change the default field in an index to another field specified by you. For
example, if you wanted to match a Voyager bib ID against an incoming field+subfield other than 001 you would
specify BBID index with an override of the desired field and subfield.
Voyager bulk import rules and bibliographic duplicate detection profiles -- p. 3

Field Weight allows assigning a greater weight to one or more indexes in order to give priority over other
indexes. To date we’ve only used 100. Each match that is performed receives a score. When a match occurs
between an incoming record and an existing record, that match receives the points specified for that field in the
Field Weight option. If the fields do not match, no points are received. The points for each field are added up to
determine the score for that match. Each match that is performed receives a score. When a match occurs
between an incoming record and an existing record, that match receives the points specified for that field in the
Field Weight option. If the fields do not match, no points are received. The points for each field are added up to
determine the score for that match. These points are used to determine the relative importance of each match
and to determine whether an automatic match may be made. A match is only automatically established if there
is one candidate that has a score that is equal to or above the value specified in Duplicate Replace (on the
Profile tab). Since we use a Duplicate Replace value of 100, this ensures an exact match. But that may not be
flexible enough in some situations.
Quality Hierarchy tab is not something we use
Merge Fields tab only appears if Duplicate Handling = Merge. It determines which fields are retained when a bib
replaced. For us these are normally certain 9xx fields and fields with |5 CtW (see guidlines for LOCAL FIELDS). While
specific indicators may be denoted, to date we have used asterisks (**) to encompass any indicator.
BULK IMPORT RULES
Bulk Import Rules allow customization of record import in the Cataloging client and for bulk loading. While bulk import can
load bibs+MFHDs, we normally load only bibs. Below is the Rules tab of our default rule BIB (Add/Replace/Merge 0350)
that adds new records and replaces (overlays) existing records; when replacing records, local fields are merged with
those in the incoming record.
Rule Name tab includes the Code and Name for the rule, in this case BIB = Add/Replace/Merge 0350. When creating a
rule code and name be as specific about the function as you can within the number of characters allowed, and maintain
consistency with existing codes. Avoid creating a rule that is exactly the same as another rule; if needed, rename the
existing rule to be more inclusive. As a password for the new rule we used to simply repeat the ID; Voyager now wants 8
characters minimum so use wesleyan for all rules. Select This password never expires.
Rules tab (shown above) is self-explanatory except for:
 Bib Dup Profile specifies how duplicate bibs are handled (see below)
Voyager bulk import rules and bibliographic duplicate detection profiles -- p. 4




Loc Field, Loc Subfield, etc. indicates which field+subfield in the bib record, if any, contains a location code to be
used in creating MFHDs or MFHDs+items
Load Bib/Auth Only, etc. tells Voyager what to do. Just load Bibs? Load bibs and create MFHDs? And so on.
For import rule BIB, no matches in the catalog = new bibs/MFHDs/items created; matches in the catalog =
existing bibs replaced.
Create MFHD for existing Bibs, etc. is not something we’ve done so far
Orders button opens a separate screen that specifies the source of data to create a PO. It’s used for EOCRs
and YBP ebook batch loads.
Item Type indicates which field(s)+subfield(s) if any, contain an itemtype code to be used in creating item records. If the
code can be found in more than one field, indicate the hierarchy of fields to check. To date we’ve never done this.
Mapping tab allows you to create a single Voyager location+itemtype map to be applied to all incoming records (e.g.
Online+Electronic Book), or multiple maps to be applied according to location data in the bibs (as with EOCRs and
PromptCat). For the latter, each location is assigned a default itemtype, e.g. Olin=Book and Sci. Ref.=Reference Book.
Mapping can also indicate the source of the call number data, if any, to be inserted into the MFHD. In the list below,
“MARC” refers to the bib record.
 MARC Item Type = Enter an itemtype that appears as a text string in the incoming bibs, if any. We’ve never done
that. Instead, we specify an asterisk (*) to default to Voyager Item Type.
 MARC Location Code = Enter the location code that appears as a text string in the incoming bibs, if any. We’ve
never done that. Instead, we specify an asterisk (*) in this field to default to Voyager Location.
 Voyager Item Type assigns a specific Voyager itemtype to all records
 Voyager Location assigns a specific Voyager location to all records
 Call Number Hierarchy determines the bib field(s)/subfield(s) from which call number data is pulled into MFHD
852 |h and |i. For import rule BIB it looks first at 050 |a and |b; if none, it looks at 090. For most e-resource
batchloads it looks only at 090 |a and |b.
Barcode tab indicates which field+subfield in the bib record, if any, is inserted into the barcode field of the item record,
e.g. 949 |i for PromptCat.
To duplicate an existing rule, highlight the rule to be duplicated and click Duplicate. Enter the code and name for the new
rule. Make any changes. Click Save.
Download