Post-load checks

advertisement
POST-LOAD CHECKS
12/9/10spg, rev.2/3/16
Always check the log.imp file after a load. Preserve all your files, including any OCLC Connexion files, until you are
absolutely sure records have processed correctly.
WebAdmin
If using WebAdmin you receive an email when the job starts and when it completes. The log.imp file name is in the form
log.imp.<job number><date>.<job#>. As of Voyager 9 the file may not show in the email, however the attached .zip
file--which is actually empty--has the file name. To copy the file name, click the attachment once then copy the file name
out of “File name:”
In WebAdmin do Ctrl-F and paste in the file name.
If you are running several processes very close together the log.imp file will cumulate the information from previous runs.
A large file takes longer to process. You can check a log.imp file as it builds, clicking refresh.
If you do not get an email find the file in WebAdmin in this way:



Do Ctrl-F and type log.imp into the Find box
In the box, highlight log.imp and type in the desired date, e.g. 20140211
Scroll to find a likely match
On the server
To view the log.imp file see instructions in EXPORTING AND IMPORTING FILES USING LINUX COMMANDS. The file
you will see is the same as the one viewed in WebAdmin.
Log.imp file
If all records were processed the expected number of records will be reported at the bottom. Voyager record numbers are
indicated so you can check a few records in the catalog if you wish. Example of a successful load to replace bib records:
I am 29741. I will be doing all of '/m1/voyager/tmp/cgipost-ImportFile-28405.tmp' for you.
The import code is "BIBREPU8" for this run.
The bib dup profile is "Replace " for this run.
This import is using a rule that does not allow creation of MFHDs or Items.
Tue Jul 12 18:41:02 2011
Expecting Marc21 UTF-8 Records
1(1):Duplicate Bibs above threshold: replace 1, warning 0.
BibID & rank
1850307 - 100
REPLACE Existing DB Bib record replaced.
<snip>
Processed: 902
Added: 0
Discarded: 0
Rejected: 0
Errored: 0
Replaced: 902
Merged: 0
Deleted: 0
Mfhds created: 0
Items created: 0
Post-load checks -- p. 2
If you see a message Record does not match format for '<character set>' change your import rule, it means that the
character encoding of the first record does not match that in your bulk import rule, which halts the process. Change the
rule and try again.
MarcEdit changes all records to UTF-8. If loading a file not processed through MarcEdit it is possible to have a mixture of
character sets, typically UTF-8 and MARC-8. For these files, if records other than the first have a different character set
from the bulk import rule, the ones that match the rule will load and the others will be reported as errors.
Similarly, records with unparsable characters (not recognized as MARC) are reported as errors. MarcEdit validation does
not catch all of them.* These characters usually load correctly: {acute}, {tilde}, {caron} and other spelled-out diacritics in
curly brackets. Make sure they are appropriate to the language.
*Validation should have detected an LDR error, reported as an incorrect number of characters due to the presence of {90}, {92}, etc. rather than 0, 2, etc.
This typically occurs in older records exported from Voyager. Use Find All with value {9 to check. Change individually or use Find and Replace:
Examples: =LDR 02220cam a2200385Ia 45{90}0  4500
=LDR 01992cam a2200373Ia 45{92}0  4520
The records themselves appear in a corresponding err.imp file.
Errors
If the log.imp file is large or complicated you can copy the text and paste it into Word or Notepad and work from that.
It is often easier to correct the errors in your .mrk/.mrc file and reload the whole thing, rather than extracting the problem
records to work on separately. However, if there are only a few errors, avoid reloading the entire file. Instead, copy/paste
the problem records into Notepad, save as .mrk, and work from there.
It may be helpful to view the problematic record(s) in OCLC to help diagnose the problem, e.g. a special character or
diacritic.
In the log.imp file an error appears as:
: ERROR: Unparseable record written to error file:
There is no indication of what the error is, but since the file is in the same order as your mrk/.mrc file you can figure out
which records are the problems.
If it is not clear what’s wrong, retrieve the err.imp file that corresponds with the log.imp file and open it as a Word
document. If the data is one long string, copy it from the err.imp file and paste it into Word.) The file contains the errored
records but not in MARC format, nor can it be saved as .mrc or even .mrk, as it won’t correctly compile into .mrc. Here’s
an example of an error in the err.imp file. It appears to show an illegal character or characters preceding indicators 08,
however we don’t know what the field is:
 08iTitle is part of eBook package:dDe GruytertHUP eBook Package
When the problem record was examined in the .mrk file there were illegal characters preceding indicators 08 in the 773
field:
=773
{A0}08$iTitle is part of eBook package:$dDe Gruyter
(This was in fact reported during MarcEdit validation but not recognized as a problem that would affect the load.) Since
there were many 773’s, Find--Replace was used to correct them. Since there were so many records involved, the file was
reloaded in full. If there had been only a few errors, the records would have been copied/pasted into Notepad and saved
as .mrk.
Curly brackets sometimes indicate an error, but certainly not always. These have been noted as errors in the past:
{80}, {81}, etc.
{85}
{90}, {91}, etc.
{91}{92}
{9C}
should be superscript 0, superscript 1, etc.
elipses (...)
subscript 0,subscript 1,etc.
superscript 10
superscript hyphen
Post-load checks -- p. 3
{92}
{93}
{94}
{95}
{96}
{97}
apostrophe
opening quote (doesn’t matter whether opening or closing; for our purposes it’s just “)
closing quote
<not sure>
dash (-)
double dash (--)
If you are completely stumped as to what the problem is, import one or more of the MARC records via the cataloging
module and see what Voyager validation can tell you. To do this, copy and paste the record(s) into Notepad and save as
a .dat file. If it won’t import, turn off validation. Once the record is in the catalog, turn on validation and save it to the
database again. Remember to delete the record before batchloading.
If you still cannot determine the problem, you can try these:
(1) This is the simpler option, especially if there are not too many records. In the .mrk file, change questionable codes to
something that will load and be easily found in the catalog afterward. Inputting your initials can help with that, e.g.:
[SPGBRACKETED82]
[SPGSUPERSCRIPT2]
[SPGDEGREESYMBOL]
Once the records are loaded, do a keyword search in the catalog to find them and make corrections.
(2) In the .mrk file, delete any oddball characters. The records will load without them, of course, but at least they are in
the catalog.
(3) Make corrections in the MarcEdit file. If file was originally not UTF-8, opening and saving it changes the records to
UTF-F. The UTF-8 character set table under “Edit” is fairly easy to use. You may still see some leftover MARC-8
characters. The MARC-8 ALA character set table seems inaccurate/incomplete, but if you type in a curly bracket a little
window pops up and you can choose from that. To add an umlaut to the u in “uber,” for example: a) position cursor at
insert point or highlight text to replace; in the little popup table, double-click {uuml} so that you have {uuml}ber, which will
load as über, at least I think so.
(3) Ask for help on MarcEdit-L or Voyager-L.
Discards and Replaces
Aside from log.imp and err.imp there are two other report files that may come into play:

Discard.imp - The discard file holds records that have been discarded in accordance with the bulk import rule used.
One reason a record may end up there is because it matches multiple records. A second reason involves AddConditional which specifies that dups are not added but are written to the discard.imp file. A third is a bulk import rule
that specifies Discard incoming records that do not match existing records.
► Profiles named “Replace” have the specification Discard incoming records that do not match existing records
because only replaces are expected. If there are any non-matches they are counted as discards in the log.imp file,
with the records themselves in discard.imp. They of course should be followed up. Profiles named “Add/Replace,” on
the other hand, do not specify Discard incoming records that do not match existing records because non-matches
(adds) are expected along with the replaces.

Replace.imp - When replaced by an incoming record, the existing record is copied to this file. This is helpful if a record
has been overlaid by mistake.
Reloading
When reloading records for any reason, specifythe right bulk import rule. For record sets, if there’s even the slimmest
possibility of existing records in Voyager, use a rule that adds and replaces records.
For record sets, use the operator ID of the set, not your personal ID.
Post-load checks -- p. 4
When a job dies partway through
When processing is interrupted the log.imp file reports where it left off:
I am 21390. I will be doing all of '/m1/voyager/tmp/cgipost-ImportFile-13006.tmp' for you.
The import code is "EBADD" for this run.
The bib dup profile is "Add " for this run.
This import is using a rule that allows for creating Bibs, MFHDs and Items.
Mon Jul 11 19:00:34 2011
Expecting 'U' as character set
1(1):Duplicate Bibs above threshold: replace 0, warning 0.
Adding Bib record 1992874.
2(2):Duplicate Bibs above threshold: replace 0, warning 0.
Adding Bib record 1992877.
<snip>
1803(1803):Duplicate Bibs above threshold: replace 0, warning 0.
Adding Bib record 1998277.
1804(1804):Duplicate Bibs above threshold: replace 0, warning 0.
Adding Bib record 1998281.
No MFHD created.
UpdateMfhdRec return an unknown error code '1'. Check voyager log.
1805(1805):Duplicate Bibs above threshold: replace 0, warning 0.
Adding Bib record 1998284.
No MFHD created.
UpdateMfhdRec return an unknown error code '1'. Check voyager log.
1806(1806):Duplicate Bibs above threshold: replace 0, warning 0.
Adding Bib record 1998287.
Bulkimport terminated by signal 15.
In this case, "signal 15" indicates that the process was deliberately terminated when Voyager reset itself on Tue Jul 12 at
approximately 03:18:24 2011. Voyager resets itself around 3 a.m. each day.
The number to the left of each action is the sequence in the import file, not a Voyager number.
Follow up in the catalog for records that were partially created. Then re-import the original file but indicate where the
sequence should begin, in this case 1807.
BULK EXPORTING
File names have “exp” and are similar to those for imports. For more information see EXPORTING AND IMPORTING
RECORDS USING WEBADMIN.
Download