the MARC leader and directory

advertisement
lis512 lecture 4
the MARC format structure, leader,
directory
MARC 21


MARC 21 is as important example of a record
format used in by the library community
Integrated Library Systems (ILSs) all



import MARC21 records into relational database
system
export MARC21 records from relational database
systems
MARC21 records describe records from library
catalogues.
documentation
• I use the documentation provided by the
library of congress at
http://www.loc.gov/marc/.
warning: the blank
• The MARC record format uses blanks as
values.
• Since blanks are not usually seen on a slide, I
may denote them here by a special sign ␢.
Most of the time ‘ ’ does the job.
• In the documentation at the LoC, they use the
# character.
MARC 21 formats
• There are different format of MARC21 records
– MARC 21 format for bibliographic data
– MARC 21 format for authority data
– MARC 21 format for holdings data
– MARC 21 format for classification data
• We are only studying the first one here and
call this the “MARC format” in what follows.
MARC format
• The MARC format is very complicated.
• The basic structure is
– leader
– directory
– control fields
– data fields
• The leader and directory are fixed fields. That
means they have fixed length.
• The control and data fields are called variable
fields. That means they have variable length.
fixed fields
• The leader is 24 characters/bytes long. It can
not be repeated.
• Each directory entry is 12 characters/bytes
long.
• Both parts can only contain ASCII characters.
• There may be many directory entries.
Therefore the total length of the directory is
not fixed.
MARC leader
• Described in
http://www.loc.gov/marc/bibliographic/bdlea
der.html
• The leader is 24 bytes long.
• Each byte houses one ASCII character, so we
can also say that it is 24 characters long.
• We count them from 0 to 23.
grouping of leader bytes
• I have grouped the bytes into “boring” and
“interesting” bytes.
• Boring bytes contain no information. Their
values can be calculated by a computer.
• Interesting bytes contain some information.
They have to be filled in by a human.
sub-classification of boring bytes
• Boring bytes can be classified as fixed boring
bytes and variable boring bytes.
• Fixed boring bytes always contain the same
value for any MARC record.
• Variable boring bytes contain potentially
different values for every MARC record.
• I write “leader/??” when I want to refer to the
byte ?? in the leader.
fixed boring byte positions
• Here are all fixed boring bytes
– leader/10 always contains ‘2’
– leader/11 always contains ‘2’
– leader/20 always contains ‘4’
– leader/21 always contains ‘5’
– leader/22 always contains ‘0’
– leader/23 always contains ‘0’
• Here is an example with fixed boring bytes
underlined: “01178cam a2200313 a 4500”
fixed boring bytes
• For fixed boring bytes, we need to be able to
see where they are.
• We do not really need to know what they
mean since we can’t change them anyway.
• For the curious, I have explanations in the
appendix.
Variable boring bytes
• These bytes contains length data about the
record.
• Of course, for most records you will find
different values.
• But the values hardly contain any useful
information.
• The boring positions are 00–04 and 12–16.
• Both are 5 positions long and we study them
in turn.
leader positions 00–04
• At that position, there appears the length of
the entire MARC record.
• This is the number of bytes, not the number
of characters!
• There are 5 numerical characters. The number
is right justified and unused characters contain
zeros, as in say “00234”
leader positions 12–16
• This contains the position of the start of the
data, after the leader and the directory.
• This is contains the length of the leader plus
the length of the directory, plus 1.
• It is encoded like positions 00–04.
• The number should be much smaller than the
number encountered at 00–04. If you subtract
one, you should find a multiple of 12.
finished with boredom
• Here is an example with fixed all boring bytes
underlined: “01178cam a2200313 a 4500”
• In the interesting bytes, we find codings that
give us information about the MARC records.
• If we create a MARC record, we need to fill
them out.
• Sometimes a system may prepopulate them
but we still need to know what they are.
interesting code positions
• From the example, you will see that the code
positions that are interesting come in a rear
part 17–19 and it a front part 05–09.
leader position 05
• The records status indicates the relationship
of the record to a set of records in a file.
– ‘a’  The record has been changed to a higher
encoding level as recorded in position 17.
– ‘c’  Corrected or revised Addition/change other
than in the encoding level code has been made to
the record.
– ‘d’ Deleted Record has been deleted.
– ‘n’ New Record is newly input.
– ‘p’  Increase in encoding level from
prepublication.
leader position 6, slide 1
• This gives the type of material. Values are
– ‘a’  language material
– ‘t’  manuscript language material
– ‘c’  notated music
– ‘d’  a manuscript notated music
– ‘e’  cartographic material
– ‘f ‘  manuscript cartographic material
– ‘g’  projected medium
leader position 6, slide 2
• further values are
– ‘I’  nonmusical sound recording
– ‘j’  musical sound recording
– ‘k’  a two-dimensional non-projectable graphic
– ‘m’  a computer file
– ‘o’  a kit
– ‘p’  mixed materials
– ‘r’  a three-dimensional artifact or naturally
occurring object
leader position 7
• Bibliographic level of the description
– ‘a’  monographic component part
– ‘b’  serial component part
– ‘c’  collection
– ‘d’  a subunit
– ‘I’  an integrating resource
– ‘m’  a monograph
– ‘s’  a serial
leader position 8
• This gives the type of control. There are two
valid values
– ‘ ’ –> not specified
– ‘a’  the item is described according to archival
descriptive rules, which focus on the contextual
relationships between items and on their
provenance rather than on bibliographic detail. All
forms of material can be controlled archivally.
leader position 9
• This field indicates the character set and
encoding scheme used in the record. There
are two valid values
– ‘ ’  MARC-8
– ‘a’  the UTF-8 encoding of UCS/Unicode
conclusion about the front part
• Normally, if we are cataloging books, and keep
changing the records, we expect something
like ‘cam a’.
• You are allowed to use that for your record
provided you don’t go into multi-part
resources.
leader position 17, slide 1
• This is the encoding level. It indicates the fullness of
the bibliographic information in the record.
– ‘ ’  “full level”. It is a complete MARC record created
from information derived from an inspection of the
physical item. For serials, at least one issue of the serial
is inspected.
– ‘1’ “full level, material not examined”  created from
information derived from an extant description of the
item, without reinspection of the item.
– “2”  “Less-than-full level, material not examined” 
created from an extant description of the material
without reinspection of the item.
leader position 17, slide 2
• More allowed values for position 17 are
– ‘3’  “abbreviated level”  meaning a
brief record that does not meet minimal
level cataloging specifications.
– ‘4’ “core level” a less-than-full but
greater-than-minimal level cataloging
record
– ‘5’  “partial (preliminary) level” a
preliminary cataloging level record that is
not considered final
leader position 17, slide 3
• More allowed values for position 17 are
– ‘7’  “minimal level”  Record that meets the
U.S. National Level Bibliographic Record minimal
level cataloging specifications and is considered
final
– ‘8’  “prepublication level”  prepublication
level record. Includes records created in cataloging
in publication programs.
– ‘u’  “unknown”
– ‘z’  “not applicable” the concept of encoding
level does not apply to the record.
leader position 18
• This field say what cataloging rules were
applied in the descriptive part. Allowed values
are
– ‘ ’  the description does not follow the
International Standard Bibliographic Description
(ISBD) cataloging and punctuation provisions.
– ‘a’  the description uses the AACR2
– ‘c’  the description follows the ISBD, but the
end of subfield punctuation is omitted.
– ‘i’  the description follows the ISBD including
end of subfield punctuation
– ‘u’  unknown
leader position 19
• This has the multipart resource record level. This
pertains to the situation where the record
describes part of a resource or a resource that has
many parts.
– ‘ ’  not specified or not applicable
– ‘a’  the record describes a set of resources
– ‘b’  the record describes a resource which is part of
a set. The resource has a title that allows it to be
independent of the set.
– ‘c’  the record describes a resource that is part of a
set. The resource does not have a title that makes it
understandable separately.
conclusion about the rear part
• Normally, when we download a MARC record,
we expect the rear part to be ‘ a ’, with two
blanks surrounding an ‘a’.
• In the record that we create for class we make
it ‘7a ’. Let’s avoid complications with multipart resources. ;-)
Thank you for your attention.
http://openlib.org/home/krichel
MARC directory
• The MARC directory follows the leader.
• The directory contains fixed-length
information about the variable-length fields.
• Each directory entry has 12 bytes numbered 0
to 11.
• For each variable field appearing in the record,
there is a directory entry.
• The MARC directory contains no information.
appendix
• This is an appendix to the slides.
• It contains a description of the MARC
directory
• It contains a partial rationale for the fixed
boring bytes.
MARC directory bytes 0–2
• Bytes 00–02 give the field name.
• The field names used in MARC are numbers.
• So you will find three ASCII numeric characters
at that place.
the MARC directory bytes 03–06
• They give the field length.
• This is 4 (four) ASCII numeric characters.
• The fact that there are four of them is
determined by character 20 in the MARC
leader.
• The number is right justified and filled by 0s if
required.
• The field length of the variable field, including
the field names, the field indicators, subfield
codes, data, and the field terminator
character.
the MARC directory bytes 07–11
• Bytes 07–11 give the starting character
position. This five ASCII numeric characters
that specify the starting character position of
the variable field relative to the base address.
• The fact that there are five of them is
determined by character 21 in the MARC
leader.
• The number is right justified and filled by 0s if
required.
leader position 10
• This is the indicator count.
• It says how many bytes are used by indicators
to a field.
• The indicator count is always 2.
leader position 11
• This indicates the number of character
positions used for each subfield code in a
variable data field.
• This always ‘2’.
• There are two characters in each subfield. First
there is the delimiter ‘$’ and then there is a
lowercase or number code for the field.
leader position 20
• This is the length of the length-of-field
portion.
• Meaning how long is the length of a field.
• It is always “4”.
leader position 21
• Length of the starting-character-position
portion.
• At this position, we always find the value “5”.
• This says that in each entry in the MARC
directory (to be covered later) the starting
character position part is 5 numbers long.
leader position 22
• This gives the length of the implementation
dependent part of the record.
• This is always 0.
• This means that the record is done according
to the specification and there is no additional
part that is custom made for use on the
information system you are working with.
• If it were there, it would have to be short!
leader position 23
• This is the last position.
• The meaning of the data at this position is
undefined.
• I suppose it could be defined later when we
find a need for it.
• This position is always occupied by a 0.
Download