Metadata Issues Paper

advertisement
Museums and Images JISC-FAIR Cluster Group
Metadata Issues Paper
Most metadata regimes seem to start from the assumption that all information objects
have the same characteristics and are therefore susceptible to the same treatment
for retrieval purposes. This may well be so where the objects in question are text
documents (the term “document” is sometimes used as a synonym of “information
object” hence the qualification) but once the remit is extended to other formats this
ceases to be the case. Different kinds of information objects have different
characteristics but rather than attempt to itemise these in their entirety three
examples are offered to illustrate the point. This may result in some broad
generalisations that should ideally be qualified but the purpose is to feed the
discussion rather than provide a definitive statement.
Text documents
Each document is unique: where two documents are substantially the same but
slightly different we have developed the concept of edition. A document represents a
statement of a particular point of view, that is to say it is the materialisation of a
thought process. This means its physical format is not considered of overwhelming
importance so that reading it on a computer screen is as acceptable as perusing a
paper copy. The metadata is more or less inherent in the object and the method of
referring to the object – the “bibliographical citation” – is standardised.
1. Creator
The author (or authors) is explicitly identified on the object. This is deemed to be so
important that where the object clearly emanates from an institution but does not
identify an author it is deemed to have a “corporate author”.
2. Title
The title is fundamental to identifying the work and the only practical way of referring
to it informally. Where journal articles are concerned it also provides an indication of
the content.
3. Publisher
Explicit on physical copies; important for citation/ location purposes. May be implied
in purely electronic publications (e.g. by the website credentials)
4. Date
The date is usually explicit on physical copies; important for citation/ location
purposes. Not always clear in electronic publications.
5. Subject
Subject is a statement of “aboutness”. May be explicit in the form of keywords
supplied by the author or added later by an indexer. Could theoretically be derived
from an automated analysis of the content. Tends to be expressed in the form of
abstract nouns often mirroring academic disciplines e.g. economics, physics. May
draw on published schema such as Dewey decimal classification or specialist
thesauri.
Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper
Author: Roy McKeown, Minor edits: Shaun Osborne
Document Date: 1Jul2004
Artworks
Each work is unique and is a creative statement by the artist. Only the original work is
truly acceptable though photographic or digital surrogates may be adequate for some
purposes. Any copy is deemed to be a new (if unoriginal) work. The majority of works
do not have inherent metadata.
6. Creator
May be explicitly identified through the use of a signature or monogram but signature
may require expert interpretation. Most works unsigned so attribution relies on expert
opinion, which, of course, is open to challenge.
7. Title
Reliability varies. Modern works have a title assigned by the artist but earlier works
‘acquired’ titles or nicknames or both e.g. the Wilton Diptych is properly “Richard II
presented to the Virgin and Child by his Patron Saint John the Baptist and Saints
Edward and Edmund”. Not usually embedded in the work.
8. Publisher
Not published (except for special subsets such as prints)
9. Date
Not usually embedded in the work. A date may be assigned as part of the expert
attribution process or known from the artist’s own documentation.
10. Subject
Has “aboutness” but because of the need to express this verbally when the original is
purely visual it is very open to interpretation. May be expressed as abstract nouns but
does not readily correlate with academic disciplines. No readily applicable subject
schema? [Iconclass offers a schema for categorising the iconographic elements of a
picture but this is the “of” level rather than the “about”]
Museum objects
Objects are generic and made for a purpose rather than as a creative statement.
Most objects are not intended to be unique though only a single example may
survive. Purpose not always clear: may be assigned as an expert opinion. More than
one purpose possible e.g. a knife may be used for hunting, fighting or cooking. The
majority of objects do not have inherent metadata.
11. Creator
Not usually identified but later objects may have factory marks or hallmarks. More
modern pieces – most notably ceramics - may have designers, modellers, decorators
etc but they are not usually identified explicitly on the item.
12. Title
Not usually assigned though important pieces may acquire a nickname. Usually have
to make do with a generic reference – knife, cup, bowl.
13. Publisher
Not applicable. Might be stretched to include manufacturer for mass-produced items
but again this would only apply to objects from the recent past.
Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper
Author: Roy McKeown, Minor edits: Shaun Osborne
Document Date: 1Jul2004
14. Date
Not usually explicit in the object though precious metals and ceramics from the last
400 years or so frequently have hallmarks and date letters. Date is usually derived by
expert attribution using stylistic and contextual evidence. Precision at any meaningful
level only becomes possible with more recent (i.e. better documented) objects so that
dating to a period (a range of years during which certain stylistic features were
evident) is often considered sufficient.
15. Subject
Objects do not have “aboutness”. They are produced for a purpose which may be
very specific and specialised - such as a loom for weaving – but more often are
generic – a knife for cutting, a cup for drinking, a bowl for eating. It would be possible
to codify such activities to form a classification scheme such as SHIC (Social History
and Industrial Classification) but it is open to question as to how this might relate to
schemes in use for other types of information object.
Conclusion
The conclusions to be drawn from this are a matter for debate and will be heavily
influenced by the scenario(s) in which the metadata are being applied.
In the abstract, the logic of Dublin Core is that all information objects have their
metadata available online so that in theory the totality of information can be
searched. In this scenario a search for text items can be reasonably efficient if it is
based on a combination of author and title as each of these elements is reasonably
specific and should return a manageable dataset. Applied to the metadata for
museum objects, on the other hand, the result is likely to be less satisfactory. As
outlined above, the author/ title concept is pretty much absent and date is likely to be
fairly vague. Museums do not routinely apply subject descriptors at object level as
each collection is generally felt to have broad subject affiliation but at the detailed
level to be capable of supporting a range of subjects. Each object is a piece of
evidence so that for example a cup provides evidence of drinking practices, of
design, of the technology used to manufacture the cup and of the use of materials in
that operation. For museum objects, then, subject metadata may well not be present
and where it is present is unlikely to be comprehensive since it is at best difficult to
predict all possible connections. It seems likely that in this scenario a cross-domain
search would mean the service provider mapping object names/ descriptors to titles
so that a search would retrieve all the objects with the sought descriptor/ name plus
all the text documents with that word in the title. Any secondary criteria added to
increase specificity seems likely to skew the selection. For example, adding a
modern date would exclude the museum objects while adding a historic date would
exclude modern publications. This would suggest that the feasibility of cross-domain
searching is at best open to question.
Moving to a scenario where only museums’ metadata are available does not
necessarily solve all the problems. Assuming that all or most museums have their
metadata online, it becomes feasible to search on object names/ descriptors.
Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper
Author: Roy McKeown, Minor edits: Shaun Osborne
Document Date: 1Jul2004
However, considerable mapping is likely to be required since even museums with
comparable collections use differing vocabularies while different civilisations use
different timelines. Vanilla DC could probably cope with at least part of this probably
by equating title with object descriptor/ name and using the date field to indicate
period of origin. However its ability to provide sufficient parameters to support
efficient searching is moot. A suitable schema would, of course, help but I contend
that even then it would be difficult to frame a search that returned a manageable set
of records. The paradox is that the more organisations that offer metadata the more
useful the service becomes as a resource discovery tool but the less efficient it
becomes as the number of hits generated by any one search increases.
To address this problem there should be consideration of the use of collection level
data. The constitution of a collection is another debating point but by bundling similar
objects into one wrapper the provision of rich metadata becomes more practical and
each collection would equate to a text document thus reducing the overall number of
hits in a cross-domain search without impoverishing the result. The collection could
then be unpacked with accompanying illustrations at the home website thus reducing
traffic.
The overall conclusion to be drawn here is that while metadata for documents is
relatively specific for retrieval purposes, metadata for museum objects tends strongly
toward the generic. In practice this means that a search for objects is likely to return
an impractically large set of matches which is only likely to be made larger once
searching is extended across a number of collections. Searches can, of course, be
filtered for date or material which will reduce the final set of hits displayed to the user
but overall it does mean that there will be a substantial amount of information moving
around the system if metadata searching is offered at item level.
Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper
Author: Roy McKeown, Minor edits: Shaun Osborne
Document Date: 1Jul2004
Download