Museums and Images JISC-FAIR Cluster Group Metadata Issues Paper Most metadata regimes seem to start from the assumption that all information objects have the same characteristics and are therefore susceptible to the same treatment for retrieval purposes. This may well be so where the objects in question are text documents (the term “document” is sometimes used as a synonym of “information object” hence the qualification) but once the remit is extended to other formats this ceases to be the case. Different kinds of information objects have different characteristics but rather than attempt to itemise these in their entirety three examples are offered to illustrate the point. This may result in some broad generalisations that should ideally be qualified but the purpose is to feed the discussion rather than provide a definitive statement. Text documents Each document is unique: where two documents are substantially the same but slightly different we have developed the concept of edition. A document represents a statement of a particular point of view, that is to say it is the materialisation of a thought process. This means its physical format is not considered of overwhelming importance so that reading it on a computer screen is as acceptable as perusing a paper copy. The metadata is more or less inherent in the object and the method of referring to the object – the “bibliographical citation” – is standardised. 1. Creator The author (or authors) is explicitly identified on the object. This is deemed to be so important that where the object clearly emanates from an institution but does not identify an author it is deemed to have a “corporate author”. 2. Title The title is fundamental to identifying the work and the only practical way of referring to it informally. Where journal articles are concerned it also provides an indication of the content. 3. Publisher Explicit on physical copies; important for citation/ location purposes. May be implied in purely electronic publications (e.g. by the website credentials) 4. Date The date is usually explicit on physical copies; important for citation/ location purposes. Not always clear in electronic publications. 5. Subject Subject is a statement of “aboutness”. May be explicit in the form of keywords supplied by the author or added later by an indexer. Could theoretically be derived from an automated analysis of the content. Tends to be expressed in the form of abstract nouns often mirroring academic disciplines e.g. economics, physics. May draw on published schema such as Dewey decimal classification or specialist thesauri. Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper Author: Roy McKeown, Minor edits: Shaun Osborne Document Date: 1Jul2004 Artworks Each work is unique and is a creative statement by the artist. Only the original work is truly acceptable though photographic or digital surrogates may be adequate for some purposes. Any copy is deemed to be a new (if unoriginal) work. The majority of works do not have inherent metadata. 6. Creator May be explicitly identified through the use of a signature or monogram but signature may require expert interpretation. Most works unsigned so attribution relies on expert opinion, which, of course, is open to challenge. 7. Title Reliability varies. Modern works have a title assigned by the artist but earlier works ‘acquired’ titles or nicknames or both e.g. the Wilton Diptych is properly “Richard II presented to the Virgin and Child by his Patron Saint John the Baptist and Saints Edward and Edmund”. Not usually embedded in the work. 8. Publisher Not published (except for special subsets such as prints) 9. Date Not usually embedded in the work. A date may be assigned as part of the expert attribution process or known from the artist’s own documentation. 10. Subject Has “aboutness” but because of the need to express this verbally when the original is purely visual it is very open to interpretation. May be expressed as abstract nouns but does not readily correlate with academic disciplines. No readily applicable subject schema? [Iconclass offers a schema for categorising the iconographic elements of a picture but this is the “of” level rather than the “about”] Museum objects Objects are generic and made for a purpose rather than as a creative statement. Most objects are not intended to be unique though only a single example may survive. Purpose not always clear: may be assigned as an expert opinion. More than one purpose possible e.g. a knife may be used for hunting, fighting or cooking. The majority of objects do not have inherent metadata. 11. Creator Not usually identified but later objects may have factory marks or hallmarks. More modern pieces – most notably ceramics - may have designers, modellers, decorators etc but they are not usually identified explicitly on the item. 12. Title Not usually assigned though important pieces may acquire a nickname. Usually have to make do with a generic reference – knife, cup, bowl. 13. Publisher Not applicable. Might be stretched to include manufacturer for mass-produced items but again this would only apply to objects from the recent past. Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper Author: Roy McKeown, Minor edits: Shaun Osborne Document Date: 1Jul2004 14. Date Not usually explicit in the object though precious metals and ceramics from the last 400 years or so frequently have hallmarks and date letters. Date is usually derived by expert attribution using stylistic and contextual evidence. Precision at any meaningful level only becomes possible with more recent (i.e. better documented) objects so that dating to a period (a range of years during which certain stylistic features were evident) is often considered sufficient. 15. Subject Objects do not have “aboutness”. They are produced for a purpose which may be very specific and specialised - such as a loom for weaving – but more often are generic – a knife for cutting, a cup for drinking, a bowl for eating. It would be possible to codify such activities to form a classification scheme such as SHIC (Social History and Industrial Classification) but it is open to question as to how this might relate to schemes in use for other types of information object. Conclusion The conclusions to be drawn from this are a matter for debate and will be heavily influenced by the scenario(s) in which the metadata are being applied. In the abstract, the logic of Dublin Core is that all information objects have their metadata available online so that in theory the totality of information can be searched. In this scenario a search for text items can be reasonably efficient if it is based on a combination of author and title as each of these elements is reasonably specific and should return a manageable dataset. Applied to the metadata for museum objects, on the other hand, the result is likely to be less satisfactory. As outlined above, the author/ title concept is pretty much absent and date is likely to be fairly vague. Museums do not routinely apply subject descriptors at object level as each collection is generally felt to have broad subject affiliation but at the detailed level to be capable of supporting a range of subjects. Each object is a piece of evidence so that for example a cup provides evidence of drinking practices, of design, of the technology used to manufacture the cup and of the use of materials in that operation. For museum objects, then, subject metadata may well not be present and where it is present is unlikely to be comprehensive since it is at best difficult to predict all possible connections. It seems likely that in this scenario a cross-domain search would mean the service provider mapping object names/ descriptors to titles so that a search would retrieve all the objects with the sought descriptor/ name plus all the text documents with that word in the title. Any secondary criteria added to increase specificity seems likely to skew the selection. For example, adding a modern date would exclude the museum objects while adding a historic date would exclude modern publications. This would suggest that the feasibility of cross-domain searching is at best open to question. Moving to a scenario where only museums’ metadata are available does not necessarily solve all the problems. Assuming that all or most museums have their metadata online, it becomes feasible to search on object names/ descriptors. Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper Author: Roy McKeown, Minor edits: Shaun Osborne Document Date: 1Jul2004 However, considerable mapping is likely to be required since even museums with comparable collections use differing vocabularies while different civilisations use different timelines. Vanilla DC could probably cope with at least part of this probably by equating title with object descriptor/ name and using the date field to indicate period of origin. However its ability to provide sufficient parameters to support efficient searching is moot. A suitable schema would, of course, help but I contend that even then it would be difficult to frame a search that returned a manageable set of records. The paradox is that the more organisations that offer metadata the more useful the service becomes as a resource discovery tool but the less efficient it becomes as the number of hits generated by any one search increases. To address this problem there should be consideration of the use of collection level data. The constitution of a collection is another debating point but by bundling similar objects into one wrapper the provision of rich metadata becomes more practical and each collection would equate to a text document thus reducing the overall number of hits in a cross-domain search without impoverishing the result. The collection could then be unpacked with accompanying illustrations at the home website thus reducing traffic. The overall conclusion to be drawn here is that while metadata for documents is relatively specific for retrieval purposes, metadata for museum objects tends strongly toward the generic. In practice this means that a search for objects is likely to return an impractically large set of matches which is only likely to be made larger once searching is extended across a number of collections. Searches can, of course, be filtered for date or material which will reduce the final set of hits displayed to the user but overall it does mean that there will be a substantial amount of information moving around the system if metadata searching is offered at item level. Museums and Images JISC-FAIR Cluster Group – Metadata Issues Paper Author: Roy McKeown, Minor edits: Shaun Osborne Document Date: 1Jul2004