Subject Metadata Subject Analysis • SUBJECT ANALYSIS: The process of ascertaining the “aboutness” of a document by describing its topic, the discipline in which the topic is treated, and the form of the document. • Discipline: An area or a branch of knowledge. The discipline is distinct from the thing being studied by the discipline. A broad field of inquiry; the context in which any subject is treated • Subject (Phenomena): Broadly, the things studied by disciplines • Form: What the document is rather than what it contains’ – Intellectual: method by which the document has been compiled: history, biography, textbook, Festschrift – Presentation: manner in which subject content has been organized. Statistical compilation – Physical form: Structure of the document as an artefact. Book, video. Definitions • Subject analysis is the part of indexing or cataloging that deals with – the conceptual analysis of an item: what is it about? what is its form/genre/format? – translating that analysis into a particular subject heading system • Subject heading: a term or phrase used in a subject heading list to represent a concept, event, or name Types of concepts to identify • Topics • Names of: – Persons – Corporate bodies – Geographic areas • Time periods • Titles of works • Form of the item Subjects vs. forms/genres • Subject: what the item is about • Form: what the item is, rather than what it is about – – – – Physical character (video, map, miniature book) Type of data it contains (statistics) Arrangement of information (diaries, indexes) Style, technique (drama, romances) • Genre: works with common theme, setting, etc. – Mystery fiction; Comedy films What is a Controlled Vocabulary? • From Wikepedia: A controlled vocabulary is a carefully selected list of words and phrases … The terms are chosen and organized by trained professionals (including librarians and information scientists) who possess expertise in the subject area. Controlled vocabulary terms can accurately describe what a given document is actually about, even if the terms themselves do not occur within the document’s text. Controlled Vocabularies: Subject Heading lists vs. Thesauri • Thesauri • Created largely in indexing communities • Made up of single terms and bound terms representing single concepts (usually called descriptors). Bound terms occur when some concepts can only be represented by two or more words (e.g. Type A Personality) • Subject heading lists • Created largely in library communities • Consist of phrases and other precoordinated terms in addition to single terms Controlled Vocabularies: Subject Heading lists vs. Thesauri • Thesauri • More strictly hierarchical. Because they are made up of single terms, each term usually has only one broader term • Narrow in scope. Usually made up of terms from one specific subject area • More likely to be multilingual. Because single terms used, easier to maintain in multiple languages • Subject heading lists • Not strictly hierarchical. Some headings may have no broader and/or narrower terms • More general in scope, covering a broad subject area, or the entire scope of knowledge • Usually not multilingual Translating key words & concepts into controlled vocabulary • Controlled vocabulary – Thesauri (examples) • Art & Architecture Thesaurus (AAT) • Thesaurus for Graphic Materials I: Subject Terms (TGMI) • Thesaurus for Graphic Materials II: Genre and Physical Characteristic Terms (TGMII) • Thesaurus of Geographic Names (TGN) – Subject heading lists (examples) • Library of Congress Subject Headings (LCSH) • Sears List of Subject Headings • Medical Subject Headings (MeSH) Keywords vs. Controlled Terms • System should allow for both • Keywords give access using “nonstandard” terms • Keywords include terms not yet in vocabularies; places or names not indexed Drawbacks to Controlled Vocabulary • • • • Time to assign = $$ Need for trained catalogers = $$ Time lag to add relevant terms Time lag to delete outdated terms – … so use both keywords and controlled terms Why use controlled vocabulary? Controlled vocabularies: • identify a preferred way of expressing a concept • allow for multiple entry points (i.e., crossreferences) leading to the preferred term • identify a term’s relationship to broader, narrower, and related terms Function of keywords Advantages: • provide access to the words used in bibliographic records Disadvantages: • cannot compensate for complexities of language and expression • cannot compensate for context Keyword searching is enhanced by assignment of controlled vocabulary! Vocabulary Control • Vocabulary control is used to improve the effectiveness of information storage and retrieval systems, Web navigation systems, and other environments that seek to both identify and locate desired content via some sort of description using language. The primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval. Need for Vocabulary Control • The need for vocabulary control arises from two basic features of natural language • Two or more words or terms can be used to represent a single concept – Example: • salinity/saltiness • VHF/Very High Frequency • Two or more words that have the same spelling can represent different concepts – Example: • Mercury (planet) • Mercury (metal) • Mercury (automobile) • Mercury (mythical being) Principles of Controlled Vocabularies • There are four important principles of vocabulary control that guide their design and development. • eliminating ambiguity • controlling synonyms • establishing relationships among terms where appropriate • testing and validation of terms Ambiguity • Ambiguity occurs in natural language when a word or phrase (a homograph . or polyseme) has more than one meaning • A controlled vocabulary must compensate for the problems caused by ambiguity by ensuring that each term has one and only one meaning Synonymy • A different problem occurs when a concept can be represented by two or more synonymous or nearly synonymous words or phrases. This is called synonymy. This means that desired content may be scattered around an information space or database because it can be described by different but equivalent terminology • A controlled vocabulary must compensate for the problems caused by synonymy by ensuring that each concept is represented by a single preferred term. The vocabulary should list the other synonyms and variants as non-preferred terms with USE references to the preferred term. Type of vocabulary control Controlled Lists A list is a simple group of terms Example: Alabama Alaska Arkansas California Colorado .... Frequently used in Web site pick lists and pull down menus What are these? • Flying Horse • King Fisher • Royal Challenge -- The meaning is not clear. -- Need to eliminate ambiguity What are these? • • • • • • • Flying Horse King Fisher Royal Challenge Heineken Budweiser Miller-lite Bud-light Drinks • • • • • • • • • • Flying Horse King Fisher Royal Challenge Taj Mahal Hayward’s 2000 Heineken Corona Budweiser Miller-lite Bud-light Synonym Rings A synonym ring is a list of synonyms or near synonyms that are used interchangeably for retrieval purposes Synonym Rings -- Examples Synonym rings are usually found as sets of lists that allow users to access all content containing any of the terms. e.g., cholesterol: Cholesterol Blood Cholesterol Serum Cholesterol Good Cholesterol Bad Cholesterol LDL . . . Synonym rings are used … • Synonym rings are used to expand queries for content objects. – If a user enters any one of these terms as a query to the system, all items are retrieved that contain any of the terms in the cluster. An example from International SEMATECH; a search for Silicon would look like this: Synonym rings are used … • Synonym rings are often used in systems where the underlying content objects are left in their unstructured natural language format, – the control is achieved through the interface by drawing together similar terms into these clusters. • Synonym rings are used in conjunction with search engines and provide a minimal amount of control of the diversity of the language found in the texts of the underlying documents. Search: Tilenol, Result: Tylenol Synonym rings can be used for assigning keywords in metadata fields IBM Homepage source code: <meta name="Keywords" content="ibm, international business machines, internet, ebusiness, ebusiness, e-business on demand, ebusiness on demand, on demand, ibm on demand, on demand business, on demand enterprise, on demand services, ondemand, ondemand, personal computer, personal system, e-commerce, ecommerce, pc, workstation, mainframe, unix, linux, technical support, homepage, home page"/> Where to find synonyms Search logs Dictionaries Existing authority files LC Name Authority File (NAF) The Union List of Artist Names (ULAN) The Getty Thesaurus of Geographic Names (TGN) Lexical databases, e.g., WordNet http://www.cogsci.princeton.edu/~wn/ Taxonomies A taxonomy is a set of preferred terms, all connected by a hierarchy or polyhierarchy Example: Chemistry Organic chemistry Polymer chemistry Nylon Frequently used in web navigation systems United Nations Standard Products and Services Classification Thesauri A thesaurus is a controlled vocabulary with multiple types of relationships Example: Rice UF Paddy BT Cereals BT Plant products NT Brown rice RT Rice straw Thesauri Relationship types: • Use/Used For – indicates preferred term • Hierarchy – indicates broader and narrower terms • Associative – almost unlimited types of relationships may be used It is the most complex format for controlled vocabularies and widely used. National Monuments Record Thesauri-Archaeological Objects Thesaurus Use of Controlled Vocabularies in Information Storage and Retrieval Systems Dublin Core Content data for some elements may be selected from a controlled vocabulary, as indicated by best practice guidelines Content Coverage Description Type Relation Source Subject Title Intellectual Property Contributor Creator Publisher Rights Instantiation Date Format Identifier Language Example from LOM (Learning Object metadata) 5.2 Learning Resource Type Explanation: Specific kind of learning object. The most dominant kind shall be first. NOTE: --The vocabulary terms are defined as in the OED:1989 and as used by educational communities of practice. • Controlled terms Value Space: ordered exercise simulation questionnaire diagram figure graph index slide table narrative text exam experiment problem statement self assessment lecture Build in a pick-list for creating metadata records Build in a thesaurus for automatic assignment of subject terms Build in a thesaurus to assist searching Build in an illustrated thesaurus to assist searching Advantages and Disadvantages of Particular Structures • Lists: – Simple to implement, use, and maintain – Provide little or no guidance for the user • Synonym Rings: – Are constructed manually and are not used in indexing – Can be useful in retrieval as they allow synonyms and near-synonyms to be treated equally in searching. Advantages and Disadvantages of Particular Structures • Taxonomies – Good information about hierarchical relationships among terms – Useful for both indexers and searchers who need to discover the most appropriate, specific terms for their purposes – There is no entry vocabulary, (i.e. USE/USED FOR terms) – Taxonomies do not indicate other types of relationships among terms • Thesauri – Good information about hierarchical relationships among terms – Good information about relationships among terms – Entry vocabulary to help users locate the correct terms – Thesauri are useful for both indexers and searchers who need to discover the most appropriate, specific terms for their purposes – Thesauri are time-consuming and labor intensive to develop and maintain Typical applications of Lists, Synonym Rings, Taxonomies, and Thesauri • • • Lists – Lists are frequently used to display small sets of terms that are to be used for quite narrowly defined purposes such as a web pull-down list or list of menu choices. Synonym Rings – Synonym rings are frequently used behind-the-scenes to enhance retrieval, especially in an environment in which the indexing uses an uncontrolled vocabulary and/or there is no indexing as when searching full text. Taxonomies • Taxonomies are often created and used in indexing applications and for web navigation. Because of their (usually simple) hierarchical structure) they are effective at leading users to the most specific terms available in a particular domain. Thesauri – Thesauri are the most typical form of controlled vocabulary developed for use in indexing and searching applications because they provide the richest structure and cross-reference environment. Thesauri can be narrow in scope and cover a limited domain or they can be broad in scope and widely applicable to many different types of content. Subject Analysis • Subject analysis is the abstracting and indexing of an item’s conceptual content • A two step process: – ascertaining the subject – translating the subject into controlled vocabulary • Important considerations include: cataloger objectivity, cataloger’s background knowledge, and consistency in determining the content Subject Analysis • Finding (find a work of which subject is known) • Collocating (find what repository has on subject) • Evaluating (assist in making informed decision) • Navigating (provide users with links to related terms) Subject Analysis • What is it about? (aboutness or subject) • What is it for? (relevance or use) • These can be the same question in some instances, but often the subject of a work can be quite separate from the use to which the searcher may put it or the reasons why the searcher considers it relevant. There are a number of methods for determining the aboutness of an item • The Purposive Method tries to determine the author's purpose in creating the work. • The Figure-Ground Method tries to determine what is most central to the work (highly subjective). • The Objective Method counts references to topics and presume that commonly used topic words are central (this is one of the methods used by computers). • The Appealing to Unity Method tries to determine what holds the work together. This photograph is from the Library of Congress, and it was taken by Marion Post Wolcottin March 1940 What is this scene about? • • • • • • photo of a town covered in snow at night from 1940 is this about winter? small town America? the introduction of electric lights? the depression? The answer is it is about all of those things, and probably more. But it is a photo of a small town in the U.S. in the snow, it is a main street, we see automobiles and houses but also commercial buildings, footsteps in the snow, electric lights, and so on. There is a fundamental difference between what an artifact is (a book or a photograph), what it is of, and what it is about. But all of those things usually get lumped together in subject headings and classifications. Summarization for Subject Analysis • Sumarization is the process of deciding what an item is about and translating this into index terms from a subject language. • This process should examine three distinct areas: the discipline in which the item was produced, the specific subjects or topics treated and the form of the item. Summarization • "Summarization" is the word used for a string of terms that describe the aboutness of an artifact. • Discipline | Topic {Facet} | Form • The photograph could be described as: • Sociology | Depression; Winter; American town | Photograph OR • History | Winter; Small Town America | Photograph Subject Access Points • Serve to identify the subject of particular archival collections, series, subseries, or items, and facilitate direct topical retrieval of these materials • Subject headings allow the user to see the entire scope of a repository’s holdings on a given topic by causing these bibliographic records to collocate, or appear side-by-side, under a subject heading in the catalog • When LCSH are used, the archival materials will collocate with published material on the same topic Topical Subjects • The topical subject matter to which the records pertain is among the most important aspects of the archival materials. Terms suggesting topics that might be employed as access points may be found in the following areas of the descriptive record: – Title Element (2.3) – Scope and Content Element (3.1) – Administrative/Biographical History Element (2.7, Chapter 10) Documentary Forms • Terms that indicate the documentary form(s) or intellectual characteristics of the records being described (e.g., minutes, diaries, reports, watercolors, documentaries) provide the user with an indication of the content of the materials based on an understanding of the common properties of particular document types. For example, one can deduce the contents of ledgers because they are a standard form of accounting record, one that typically contains certain types of data. Documentary forms are most often noted in the following areas of the descriptive record: – Title Element (2.3) – Extent Element (2.5) – Scope and Content Element (3.1) Occupations • The occupations, avocations, or other life interests of individuals that are documented in a body of archival material may be of significance to users. Such information is most often mentioned in the following areas of the descriptive record: – Scope and Content Element (3.1) – Administrative/Biographical History Element (2.7, Chapter 10) Functions and Activities • Terms indicating the function(s), activity(ies), transaction(s), and process(es) that generated the material being described help to define the context in which records were created. Examples of such concepts might be the regulation of hunting and fishing or the conservation of natural resources. Functions and activities are often noted in these areas of the descriptive record: – Title Element (2.3) – Scope and Content Element (3.1) – Administrative/Biographical History Element (2.7, Chapter 10) Subject Analysis for Archival Materials: Questions • Concept of aboutness: – How is it determined for archival materials? – Is it of any use to information seekers? – Should other concepts (occupation, form, genre) take precedence over topicality? • Means of providing subject access – Should it be LCSH or other thesauri (or a combination) Depth of Subject Analysis • Summary Level – Most library materials analyzed at this level. The analysis of the collection will proceed as though it were a single entity. Reduce analysis to a single phrase that identifies its main topical theme. – Rarely appropriate for archival materials. Depth of complexity of materials will be lost in gross generalizations Depth of Subject Analysis • Depth Level – Although rarely used in library cataloging, usually will provide a more meaningful approach to archival collections – Break collection into appropriate components and summarize each component individually • Exhaustive Level – Analyze every component of a collection. This is very expensive and time-consuming, so will be utilized only in special cases Archival Management and the Depth Level • Consider amount of processing that is being conducted on a collection at the point that description occurs – Summary level may be appropriate for a recently acquired collection that is not yet processed and has a preliminary record – When processing is underway and the collection has been arranged into series and subseries, the depth level might be a better choice – The exhaustive level is probably only appropriate occasionally when some segment of records is heavily used or considered to be of central importance to the repository’s users Subject Analysis for Archival Materials • Discipline • Topic – Provenance • Creator, Function, Activity – Cultural orientation • Chronological • Geographic • Form – Intellectual, e.g. historical sources – Physical, e.g. diaries or correspondence – Presentation, e.g. statistics Library of Congress Subject Headings (LCSH) • Originally designed as a controlled vocabulary for representing the subject and form of books and serials in the LC collection • Literary warrant: LC collection • originally for use in LC catalogs • now global standard for (i) library catalogs, (ii) bibliographic databases • Approximately 259,000 headings • c.10,000 new headings added each year • c.10,000 new headings added each year • Approximately 36% of headings are followed by LC Class numbers LCSH Principles • User and usage based • Literary warrant • Uniform headings – – – – – • • • • • Synonymous terms Spelling variants English vs. foreign language terms Scientific/technical vs. popular terms Currentness Unique headings Specific entry and co-extensivity Internal consistency Stability Precoordination: indexing terms are chosen and coordinated (“put together as a string”) at the time of cataloging LCSH Headings can be: • Personal names – Individuals – Families, dynasties, etc – Mythological, legendary or fictitious characters • • • • • • Corporate bodies Historical events Names of animals Other proper names Languages Ideas, events • Prizes, awards • Holidays, days of the week, etc. • Ethnic groups, tribes, nationalities, etc. • Religious, philosophical systems • Geographic names – Jurisdictional headings – Geographic features • You name it – it can be a subject heading LCSH Conventions for Relationships • UF: used for: specific see reference • BT: broader term: specific see also reference • NT: narrower term: specific see also reference • SA: see also: general see also reference • RT: related term: specific see also reference Syndetic structure: references • Equivalence relationships • Hierarchical relationships • Associative relationships Equivalence or USE/UF references • Link terms that are not authorized to their preferred form • Example: Baby sitting USE Babysitting Categories of USE/UF references • Synonyms and near synonyms – Dining establishments USE Restaurants • Variant spellings – Haematology USE Hematology • Singular/plural variants – Salsa (Cookery) USE Salsas (Cookery) Categories of USE/UF references • Variant forms of expression – Nonbank banks USE Nonbank financial institutions • Alternate arrangement of terms – Dogs—Breeds USE Dog breeds • Earlier forms of headings – Restaurants, lunch rooms, etc. USE Restaurants Hierarchical references: broader terms and narrower terms • Link authorized headings • Show reciprocal relationships • Allow users to enter at any level and be led to the next level of either more specific or more general topics Three types of hierarchical references • Genus/species (or class/class member) Dog breeds NT Shih tzus Shih tzus BT Dog breeds • Whole/part Foot Toes NT Toes BT Foot • Instance (or generic topic/proper-named example) Mississippi River BT Rivers—United States Rivers—United States NT Mississippi River Associative or related term references • Link two headings associated in some manner other than hierarchy • Currently made between – Headings with overlapping meanings • Carpets RT Rugs – Headings for a discipline and the focus of that discipline • Ornithology RT Birds – Headings for persons and their field of endeavor • Physicians RT Medicine Entry in LCSH Automobiles (May Subd Geog) [TL1-296.5] UF Autos (Automobiles) Cars (Automobiles) Gasoline automobiles Motorcars (Automobiles) BT Motor vehicles Transportation, Automotive SA headings beginning with the word Automobile NT A.C. Automobile Abarth automobiles Alfa Romero automobile Etc. Entry in LCSH Librarians (May Subd Geog) [Z682 (Personnel)] [Z720 (Biography] BT Information scientists Library employees RT Libraries NT Academic librarians Acquisitions librarians Adult services librarians Bisexual librarians Etc. Limitations of subject access for primary sources • Standard terminology can be too generic or heterogeneous • Terms change over time (e.g., place names, archaic terms) • Large number of terms needing to be assigned • Lack of overlap in terms being assigned by different describers Alternatives to subject access • • • • • Provenance Function Genre or form-of-material Geographic coordinates Date Headings and Subdivisions Useful for Archival Purposes: Correspondence • Use for personal correspondence of individuals • Assign the following combination of headings to collections of personal correspondence: – 600 X0 $a [name of the letter writer(s)] $v Correspondence. – 600 X0 $a [name of the addressee(s)] $v Correspondence. – 650 #0 $a [class of persons, or ethnic group] $v Correspondence. – 650 #0 $a [special topics discussed in the letters] Example • Title: Letters from John Smith, metallurgist, to his student, John Doe, concerning his research into zinc alloys. – 600 10 $a Smith, John $v Correspondence. – 600 10 $a Doe, John $v Correspondence. – 650 #0 $a Metallurgists $z Maryland $v Correspondence. – 650 #0 $a Zinc alloys. Example • Title: The exchange of correspondence between Irish American author, Mary O'Brien and her publisher, Sam Brown, during her stay in France in 1925-30. – 600 10 $a O'Brien, Mary $v Correspondence. – 600 10 $a Brown, Sam $v Correspondence. – 650 #0 $a Authors, American $y 20th century $v Correspondence. – 650 #0 $a Publishers and publishing $z New York (State) $v Correspondence. – 651 #0 $a France $x Description and travel. – 651 #0 $a France $x Civilization $y 1901-1945. Headings and Subdivisions Useful for Archival Purposes: History--Sources • Assign the free-floating subdivision History–Sources under historical headings for collections or discussions of historical source materials. • The subdivision –Sources is used directly after headings and subdivisions that denote history or a historical event, or have an obvious historical connotation. • The subdivision –History–Sources, or –History– [period subdivision]–Sources is used after other headings to denote historical source materials. • Since the correspondence or diaries of an individual person may or may not be regarded as historical source material, depending on the viewpoint of the reader, do not add the subdivision –Sources or –History–Sources to the headings assigned to works of this type Headings and Subdivisions Useful for Archival Purposes: Archives • Archives are collections of documents or records relating to the activities, business dealings, etc., of a person, family, corporation, association, community, or nation. • Use the free-floating subdivision –Archives as a form or topical subdivision under types of corporate bodies and educational institutions, classes of persons, and ethnic groups, and under names of individual corporate bodies, educational institutions, persons, and families, for collections or discussions of documentary material, such as manuscripts, household records, diaries, correspondence, photographs, memorabilia, etc., pertaining to these persons or institutions. • Code –Archives as a $v subfield if the work consists of collections of documentary material. Code it as an $x subfield if the work discusses the documentary material. Examples • Title: The personal archives of President Calvin Coolidge. – 600 10 $a Coolidge, Calvin, $d l872-1933 $v Archives. – 651 #0 $a United States $x Politics and government $y 19231929 $v Sources. • Title: Documents of the State Department relating to the history of Greece from 1950 to 1954. – 651 #0 $a Greece $x History $y 1950-1967 $v Sources. – 610 10 $a United States. $b Dept. of State $v Archives. • Title: Papers of the Society of American Indians [microform] – 650 #0 $a Indians of North America $x History $v Sources. – 610 20 $a Society of American Indians $v Archives. Headings and Subdivisions Useful for Archival Purposes: Archives • Use the free-floating subdivision –Archives under names of corporate bodies, including individual educational institutions, provided that the corporate body or educational institution is an authoring party in the preparation of the archive, not merely the institution that houses the archive. • If the collection is a formally organized archive for which a name heading can be established, use that heading, as appropriate, instead of the subdivision –Archives under the name of the corporate body. Headings and Subdivisions Useful for Archival Purposes: Manuscripts • Because of the unique characteristics of manuscripts and works about them, it is necessary for subject catalogers to assign a complex of subject headings in order to bring out various aspects, each of which represents a possible means of retrieval. • Included among these various aspects are the following: the topical information presented in the manuscript; the category of works to which the manuscript belongs, such as missals; the illuminations present; the name of the collection to which the manuscript belongs; etc. • See SCM H1855 for details • See SCM 1845 for specific instructions for genealogy and local history collections 6XX (Subject Headings) • 600 (Personal Name Subject Heading), 610 (Corporate Name Subject Headings), 611 (Meeting Name Subject Heading) • Use for subject access to the main entry. • As a rule, put the name in the 100 field into the 600 field, because often the archival and manuscript material (such as letters or personal papers) is as much about the person as it is authored by the person. There are, however, exceptions to this, as when someone has written a book about someone or something else, and it is not logical to put the author into a 600 field but into the 700 field. • Choosing names: Significant personal or professional subjects of either correspondence or other nature significant to the collection should be included. Also, include a person if there is a large volume of letters sent to that person, but none are received from them. 6XX (Subject Headings) • Although there is no limit to the number of names one can include using this field, limit your choices to only the most significant names reflected by a collection. • Use authorized forms of names. • 600 1st indicator – 0 Forename 100 0 $a Liberace – 1 Surname 100 1 $a Chiang, Kai-shek – 3 Family name 100 3 $a Dunlop family 6XX (Subject Headings) • 655 (Genre/Form Heading) • Terms indicating the genre, form, and/or physical characteristics of the materials being described. • 2nd indicator – 0 LCSH – 2 MeSH – 7 Source specified in $2 • $2 Examples of thesauri used: – aat (Art and Architecture Thesaurus) – gmgpc Thesaurus for graphic materials: TGM II, Genre and physical characteristic terms – rbgenr (Genre Terms Created by the Bibliographic Standards Committee of RBMS) • 655 #7 $a Diaries $2 aat 6XX (Subject Headings) • 656 (Index Term/Occupation) • Contains terms giving occupations and avocations reflected in the contents of the described materials. It is NOT used to list the occupations of the creator, unless they are significantly reflected in the materials themselves. • Major sources for occupational terms and $2 codes are: – aat Art and Architecture Thesaurus – lcsh Library of Congress Subject Headings – dot Dictionary of Occupational Titles (U.S. Dept. of Labor) • 656 #7 $a Politicians. $2 lcsh 6XX (Subject Headings) • 657 (Index Term/Function) • An index term describing the activity or function that generated the described materials (e.g., property assessment or voter registration). • 650 #7 Annual inventory ‡x Ladies' apparel. ‡2 [thesaurus code] Getty Vocabularies • Structure & content are based upon standards (e.g., ISO, CDWA) • Are compiled resources (not comprehensive) • Growth through collaboration, inside Getty & outside Getty Vocabularies • Art & Architecture Thesaurus (AAT) • Union List of Artist Names (ULAN) • Getty Thesaurus of Geographic Names (TGN) Types of terms in vocabularies • personal names: Painter of the Wedding Procession (attributed to); Nikodemos (signed, as potter) • geographic names: Athens • object names: storage vessels, Panathenaic amphorae • corporate names: J. Paul Getty Museum • iconographic subjects and themes: Nike Crowning the Victor, with Judge on right and defeated opponent on left • genre terms: Antiquities, ceremonies • multilingual terms: Athínaí (Greek) = Athens (English) = Athenae (Latin) Types of terms in vocabularies • • • • • • • personal names in the Union List of Artist Names you will find "Georgia O’Keeffe" geographic place names in the Getty Thesaurus of Geographic Names you will find "Botswana" corporate names in the Library of Congress Name Authority File you will find "Metropolitan Museum of Art (New York. N.Y)" object names in the Art & Architecture Thesaurus you will find "scroll paintings" iconographic subjects and themes in ICONCLASS you will find the "education of Cupid by Venus and Mercury" genre terms in the Thesaurus for Graphic Materials II: Genre and Physical Characteristic Terms you will find "political cartoons" multi-lingual terms in the Multilingual Egyptological Thesaurus you will find the term "pottery" in English, German, "keramik" and French, "céramique". Getty Vocabularies • data value standards that provide terminology for use in cataloging, indexing and documentation practice. They are most effective when used in combination with data structure standards (e.g., CDWA) and data content standards (e.g., AACR2). • thesauri built according to standards. They follow the rules and conventions prescribed by standards organizations such as ISO, NISO, and other codes of practice for thesaurus construction. • designed for use in both indexing and retrieval. They are intended to bridge the language of the indexer and that of the searcher. If the vocabularies are available at the time of the search query, the searcher can consult the vocabulary to see what likely terms are available for the query. Getty Vocabularies • facilitators for information-sharing among different types of collections. For example, the AAT can be used to describe subject matter for books in a library, works of art in a museum, records in an archive, or images on the Web. • application independent. The Vocabularies can be applied in the electronic environment in a variety of applications (e.g., databases and search engines) as well as in manual indexing systems, such as a card file. • evolving and growing tools. Work with contributors allows for ongoing community input and expansion of coverage in specialized subject areas. AAT • focus of the AAT is on art and architecture, as the title suggest. • However, the AAT can provide terminology for the description, documentation, and retrieval of visual and textual surrogates for art, and for related disciplines. • The scope of the AAT is global, although currently it is richest in terminology used for art of Western Europe and North America. • The AAT is growing and expanding coverage by incorporating additional data from a variety of Getty projects and external contributors. For example, a working group from the National Museum of African Art has added terminology for African styles/periods and object names. AAT • The AAT includes terminology related to: – works of art (e.g., painting, sculpture, mixed media) – architecture (e.g., the built and natural environment) – material culture (e.g., furniture, costume, and equipment) – forms and genre (e.g., document types, records) – cultural traditions (e.g., events) High-Backed Chair for Miss Cranston's tea rooms AAT terms in Italics • • • • • • • • • • • What is it? high-backed chair What is it made of? oak, horsehair How was it made? upholstered, stained, pierced Who made it? Charles Rennie Mackintosh, architect When was it made?1898-99 What style is it? Arts and Crafts What is it part of? tea room What condition is it in? reupholstered How was it used? dining What is it about? anthropomorphic Where did it come from? Miss Cranston's Arbyle Street Tea Rooms • Where is it? Glasgow School of Art, Glasgow AAT does not include certain types of terminology • Personal Names: Charles Rennie Mackintosh (ULAN) • Corporate Names: Glasgow School of Art (Library of Congress authority files) • Geographic Place Names: Glasgow (TGN) • Building Names: Miss Cranston's Argyle Street Tearoom (local authority) • Historic Events: Exhibition of Decorative Art, London, 1923 (Library of Congress authority files) • Iconographic themes: Venus and Cupid (ICONCLASS) Art & Architectur e Thesaurus • Contains around 34,000 concepts, 131,000 terms •Records contain terms, notes, relationships, bibliography Scope ranges from antiquity to present Global, but preponderance of Western concepts Terms describe Art, Architecture, Decorative Arts, Material Culture, & Archival Materials Elements of an AAT record parent concept furnishings mirrors wall mirrors concept Note: The Focus of each vocabulary record is a concept - not a “term” object, material, activity, style, attribute... scope note Tall, narrow mirrors intended to fill the pier, the space between two windows... names/terms pier glasses pier mirrors trumeaux related concepts pier tables sources Comstock, Helen. The Looking Glass in America, 1700-1825. Page 17. TGN • The TGN is a structured vocabulary containing around 1,000,000 names and other information about places. • The TGN includes all continents and nations of the modern political world, as well as historical places. • It includes physical features and administrative entities, such as cities and nations. • The emphasis in TGN is on places important for art and architecture. Getty Thesaurus Scope and range of Geographic Names Records for 912,000 places, 1,106,000 names Names, coordinates, relationships, dates & bibliography Includes all continents and nations of modern political world, historical places Includes physical features Includes inhabited places, other administrative and political entities Emphasis on places important to art & architectural history Elements of a TGN record Focus is concept names Siena Sena Julia parent place notes Italy Tuscany Siena province geographic coordinates place 43 19 N, 011 21 E Founded as Etruscan hill town; later was Roman city of Sena Julia; thrived under Lombard kings; was medieval self-governing commune; was seat of Ghibelline power ... place types bibliography Annuario Generale (1980) Dizionario Corografico Toscana (1977) Webster's Geographical Dictionary (1984) Hook, Siena (1979), 6 ff. TCI: Toscana (1984), 479 ff. Times Atlas of the World (1992), 183 Canby, Historic Places (1984), II, 861 Milanesi, Storia dell'Arte Senese (1969) inhabited place provincial capital dates settled by Etruscans (flourished 6th cen. BCE) ULAN • The ULAN is a structured vocabulary that contains around 220,000 names and other information about artists. • The coverage of the ULAN is from Antiquity to the present, and the scope is global. • The scope of the ULAN includes any identified individual or "corporate body" (i.e., a group of people working together) involved in the design or creation of art and architecture. Scope and Range Union List of Artist Names Scope is from Antiquity to the present Coverage is global, preponderance Western artists Identified individuals or groups of individuals working together (corporate bodies) Involved in the conception or production of visual arts ULAN contains records for 120,000 ‘artists’, & architecture 293,000 names Records contain names, biographical information, relationships, & bibliography Elements of a ULAN record roles painter draftsman Focus is concept geographic location Ferrara (Italy) Venice (Italy) notes Although early biographers, including Vasari, noted a birth date of ca. 1475, modern scholars agree that he cannot have been born much before 1490... bibliography Artist names names Dosso Dossi Dosso Dossi Giovanni de Lutero Giovanni de Lutero Dosso da Ferrara Dosso da Ferrara Giovanni di Niccolò Giovanni di Niccolò life dates born ca. 1490, active from 1512, died 1542 related people student of: Lorenzo Costa di Ottavio, from 1507 *Bénézit; Berenson; *Bolaffi; *Encyc. world art; Gibbons, DOSSO AND BATT. DOSSI (1968); Grove Dict of Art Cataloguing Cultural Objects as a tool for subject cataloguers Aims • practical guidance for subject cataloguers, indexers • intra- and inter-indexer consistency • user–indexer consistency • retrieval effectiveness Cataloguing Cultural Objects as a tool for subject cataloguers Challenges 1. 2. 3. 4. 5. what does “subject” mean? -- i.e., what kinds of property of works should be indexed? what kinds of method should be used to determine the subject(s) of works, and ... ... to select terms that represent those subjects? what kinds of control should be imposed on the lists of terms from which selection is made, and how should such authority control be implemented? what metadata elements should be established for recording subject data? Kinds of subject Subjects, objects, images, texts • subjects: e.g., people, things, events, places, concepts • objects (works) [in museums, archives]: e.g., artworks, buildings, artifacts, documents, collections – descriptive cataloguing: what the objects are – subject cataloguing: what subjects the objects are of / about Kinds of subject • images [in visual resource collections]: visual representations of objects, e.g., photographs, slides, digital files – descriptive cataloguing: what the images are; what objects the images are of – subject cataloguing: what subjects the images are about • texts [in libraries]: verbal representations of objects, e.g., books, journal articles – descriptive cataloguing: what the texts are – subject cataloguing: what objects the texts are about; what subjects the texts are about CDWA Subject • In CDWA, subject matter is analyzed according to a method based on the work of Erwin Panofsky • Panofsky identified three main levels of meaning in art: – Pre-iconographic description – Iconographic identification – Iconographic interpretation or “iconology” CDWA Subject • Three sets of subcategories under the category Subject Matter in CDWA reflect this traditional art-historical approach to subject analysis • Simplified and practical for purposes of retrieval CDWA Subject • CDWA levels of subject analysis – Subject matter–Description. A description of the work in terms of the generic elements of the image or images depicted in, on, or by it – Subject matter–Identification. The name of the subject depicted in or on a work of art: its iconography. Iconography is the named mythological, fictional, religious, or historical narrative subject matter of a work of art, or its non-narrative content in the form of persons, places, or things – Subject matter-Interpretation. The meaning or theme represented by the subject matter or iconography of a work of art. Mantegna’s Adoration of the Magi • Subject matter–Description: woman, baby, men, vessels, coins, turbans, etc. • Subject matter–Identification: Known iconographic subject. Based on New Testament (Matthew 2). Balthasar, Melchoir, Caspar, Mary, Jesus, Joseph • Subject matter-Interpretation: Three Ages of Man (Youth, Middle Age, Old Age); Three Races of Man; Three Parts of the World Kinds of subject Representation • representational (figurative) works – narrative subjects • stories • episodes in stories, i.e., events – non-narrative subjects • people, animals, plants • objects, e.g., buildings • activities; places; periods • [work types: portraits, still lifes, landscapes, genre scenes, architectural drawings ...] Kinds of subject • non-representational works • • • • abstract works buildings furniture decorative arts – “subject” / content = • meaning (symbolic, allegorical, thematic, conceptual) • form, composition • function, purpose, use Kinds of subject Ofness and aboutness • what is the work of? – generically: description • e.g., “Nude standing woman seen from front, holding dagger in right hand” – specifically: identification • e.g., “The suicide of Lucretia” • what is the work about? – interpretation • e.g., “virtuousness” CCO recommendation #1 • subject data should be consistently given for all works, not just for representational ones – (even if those data end up overlapping with the content of other elements, e.g. Work Type) Subject analysis Ofness • who? what? where? when? – people, objects/activities, places, times • generic to specific • left to right; top to bottom; foreground to background ... Subject analysis Aboutness • what is the meaning of the work? • what is expressed by the work? • what do the objects, events, etc., depicted in the work symbolize? • how may the image be interpreted? • what was the intention of the work’s creator? • how has the work been interpreted historically? CCO recommendation #2 • take a methodical approach to subject analysis Term selection What kinds of terms? How many terms? • factors that can’t help but affect the specificity of indexing: – quality and quantity of available scholarly information about the work – extent of indexer’s knowledge of the work – extent of indexer’s general pre-iconographic knowledge – depth of indexer’s indexing expertise – availability of time; money; human resources; technology at institution’s disposal Term selection • factors that should also affect the specificity of indexing – – – – – needs of end-users: expert and non-expert characteristics of the collection relative importance of the work presence of unusual details in the work institutional policies • number of terms to be assigned per work • method of subject analysis to be used – capabilities of system • e.g., to link NTs to BTs, preferred terms to synonyms and RTs, etc. CCO recommendation #3a • don’t be specific without the support of scholarly evidence – better to be general and accurate than specific and wrong CCO recommendation #3b • use subject terms that have been identified as “preferred” in established authority files (controlled vocabularies) Authority control Four kinds of authority file • Personal and Corporate Body Authority – preferred forms of names of real people/bodies (as artists, patrons, subjects of works) • Geographic Place Authority – preferred forms of names of real places Authority control • Concept Authority – preferred forms of genre terms • e.g. “still life,” “landscape” – preferred forms of generic subject terms • objects, materials, activities, agents, properties, styles, periods treated as subjects Authority control • Subject Authority – preferred forms of iconographical terms • proper names, uniform titles, standard labels ... • ... of characters, situations, events, themes, works (e.g., buildings) ... • ... in historical, mythological, religious, literary contexts Authority control • cf. AAT: Art & Architecture Thesaurus – terms for describing what objects / images are – project began 1980; funded by CLR, NEH, Mellon, then Getty from 1985; sponsored by ARLIS, CAA, SAH, etc. – current: version 3.0-Web, at http://www.getty.edu/research/conducting_research/vocabularies/aat/ • cf. ICONCLASS: Iconographic Classification System – terms for describing what objects / images are of / about – an iconographic classification system (not a vocabulary per se) – a collection of circa 24,000 ready-made definitions (in English) of objects, persons, events, situations, and abstract ideas that can be the subject of a work of art (emphasis is on Western art) – 1949: van de Waal (U. Leiden) began to develop ideas that led to ICONCLASS – 1973-85: published in 17 vols. – ICONCLASS Libertas Browser (KNAW, Amsterdam): web-accessible version, at http://www.iconclass.nl/ ICONCLASS • Iconclass was developed by Henri van de Waal (19101972), Professor of Art History at the University of Leiden • His ideas for a systematic overview of subjects, themes and motifs in Western art, which later became the Iconclass System, took form in the early 50’s. • The complete Iconclass System was finished in the years after 1972 by a large group of scholars and was published between 1973 and 1985 by the Royal Netherlands Academy of Arts and Sciences (KNAW) of which Van de Waal was a member. ICONCLASS • Iconclass is a subject-specific classification system; it is a hierarchically ordered collection of definitions of objects, persons, events and abstract ideas that can be the subject of an image. • Art historians, researchers and curators use it to describe, classify and examine the subject of images represented in various media such as paintings, drawings and photographs. ICONCLASS • Numerous institutions across the world use Iconclass to describe and classify their collections in a standardized manner. • In turn, users ranging from art historians to museum visitors use Iconclass to search and retrieve images from these collections. • As a research tool, Iconclass is also used to identify the significance of entire scenes or individual elements represented within an image. The three main component of Iconclass are • Classification System: 28,000 hierarchically ordered definitions divided into ten main divisions. Each definition consists of an alphanumeric classification code (notation) and the description of the iconographic subject (textual correlate). The definitions are used to index, catalogue and describe the subjects of images represented in works of art, reproductions, photographs and other sources. • Alphabetical Index: 14,000 keywords used for locating the notation and its textual correlate needed to describe and/or index an image. This index is a valuable tool for iconographers in the identification, search and retrieval of subjects and scenes. • Bibliography: 40,000 references to books and articles of iconographical interest. Authority control Kinds of source of terminology for local authority files – distinguished by structure: • hierarchical vs. non-hierarchical – by object type: • subjects vs. people/places – by scope: • domain-specific vs. interdisciplinary – by purpose: • authority control vs. end-user reference CCO recommendation #4 • link the occurrences of subject terms in work records to the authority records for those terms – (in authority files that implement synonym control and hierarchical structure) Record structure Metadata element sets • cf. CDWA: Categories for the Description of Works of Art – ed. Baca, Harpring – funded by Getty, NEH, CAA – 2000: version 2.0; on web at http://www.getty.edu/research/conducting_research/standards/cd wa/ • cf. VRA Core Categories – ed. Lanzi, Whiteside – 2007: version 4.0; on web at http://www.vraweb.org/projects/vracore4/index.html Record structure Subject metadata elements recommended by CCO • Description [free-text; non-repeatable] • Subject [required; controlled; repeatable] • Extent – for designating the part of the work to which the subject terms are applicable • Subject Type – for distinguishing between description, identification, interpretation CCO recommendation #5 • implement separate subject elements for display and for retrieval Example • Statue of Hercules (Lansdowne Herakles) • Unknown Roman sculptor; after the School of Polykleitos • about 125 CE • marble • height: 193.5cm • J. Paul Getty Museum (Los Angeles, CA) • ©2004 J. Paul Getty Trust. Example Description: Herakles standing in contrapposto, holding his attributes, the skin of the Nemean lion and a club. This statue was found in Tivoli ca. 1790, in the ruins of Hadrian’s villa; it was in the collection of the Marquess of Lansdowne until 1951. It is related in appearance to works attributed to 4th-century BCE Greek sculptors; however, the work has an eclectic style that is purely Roman. Subject--Description: religion/mythology; human figure; male; nude; lion skin; club Subject--Identification: Hercules (Greek/Roman hero); Nemean Lion Example of a Subject Authority record Subject Names: Hercules (preferred); Herakles; Heracles; Ercole; Hercule; Hércules Hierarchical Position: Classical mythology--Greek heroic legends--Story of Hercules--Hercules Indexing Terms: Greek hero; king; strength; fortitude; perseverance; Argos; Thebes Note: Probably based on an actual historical figure, a king of ancient Argos. The legendary figure was the son of Zeus and Alcmene ... Related Subjects: Labors of Hercules; Love Affairs of Hercules; Zeus (Greek god); Alcmene (Greek heroine); Hera (Greek goddess) Dates: Story developed in Argos, but was taken over at early date by Thebes; literary sources are late, though earlier texts may be surmised. Earliest: -1000 Latest: 9999 Sources: ICONCLASS http://www.iconclass.nl/; Grant, Michael and John Hazel. Gods and Mortals in Classical Mythology. Springfield, MA: G & C Merriam Company, 1973. Page: 212 ff. Opportunities • • • • • integrity and longevity of data consistent, reliable access to data exchange, sharing, reuse of data interoperability of systems easy migration of data to new systems • communication, cooperation, collaboration Questions • should indexers be expected to do iconographical research to index aboutness? • should cultural-historical questions about a work’s unintended meanings be answered by indexers? • how may future users’ needs be predicted? • what role for general knowledge-organization schemes?