Knowledge Organization: Thesauri and Content Standard Principles Thesauri 1. Introduction to Thesauri: - - Particular genre of controlled vocabularies - specific to info field(?) in this version? Controlled vocabulary = any kind of designed, finite list of options (e.g. drop down menu of languages to choose from) Thesauri Used - re: information-bearing objects, institutions - For aboutness (official subject headings) “Syndetic structure” Used to name what something is about (classification is where they fit) Research Papers, for e.g.: Specifically talking about subjects field - Author keywords are not included Hierarchical Classification vs. Faceted Classification vs. Thesauri E.g. “Education” Hierarchical Classification - e.g. Dewey Decimal System - Every term / level only has meaning in context of the above terms/levels Faceted Classification: E.g. three hierarchical levels: 1). Location then 2). Discipline 3). Level (Primary, Secondary) - Every item only has one location / place where it lives Thesaurus Rules / Syndetic Structure: E.g. ERIC Thesaurus slide put together by Dr. Bullard (rare to see thesaurus visualizations) Syndetic Structure - Relationships in subject headings & thesaurus systems - Thesaurus rules Three Principles of Organization: 1. Equivalence - which things mean the same thing 2. Hierarchy - which things are more specific variants of another thing - species of a same thing 3. Other Relationships a. neither of the above b. but with intrinsic associations / relationships to another thing ERIC Thesaurus (Education Resources Information Centre) - has Database E.g. Physical Education (under / part of ‘Education’) - Specific kinds of physical education - Long list of other relationship terms (e.g. athletic coaches) (increasingly) long list of terms being added for e.g. to one article by a worker at ERIC - can get messy. Library Thesauri usually have upper limit (# of allowed terms) per resource. E.g. with abbreviations: BT- Broader Term: - Dance -----> Fine Arts - Dance has the broader term of Fine Arts - (unidirectional, solid line / connection) NT- Narrower Term: - Dance is a part of Fine Arts - (solid line / connection, unidirectional - FA ---> D) - Fine Arts has the narrower term of Dance RT - Related Term: Dance ←- - - - - > Fine Arts (bidirectional) Relationships Used in Thesauri: 1. Equivalence a. “USE” / “Used For” b. Equivalent Phrases (e.g. air/aeroplanes) c. Inverted Forms (e.g. bilingual education & education, bilingual) - searches alphabetized. d. Acronyms & Abbreviations e. Antonyms i. Uncommonly used ii. Searching for “dropouts” ---> should lead to items regarding “Student retention” (always relevant to first term) 2. Hierarchical a. Broader term (BT) b. Narrower term (NT) 3. Associative a. Related term Lead-in Terms: - People using queries / their specific locational spelling etc. to find info - Goal: direct people to same resource - E.g. lead-in term of “aeroplanes” -> USE “airplanes” 1. Equivalence Relationship Upward Posting (query leads to only broader category - e.g. ballet -> dance) - Often because there aren’t enough resources specific (e.g. to ballet) to justify it having its own term in a given thesauri context 2. Hierarchical Relationships Relationship between concept & more specific concept 1. Generic Relationship: link to a more specific type a. Teachers NT School Teachers b. Thinking NT Reasoning 2. Instance Relationships: a link to a particular example a. Seas NT Baltic Sea b. Wars NT WW2 3. Partitive Relationship: a link to a part a. Canada NT British Columbia b. Cars NT Steering Wheel 4. Other Types of Relationships? a. Children NT Television and Children Hierarchical Breadth Hierarchical Depth: all ‘levels’ should be true (e.g. all cats are living things) AKA “transitive” 3. Associated Relationships (related terms) - Intrinsic relationships - brings together related concepts, but that have another BT/NT relationship - Not transitive - Examples, often: - Operations & Instruments - Actions & Products (e.g. Roadmaking RT Roads) - Causal Relationships (accidents RT injury) - Field of study & objects studied (e.g. xenobiology RT alien lifeforms) - When not to use: - E.g. Insects NT Bees & Flies - Doesn’t pass tests from previous slide - Bees & Flies don’t need RT to one another Thesaurus / Activity: have document speak for itself (avoid extra / separate text with explanations) One common way: - Scope Notes (SN) - Describes how heading is to be used - E.g. “Art” here means / encompasses __xyz__, as opposed to more precise headings such as “Art Products”, “Visual Arts” (basically, go see those if you need to) Content Standard Principles 1. Intro Key Questions: - Which aspects of a resource to represent? - What are the constraints placed on those representations? - Is there a limit on the type of word used, professional knowledge required? Central Concepts (generally agreed upon as being important to include) - Title - Creator (responsible for resource) - Version (e.g. diff editions, publishing dates) Other concepts (might want to use / include) - What it looks like - Resource type (text, sound recording…) - Who published it and where - How to ID resource (ISBN, ISSN, DOI, etc.) - What resource is about - Thesaurus terms - Class number (libraries) Content Schema vs. Content standards - Not exactly interchangeable - Schemas become standards when regulated, managed, shared among institutions - Schema = ad hoc, personal, idiosyncratic -> standards = professional, controlled, agreed upon (Authority) Control: - Of language & consistency - Allows for trust in system, results - Users get used to how system works in one context and can apply it in new one / institution. Ability to share / compare results & predict across records, institutions Structure within record Bibliographic relationships - Structure for linking between (related) records in reliable way Abstraction & Specificity - How important is it to make similar resources appear the same? - How important is it to distinguish similar resources from each other? - What do users need to know / have specified? Access - “Access points” - In a digital environment: what is searchable, filterable? - What will used need to find, sort, ID & evaluate a record? (e.g. probably don’t need to know that a give book is 13.27cm wide) IFLA’s General Principles: - Everything we’re trying to achieve - Complex, and principles often in tension with one-another - often trade-offs (e.g. consistency & standardization vs. convenience of the user - might mean lacking searchable details they are interested in) Dublin Core - Example of Content Schema with Minimal Rules - Even less policed than standard content schema - to be used as (customizable) base for diff. Orgs, environments - Sets up common “core” of metadata elements for web-based resource - Main goal: easier search & retrieval mechanisms - Across institutions, super broad - e.g. ideally able to search resources by year across a ton of institutions - 15 elements - Dublin Core Qualifiers: - Able to use more specific version of one of the elements - E.g. instead of standard ‘date’ - Date digitized / modified / available Metadata & Cataloguing: - in practice, each describe different kinds of approaches - Many parallels between the two Most Metadata: - Created specifically to describe digital content - DC often constantly changes = ‘extent’ of website or page can be difficult to measure as content evolves & doesn’t have ‘editions’ Harder to control = metadata plays gatekeeping / control role Choosing / Creating Content Standard: Consider: - Balance - Functionality & Simplicity (how to tell the ideal number of elements?) - Support Human & Machine Use (learn / read etc. differently) - Support Interoperability - Metadata from one institution should be readable by another = standardization - Md from one schema can be readable in another (enabled by translations, crosswalks) - Support Extensibility - Md schema is adaptable for local needs (made specific) - Extended schema can be simplified for global needs (collapsed using aggregate tool) Example: creating a crosswalk between Dublin Core and EAD fields e.g. “dc.title” maps onto EAD (Header) field “titleproper” Summary Each Schema or Standard has: - Set of values - Instructions on which elements are necessary - Instructions on how to modify elements - Instructions on how to fill out the values - Possibly a reference to Controlled Vocabs for specific fields / professions -