Faceted Classification Complex subjects from simpler components Outline • Refresher from last week: Basic classification structures and arrangement within that structure. • Goals of faceted classification. • Basic design of faceted classifications. • Facet analysis of complex subjects (factoring). • Determination of facet structure. • Faceted browsing on the Web. Three basic classification structures One hierarchy. All concepts emanate via hierarchical relationships from a single root node. Example: The enzyme hierarchy from the first day of class. Three basic classification structures Multiple parallel hierarchies. Instead of a single root node, there are multiple nodes. The parallel hierarchies might be of similar kinds, but different themes (e.g., religion and science, which are both disciplines). Examples: The Soviet library classification from the first day of class (any library classification, where there are separate independent sections for various disciplines). Three basic classification structures Faceted. A variation of multiple parallel hierarchies in which fundamental types are combined to create complex concepts. Facets are typically of different orthogonal kinds (processes, products, actors, places). Examples: See Hunter and Vickery readings. Structural refinements In addition to the basic hierarchical relationships from broader terms to narrower ones (is-a, is-apart-of, is-an-instance-of), we can implement additional structural refinements to clarify relationships between concepts in a single array (level) of a hierarchy. Arrangement within arrays Two forms: • When multiple principles of division are used, showing the nature of relationships between narrower concepts (“children”) to a broader concept (“parent”). • Using order of concepts within an array to convey relationships between “siblings.” Showing principles of division shoes high heels hiking boots mary-janes pumps running shoes sandals slingbacks stilettos wedges winter boots shoes (by season) winter spring (by function) hiking running (by style) boots pumps sandals (by feature) slingbacks mary-janes (by heel type) stilettos wedges (by heel height) high heels Another example Furniture (by material) wooden furniture plastic furniture (by style) rococco furniture modern furniture (by room) bedroom furniture office furniture (by function) storage furniture (by form) bookcases tables desks wardrobes bureaus sleeping furniture Uses of structural labels The parenthetical phrases that indicate principles of division (sometimes called “node labels” or “subfacet indicators”) are typically not used for indexing, but help the user (either the indexer or the information seeker) to understand the types of relationships defined by the system and to apply terms accordingly. A shoe might be indexed as: winter-boots-wedges-high heels. shoes (by season) winter spring (by function) hiking running (by style) boots pumps sandals (by feature) slingbacks mary-janes (by heel type) stilettos wedges (by heel height) high heels Ordering concepts at each level An “array” is a group of siblings (descriptors at the same level of hierarchy). Order in an array provides another means to show relationships between concepts. Possible orders: •Chronological (art styles from Post-Impressionist to Dada to Cubist to Abstract Impressionist) •Directional (east to west, for example, or closest to farthest) •Increasing intensity (slowest to fastest music tempos, for example, or lightest to darkest hues) •Increasing concreteness (from more general to more specific, such as from philosophical warrant to cultural warrant to literary warrant) •Increasing quantity (from one to many) •Order of a process (from plowing to planting to weeding to harvesting, for example) Example: music tempos Allegro Andante Largo Moderato Presto Vivace Largo Andante Moderato Allegro Vivace Presto (alphabetical) (slowest to fastest) And in the beginning, S. R. Ranganathan saw a Meccano set... Motivations for faceted classification • The sheer number of documents keeps growing. • The subjects of the documents are both more specific and more complex. • Knowledge itself is rapidly expanding—new subjects are constantly being created. It’s not helpful to put huge numbers of documents in general subject categories (British History, Nuclear Physics). And yet we can’t possibly enumerate all the possible subjects that either currently exist or may soon exist. What to do? Goals of faceted classification If we can create a classification scheme that lists subject components, then we can build complex subjects out of the components as necessary. We facilitate the construction of complex subjects by organizing the component concepts that make up our classification into facets, or potential aspects of the subject. From compound to components Example of complex subject: The history of Japanese tea-drinking etiquette Components (or isolates, or factors): History + Japan + Tea + Drinking + Etiquette Potential fundamental categories (facets) for the components: Disciplines (history); Locations (Japan); Beverages (Tea); Activities (Drinking); Values (Etiquette) Building subjects from components A traditional faceted classification for libraries includes both the facet structure of components and syntax rules for combining the components into complex subjects. These rules are necessary to ensure that documents are filed consistently on shelves. (In an online environment, these rules become superfluous.) To “mechanize” the subject-building process and simplify filing, components are given a notation (such as “soil acidity – sag” that clarifies the component’s position within a facet. Structure of faceted classifications While a facet may be a simple list, components within a facet are typically arranged hierarchically (using a stricter or looser sense of hierarchy as appropriate). Organic farming classification Crops Fruits (by origin) Vines Grapes Bushes Trees Vegetables Herbs Processes Materials Planting Natural soil amendments Controlling pests Fertilizing Compost Mulch Natural pesticides Designing faceted classifications 1. Decompose complex concepts (which you have gathered via your research into the subject literature) into component parts, via syntactic or semantic factoring. 2. Group the simple components into fundamental categories. 3. Organize the components in each facet (with hierarchical relationships, subfacets that indicate multiple principles of division, order within arrays, and so on). Understanding complex concepts There are two kinds of compounds: • A multi-word unit (which may be a simple concept, such as stained glass, or a complex concept, such as glass cutting). • A multi-concept unit (which may be a single word, such as sourdough). Syntactic and semantic factoring Syntactic factoring: A term with multiple words is divided into smaller components. Example: rye bread into rye + bread; Irish emigration into emigration + Irish Semantic factoring: A term is divided into multiple elementary concepts. Example: apartment into dwelling + rental + shared building. Semantic factoring Most standards/authorities don’t recommend semantic factoring, and there aren’t rules you can use to help with it. But semantic factoring can sometimes help you discover missing concepts in your subject language. It might be extreme to describe Passover as “holiday + Jewish + commemoration + Exodus,” but doing so might make us consider both religion and commemoration of events as aspects common to many holidays. Parsing compounds A compound term consists of a focus (the class of things or events) and a difference, which modifies the class and makes a subclass. Examples: • Car tires: Focus is tires, difference is cars. • Opera singing: Focus is singing, difference is opera. • Mushroom hunter: Focus is hunter, difference is mushroom. Action/patient factoring If the term contains an action (focus) modified by the recipient of the action (difference), factor. But if the term refers to a material (focus) as modified by an action (difference), don’t factor. Example: Hair dyeing: hair + dyeing Bronze engraving: bronze + engraving But don’t factor: dyed hair, engraved bronzes Part/whole factoring If the focus refers to a part or property, and the difference refers to the whole or the possessor of the part or property, factor. But if the focus is the whole, and the difference is the part or property, don’t factor. Examples: Soil acidity: soil + acidity Car tires: tires + cars Don’t factor: spare tires, rain forest. Maybe: pine forest, redwood forest. Action/performer factoring If the term contains an intransitive action (focus) modified by the performer (difference), factor. If the performer (focus) is modified by its performance of an intransitive action (difference), don’t factor. Examples: Student meeting: students + meetings Lemur migration: lemurs + migrations But don’t factor: migratory birds Determination of facet structure Ranganathan started from the top down: describing fundamental categories (PMEST) for all subjects and organizing components into those universal facets. The Classification Research Group (CRG), as described by Vickery, advocates beginning from the bottom up: reviewing components and assigning preliminary fundamental categories based on the concept’s definition within the classification’s domain, then looking for commonalities in these preliminary choices. Facets are specific to each classification. Principles for creating facets Some elements to consider when creating facets: • Independence (are the facets mutually exclusive?). • Semantic importance (do the facets represent the most important fundamental types in the domain?) • Balance (are the facets at similar levels of abstraction?) • Comprehensiveness (do the facets include all important subject components in the domain?) • Hospitality (would it be easy to add more concepts to a facet?) • Relevance (are the facets of interest to the identified user group and purpose)? Faceted browsing on the Web Hearst’s Flamenco is an interface to support browsing of faceted structures on the Web. The Hearst article that you read describes how users preferred the faceted browsing interface to a search engine when exploring the collection. (Note that the facets that Hearst used in the Flamenco system are semi-automatically generated and not, perhaps, the best that one might create...) Your own classification design project • Continue compiling potential concepts from source documents. • Use your audience and purpose, as well as your subject knowledge, to refine the scope of your classification: its boundaries and its central and peripheral areas. • Break down your candidate concepts into components: generate broader/narrower/linking concepts as appropriate. • Define each concept’s particular meaning in the context of your classification. • Wrangle your concepts into a classified structure. Assignment progress check • Bring drafts to class next week for peer feedback sessions (especially a draft of your classified structure). • Also, everyone will have a five-minute check-in with me: you will tell me your subject, audience, and purpose in a few sentences, and explain your classified structure.