Lecture: faceted classification

advertisement
Faceted Classification
Complex subjects from
simpler components
Outline
• Refresher from last week: Basic classification
structures and arrangement within that
structure.
• Goals of faceted classification.
• Basic design of faceted classifications.
• Facet analysis of complex subjects (factoring).
• Determination of facet structure.
• Faceted browsing on the Web.
Three basic classification
structures
One hierarchy. All concepts emanate via
hierarchical relationships from a single root
node.
Example: The enzyme hierarchy from the first
day of class.
Three basic classification
structures
Multiple parallel hierarchies. Instead of a single root
node, there are multiple nodes. The parallel hierarchies
might be of similar kinds, but different themes (e.g.,
religion and science, which are both disciplines).
Examples: The Soviet library classification from the
first day of class (any library classification, where there
are separate independent sections for various
disciplines).
Three basic classification
structures
Faceted. A variation of multiple parallel
hierarchies in which fundamental types are
combined to create complex concepts. Facets are
typically of different orthogonal kinds
(processes, products, actors, places).
Examples: See Hunter and Vickery readings.
Structural refinements
In addition to the basic hierarchical relationships
from broader terms to narrower ones (is-a, is-apart-of, is-an-instance-of), we can implement
additional structural refinements to clarify
relationships between concepts in a single array
(level) of a hierarchy.
Arrangement within arrays
Two forms:
• When multiple principles of division are used,
showing the nature of relationships between
narrower concepts (“children”) to a broader
concept (“parent”).
• Using order of concepts within an array to
convey relationships between “siblings.”
Showing principles of division
shoes
high heels
hiking boots
mary-janes
pumps
running shoes
sandals
slingbacks
stilettos
wedges
winter boots
shoes
(by season)
winter
spring
(by function)
hiking
running
(by style)
boots
pumps
sandals
(by feature)
slingbacks
mary-janes
(by heel type)
stilettos
wedges
(by heel height)
high heels
Another example
Furniture
(by material)
wooden furniture
plastic furniture
(by style)
rococco furniture
modern furniture
(by room)
bedroom furniture
office furniture
(by function)
storage furniture
(by form)
bookcases
tables
desks
wardrobes
bureaus
sleeping furniture
Uses of structural labels
The parenthetical phrases that
indicate principles of division
(sometimes called “node labels” or
“subfacet indicators”) are typically
not used for indexing, but help the
user (either the indexer or the
information seeker) to understand
the types of relationships defined
by the system and to apply terms
accordingly.
A shoe might be indexed as:
winter-boots-wedges-high heels.
shoes
(by season)
winter
spring
(by function)
hiking
running
(by style)
boots
pumps
sandals
(by feature)
slingbacks
mary-janes
(by heel type)
stilettos
wedges
(by heel height)
high heels
Ordering concepts at each level
An “array” is a group of siblings (descriptors at the same level of hierarchy).
Order in an array provides another means to show relationships between
concepts. Possible orders:
•Chronological (art styles from Post-Impressionist to Dada to Cubist to
Abstract Impressionist)
•Directional (east to west, for example, or closest to farthest)
•Increasing intensity (slowest to fastest music tempos, for example, or lightest
to darkest hues)
•Increasing concreteness (from more general to more specific, such as from
philosophical warrant to cultural warrant to literary warrant)
•Increasing quantity (from one to many)
•Order of a process (from plowing to planting to weeding to harvesting, for
example)
Example: music tempos
Allegro
Andante
Largo
Moderato
Presto
Vivace
Largo
Andante
Moderato
Allegro
Vivace
Presto
(alphabetical)
(slowest to fastest)
And in the beginning,
S. R. Ranganathan saw a Meccano set...
Motivations for faceted classification
• The sheer number of documents keeps growing.
• The subjects of the documents are both more specific and more
complex.
• Knowledge itself is rapidly expanding—new subjects are
constantly being created.
It’s not helpful to put huge numbers of documents in general
subject categories (British History, Nuclear Physics). And yet we
can’t possibly enumerate all the possible subjects that either
currently exist or may soon exist. What to do?
Goals of faceted classification
If we can create a classification scheme that lists
subject components, then we can build complex
subjects out of the components as necessary.
We facilitate the construction of complex
subjects by organizing the component concepts
that make up our classification into facets, or
potential aspects of the subject.
From compound to components
Example of complex subject:
The history of Japanese tea-drinking etiquette
Components (or isolates, or factors): History + Japan + Tea +
Drinking + Etiquette
Potential fundamental categories (facets) for the components:
Disciplines (history); Locations (Japan); Beverages (Tea);
Activities (Drinking); Values (Etiquette)
Building subjects from components
A traditional faceted classification for libraries includes both the
facet structure of components and syntax rules for combining the
components into complex subjects.
These rules are necessary to ensure that documents are filed
consistently on shelves. (In an online environment, these rules
become superfluous.)
To “mechanize” the subject-building process and simplify filing,
components are given a notation (such as “soil acidity – sag” that
clarifies the component’s position within a facet.
Structure of faceted classifications
While a facet may be a simple list, components within a facet are
typically arranged hierarchically (using a stricter or looser sense
of hierarchy as appropriate).
Organic farming classification
Crops
Fruits
(by origin)
Vines
Grapes
Bushes
Trees
Vegetables
Herbs
Processes
Materials
Planting
Natural soil amendments
Controlling pests
Fertilizing
Compost
Mulch
Natural pesticides
Designing faceted classifications
1. Decompose complex concepts (which you have
gathered via your research into the subject literature)
into component parts, via syntactic or semantic
factoring.
2. Group the simple components into fundamental
categories.
3. Organize the components in each facet (with
hierarchical relationships, subfacets that indicate
multiple principles of division, order within arrays,
and so on).
Understanding complex concepts
There are two kinds of compounds:
• A multi-word unit (which may be a simple
concept, such as stained glass, or a complex
concept, such as glass cutting).
• A multi-concept unit (which may be a single
word, such as sourdough).
Syntactic and semantic factoring
Syntactic factoring: A term with multiple words is
divided into smaller components.
Example: rye bread into rye + bread; Irish emigration
into emigration + Irish
Semantic factoring: A term is divided into multiple
elementary concepts.
Example: apartment into dwelling + rental + shared
building.
Semantic factoring
Most standards/authorities don’t recommend semantic factoring,
and there aren’t rules you can use to help with it.
But semantic factoring can sometimes help you discover missing
concepts in your subject language.
It might be extreme to describe Passover as “holiday + Jewish +
commemoration + Exodus,” but doing so might make us consider
both religion and commemoration of events as aspects common to
many holidays.
Parsing compounds
A compound term consists of a focus (the class of things
or events) and a difference, which modifies the class and
makes a subclass.
Examples:
• Car tires: Focus is tires, difference is cars.
• Opera singing: Focus is singing, difference is opera.
• Mushroom hunter: Focus is hunter, difference is
mushroom.
Action/patient factoring
If the term contains an action (focus) modified by the recipient of
the action (difference), factor.
But if the term refers to a material (focus) as modified by an
action (difference), don’t factor.
Example:
Hair dyeing: hair + dyeing
Bronze engraving: bronze + engraving
But don’t factor: dyed hair, engraved bronzes
Part/whole factoring
If the focus refers to a part or property, and the difference refers
to the whole or the possessor of the part or property, factor.
But if the focus is the whole, and the difference is the part or
property, don’t factor.
Examples:
Soil acidity: soil + acidity
Car tires: tires + cars
Don’t factor: spare tires, rain forest.
Maybe: pine forest, redwood forest.
Action/performer factoring
If the term contains an intransitive action (focus) modified by the
performer (difference), factor.
If the performer (focus) is modified by its performance of an
intransitive action (difference), don’t factor.
Examples:
Student meeting: students + meetings
Lemur migration: lemurs + migrations
But don’t factor: migratory birds
Determination of facet structure
Ranganathan started from the top down: describing fundamental
categories (PMEST) for all subjects and organizing components
into those universal facets.
The Classification Research Group (CRG), as described by
Vickery, advocates beginning from the bottom up: reviewing
components and assigning preliminary fundamental categories
based on the concept’s definition within the classification’s
domain, then looking for commonalities in these preliminary
choices. Facets are specific to each classification.
Principles for creating facets
Some elements to consider when creating facets:
• Independence (are the facets mutually exclusive?).
• Semantic importance (do the facets represent the most
important fundamental types in the domain?)
• Balance (are the facets at similar levels of abstraction?)
• Comprehensiveness (do the facets include all important subject
components in the domain?)
• Hospitality (would it be easy to add more concepts to a facet?)
• Relevance (are the facets of interest to the identified user
group and purpose)?
Faceted browsing on the Web
Hearst’s Flamenco is an interface to support browsing of faceted
structures on the Web.
The Hearst article that you read describes how users preferred the
faceted browsing interface to a search engine when exploring the
collection.
(Note that the facets that Hearst used in the Flamenco system are
semi-automatically generated and not, perhaps, the best that one
might create...)
Your own classification design project
• Continue compiling potential concepts from source
documents.
• Use your audience and purpose, as well as your subject
knowledge, to refine the scope of your classification: its
boundaries and its central and peripheral areas.
• Break down your candidate concepts into components:
generate broader/narrower/linking concepts as appropriate.
• Define each concept’s particular meaning in the context of
your classification.
• Wrangle your concepts into a classified structure.
Assignment progress check
• Bring drafts to class next week for peer feedback
sessions (especially a draft of your classified
structure).
• Also, everyone will have a five-minute check-in with
me: you will tell me your subject, audience, and
purpose in a few sentences, and explain your
classified structure.
Download