Faceted Classification Complex subjects from simpler components INF 384 C, Fall 2009

advertisement
Faceted Classification
Complex subjects from
simpler components
INF 384 C, Fall 2009
Outline
• Prequel: Arrangement of concepts within
hierarchies.
• Goals of faceted classification.
• Basic design of faceted classifications.
• Facet analysis of complex subjects (factoring).
• Determination of facet structure.
• Faceted browsing on the Web.
• Assignment notes.
INF 384 C, Fall 2009
Arrangement within hierarchies
Two forms:
• Showing multiple principles of division that
relate children to their parent node (subfacets).
• Ordering via appropriate principles at each
array (Ranganathan’s canons and such).
INF 384 C, Fall 2009
Showing principles of division
(subfacets)
shoes
high heels
hiking boots
mary-janes
pumps
running shoes
sandals
slingbacks
stilettos
wedges
winter boots
INF 384 C, Fall 2009
shoes
(by season)
winter
spring
(by function)
hiking
running
(by style)
boots
pumps
sandals
(by feature)
slingbacks
mary-janes
(by heel type)
stilettos
wedges
(by heel height)
high heels
Ordering concepts at each level
An “array” is a group of siblings (descriptors at the
same level of hierarchy). These should be ordered, even
if you don’t need subfacets. Possible orders:
• General to specific.
• Chronological.
• Close to far away.
• Order of a process.
INF 384 C, Fall 2009
Example: music tempos
Allegro
Andante
Largo
Moderato
Presto
Vivace
Largo
Andante
Moderato
Allegro
Vivace
Presto
(alphabetical)
(slowest to fastest)
INF 384 C, Fall 2009
Motivations for faceted classification
• The sheer number of documents keeps growing.
• The subjects of the documents are both more specific and more
complex.
• Knowledge itself is rapidly expanding—new subjects are
constantly being created.
It’s not helpful to put huge numbers of documents in general
subject categories (British History, Nuclear Physics). And yet we
can’t possibly enumerate all the possible subjects that either
currently exist or may soon exist. What to do?
INF 384 C, Fall 2009
Goals of faceted classification
If we can create a classification scheme that lists
subject components, then we can build complex
subjects out of the components as necessary.
We facilitate the construction of complex
subjects by organizing the component concepts
that make up our classification into facets, or
potential aspects of the subject.
INF 384 C, Fall 2009
From compound to components
Example of complex subject:
The history of Japanese tea drinking etiquette
Components (or isolates, or factors): History + Japan + Tea +
Drinking + Etiquette
Potential fundamental categories (facets) for the components:
Disciplines (history); Locations (Japan); Beverages (Tea);
Activities (Drinking); Values (Etiquette)
INF 384 C, Fall 2009
Building subjects from components
A traditional faceted classification for libraries includes both the
facet structure of components and syntax rules for combining the
components into complex subjects.
These rules are necessary to ensure that documents are filed
consistently on shelves. (In an online environment, these rules
become superfluous.)
To “mechanize” the subject-building process and simplify filing,
components are given a notation (such as “soil acidity – sag” that
clarifies the component’s position within a facet.
INF 384 C, Fall 2009
Structure of faceted classifications
While a facet may be a simple list, components within a facet are
typically arranged hierarchically (using a stricter or looser sense
of hierarchy as appropriate).
Organic farming classification
Crops
Fruits
(by origin)
Vines
Grapes
Bushes
Trees
Vegetables
Processes
Materials
Planting
Natural soil amendments
Controlling pests
Fertilizing
Compost
Mulch
Natural pesticides
Herbs
INF 384 C, Fall 2009
Designing faceted classifications
1. Decompose complex concepts (which you have
gathered via your research into the subject literature)
into component parts, via syntactic or semantic
factoring.
2. Group the simple components into fundamental
categories.
3. Organize the components in each facet (with
hierarchical relationships, subfacets that indicate
multiple principles of division, order within arrays,
and so on).
INF 384 C, Fall 2009
Understanding complex concepts
There are two kinds of compounds:
• A multi-word unit (which may be a simple
concept, such as stained glass, or a complex
concept, such as glass cutting).
• A multi-concept unit (which may be a single
word, such as sourdough).
INF 384 C, Fall 2009
Syntactic and semantic factoring
Syntactic factoring: A term with multiple words is
divided into smaller components.
Example: rye bread into rye + bread; Irish emigration
into emigration + Irish
Semantic factoring: A term is divided into multiple
elementary concepts.
Example: apartment into dwelling + rental + shared
building.
INF 384 C, Fall 2009
Semantic factoring
Most standards/authorities don’t recommend semantic factoring,
and there aren’t rules you can use to help with it.
But semantic factoring can sometimes help you discover missing
concepts in your subject language.
It might be extreme to describe Passover as “holiday + Jewish +
commemoration + Exodus,” but doing so might make us consider
both religion and commemoration of events as aspects common to
many holidays.
Parsing compounds
A compound term consists of a focus (the class of things
or events) and a difference, which modifies the class and
makes a subclass.
Examples:
• Car tires: Focus is tires, difference is cars.
• Opera singing: Focus is singing, difference is opera.
• Mushroom hunter: Focus is hunter, difference is
mushroom.
INF 384 C, Fall 2009
Action/patient factoring
If the term contains an action (focus) modified by the recipient of
the action (difference), factor.
But if the term refers to a material (focus) as modified by an
action (difference), don’t factor.
Example:
Hair dyeing: hair + dyeing
Bronze engraving: bronze + engraving
But don’t factor: dyed hair, engraved bronzes
INF 384 C, Fall 2009
Part/whole factoring
If the focus refers to a part or property, and the difference refers
to the whole or the possessor of the part or property, factor.
But if the focus is the whole, and the difference is the part or
property, don’t factor.
Examples:
Soil acidity: soil + acidity
Library shelves: libraries + shelves
Don’t factor: acid soils, spare tires.
INF 384 C, Fall 2009
Action/performer factoring
If the term contains an intransitive action (focus) modified by the
performer (difference), factor.
If the performer (focus) is modified by its performance of an
intransitive action (difference), don’t factor.
Examples:
Student meeting: students + meetings
Lemur migration: lemurs + migrations
But don’t factor: migratory birds
INF 384 C, Fall 2009
Determination of facet structure
Ranganathan started from the top down: describing fundamental
categories (PMEST) for all subjects and organizing components
into those universal facets.
The Classification Research Group (CRG), as described by
Vickery, advocates beginning from the bottom up: reviewing
components and assigning preliminary fundamental categories
based on the concept’s definition within the classification’s
domain, then looking for commonalities in these preliminary
choices. Facets are specific to each classification.
INF 384 C, Fall 2009
Principles for creating facets
Spiteri, 1998, synthesized the following facet design
principles from Ranganathan and the CRG:
• Differentiation.
• Relevance.
• Ascertainability.
• Permanence.
• Homeogeneity.
• Mutual exclusivity.
Differentiation principle
When creating facets that split a group of entities,
choose a principle of division that cleanly splits the
group into component parts.
For example, dividing people by gender creates two
generally unambiguous categories. However, dividing
socks according to color can cause problems when
considering socks with multiple colors; color does not
provide the same level of differentiation for socks as
gender does for people.
More facet design principles
 Relevance: Choose facets based on the purpose of the
classification. A classification of gardening might divide terrain
by sun exposure, but a classification of cycling might divide
terrain by elevation.
 Ascertainability: When possible, choose facets that can be
reliably measured.
 Permanence: When possible, choose facets that will not
change over time.
Final facet design principles
 Homogeneity: Each facet (or subfacet) should represent a
single principle of division. For example, if we are classifying
socks, we should not see colors and patterns in the same array.
We would need to separate patterns and colors.
 Mutual Exclusivity: The contents of any two facets (or
subfacets) should not overlap (that is, they should be mutually
exclusive). If we are dividing shoes by heel height and by form,
we should not find any mixing of values for either facet (for
example, we should not see “high-heeled pumps” in the form
facet, but merely “pumps”).
Faceted browsing on the Web
Hearst’s Flamenco is an interface to support browsing of faceted
structures on the Web.
The Hearst article that you read describes how users preferred the
faceted browsing interface to a search engine when exploring the
collection.
(Note that the facets that Hearst used in the Flamenco system are
semi-automatically generated and not, perhaps, the best that one
might create...)
INF 384 C, Fall 2009
Your continuing mission
• Continue narrowing down your list of potential
concepts for your classification: use your sense of the
classification’s audience and purpose, as well as your
subject knowledge, to more clearly define the scope
of your classification, its boundaries and its central
and peripheral areas.
• Continue defining each concept’s meaning in the
context of your classification.
• Continue working on a classified structure (one or
multiple hierarchies) for your concepts.
INF 384 C, Fall 2009
A few assignment notes
Brevity is nice for concept labels, but it’s more
important to specify the precise extent of the
concept clearly.
If you mean “taking pictures with a digital
camera,” don’t use the label “digital camera.”
INF 384 C, Fall 2009
Equivalence
If you’ve identified several synonymous terms for a concept,
select one term for the label. You can mention the others in a
usage note in the alphabetical structure.
Example
Cockroaches
Definition: A common household pest that may enter a home seeking food or
water. Because they can endure extremes of temperature and long periods
without sustenance, it can be difficult to eradicate them once they invade a
dwelling.
Usage note: Water bugs is a synonym for this term. Class documents that refer
to water bugs here.
INF 384 C, Fall 2009
Non-subject concepts
Don’t include document attributes that aren’t subjects, such as forms or genres
(blogs, articles, books, diaries...).
You are creating a representation of a subject that can be used to organize
documents; you are not describing the types of documents in which users might
be interested.
Include in your classification: terms for concepts that relate to gardening, such
as types of plants (grasses, cacti, shrubs).
Do not include in your classification: Document types that list such plants
(plant databases, seed catalogs). However, you might use your classification to
categorize a cactus database with the Cacti concept...
INF 384 C, Fall 2009
Assignment progress checks
• Drop by extra office hours this week (Monday 4:00 to
5:30; Wednesday and Thursday from 2:00 to 5:00).
(Not required but encouraged.)
• Be prepared to tell me your subject, audience, and
purpose in a few sentences, and explain how your
classified structure represents your theory of the
subject.
• Next week, bring your classified structure to class;
we’ll have a peer feedback session.
INF 384 C, Fall 2009
Download