Classification basics

advertisement
Classification basics
A classification is a system of related categories used to put things in groups. We can create
classifications for anything: biological organisms, art objects, clothing. In information studies, we are
most often creating classifications for organizing documents, and these classifications are often based on
the subject of those documents.
Classifications are typically expressed in a hierarchical structure. A hierarchy takes a broad category of
things or concepts (like “living organisms” or “the bibliographic universe” or “soil science”) and
successively divides that category into levels of more specific groups. A hierarchy shows relationships
between the categories that it defines. By using a hierarchy, we establish how, for example, cats are more
similar to dogs than to fish.
A well-constructed hierarchy exhibits several characteristics:
 Classes descend from a broader class by the same principle of division.
 Classes are jointly exhaustive and mutually exclusive.
 Classes at the same level are at a similar level of abstraction.
 The progression from one level to another is gradual, not abrupt.
Example
Shoes
Pumps
High heels
Running shoes
Sandals
Dress shoes
Vegan shoes
Trippen Tidy, black, size 37
Comfy shoes
While all of these narrower classes of shoes are smaller groups of shoes, they divide the broader class by
different characteristics (form, function, material, brand, fit). As a result, the level (or array) lacks mutual
exclusivity. Moreover, the difference in level of abstraction makes the list more confusing to parse. On a
more subtle level, there is no discernable order in the array: why are pumps first and comfy shoes last?
Structurally, bibliographic classifications (those used to organize documents) may be enumerative or
synthetic. An enumerative classification lists all possible classes. A synthetic classification enables some
classes to be created as documents are classified, by the combination of smaller elements according to
defined rules. One form of synthetic classification is the faceted, or analytico-synthetic classification.
In a faceted classification, a class is formed by combining terms from multiple parallel hierarchies that
each follow through a different principle of division. So if we were developing a faceted classification of
shoes, we might have the following facets:
Example



Form (sandals, pumps, boots)
Material (leather, plastic)
Heel height (flat, low, high)




Heel shape (wedge, stiletto, stacked)
Function (running, walking)
Occasion (dressy, work, casual)
Brand
Terms from each facet would be combined (again, according to a syntax) to create a specific class. We
would not need to define “low wedge work leather sandal” in advance of classifying that item. Each facet
might encompass a hierarchy of its own, by the way: hierarchical and faceted classifications are not
oppositional. Faceted and enumerative classifications are oppositional, although many largely
enumerative systems include some synthetic elements: the DDC (Dewey Decimal Classification) includes
facilities for making any class specific for a particular location, for example.
Library classifications use a notation to create a linear order of items in the collection, which is then
translated to a shelf. This linear order needs to reflect the information expressed by the classification
assignment. Describing rules to attain this linear order via a faceted classification is not a trival task, as
the extended discussion in Mills regarding facet order makes apparent. In a digital environment, notation
is not an issue in the same way, although you could apply the same reasoning to sort through lists of
nested classes and associated items, or to sort search results (think about how such lists are currently
arranged, or not arranged).
Warrant
Warrant describes the basis by which concepts are included in a classification and related to each other. A
longstanding debate involves whether warrant should be based on what is written (literary warrant) or
merely what constitutes knowledge. As Beghtol notes, there are many potential “warrants,” and a
classification may employ (consciously or no) various of them. A particular warrant may imply a topdown (general categories first) or bottom-up (specific categories first) approach to classification design.
In practice these approaches are often mixed.
Structural labels
When discussing the organization of documents, we often call the related system of classes we make a
classification. These classifications often use hierarchical structure. A taxonomy is another name for a
hierarchical structure. Taxonomy is often associated with biology. An ontology is yet another name for a
system of classes. In computer science, an ontology often refers to a system of classes that includes more
information than is typical for a bibliographic classification: constraints on what can be a member of a
certain class, for example. These rules can be used for inferencing. A folksonomy is a set of tags, or terms,
typically contributed by users of a system. Each term can be said to represent a class. A folksonomy has
no structure.
Download