organizing information assets

advertisement
ORGANIZING INFORMATION ASSETS
Understanding Taxonomies and More
As organizations purchase and create more and more digital content, employees find it
increasingly challenging to find and reuse specific pieces of information from the growing
quantities of unstructured information contained in shared drives, intranets and portal sites.
Embedded “search” functionality is an expectation of almost any software product and many
search engine vendors promise automatic, dynamic classification of content. Some go as far as
to suggest that using a pre-defined taxonomy as a framework for tagging documents is
unnecessary—that content can be automatically classified “on the fly.” What is the reality? What
value does a taxonomic structure provide in search efficiency and retrieval precision?
The reality is that until computers can consistently and accurately recognize concepts, in addition
to terms or character strings, using a taxonomy as a framework for categorizing documents will
aid in navigation and retrieval. Natural language searching and keyword searching yield high
retrieval but can miss essential pieces of content that do not contain the specific terms that are
being searched or articles and documents about concepts that are described in various ways. A
taxonomy can be counted on to improve search precision, facilitate discovery when drilling down
into a subject hierarchy and provide a window into the knowledge domain of the organization.
In this unit of the Information Professional Resource Center, the following resources will help you
become better acquainted with the importance of taxonomies:
Taxonomy FAQ
Taxonomy options
Importance of a taxonomy
Additional reading: http://www.sla.org/content/resources/infoportals/taxonomies.cfm
See also: White papers available at www.Factiva.com/collateral:
How to Utilize Enterprise Information Architecture to Enable Enterprise Information
Integration
Making Solid Business Decisions through Intelligent Indexing Taxonomies
Information professionals bring unique skills to the area of information management and can play
a pivotal role in designing an information architecture that will help knowledge workers quickly
find the information they need for their work and leverage intellectual assets of the organization.
Unit 4 – Organizing Information Assets
Factiva – A Dow Jones & Reuters Company
1
ORGANIZING INFORMATION ASSETS
Taxonomy FAQ
How is taxonomy defined?
A taxonomy is a hierarchical structure for organizing information, the science of classification
according to a predetermined system. The term is commonly used in biology and other natural
sciences to refer to a means for classifying a living organism in relation to other similar
organisms. In the biological taxonomy, an organism occupies one specific place while in a lexical
system, a concept term can be placed in more than one category if appropriate.
What is a controlled vocabulary?
A controlled vocabulary is a closed list of terms (or phrases) that is used for consistently indexing
or labeling items admitted to content repositories. The controlled vocabulary terms are also used
to increase relevance in the search process. An alphabetic list of terms, a hierarchical list of
terms, and a list of related terms are examples of controlled vocabularies. A taxonomy is a type of
controlled vocabulary, generally a hierarchical view of the controlled vocabulary. The controlled
vocabulary can be thought of as the glue that binds together related content objects produced in
various departments or business units across the organization.
What is the difference between a taxonomy and a thesaurus?
As noted above, a taxonomy is a set of terms arranged hierarchically while a thesaurus typically
shows relationships between terms, including broader terms, narrower terms, related terms,
preferred terms and “use for” terms.
What is metadata?
Metadata is broadly defined as ‘data about data’—including subject indexing terms and other
properties such as author, language, date of creation, etc. that will make it easy to find a
particular record or informational artifact.
How is a taxonomy developed?
Developing a taxonomy is a complex effort that involves, at a high level, defining types of content
housed in enterprise systems, identifying the vocabulary used to retrieve content and
understanding hierarchies and relationships between terms. Using a manual approach to
developing a taxonomy typically involves subject matter experts:




Examining a representative set of exemplary documents
Extracting, classifying and organizing significant terms and concepts
Mapping synonyms to terms and concepts
Evaluating other term lists and selecting appropriate terms to be added to the taxonomy
Unit 4 – Organizing Information Assets
Factiva – A Dow Jones & Reuters Company
2


Working with content creators and users to agree on definitions
Reaching agreement on terms to be included in the controlled vocabulary for indexing
and for classification
Categorization software can speed up the process considerably by automating the document
analysis and term extraction process. Exemplary documents are used as the basis for training the
system and automatically generating rules that determine the categories into which content will
be placed.
What is a faceted classification scheme?
Facets are various views or attributes of a topic or object. A faceted classification scheme
attempts to present all aspects of a subject in a way that enables the user to easily focus on the
aspect(s) of interest—for example, by clicking on a folder containing the sub-topic(s) of interest.
When searching on a company name, a large number of results might be conveniently grouped
into categories such as: products, geography, competitors, and intellectual property—depending
on the body of content being searched.
What are the benefits of automated classification?
Automatic classification is accomplished with software designed to rapidly scan documents sets
(or other content objects) and assign the objects to categories, sometimes according to an
underlying taxonomy, although a taxonomy is not always a part of the software. Benefits are
consistency in assigning categories, speed of processing and lower costs.
What are the benefits of manual classification?
Manual classification, done by individuals who are subject matter experts, tends to be accurate
because human judgment typically overcomes the ambiguity of language. It must be recognized
that manual classification is usually a slow, laborious process and thus, a costly process. The
manual classification project has also been found to be surprisingly subjective.
Hybrid systems take advantage of the speed of automated classification, but use human
expertise for creating and fine tuning rules and for spot checking accuracy of the automated
systems to provide optimal results.
What are success criteria?
How do you know if a taxonomy project has been successful? A good taxonomy allows users to
1) search the way they think and 2) quickly retrieve the information they are seeking because the
content has been analyzed, categorized and labeled according to a lexical scheme that is
meaningful for a particular business environment. A taxonomy that reflects the language of the
business should favorably impact knowledge worker productivity.
Unit 4 – Organizing Information Assets
Factiva – A Dow Jones & Reuters Company
3
ORGANIZING INFORMATION ASSETS
Developing a Taxonomy
Taxonomies can be custom built by information scientists and business consultants who
specialize in this area. There are also taxonomies for many knowledge domains that can be
licensed or purchased. Whether the taxonomy will be used as a search tool, as a Web site
navigation aid or for tagging content in the content management system, it is crucial to devote
time and effort to creating a robust structure that reflects the language of your business.
Think about the following options for developing a taxonomy as you plan the strategy for
managing information in your organization:




Build your own taxonomy–with in-house staff or with consulting assistance
Build your own taxonomy using categorization software with fine tuning by subject matter
experts
License a taxonomy from an organization that has already created taxonomies closely
related to your business and your industry from a content management software
company, publisher, trade association or content aggregator (The taxonomy structure will
need to be modified or built out to match your business content.)
Implement a taxonomy that is in the public domain after adapting it to match your
business content
Most organizations will not have to “start from scratch” to build a taxonomy. It is important to learn
about efforts to organize information already underway across the enterprise, perhaps at the
departmental or functional group level. For example, purchasing departments may have a
standardized list of raw materials purchased and information professionals likely have subject
lists used for identifying external content in their collections. Classification hierarchies and other
metadata generated for content management systems are valuable, as are database query logs.
Collecting and reusing keyword lists like these will provide the foundation for an organizational
taxonomy and will shorten the development process.
Predefined taxonomies are available for many disciplines at varying levels of granularity or depth.
These taxonomies result from extensive research as well as familiarity and long experience with
content pertaining to an industry or discipline. The taxonomy builders typically employ best
practices for categorizing content. A potential drawback is that these predefined taxonomies are
built for managing collections of books or journals and thus lack vocabulary relating to business
processes. This drawback is easily overcome by selecting the portion(s) of the predefined
taxonomy applicable to your environment; you can then revise sections of the taxonomy that do
not exactly meet your needs and incorporate additional terms and concepts specific to your
organization.
Software tools for creating taxonomies learn from representative “training” documents and
suggest categories based on the content. These tools are becoming more sophisticated and
more accurate. The underlying programs crawl designated content repositories and rely on
analysis of word patterns and occurrence of terms and complex business rules to group similar
documents or exclude documents from a category. New training documents can be fed into the
categorization engine; the software learns from the training documents and makes better
Unit 4 – Organizing Information Assets
Factiva – A Dow Jones & Reuters Company
4
decisions going forward. Ideally, the programs enable the taxonomy structure to work in tandem
with search and retrieval capabilities and integrate seamlessly with other IT applications. If there
are ECM (enterprise content management) systems in use in your organization, they may have
taxonomy creation capabilities or such modules may be easily added.
Business users should be involved in the taxonomy development efforts—whether a manual
effort or a software solution—to build awareness of the information organization efforts, to take
advantage of their knowledge of specific areas of the business, and most important—to make
sure the system developed will help them efficiently find the information they need. Some
questions that should be explored in preliminary stages include:







What types of items are stored in enterprise repositories?
Can audio, video and image files be classified with content management software?
Who will use this content?
How do different user groups name content types?
How quickly are content repositories growing?
Can search and classification software under consideration scale to handle rapidly
growing volumes of information?
Is the content in languages in addition to English; can software being considered handle
non- English materials?
Companies with experience in developing and deploying a taxonomy find that not all parts of the
taxonomy are relevant to all business units. However a central taxonomy repository allows for the
most efficient updating and maintenance. Since a taxonomy is a living entity, developing it is only
the beginning of the process. There must be a commitment to testing the taxonomy in varied
applications and refining it as content evolves and as business conditions change. Regularly
scheduled reviews to add or change sections of the taxonomy will keep it fresh and in synch with
content changes.
Unit 4 – Organizing Information Assets
Factiva – A Dow Jones & Reuters Company
5
ORGANIZING INFORMATION ASSETS
Value of a Taxonomy
Why do organizations care about a taxonomy? There might not be significant interest in a
taxonomy per se, but there is no doubt that organizations care about being able to efficiently find,
retrieve and repurpose content from enterprise systems. There is a frequently quoted statistic that
more than 80% of information housed in corporate repositories is unstructured data. Even for
small and mid-size companies, managing this amount of textual information is challenging. It is
more challenging for large global organizations.
For companies embracing a market-driven strategy, as opposed to a product strategy, reducing
time to market is as important as innovation and quality. Systematically organizing information,
especially unstructured information, is crucial for effectively managing the volume of content
accumulating in all enterprises. A sound underlying structure is what will enable the knowledge
worker to quickly retrieve relevant information from the desktop environment. What may appear to
the user as serendipitous discovery of useful information is actually possible because of careful
preparation and processing of the content. A taxonomy for labeling content is a cornerstone of
that infrastructure. Microsoft recently reported a 40% improvement in hit rates and a doubling of
satisfaction metrics using even a relatively primitive taxonomic system. Users spent significantly
less time trying to find a given document.
(http://www.it-director.com/article_pf.php?articleid=3757)
The lack of a common vocabulary and robust taxonomy structure used across business units can
effectively undermine employee productivity and ultimately, have a negative impact on the quality
of customer service. Benefits to the enterprise taxonomy include:





The ability of one business group to leverage product and industry expertise of other
groups
Reduced “reinventing the wheel” syndrome and duplication of effort in creating
intellectual capital
Surfacing hidden assets when the taxonomy is employed in navigation schemes as well
as in search and retrieval programs
More unified way of serving clients
More satisfactory user experience
The same principles of design that are used for internal content repositories should be used for
public Web sites to ensure that clients and potential clients have a satisfactory experience and
positive impression of an organization based on clarity, usability and accurate retrieval at the Web
site.
Information professionals can make a huge contribution to taxonomy discussions by keeping the
information-seeking behavior and information needs of users at the forefront of the discussions.
Unit 4 – Organizing Information Assets
Factiva – A Dow Jones & Reuters Company
6
Download