TaxonomyFAQs - Taxonomy Strategies

Taxonomy Strategies LLC
Frequently Asked Questions about
Taxonomies and Metadata
Ron Daniel
Taxonomy Strategies LLC
rdaniel@taxonomystrategies.com
November 8, 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Agenda
 FAQs – Frequently Asked Questions
 SAQs – Seldom Asked Questions
 Fun Questions
Taxonomy Strategies LLC
The business of organized information
2
Pop Quiz
On a blank piece of paper:
• What question(s) did you want to have answered
by coming to today’s talks?
Please provide your job title, division, and either
company or company type.
You do NOT have to provide your name.
Taxonomy Strategies LLC
The business of organized information
3
What do other people ask about?
 How to build a taxonomy?
 Definitions of terms.
 How do I sell management
on a taxonomy project?
 How to govern its use and  How do we maintain
them?
maintenance?
and many more…
 What’s the ROI?
 What are they for?
 How do we put them to
use?
 How do we link them to
content?
 How do they help search?
Taxonomy Strategies LLC
The business of organized information
development
definitions
governance
ROI
basic taxo purpose
usage
tagging
search
selling
maint
4
What is a taxonomy – just a folder
structure or something else?
 There is no agreed definition of what a “taxonomy” is.
 When talking with someone about taxonomy, make sure you are
talking about the same things.
 When we talk about a taxonomy, we are NOT only talking about a
website navigation scheme.
 Websites change frequently, we are looking at a more durable way to
deal with content so that different navigation schemes can be used over
time.
 We look at taxonomies and metadata together.
 We typically create a metadata specification that defines fields like
Title, Description, Date, Type, Subject, etc.
 Several fields (e.g. Type and Subject) have pre-defined lists of
allowed values.
 Those lists of values, flat or hierarchical, are “facets” within the
overall taxonomy.
Taxonomy Strategies LLC
The business of organized information
5
Other things sometimes called taxonomy
Type
Remarks
Synonym Ring
 Connects a series of terms together
 Treats them as equivalent for search purposes
e.g (Dog, Canine, Pooch, Mutt) (Cat, Feline, Kitty), …
Authority File
 Used to control variant names with a preferred term
 Typically used for names of countries, individuals, organizations
e.g. (IBM, Big Blue, International Business Machines Inc.)
Classification
Scheme
 A hierarchical arrangement of terms
 May or may not follow strict “is-a” hierarchy rules
 Usually enumerated; ie, LC or Dewey
Thesaurus
 Expresses semantic relationships of:
• Hierarchy (broader & narrower terms)
• Equivalence (synonyms)
• Associative (related terms)
 May include definitions
Ontology
Taxonomy Strategies LLC
 Resembles faceted taxonomy but uses richer semantic relationships
among terms and attributes and strict specification rules
 A model of reality, allowing inferences to be made.
The business of organized information
6
How do taxonomies actually improve
search?
Input (Query) Side
 “Search” using a small set of pre-defined values instead of trying to
guess what word or words might have been used in the content.

Providing dropdowns instead of search improves results, but is limiting.
 Have synonyms mapped together so searches for “car” and
“automobile” return the same things.
Output (Results) Side
 Organize search results into groups of related items.
 Sorting and filtering
 Refinement
Taxonomy Strategies LLC
The business of organized information
7
Taxonomy in action on the results side
 Position Category
 Company
 City
 State
 Salary
Taxonomy Strategies LLC
The business of organized information
8
Where do the benefits come from?
Common taxonomy ROI scenarios
 Catalog site - ROI based on increased sales through improved:
 Product findability
 Product cross-sells and up-sells
 Customer loyalty
 Call center - ROI based on cutting costs through:
 Fewer customer calls due to improved website self-service
 Faster, more accurate CSR responses through better information access
 Compliance – ROI based on:
 Avoiding penalties for breaching regulations
 Following required procedures (e.g. Medical claims)
 Knowledge worker productivity - ROI based on cutting costs through:
 Less time searching for things
 Less time recreating existing materials, with knock-on benefits of less confusion and
reduced storage and backup costs
 Executive mandate
 No ROI at the start, just someone with a vision and the budget to make it happen
For more details on taxonomy ROI, and other topics, see
http://www.taxonomystrategies.com/presentations/Taxonomy_1-2-3a.ppt
Taxonomy Strategies LLC
The business of organized information
9
How do I sell Management on a Taxonomy
Project?
 Don’t sell “metadata” or “taxonomy”, sell the vision of
what you want to be able to do.
 Clearly understand what the problem is and what the
opportunities are.
 Calculate costs and benefits so you can explain the
ROI in a believable manner.
 Design the taxonomy (in terms of level of effort) in
relation to the value at hand.
Taxonomy Strategies LLC
The business of organized information
10
Who should build the taxonomy?
 The taxonomy (and metadata specification) should be
produced by a cross-functional team which includes
business, technical, information management, and
content creation stakeholders.
 The team should plan on maintaining the taxonomy
as well as building it.
 Maintenance will not (usually) be anyone’s full-time job.
 Exact mix of people on team will change.
 It should be built in an iterative fashion, with more
content and broader review for each iteration.
Taxonomy Strategies LLC
The business of organized information
11
How Do We Build a Taxonomy?
1. Know the ROI case – what is the benefit you want
and what can you afford in the way of tagging,
software, and other expenses.
2. Know the content to be categorized and the people
who will use it. Have an idea of the UI they will use
to access the content.
3. Get the team together.
4. Go through the process, in an iterative manner.
Taxonomy Strategies LLC
The business of organized information
12
How do we build a Taxonomy: Process
Overview
Week:
1 Identify
Objectives
2 Inventory
Resources
1
2
3
5
6
7
8
9
10
11
12
Conduct interviews
Identify, gather & review
resources
3 Specify
Metadata
Define fields &
purpose
Define content
chunks & XML
DTDs
4 Model
Content
5 Specify
Vocabularies
Compile controlled
vocabularies
6 Specify
Procedures
Develop workflow,
rules & procedures
Manually tag
sample
7 Test & Train
Taxonomy Strategies LLC
4
The business of organized information
13
Building a Taxonomy: Which fields need
controlled values?
(Virtually)
Mandatory
Language
Format
Coverage
Type
Subject
Highly Likely
Maybe
Highly Unlikely
(Virtually)
Impossible
RFC 3066
IMT
ISO 3166
DCMI Type?
Custom
Creator
LDAP?
Publisher
Custom
These five
elements are the
ones that take the
most thought
when defining a
metadata spec.
Rights
Contributor
LDAP?
Identifier
Custom
Date
Title
Relation
Source
Description
Taxonomy Strategies LLC
W3C DTF
These 15 fields are the
Dublin Core – the starting
point for most modern
metadata specs.
The business of organized information
14
How big should the taxonomy be?
 Consultant’s answer – “It depends”
 How much content do you need to organize?
 How fine-grained does the categorization need to be?
 Overly-simplistic method:
 Nterms = # items / desired bucket size
 (1 M documents, 100 documents / bucket = > 10k buckets)
 Bad method – documents don’t distribute evenly
 Second method:
 # facets ≈ Log(# items) ± 2
 (1 M items => 5..7 facets)
 Sum of terms across all facets < 1200 in most cases
Taxonomy Strategies LLC
The business of organized information
15
How do we know we have a good
taxonomy?
Method
Process
Who
Requires
Validation
Walk-thru
Show & explain
 Taxonomist
 SME
 Team
 Rough
taxonomy
 Approach
 Appropriateness to task
Walk-thru
Check
conformance to
editorial rules
 Taxonomist
 Draft taxonomy
 Editorial Rules
 Consistent look and feel
Usability
Testing
Contextual
analysis (card
sorting, scenario
testing, etc.)
 Users
 Rough
taxonomy
 Tasks &
Answers
 Tasks are completed successfully
 Time to complete task is reduced
User
Satisfaction
Survey
 Users
 Rough
Taxonomy
 UI Mockup
 Search prototyp
Reaction to taxonomy
Reaction to new interface
Reaction to search results
Tagging
Samples
Tag sample
content with
taxonomy
 Taxonomist
 Team
 Indexers
 Sample content
 Rough
taxonomy (or
better)
Content ‘fit’
Fills out content inventory
Training materials for people &
algorithms
Basis for quantitative methods
For much more on “Testing Your Taxonomy”, see
http://www.taxonomystrategies.com/presentations/Taxonomy_Testing-2006-11-03.ppt
Taxonomy Strategies LLC
The business of organized information
16
What if I have to do it solo?
Realize:
 Its not totally solo – IT help,
Graphics & UI help, Business
Goals help, Funding help, Review
& QA help…
 You are the general contractor
 It needs to be part of your
objectives
 Limit the objectives to what can be
achieved by you, and by your
organization
Concentrate:
 Resource allocation
 (i.e. Manage your time)
 Fundamental processes
 Query log examination
 Error correction procedure
 Communications!!!
Taxonomy Strategies LLC
The business of organized information
 Cherry-pick from Roles on a
larger team:
 Business Lead – align with
organization goals, get needed
resources, make cost/benefit
decisions, report upstairs
 IT Liaison – Work with IT
specialists to get software
installed, logs gathered, content
harvested, etc. Consider impact
of changes on tools and data
 Taxonomy / Search Specialist –
analyze behavior and suggest
changes. Implement changes
which pass cost/benefit muster
 Website/User Representative –
consider impact of changes on
users and job performance
17
Agenda
 FAQs – Frequently Asked Questions
 SAQs – Seldom Asked Questions
 Your Questions
Taxonomy Strategies LLC
The business of organized information
18
What should I be thinking about at the
start of a taxonomy project?
Taxonomy development is not the most important
problem:

The Taxonomy Problem: How are we going to maintain the lists of predefined values that can go into some of the metadata elements?

The Tagging Problem: How are we going to populate metadata elements
with complete and consistent values?
 What can we expect to get from automatic classifiers? What kind of error detection
and error correction procedures do we need? What fields do we need?

The ROI (Return On Investment) Problem: How are we going to use
content, metadata, and vocabularies in applications to obtain business
benefits?
 More sales? Lower support costs? Greater productivity? Risk avoidance?
 How much content? How big an operating budget? How to expose to users?
Business Goals and Cultural Factors are major influences
on tagging and taxonomy. These must be acknowledged
at the start to avoid rework.
Taxonomy Strategies LLC
The business of organized information
19
What must change when the Taxonomy
changes?
There’s more to maintaining the Taxonomy than maintaining just the taxonomy.
 The master copy of the taxonomy.
 Announcements for stakeholders!
 The information sent to downstream users of the taxonomy.
The versions and formats of the taxonomy distributed to others.
The list of changes.
 The data tagged with the taxonomy?
 The user interface which uses the taxonomy?
 Backend system software which uses the taxonomy?
 The training set for automatic classifiers?
 The educational material for users, catalogers, programmers, etc.?
Taxonomy Strategies LLC
The business of organized information
20
Agenda
 FAQs – Frequently Asked Questions
 SAQs – Seldom Asked Questions
 Your Questions
Taxonomy Strategies LLC
The business of organized information
21
Fun Questions
 Examples of good and bad taxonomies
The animals are divided into:
(a) belonging to the emperor,
(b) embalmed, (c) tame, (d) sucking pigs,
(e) sirens, (f) fabulous, (g) stray dogs,
(h) included in the present classification,
(i) frenzied, (j) innumerable, (k) drawn with
a very fine camelhair brush, (l) et cetera,
(m) having just broken the water pitcher,
(n) that from along way off look like flies.
This was created
to be as bad a
classification as
possible. What
makes it so bad?
Jorge Luis Borges, " THE ANALYTICAL
LANGUAGE OF JOHN WILKINS"
Works in 3 volumes (in Russian). St.
Petersburg, "Polaris", 1994. V. 2: 87.
Taxonomy Strategies LLC
The business of organized information
22
Backup Slides
Taxonomy Strategies LLC
The business of organized information
23
Why do we usually recommend faceted
taxonomies?
 Categorize in multiple,
independent, categories.
 Allow combinations of
categories to narrow the
choice of items.
 4 independent categories of
10 nodes each have the
same discriminatory power as
one hierarchy of 10,000
nodes (104)
Main
Ingredients
•
•
•
•
•
•
•
•
•
•
Chocolate
Dairy
Fruits
Grains
Meat &
Seafood
Nuts
Olives
Pasta
Spices &
Seasonings
Vegetables
Meal Type
•
•
•
•
•
•
Breakfast
Brunch
Lunch
Supper
Dinner
Snack
Cooking
Methods
Cuisines
•
•
•
•
•
•
•
•
•
•
•
African
American
Asian
Caribbean
Continental
Eclectic/
Fusion/
International
Jewish
Latin American
Mediterranean
Middle Eastern
Vegetarian
•
•
•
•
•
•
•
•
•
•
•
•
•
Advanced
Bake
Broil
Fry
Grill
Marinade
Microwave
No Cooking
Poach
Quick
Roast
Sauté
Slow
Cooking
• Steam
• Stir-fry
 Easier to maintain
 Easier to reusue existing
material
 Can be easier to navigate, if
software supports it
Taxonomy Strategies LLC
The business of organized information
42 values to maintain (10+6+11+15)
9900 combinations (10x6x11x15)
24
What could possibly go wrong with a little
edit?
 ERP (Enterprise Resource Planning) team made a change to
the product line data element in the product hierarchy.
 They did not know this data was used by downstream
applications outside of ERP.
 An item data standards council discovered the error.
 If the error had not been identified and fixed, the company’s
sales force would not be correctly compensated.
“Lack of the enterprise data standards process in
the item subject area has cost us at least 30 person
days of just ‘category’ rework.”
Source: Danette McGilvray, Granite Falls Consulting, Inc.
25
Taxonomy Strategies LLC
The business of organized information
25
When should we NOT use facets?
 When you have to work
with software that can’t
handle them.
 Remember, software is
replaced but data is migrated.
 When you need to use an
existing standard
taxonomy.
Taxonomy Strategies LLC
The business of organized information
…
By Content Type
Calendars & Events
Top Links…
Holidays
Upcoming Events
Federal Reserve System…
Beige Book
Board of Governors
FOMC
More Calendars & Events…
ERAC
Officer Availability
Staff Conference
Toastmasters
Tours
Directories
Documentation
Forms
News
Policies & Procedures
By Organization
Federal Reserve System
FRB Atlanta
Board of Directors
Executive Office
Management Committee
Research Division
S&R Division
Facets can
help you
build a
useful
hierarchy.
This one is a
mix of
content type
and
organization.
26
Taxonomy Strategies LLC
Questions?
Ron Daniel
925-368-8371
rdaniel@taxonomystrategies.com
November 8, 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.