Taxonomy Strategies LLC
May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
2
Who we are: Joseph Busch
Over 25 years in the business of organized information
Founder, Taxonomy Strategies
Director, Solutions Architecture, Interwoven
VP, Infoware, Metacode Technologies
Program Manager, Getty Foundation
Manager, Pricewaterhouse
Metadata and taxonomies community leadership
President, American Society for Information Science & Technology
Director, Dublin Core Metadata Initiative
Adviser, National Research Council Computer Science and
Telecommunications Board
Reviewer, National Science Foundation Division of Information and Intelligent
Systems
Founder, Networked Knowledge Organization Systems/Services
3
Taxonomy Strategies LLC The business of organized information
Who we are: Ron Daniel, Jr.
Over 15 years in the business of metadata & automatic classification
Principal, Taxonomy Strategies
Standards Architect, Interwoven
Senior Information Scientist, Metacode Technologies
Technical Staff Member, Los Alamos National Laboratory
Metadata and taxonomies community leadership
Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group
Acting chair: XML Linking working group
Member: RDF working groups
Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.
Taxonomy Strategies LLC The business of organized information
4
Recent & current projects
Government
Commodity Futures Trading Commission
Defense Intelligence Agency
ERIC
Federal Aviation Administration
Federal Reserve Bank of Atlanta
Forest Service
GSA Office of Citizen Services
( www.firstgov.gov
)
Head Start
Infocomm Development Authority of
Singapore
NASA ( nasataxonomy.jpl.nasa.gov
)
Small Business Administration
Social Security Administration
USDA Economic Research Service
USDA e-Government Program
( www.usda.gov
)
Commercial
Allstate Insurance
Blue Shield of California
Debevoise & Plimpton
Halliburton
Hewlett Packard
Motorola
PeopleSoft
Pricewaterhouse Coopers
Siderean Software
Sprint
Time Inc.
Commercial subcontracts
Agency.com – Top financial services
Critical Mass – Fortune 50 retailer
Deloitte Consulting – Big credit card
Gistics/OTB – Direct selling giant
NGO’s
CEN
IDEAlliance
IMF
OCLC
Taxonomy Strategies LLC The business of organized information
5
Participant Introductions
Who are you?
What do you do?
What brings you here today?
Taxonomy Strategies LLC The business of organized information
6
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
7
Taxonomy Governance Overview
Is “Taxonomy Governance” synonymous with “Taxonomy
Maintenance”?
What kinds of changes can be made, and what are their costs?
What kinds of information are needed to determine the changes?
What kind of group should maintain the taxonomy?
What kinds of rules should the group follow to decide on changes?
What should the group do beyond maintaining the taxonomy?
8
Taxonomy Strategies LLC The business of organized information
Exercise: Taxonomy Modifications
Divide into small groups
Review assigned sample taxonomy
Discuss changes you would make
In 10 minutes, a spokesperson will speak for the group and briefly:
Tell us something good about the taxonomy
Characterize the short-term changes your group would make
Characterize the questions your group would like answered before making other changes
Taxonomy Strategies LLC The business of organized information
9
Exercise Notes
Team Members:
Something good about the taxonomy:
Short term changes:
Questions for other changes:
Taxonomy Strategies LLC The business of organized information
10
Group 1 Sample Taxonomy
Taxonomy Strategies LLC The business of organized information
11
Group 2 Sample Taxonomy
Top Level
Business / Accounting / Firms / Directories
Business / Biotechnology & Pharmaceuticals / Education & Training
Business / Employment / By Industry
Business / Healthcare / Employment / Regional
Random Samples of
Detailed Categories
Business / Small Business / Finance / Accounting
Reference / Education / Colleges & Universities / North America / United States / Maryland / Columbia Union College / Athletics
Reference / Education / K-12 / Home Schooling / Unschooling / Chats and Forums
Regional / Europe / Ireland / Business & Economy / Employment / Health & Medical
Science / Math / Academic Departments / South America / Colombia
Science / Social Sciences / Linguistics / Translation / Associations
Society / People / Women / Science & Technology / Mathematics
12
Taxonomy Strategies LLC The business of organized information
Group 3 Sample Taxonomy
Top Level
Detail in Auto
Products Category
Taxonomy Strategies LLC The business of organized information
Source: http://householdproducts.nlm.nih.gov/products.htm
13
Predictions
Short-term changes will center on rules of style – ‘&’ vs. ampersand, capitalization, plurals
Editorial Rules
Faceted subdivision will only be suggested by experienced practitioners, by groups given low-level details of a taxonomy, or both. People will critique the
UI Presentation
Metadata
Specification, Design for maintainability
Questions for Long-term changes will focus, in decreasing order, on:
Who are the users and what are they doing?
What is the content and how much is in the various categories?
…
What kind of money depends on the taxonomy, and what kind of maintenance expenses are justified?
How to put it into action?
User Characterization
Content and
Metadata
Maintenance
ROI
Anything else people want to cover?
Taxonomy Strategies LLC The business of organized information
14
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
15
Fundamental Processes
What are the two fundamental processes every organization should implement to maintain its metadata and taxonomies?
Query log / Click trail examination
Tagging Error Correction
What are the key outlooks a taxonomist should try to instill in their organization?
16
Taxonomy Strategies LLC The business of organized information
Fundamental Process #1 – Query Log Examination
How can we characterize users and what they are looking for?
Query Log & Click Trail
Examination
Sophisticated software available, but don’t wait.
80/20 Rule – 80% of value from
20% of possible reports.
Greatest value comes from:
Identifying a person as responsible for search quality
Starting a “Measure & Improve” mindset
Greatest challenge:
Getting a person assigned ( ≥ 10%)
Getting logs turned back on
What to do after the obvious fixes have been made
Click Trail
Packages iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville
WebTrends
UltraSeek Reporting
• Top queries
• Queries with no results
• Queries with no click-through
• Most requested documents
• Query trend analysis
• Complete server usage summary
17
Taxonomy Strategies LLC The business of organized information
Fundamental Process #2 – Tagging Error Correction
For the Taxonomy to be used, its values must be associated with content.
We will refer to this as “Tagging”.
Errors will happen, and some will be found. What are you going to do about them?
Define an error correction process.
Process will accommodate questions like:
Is it an error? What is the cost to correct or not correct? Does the correction need to be scheduled? etc.
Once an error is corrected, NEVER lose that fact .
Manually reviewed pages are vital for training automatic classifiers.
Has implications for metadata specification and review procedures.
Over time, multiple error detection methods will be defined.
e.g. Statistical sampling of newly added pages
Gradually, additional error correction processes may be defined to deal with particular types of errors.
18
Taxonomy Strategies LLC The business of organized information
Fundamental Outlooks
How are we going to build and maintain metadata structures and controlled vocabularies?
The taxonomy problem
How are we going to populate metadata elements with complete and consistent values?
The tagging problem
How are we then going to use metadata in applications and demonstrate benefits?
The ROI problem
Taxonomy Governance is a standards process.
Take tips from other standards efforts
Team, with comment-handling responsibilities and an appeals process
Issue Logs
Announcements
Release Schedule
Must know this to address other problems!
Foster a “Measure &
Improve” Mindset
19
Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
20
Taxonomy Business Processes
Taxonomies must change, gradually, over time if they are to remain relevant
Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions
A team will need to maintain the taxonomy on a parttime basis
Taxonomy team reports to some other steering committee
21
Taxonomy Strategies LLC The business of organized information
Definitions about the Controlled Vocabulary
Governance Environment
1: Syndicated
Terminologies change on their own schedule
Change Requests
& Responses
Published
CVs and STs
Consuming
Applications
Web CMS
Syndicated
Terminologies
2: CV Team decides when to update CVs
Archives
ISO
3166-1
Intranet
Search
Other
External
Vocabulary
Management
System
Notifications
ERMS ’’
CVs
Intranet
Nav.
ERP
3: Team adds value via mappings, translations, synonyms, training materials, etc.
DAM
Custodians …
Other
Internal
Other
Controlled
Items
4: Updated versions of CVs published to consuming applications
… ’’
Controlled Vocabulary Governance
Environment
Taxonomy Strategies LLC The business of organized information
22
Other Controlled Items
Taxonomy Team will have additional items to manage:
Charter, Goals, Performance Measures
Editorial rules
Team processes
Tagger training materials (manual and automatic)
Outreach & ROI
Communication plan
Website
Presentations
Announcements
Roadmap
Taxonomy Strategies LLC The business of organized information
23
Taxonomy governance | Generic team charter
Taxonomy Team is responsible for maintaining:
The Taxonomy, a multi-faceted classification scheme
Associated taxonomy materials, such as:
Editorial Style Guide
Taxonomy Training Materials
Metadata Standard
Team rules and procedures (subject to CIO review)
Team evaluates costs and benefits of suggested change
Taxonomy Team will:
Manage relationship between providers of source vocabularies and consumers of the Taxonomy
Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices
Promote awareness and use of the Taxonomy
24
Taxonomy Strategies LLC The business of organized information
Editorial Rules
To ensure consistent style, rules are needed
Issues commonly addressed in the rules:
Sources of Terms
Abbreviations
Ampersands
Capitalization
Continuations (More… or Other…)
Duplicate Terms
Hierarchy and Polyhierarchy
Languages and Character Sets
Length Limits
“Other” – Allowed or Forbidden?
Plural vs. Singular Forms
Relation Types and Limits
Scope Notes
Serial Comma
Spaces
Synonyms and Acronyms
Term Order (Alphabetic or …)
Term Label Order (Direct vs. Inverted)
Must also address issue of what to do when rules conflict – which are more important?
Rule Name
Use Existing
Vocabularies
Ampersands
Special
Characters
Serial comma
Capitalization
…
Taxonomy Strategies LLC The business of organized information
Editorial Rule
Other things being equal, reusing an existing vocabulary is preferred to creating a new one.
The character '&' is preferred to the word ‘and’ in
Term Labels.
Example: Use Type: “Manuals & Forms”, not
“Manuals and Forms”.
Retain accented characters in Term Labels.
Example: España
If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS
NOT preceded by a comma.
Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.
Use title case (where all words except articles are capitalized).
Example: “Education, Learning & Employment”
NOT “Education, learning & employment”
NOT “EDUCATION, LEARNING &
EMPLOYMENT”
NOT “education, learning & employment”
…
25
Roles in Two Taxonomy Governance Teams
Executive Sponsor
Advocate for the taxonomy team
Business Lead
Keeps team on track with larger business objectives
Balances cost/benefit issues to decide appropriate levels of effort
Specialists help in estimating costs
Obtains needed resources if those in team can’t accomplish a particular task
Technical Specialist
Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.
Helps obtain data from various systems
Content Specialist
Team’s liaison to content creators
Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc.
Small-scale Metadata QA Responsibility
Taxonomy Strategies LLC The business of organized information
Taxonomy Specialist
Suggests potential taxonomy changes based on analysis of query logs, indexer feedback
Makes edits to taxonomy, installs into system with aid of IT specialist
Content Owner
Reality check on process change suggestions
Team structure at a different org.
Business Lead
Custodians
Responsible for content in a specific CV.
Training Representative
Develops communications plan, training materials
Work Practices Representative
Develops processes, monitors adherence
IT Representative
Backups, admin of CV Tool
Info. Mgmt. Representative
Provides CV expertise, tie-in with larger IM effort in the organization.
26
Taxonomy governance | Where changes come from
Application
Logic
Recommendations by Editor
1. Small taxonomy changes
(labels, synonyms)
2. Large taxonomy changes
(retagging, application changes)
3.
New “best bets” content
Taxonomy Team
Team considerations
1. Business goals
2. Changes in user experience
3. Retagging cost
Requests from other parts of the organization
Taxonomy Strategies LLC The business of organized information
27
Processes
Different organizations will need to consider their own change processes.
Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes.
Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency.
Change process MUST also consider cost of implementing the change
Retagging data
Reconfiguring auto-classifier
Retraining staff
Changes in user expectations
Taxonomy Change Cases
Case 1.
Renaming a term
Case 2.
Adding a new leaf term
Case 3.
Inserting a new term
Case 4.
Splitting a term
Case 5.
Deleting a leaf term or subtree
Case 6.
Deleting a term
Case 7.
Moving a subtree
Case 8.
Merging terms
Case 9.
Adding a CV
Case 10. Deleting a CV
28
Taxonomy Strategies LLC The business of organized information
Taxonomy governance | Taxonomy maintenance workflow
Taxonomy Tool
Suggest new name/category
Analyst
Yes
Problem?
No
Review new name
Problem?
No
Yes
Editor
Copy edit new name
Copywriter
Add to enterprise
Taxonomy
Taxonomy
Sys Admin
29
Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
30
Taxonomy editing tools vendors
Most popular taxonomy editor? MS Excel
Immature industry
– no vendors in upper-right quadrant!
Widely used, cheap, single-user
Niche Players
Completeness of Vision
Visionaries
Taxonomy Strategies LLC The business of organized information
High functionality, high cost ($100k!)
31
Sample Taxonomy Editor Functionality
Standard and Custom Fields
Standard and Custom Relations
Data Typing, Restrictions, and
Inference
Flexible Reporting
Flexible Importing
Multiple Vocabulary Support
Inter-Vocabulary Relations
Unique IDs
ISO Codes not sufficient
Workflow
Voting
Change Request Management
Programmability
Hierarchy
Browser
Taxonomy Strategies LLC The business of organized information
Term
Editing
32
Where do I put the metadata?
Where can I store metadata?
In the content – HTML Headers, File properties, etc.
In a centralized repository – Search index, MDDB, etc.
In multiple systems – Common case
Where should I store metadata?
Consultant’s answer – “It depends.”
If you are moving files through a process, putting it in the file keeps it from getting dropped at system borders.
If you are doing search across multiple documents, it has to be at least copied out of the files.
If you make copies of files and modify them, consistent in-file metadata will be impossible.
Real question is not where to STORE the metadata, it is how to
MAINTAIN the metadata.
Web CMS as an example.
Central Metadata Database is a very advanced practice.
33
Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
34
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
35
What Processes Should I Try to Institute?
Processes will vary from one organization to another.
Assessing the Organization’s state is the first step.
Determining the ROI and potential resources follows.
Plan on instituting processes over time, beginning with basic ones.
Taxonomy Strategies LLC The business of organized information
36
Search and Metadata Self-Assessment Form
1)
2)
3)
Background
Rate your organization’s search & metadata maturity from 1 to 10.
What was the most recent change to your organization’s search & metadata processes?
What is the next step for your organization’s search & metadata processes?
8)
9)
10)
Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools?
Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up yearend money?
Are there hiring and training practices especially for metadata and taxonomy positions? If so, describe briefly.
6)
4)
5)
7)
Basic
Is there a process in place to examine query logs?
Is there an organization-wide metadata standard, such as an extension of the
Dublin Core, for use by search tools, multiple repositories, etc.?
Intermediate
Is there an ongoing data cleansing procedure to look for ROT (Redundant,
Obsolete, Trivial content)? If so, describe briefly.
11)
Advanced
Are there established qualitative and quantitative measures of metadata quality?
If so, describe briefly.
13)
14)
15)
12) Can the CEO explain the ROI for search and metadata?
Optional
Your name:
Organization:
E-mail: Does the search engine index more than
4 repositories around the organization?
Contact information will not be used for marketing purposes. It will only be used to follow-up and clarify issues around the survey.
Taxonomy Strategies LLC The business of organized information
37
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
38
Metadata Maturity Model
Taxonomy governance processes must fit the organization
As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and
Metadata
Honestly assess your organization’s metadata maturity in order to design appropriate governance processes
We are starting to define a maturity model, similar to the CMMI model in the software world.
39
Taxonomy Strategies LLC The business of organized information
Metadata Maturity Model
Shameless Plug: Tomorrow Morning at 9:45
Call for Data: Leave Self-Assessments with us
Process Areas
Search Capabilities
Basic
Uniform Search
Box
Query Log Exam.
Metadata and taxonomy standards
System MD Stds.
Tools and tool selection Requirements, then
Tools
Maturity Levels
Intermediate
Index Multiple
Best Bets
Simple Grouping
Organization MD
Std.
Reuse ERP
Bakeoff Datasets
Advanced
Intranet Facet
Navigation
Improved Ranking
Multiple Repos.
Comply
Taxonomy
Roadmap
Budget for Bakeoffs
Bleeding Edge
Highly Abstract
Subject
Taxonomies
Staff training and hiring Search Analyst
Role
Data creation and QA CM Introduced
Librarian Expertise Pre-hire Testing
ROT-Elimination Hybrid Creation
Model
SME Catalogers
Adaptive
Qualification
Quality Measures
Limiting
Processes
Unneeded
Capabil.
Tools, then Reqs.
Project management Project Plan
Executive support and
ROI
External Search
ROI
Std. Proj. Methodol.
X-Functional
Teams
Communication
Plan
Multi-Year Plan
Early Termination
Intranet ROI Model CEO knows Search
ROI
Taxonomy Strategies LLC The business of organized information
Use it or Lose It
Budgets
40
Purpose of Maturity Model
Estimating the maturity of an organization’s information management processes tells us:
How involved the taxonomy development and maintenance process should be
Overly sophisticated processes will fail
What to recommend as next steps
Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals.
Mature processes have expenses which must be justified by consequent cost savings or revenue gains.
Metadata Maturity may not be core to your business.
41
Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
42
Overview of Best Practices in Metadata and
Taxonomy
Avoid monolithic ‘subject’ taxonomies
May have a browsing taxonomy constructed from combined facets.
Use (or map to) Dublin Core for basic information.
Extend with custom elements for specific facts.
Use pre-existing, standard, vocabularies as much as possible.
Validate author names with LDAP directory
ISO country codes for locations
Product & service info from ERP system
Designate a team to manage the taxonomies and related materials
Taxonomy Editorial Rules, Processes, Training materials, Outreach & ROI
Design a Metadata QC Process
Start with an error-correction process, then get more formal on error detection.
In the future, large-scale ontologies like CYC may be valuable in automated error detection.
43
Taxonomy Strategies LLC The business of organized information
Factor “Subject” into smaller facets
Size
DMOZ tries to organize all web content, has more than
600k categories!
Difficulty in navigating, maintaining
Hidden facet structure
“Classification Schemes” vs.
“Taxonomies”
Taxonomy Strategies LLC The business of organized information
44
Sources for 7 common vocabularies
Vocabulary Definition Potential Sources
Organization
Content Type
Industry
Location
Topic
Audience
Products and
Services
Function
Organizational structure.
Structured list of the various types of content being managed or used.
FIPS 95-2, U.S. Government Manual, Your organizational structure , competitors, partners, regulators, etc.
DC Types, AGLS Document Type, AAT
Information Forms , Records management policy, etc.
Broad market categories such as lines of business, life events, or industry codes.
FIPS 66, SIC, NAICS , etc.
Place of operations or constituencies.
FIPS 5-2, FIPS 55-3, ISO 3166 , UN
Statistics Div, US Postal Service, etc.
Business topics relevant to your mission and goals.
Federal Register Thesaurus, NAL
Agricultural Thesaurus, LCSH, etc.
GEM, ERIC Thesaurus, IEEE LOM, etc.
Subset of constituents to whom a piece of content is directed or intended to be used.
Names of products/programs & services.
Functions and processes performed to accomplish mission and goals.
ERP system, Your products and services, etc.
FEA Business Reference Model, Enterprise
Ontology, AAT Functions, etc.
Taxonomy Strategies LLC The business of organized information
45
Facet Principles
Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective
“objects”.
Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable.
For example, labels like “Anarchist” or “Prime Minister” can be applied to the same person at different times (e.g. Nelson
Mandela).
46
Taxonomy Strategies LLC The business of organized information
Iterative Development Vision (More participants and tagged content at each iteration)
1 Identify
Objectives
2 Inventory
Content
Interview core team and stakeholders
ID sources, spider assets & extract metadata
Review tagged samples, default procedures
Gather additional sources, if any
Revise if needed, bake into alpha CMS
Interview alpha users
Gather additional sources, if any
Interview beta users
3 Specify
Metadata
Define fields & purpose
Modify CMS for beta
Modify for 1.0
4 Model
Content
Define content chunks & XML
DTDs
Revise if needed, bake into alpha CMS
5 Specify
Vocabularies
Compile controlled vocabularies
Revise, use in alpha CMS
6 Specify
Procedures
Start with UI sketches, off-the-shelf rules.
Tailor the default materials alpha workflows in
CMS
7 Train Staff
Manually tag small sample
Use alpha CMS to tag larger sample
Stage
Participants
Plan & Prototype
Project Team
Taxonomy Strategies LLC The business of organized information
Alpha Dev & Test
Stakeholders and SMEs
Modify CMS for beta
Modify for 1.0
Revise, use in beta
CMS
Modify & extend workflows
Revise using team procedur e
Finalize procedure materials
Use beta CMS to tag larger sample
Beta D&T
Friendly Users
Finalize training materials & train staff
Final D&T
Audiences
47
Planning for Taxonomy Changes
Error Correction – What to do when end-users and tagging staff notice problems?
Provide for it in the Error Correction Process
Add Query Log Analysis to help detect user problems
How to answer questions re. things to add, delete, or rearrange in the taxonomy?
Keep a visible issue log
Discuss with SMEs, tag samples, use other testing methods
Per-facet changes:
Corporate reorganizations, Product lineup changes, Country splits
& merges, … will happen. Prepare for them when deploying those facets
Long-term – what facets to create, when, and why
See Taxonomy Roadmap section
48
Taxonomy Strategies LLC The business of organized information
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
Brief remarks on Measurements, ROI, Training, Roadmap
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
49
Measuring Metadata and Taxonomy Quality
Taxonomy development is an iterative process
Develop an organizational idea, then test it by tagging sample content
Elicit feedback via walk-throughs and card sorting exercises
Use both qualitative and quantitative methods
Time, budget, and availability of tagged data will determine what methods are possible.
50
Taxonomy Strategies LLC The business of organized information
Taxonomy testing | Qualitative methods
Method
Walk-throughs
Include sample pages in walkthroughs, not just the hierarchy.
Usability Testing
User Satisfaction
Tagging samples
Survey
Process
Show and explain
Card sorting,
Contextual analysis
Tag sample content with taxonomy
Taxonomy Strategies LLC The business of organized information
Validation
Approach
Consistency to rules
Accuracy (SME Checking)
Appropriateness to task
Repeatability of user classification
Tasks are completed successfully
Time to complete task is reduced
Reaction to new interface
Reaction to search results
Content ‘fit’
Fills out content inventory
Training materials for people & algorithms
Basis for quantitative methods
51
Tagged Samples
The Taxonomy must fit the content.
How to verify this? Tag samples!
Spreadsheets are a convenient tool for this. URLs, drop-down choosers, text notes all allowed.
Team can review tagged samples when reviewing taxonomy
More sophisticated teams may test inter-cataloger agreement
Samples should appear in training materials for tagging staff
Show typical and unusual cases.
DOCUMENT URL
Metadata
Element
URL
Metadata Value
Headline sixbits.atl.frb.org/invoke.cfm?objectid=A01B3
0D1-10C2-11D6-
981100508B104751&method=display
Innovation Awards
Organization Federal Reserve Bank of Atlanta
Content Type Honors & Awards
Subject Salary & Compensation?
FACET A FACET B FACET C FACET D MISSING
IDEAS
Samples are used to define training sets for automatic classifiers.
52
Taxonomy Strategies LLC The business of organized information
Quantitative Method | How evenly does it divide the content?
Background:
Documents do not distribute uniformly across categories
Zipf (1/x) distribution is expected behavior
80/20 rule in action (actually 70/20 rule)
Measured and Expected Distribution of Top 10 Content Types in Library of Congress Database
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
C on gr es se s
B io gr ap hy
P er io di ca ls
M ap s
Fi ct io n
E iti on s xh ib
Ju ve ni le
li te ra tu re
B ib lio gr ap hy
S ta tis tic s
Top 10 Content Types
Series2
Series1
Methodology:
Part of alpha test of ‘content type’ for corporate intranet
115 URLs selected at random from search index were manually categorized.
Inaccessible files and ‘junk’ were removed
Results:
Results were slightly more uniform than the Zipf distribution, which is better than expected
25
20
15
10
5
0
Measured and Expected Distribution of Content Types in an
Intranet
Measured
Expected
Content Type
53
Taxonomy Strategies LLC The business of organized information
Quantitative Method | How intuitive (repeatable) are the categorizations?
Methodology: Closed Card Sort
For alpha test of a grocery site
15 Testers put each of 100 bestselling products into one of 10 predefined categories
Categories where fewer than 14 of 15 testers put product into same category were flagged
Results:
“Cocoa Drinks – Powder” is best categorized in both “Beverages” and “Grocery”.
% of
Testers
15/15
14/15
13/15
12/15
11/15
<11/15
Cumulative % of Products
54%
70%
77%
83%
85%
100%
Taxonomy Strategies LLC The business of organized information
In the trade, “Corn Tortillas” are a Dairy item!
54
Quantitative Method | How does taxonomy “shape” match that of content?
Background:
Hierarchical taxonomies allow comparison of “fit” between content and taxonomy areas
Methodology:
25,380 resources tagged with taxonomy of 179 terms. (Avg. of 2 terms per resource)
Counts of terms and documents summed within taxonomy hierarchy
Results:
Roughly Zipf distributed ( top 20 terms: 79%; top 30 terms: 87%)
Mismatches between term% and document% flagged
Term Group
Administrators
Community Groups
Counselors
Federal Funds Recipients and Applicants
Librarians
News Media
Other
Parents and Families
Policymakers
Researchers
School Support Staff
Student Financial Aid
Providers
Students
Teachers
%
Terms
7.8
2.8
3.4
9.5
2.8
0.6
7.3
2.8
4.5
2.2
2.2
1.7
27.4
25.1
% Docs
15.8
1.8
1.4
34.4
6.0
11.5
3.6
0.2
0.7
1.1
3.1
2.0
7.0
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Education
55
Taxonomy Strategies LLC The business of organized information
Taxonomy ROI
What level of effort in taxonomy creation and maintenance is justified?
Taxonomy Strategies LLC The business of organized information
56
Fundamentals of Taxonomy ROI
Building and maintaining a taxonomy, and tagging data with it, are costs not benefits.
There is no benefit without exposing the tagged data to users in some way that cuts costs or improves revenues.
Putting a new taxonomy into operation requires UI changes and/or backend system changes.
You need to determine those changes, and their costs, as part of the taxonomy ROI.
57
Taxonomy Strategies LLC The business of organized information
Common Taxonomy ROI Scenarios
Catalog site - ROI based on increased sales through improved
product findability product cross-sells and up-sells customer loyalty
Call center - ROI based on cutting costs through
fewer customer calls due to improved website self-service faster, more accurate CSR responses through better information access
Knowledge worker productivity - ROI based on cutting costs through
less time searching for things less time recreating existing materials, with knock-on benefits of less confusion and reduced storage and backup costs
Executive mandate
No ROI at the start, just someone with a vision and the budget to make it happen.
58
Taxonomy Strategies LLC The business of organized information
Tagging and Training
How are we going to populate metadata elements with complete and consistent values?
The tagging problem
How are we going to get people (and/or software) to assign consistent, and accurate, metadata to the content?
The tagger training problem
59
Taxonomy Strategies LLC The business of organized information
Taxonomy governance: Workflow-driven metadata tagging
Automatically fill-in metadata
Compose in
Template
Submit to
CMS
Approve/Edit metadata
Tagging Tool Analyst
Taxonomy Strategies LLC The business of organized information
Review content
Problem?
No
Yes
Editor
Yes
Problem?
No
Hard
Copy
Copy Edit content
Web site
Tagging Process
Doesn’t Stop Here!
Copywriter Sys Admin
60
Training Taxonomy Editors and Tagging Staff
Staff will require training on
The structure of the taxonomy
The UI they use to tag the content
The rules to follow when deciding what codes to apply
The end-effect of the codes they apply – have a running prototype or QA environment.
Tagging examples come from samples tagged during taxonomy development.
Hardcopies of the taxonomy, and yellow highlighters, are helpful during training.
Indexing UI
Indexing rules
Rule
Specificity rule
Repeatable rule
Appropriate ness rule
Usability rule
Description
Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.
All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important . Storage is cheap.
Re-creating content is expensive.
Not all attributes apply to all assets. Only supply values for attributes that make sense.
Anticipate how the asset will be searched for in the future, and how to make it easy to find it .
Remember that search engines can only operate on explicit information.
61
Taxonomy Strategies LLC The business of organized information
Tagging tool example —Interwoven MetaTagger
Auto-categorization
Manual form fill-in w/ check boxes, pulldown lists, etc.
Parse & lookup
(recognize names)
Auto keyword & summarization
Taxonomy Strategies LLC The business of organized information
Rules & pattern matching
62
Taxonomy Roadmap
How to plan for long-term taxonomy development projects?
Taxonomy Strategies LLC The business of organized information
63
Taxonomy Roadmap
Most organizations require a phased implementation of an
Enterprise Taxonomy
A Taxonomy Roadmap defines the facets to be developed, their timing, and the reasons why
Factors to consider in prioritizing the facets include:
Immediacy of application – how will the taxonomy be put into use? A
Search Engine? Portal Navigation? Other? How long will that take?
Impact
– How many users will a facet help? How big of a help will it be?
Ease of development
– does the vocabulary exist, can it be bought, or must it be developed? How big and complex will it be? How often will it change? Are there tools to help manage taxonomy changes or must those be acquired too?
What data must be tagged for that? What are the requirements on the metadata’s density and accuracy ? Can those be met with automatic methods, or will more extensive human involvement be needed?
Staff expertise and Team experience.
64
Taxonomy Strategies LLC The business of organized information
Roadmap: Dependencies
Roadmap requires an organization plan their projects well in advance, so that upcoming projects can be influenced by the taxonomy
Consequently, this is an advanced practice
Roadmap prioritizes vocabularies according to benefit, cost, and fit with projects.
Governance Team is responsible for maintaining the Roadmap and the necessary outreach .
Taxonomy Strategies LLC The business of organized information
65
Roadmap: Facet Prioritization Matrix
Facet Description Impact Effort to create/ maintain CV
Done/Low
Effort to tag
Low Language*
Format
Location*
Content Type
Organization
Subject
Publishing organization that owns content
Also referred to as topic
(benefits, travel, etc…)
Products & Services Corporate product and service offerings
Role (level of responsibility)*
Manager, employee, nonemployee
Access Control
Languages supported by portal Medium (High impact for subset)
Low File format (PDF, doc, html, etc…)
Geo, region, country, site Med-High
Also referred to as genre (news, policy, checklist, form, etc…)
Medium
Organization as audience
Medium
High
Medium
High (In use on portal, but search has limited access to secure content)
Low
Low/Low
Done/Low
Medium/Low
Medium/High
High/High
High/High
Done/Low
Medium/High
Low
Medium
Medium
Medium
Medium
High
High
High
* Facets already in existence in client’s Intranet
Taxonomy Strategies LLC The business of organized information
66
Roadmap: Timeline
Timeline lists the facets to be developed, and when those development efforts start and end.
Language
Format
Content Type
Role
Search
Search
Auto-
Classification
Tool
Taxonomy Tool Projects
Timeline shows what projects will make use of the facet, and how long that should take.
Search
CM?
FY04Q2 FY04Q3 FY04Q4
Organization
Location (Region)
Subject
Search &
Org Chart UI
Search
Search &
Portal Nav
Location
(Country)
Products/
Services
FY05Q2
Search?
Index
Search
&
Index
FY05Q3
Taxonomy Strategies LLC The business of organized information
FY05Q1
Access
Control
Sear ch?
Index
CM?
FY05Q4
67
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
68
Agenda
1:30 Welcome & Introductions
1:45 Exercise: Taxonomy Revisions
2:15 Fundamental Processes
2:30 Governance Team Roles and Structures
3:00 Tools
3:05 Break
3:15 Exercise: Organizational Self-Assessment
3:30 Maturity Model
3:40 Designing and Building Maintainable Taxonomies & Metadata
4:00 Additional Processes
4:20 Q &A
4:30 Adjourn
Taxonomy Strategies LLC The business of organized information
69
Taxonomy Strategies LLC
Ron Daniel, Jr.
925-368-8371 rdaniel@taxonomystrategies.com
Joseph Busch
415-377-7912 jbusch@taxonomystrategies.com
May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.