Taxonomy Governance - Taxonomy Strategies

Taxonomy Strategies LLC

Taxonomy Governance

Ron Daniel, Jr. & Joseph A. Busch

Taxonomy Strategies LLC

May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

2

Who we are: Joseph Busch

 Over 25 years in the business of organized information

 Founder, Taxonomy Strategies

 Director, Solutions Architecture, Interwoven

 VP, Infoware, Metacode Technologies

 Program Manager, Getty Foundation

 Manager, Pricewaterhouse

 Metadata and taxonomies community leadership

 President, American Society for Information Science & Technology

 Director, Dublin Core Metadata Initiative

 Adviser, National Research Council Computer Science and

Telecommunications Board

 Reviewer, National Science Foundation Division of Information and Intelligent

Systems

 Founder, Networked Knowledge Organization Systems/Services

3

Taxonomy Strategies LLC The business of organized information

Who we are: Ron Daniel, Jr.

 Over 15 years in the business of metadata & automatic classification

 Principal, Taxonomy Strategies

 Standards Architect, Interwoven

 Senior Information Scientist, Metacode Technologies

 Technical Staff Member, Los Alamos National Laboratory

 Metadata and taxonomies community leadership

 Chair, PRISM (Publishers Requirements for Industry Standard Metadata) working group

 Acting chair: XML Linking working group

 Member: RDF working groups

 Co-editor: PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2 reports.

Taxonomy Strategies LLC The business of organized information

4

Recent & current projects

Government

 Commodity Futures Trading Commission

Defense Intelligence Agency

ERIC

Federal Aviation Administration

Federal Reserve Bank of Atlanta

Forest Service

GSA Office of Citizen Services

( www.firstgov.gov

)

Head Start

Infocomm Development Authority of

Singapore

NASA ( nasataxonomy.jpl.nasa.gov

)

Small Business Administration

Social Security Administration

USDA Economic Research Service

USDA e-Government Program

( www.usda.gov

)

Commercial

Allstate Insurance

Blue Shield of California

Debevoise & Plimpton

Halliburton

Hewlett Packard

Motorola

PeopleSoft

Pricewaterhouse Coopers

Siderean Software

Sprint

Time Inc.

 Commercial subcontracts

Agency.com – Top financial services

Critical Mass – Fortune 50 retailer

Deloitte Consulting – Big credit card

Gistics/OTB – Direct selling giant

 NGO’s

 CEN

 IDEAlliance

IMF

OCLC

Taxonomy Strategies LLC The business of organized information

5

Participant Introductions

 Who are you?

 What do you do?

 What brings you here today?

Taxonomy Strategies LLC The business of organized information

6

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

7

Taxonomy Governance Overview

 Is “Taxonomy Governance” synonymous with “Taxonomy

Maintenance”?

 What kinds of changes can be made, and what are their costs?

 What kinds of information are needed to determine the changes?

 What kind of group should maintain the taxonomy?

 What kinds of rules should the group follow to decide on changes?

 What should the group do beyond maintaining the taxonomy?

8

Taxonomy Strategies LLC The business of organized information

Exercise: Taxonomy Modifications

 Divide into small groups

 Review assigned sample taxonomy

 Discuss changes you would make

 In 10 minutes, a spokesperson will speak for the group and briefly:

 Tell us something good about the taxonomy

 Characterize the short-term changes your group would make

 Characterize the questions your group would like answered before making other changes

Taxonomy Strategies LLC The business of organized information

9

Exercise Notes

 Team Members:

 Something good about the taxonomy:

 Short term changes:

 Questions for other changes:

Taxonomy Strategies LLC The business of organized information

10

Group 1 Sample Taxonomy

Taxonomy Strategies LLC The business of organized information

11

Group 2 Sample Taxonomy

Top Level

Business / Accounting / Firms / Directories

Business / Biotechnology & Pharmaceuticals / Education & Training

Business / Employment / By Industry

Business / Healthcare / Employment / Regional

Random Samples of

Detailed Categories

Business / Small Business / Finance / Accounting

Reference / Education / Colleges & Universities / North America / United States / Maryland / Columbia Union College / Athletics

Reference / Education / K-12 / Home Schooling / Unschooling / Chats and Forums

Regional / Europe / Ireland / Business & Economy / Employment / Health & Medical

Science / Math / Academic Departments / South America / Colombia

Science / Social Sciences / Linguistics / Translation / Associations

Society / People / Women / Science & Technology / Mathematics

12

Taxonomy Strategies LLC The business of organized information

Group 3 Sample Taxonomy

Top Level

Detail in Auto

Products Category

Taxonomy Strategies LLC The business of organized information

Source: http://householdproducts.nlm.nih.gov/products.htm

13

Predictions

 Short-term changes will center on rules of style – ‘&’ vs. ampersand, capitalization, plurals

Editorial Rules

 Faceted subdivision will only be suggested by experienced practitioners, by groups given low-level details of a taxonomy, or both. People will critique the

UI Presentation

Metadata

Specification, Design for maintainability

 Questions for Long-term changes will focus, in decreasing order, on:

 Who are the users and what are they doing?

What is the content and how much is in the various categories?

 What kind of money depends on the taxonomy, and what kind of maintenance expenses are justified?

How to put it into action?

User Characterization

Content and

Metadata

Maintenance

ROI

 Anything else people want to cover?

Taxonomy Strategies LLC The business of organized information

14

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

15

Fundamental Processes

 What are the two fundamental processes every organization should implement to maintain its metadata and taxonomies?

 Query log / Click trail examination

 Tagging Error Correction

 What are the key outlooks a taxonomist should try to instill in their organization?

16

Taxonomy Strategies LLC The business of organized information

Fundamental Process #1 – Query Log Examination

 How can we characterize users and what they are looking for?

 Query Log & Click Trail

Examination

Sophisticated software available, but don’t wait.

80/20 Rule – 80% of value from

20% of possible reports.

 Greatest value comes from:

Identifying a person as responsible for search quality

Starting a “Measure & Improve” mindset

Greatest challenge:

 Getting a person assigned ( ≥ 10%)

Getting logs turned back on

What to do after the obvious fixes have been made

Click Trail

Packages iWebTrack

NetTracker

OptimalIQ

SiteCatalyst

Visitorville

WebTrends

UltraSeek Reporting

• Top queries

• Queries with no results

• Queries with no click-through

• Most requested documents

• Query trend analysis

• Complete server usage summary

17

Taxonomy Strategies LLC The business of organized information

Fundamental Process #2 – Tagging Error Correction

 For the Taxonomy to be used, its values must be associated with content.

 We will refer to this as “Tagging”.

 Errors will happen, and some will be found. What are you going to do about them?

 Define an error correction process.

 Process will accommodate questions like:

 Is it an error? What is the cost to correct or not correct? Does the correction need to be scheduled? etc.

 Once an error is corrected, NEVER lose that fact .

 Manually reviewed pages are vital for training automatic classifiers.

 Has implications for metadata specification and review procedures.

 Over time, multiple error detection methods will be defined.

 e.g. Statistical sampling of newly added pages

 Gradually, additional error correction processes may be defined to deal with particular types of errors.

18

Taxonomy Strategies LLC The business of organized information

Fundamental Outlooks

How are we going to build and maintain metadata structures and controlled vocabularies?

 The taxonomy problem

How are we going to populate metadata elements with complete and consistent values?

 The tagging problem

How are we then going to use metadata in applications and demonstrate benefits?

 The ROI problem

 Taxonomy Governance is a standards process.

 Take tips from other standards efforts

 Team, with comment-handling responsibilities and an appeals process

 Issue Logs

 Announcements

 Release Schedule

Must know this to address other problems!

 Foster a “Measure &

Improve” Mindset

19

Taxonomy Strategies LLC The business of organized information

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

20

Taxonomy Business Processes

 Taxonomies must change, gradually, over time if they are to remain relevant

 Maintenance processes need to be specified so that the changes are based on rational cost/benefit decisions

 A team will need to maintain the taxonomy on a parttime basis

 Taxonomy team reports to some other steering committee

21

Taxonomy Strategies LLC The business of organized information

Definitions about the Controlled Vocabulary

Governance Environment

1: Syndicated

Terminologies change on their own schedule

Change Requests

& Responses

Published

CVs and STs

Consuming

Applications

Web CMS

Syndicated

Terminologies

2: CV Team decides when to update CVs

Archives

ISO

3166-1

Intranet

Search

Other

External

Vocabulary

Management

System

Notifications

ERMS ’’

CVs

Intranet

Nav.

ERP

3: Team adds value via mappings, translations, synonyms, training materials, etc.

DAM

Custodians …

Other

Internal

Other

Controlled

Items

4: Updated versions of CVs published to consuming applications

… ’’

Controlled Vocabulary Governance

Environment

Taxonomy Strategies LLC The business of organized information

22

Other Controlled Items

 Taxonomy Team will have additional items to manage:

 Charter, Goals, Performance Measures

 Editorial rules

 Team processes

 Tagger training materials (manual and automatic)

 Outreach & ROI

 Communication plan

 Website

 Presentations

 Announcements

 Roadmap

Taxonomy Strategies LLC The business of organized information

23

Taxonomy governance | Generic team charter

 Taxonomy Team is responsible for maintaining:

 The Taxonomy, a multi-faceted classification scheme

 Associated taxonomy materials, such as:

Editorial Style Guide

Taxonomy Training Materials

Metadata Standard

Team rules and procedures (subject to CIO review)

 Team evaluates costs and benefits of suggested change

 Taxonomy Team will:

 Manage relationship between providers of source vocabularies and consumers of the Taxonomy

 Identify new opportunities for use of the Taxonomy across the Enterprise to improve information management practices

 Promote awareness and use of the Taxonomy

24

Taxonomy Strategies LLC The business of organized information

Editorial Rules

 To ensure consistent style, rules are needed

Issues commonly addressed in the rules:

Sources of Terms

Abbreviations

Ampersands

Capitalization

Continuations (More… or Other…)

Duplicate Terms

Hierarchy and Polyhierarchy

Languages and Character Sets

Length Limits

“Other” – Allowed or Forbidden?

Plural vs. Singular Forms

Relation Types and Limits

Scope Notes

Serial Comma

Spaces

Synonyms and Acronyms

Term Order (Alphabetic or …)

Term Label Order (Direct vs. Inverted)

 Must also address issue of what to do when rules conflict – which are more important?

Rule Name

Use Existing

Vocabularies

Ampersands

Special

Characters

Serial comma

Capitalization

Taxonomy Strategies LLC The business of organized information

Editorial Rule

Other things being equal, reusing an existing vocabulary is preferred to creating a new one.

The character '&' is preferred to the word ‘and’ in

Term Labels.

Example: Use Type: “Manuals & Forms”, not

“Manuals and Forms”.

Retain accented characters in Term Labels.

Example: España

If a category name includes more than two items, separate the items by commas. The last item is separated by the character ‘&’ which IS

NOT preceded by a comma.

Example: “Education, Learning & Employment”, not “Education, Learning, & Employment”.

Use title case (where all words except articles are capitalized).

Example: “Education, Learning & Employment”

NOT “Education, learning & employment”

NOT “EDUCATION, LEARNING &

EMPLOYMENT”

NOT “education, learning & employment”

25

Roles in Two Taxonomy Governance Teams

Executive Sponsor

Advocate for the taxonomy team

 Business Lead

 Keeps team on track with larger business objectives

 Balances cost/benefit issues to decide appropriate levels of effort

 Specialists help in estimating costs

 Obtains needed resources if those in team can’t accomplish a particular task

 Technical Specialist

 Estimates costs of proposed changes in terms of amount of data to be retagged, additional storage and processing burden, software changes, etc.

 Helps obtain data from various systems

 Content Specialist

 Team’s liaison to content creators

 Estimates costs of proposed changes in terms of editorial process changes, additional or reduced workload, etc.

 Small-scale Metadata QA Responsibility

Taxonomy Strategies LLC The business of organized information

 Taxonomy Specialist

 Suggests potential taxonomy changes based on analysis of query logs, indexer feedback

 Makes edits to taxonomy, installs into system with aid of IT specialist

 Content Owner

 Reality check on process change suggestions

Team structure at a different org.

Business Lead

Custodians

 Responsible for content in a specific CV.

Training Representative

 Develops communications plan, training materials

Work Practices Representative

 Develops processes, monitors adherence

IT Representative

 Backups, admin of CV Tool

Info. Mgmt. Representative

 Provides CV expertise, tie-in with larger IM effort in the organization.

26

Taxonomy governance | Where changes come from

Application

Logic

Recommendations by Editor

1. Small taxonomy changes

(labels, synonyms)

2. Large taxonomy changes

(retagging, application changes)

3.

New “best bets” content

Taxonomy Team

Team considerations

1. Business goals

2. Changes in user experience

3. Retagging cost

Requests from other parts of the organization

Taxonomy Strategies LLC The business of organized information

27

Processes

 Different organizations will need to consider their own change processes.

 Organization 1: A custodian is responsible for the content, but checks facts with department heads before making changes.

 Organization 2: Analysts suggest changes, editors approve, copyeditors verify consistency.

 Change process MUST also consider cost of implementing the change

 Retagging data

Reconfiguring auto-classifier

Retraining staff

 Changes in user expectations

Taxonomy Change Cases

Case 1.

Renaming a term

Case 2.

Adding a new leaf term

Case 3.

Inserting a new term

Case 4.

Splitting a term

Case 5.

Deleting a leaf term or subtree

Case 6.

Deleting a term

Case 7.

Moving a subtree

Case 8.

Merging terms

Case 9.

Adding a CV

Case 10. Deleting a CV

28

Taxonomy Strategies LLC The business of organized information

Taxonomy governance | Taxonomy maintenance workflow

Taxonomy Tool

Suggest new name/category

Analyst

Yes

Problem?

No

Review new name

Problem?

No

Yes

Editor

Copy edit new name

Copywriter

Add to enterprise

Taxonomy

Taxonomy

Sys Admin

29

Taxonomy Strategies LLC The business of organized information

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

30

Taxonomy editing tools vendors

Most popular taxonomy editor? MS Excel

Immature industry

– no vendors in upper-right quadrant!

Widely used, cheap, single-user

Niche Players

Completeness of Vision

Visionaries

Taxonomy Strategies LLC The business of organized information

High functionality, high cost ($100k!)

31

Sample Taxonomy Editor Functionality

 Standard and Custom Fields

 Standard and Custom Relations

 Data Typing, Restrictions, and

Inference

 Flexible Reporting

 Flexible Importing

 Multiple Vocabulary Support

 Inter-Vocabulary Relations

 Unique IDs

 ISO Codes not sufficient

 Workflow

 Voting

 Change Request Management

 Programmability

Hierarchy

Browser

Taxonomy Strategies LLC The business of organized information

Term

Editing

32

Where do I put the metadata?

 Where can I store metadata?

In the content – HTML Headers, File properties, etc.

In a centralized repository – Search index, MDDB, etc.

In multiple systems – Common case

 Where should I store metadata?

 Consultant’s answer – “It depends.”

 If you are moving files through a process, putting it in the file keeps it from getting dropped at system borders.

 If you are doing search across multiple documents, it has to be at least copied out of the files.

 If you make copies of files and modify them, consistent in-file metadata will be impossible.

 Real question is not where to STORE the metadata, it is how to

MAINTAIN the metadata.

Web CMS as an example.

Central Metadata Database is a very advanced practice.

33

Taxonomy Strategies LLC The business of organized information

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

34

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

35

What Processes Should I Try to Institute?

 Processes will vary from one organization to another.

 Assessing the Organization’s state is the first step.

 Determining the ROI and potential resources follows.

 Plan on instituting processes over time, beginning with basic ones.

Taxonomy Strategies LLC The business of organized information

36

Search and Metadata Self-Assessment Form

1)

2)

3)

Background

Rate your organization’s search & metadata maturity from 1 to 10.

What was the most recent change to your organization’s search & metadata processes?

What is the next step for your organization’s search & metadata processes?

8)

9)

10)

Are system features and metadata fields added based on cost/benefit analysis, rather than things that are easy to do with the current tools?

Are tools only acquired after requirements have been analyzed, or are major purchases sometimes made to use up yearend money?

Are there hiring and training practices especially for metadata and taxonomy positions? If so, describe briefly.

6)

4)

5)

7)

Basic

Is there a process in place to examine query logs?

Is there an organization-wide metadata standard, such as an extension of the

Dublin Core, for use by search tools, multiple repositories, etc.?

Intermediate

Is there an ongoing data cleansing procedure to look for ROT (Redundant,

Obsolete, Trivial content)? If so, describe briefly.

11)

Advanced

Are there established qualitative and quantitative measures of metadata quality?

If so, describe briefly.

13)

14)

15)

12) Can the CEO explain the ROI for search and metadata?

Optional

Your name:

Organization:

E-mail: Does the search engine index more than

4 repositories around the organization?

Contact information will not be used for marketing purposes. It will only be used to follow-up and clarify issues around the survey.

Taxonomy Strategies LLC The business of organized information

37

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

38

Metadata Maturity Model

 Taxonomy governance processes must fit the organization

 As consultants, we notice different levels of maturity in the business processes around Content Management, Taxonomy, and

Metadata

 Honestly assess your organization’s metadata maturity in order to design appropriate governance processes

 We are starting to define a maturity model, similar to the CMMI model in the software world.

39

Taxonomy Strategies LLC The business of organized information

Metadata Maturity Model

Shameless Plug: Tomorrow Morning at 9:45

Call for Data: Leave Self-Assessments with us

Process Areas

Search Capabilities

Basic

Uniform Search

Box

Query Log Exam.

Metadata and taxonomy standards

System MD Stds.

Tools and tool selection Requirements, then

Tools

Maturity Levels

Intermediate

Index Multiple

Best Bets

Simple Grouping

Organization MD

Std.

Reuse ERP

Bakeoff Datasets

Advanced

Intranet Facet

Navigation

Improved Ranking

Multiple Repos.

Comply

Taxonomy

Roadmap

Budget for Bakeoffs

Bleeding Edge

Highly Abstract

Subject

Taxonomies

Staff training and hiring Search Analyst

Role

Data creation and QA CM Introduced

Librarian Expertise Pre-hire Testing

ROT-Elimination Hybrid Creation

Model

SME Catalogers

Adaptive

Qualification

Quality Measures

Limiting

Processes

Unneeded

Capabil.

Tools, then Reqs.

Project management Project Plan

Executive support and

ROI

External Search

ROI

Std. Proj. Methodol.

X-Functional

Teams

Communication

Plan

Multi-Year Plan

Early Termination

Intranet ROI Model CEO knows Search

ROI

Taxonomy Strategies LLC The business of organized information

Use it or Lose It

Budgets

40

Purpose of Maturity Model

 Estimating the maturity of an organization’s information management processes tells us:

 How involved the taxonomy development and maintenance process should be

 Overly sophisticated processes will fail

 What to recommend as next steps

 Maturity is not a goal, it is a characterization of an organization’s methods for achieving particular goals.

 Mature processes have expenses which must be justified by consequent cost savings or revenue gains.

 Metadata Maturity may not be core to your business.

41

Taxonomy Strategies LLC The business of organized information

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

42

Overview of Best Practices in Metadata and

Taxonomy

 Avoid monolithic ‘subject’ taxonomies

 May have a browsing taxonomy constructed from combined facets.

 Use (or map to) Dublin Core for basic information.

 Extend with custom elements for specific facts.

 Use pre-existing, standard, vocabularies as much as possible.

 Validate author names with LDAP directory

 ISO country codes for locations

 Product & service info from ERP system

 Designate a team to manage the taxonomies and related materials

 Taxonomy Editorial Rules, Processes, Training materials, Outreach & ROI

 Design a Metadata QC Process

 Start with an error-correction process, then get more formal on error detection.

 In the future, large-scale ontologies like CYC may be valuable in automated error detection.

43

Taxonomy Strategies LLC The business of organized information

Factor “Subject” into smaller facets

Size

 DMOZ tries to organize all web content, has more than

600k categories!

 Difficulty in navigating, maintaining

 Hidden facet structure

“Classification Schemes” vs.

“Taxonomies”

Taxonomy Strategies LLC The business of organized information

44

Sources for 7 common vocabularies

Vocabulary Definition Potential Sources

Organization

Content Type

Industry

Location

Topic

Audience

Products and

Services

Function

Organizational structure.

Structured list of the various types of content being managed or used.

FIPS 95-2, U.S. Government Manual, Your organizational structure , competitors, partners, regulators, etc.

DC Types, AGLS Document Type, AAT

Information Forms , Records management policy, etc.

Broad market categories such as lines of business, life events, or industry codes.

FIPS 66, SIC, NAICS , etc.

Place of operations or constituencies.

FIPS 5-2, FIPS 55-3, ISO 3166 , UN

Statistics Div, US Postal Service, etc.

Business topics relevant to your mission and goals.

Federal Register Thesaurus, NAL

Agricultural Thesaurus, LCSH, etc.

GEM, ERIC Thesaurus, IEEE LOM, etc.

Subset of constituents to whom a piece of content is directed or intended to be used.

Names of products/programs & services.

Functions and processes performed to accomplish mission and goals.

ERP system, Your products and services, etc.

FEA Business Reference Model, Enterprise

Ontology, AAT Functions, etc.

Taxonomy Strategies LLC The business of organized information

45

Facet Principles

 Basic facets with identified items – people, places, projects, instruments, missions, organizations, … Note that these are not subjective “subjects”, they are objective

“objects”.

 Subjective views can be laid on top of the objective facts, but should be in a different namespace so they are clearly distinguishable.

 For example, labels like “Anarchist” or “Prime Minister” can be applied to the same person at different times (e.g. Nelson

Mandela).

46

Taxonomy Strategies LLC The business of organized information

Iterative Development Vision (More participants and tagged content at each iteration)

1 Identify

Objectives

2 Inventory

Content

Interview core team and stakeholders

ID sources, spider assets & extract metadata

Review tagged samples, default procedures

Gather additional sources, if any

Revise if needed, bake into alpha CMS

Interview alpha users

Gather additional sources, if any

Interview beta users

3 Specify

Metadata

Define fields & purpose

Modify CMS for beta

Modify for 1.0

4 Model

Content

Define content chunks & XML

DTDs

Revise if needed, bake into alpha CMS

5 Specify

Vocabularies

Compile controlled vocabularies

Revise, use in alpha CMS

6 Specify

Procedures

Start with UI sketches, off-the-shelf rules.

Tailor the default materials alpha workflows in

CMS

7 Train Staff

Manually tag small sample

Use alpha CMS to tag larger sample

Stage

Participants

Plan & Prototype

Project Team

Taxonomy Strategies LLC The business of organized information

Alpha Dev & Test

Stakeholders and SMEs

Modify CMS for beta

Modify for 1.0

Revise, use in beta

CMS

Modify & extend workflows

Revise using team procedur e

Finalize procedure materials

Use beta CMS to tag larger sample

Beta D&T

Friendly Users

Finalize training materials & train staff

Final D&T

Audiences

47

Planning for Taxonomy Changes

 Error Correction – What to do when end-users and tagging staff notice problems?

 Provide for it in the Error Correction Process

 Add Query Log Analysis to help detect user problems

 How to answer questions re. things to add, delete, or rearrange in the taxonomy?

 Keep a visible issue log

 Discuss with SMEs, tag samples, use other testing methods

 Per-facet changes:

 Corporate reorganizations, Product lineup changes, Country splits

& merges, … will happen. Prepare for them when deploying those facets

 Long-term – what facets to create, when, and why

 See Taxonomy Roadmap section

48

Taxonomy Strategies LLC The business of organized information

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 Brief remarks on Measurements, ROI, Training, Roadmap

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

49

Measuring Metadata and Taxonomy Quality

 Taxonomy development is an iterative process

 Develop an organizational idea, then test it by tagging sample content

 Elicit feedback via walk-throughs and card sorting exercises

 Use both qualitative and quantitative methods

 Time, budget, and availability of tagged data will determine what methods are possible.

50

Taxonomy Strategies LLC The business of organized information

Taxonomy testing | Qualitative methods

Method

Walk-throughs

Include sample pages in walkthroughs, not just the hierarchy.

Usability Testing

User Satisfaction

Tagging samples

Survey

Process

Show and explain

Card sorting,

Contextual analysis

Tag sample content with taxonomy

Taxonomy Strategies LLC The business of organized information

Validation

 Approach

 Consistency to rules

 Accuracy (SME Checking)

 Appropriateness to task

 Repeatability of user classification

 Tasks are completed successfully

 Time to complete task is reduced

 Reaction to new interface

 Reaction to search results

 Content ‘fit’

 Fills out content inventory

 Training materials for people & algorithms

 Basis for quantitative methods

51

Tagged Samples

 The Taxonomy must fit the content.

 How to verify this? Tag samples!

 Spreadsheets are a convenient tool for this. URLs, drop-down choosers, text notes all allowed.

 Team can review tagged samples when reviewing taxonomy

 More sophisticated teams may test inter-cataloger agreement

 Samples should appear in training materials for tagging staff

 Show typical and unusual cases.

DOCUMENT URL

Metadata

Element

URL

Metadata Value

Headline sixbits.atl.frb.org/invoke.cfm?objectid=A01B3

0D1-10C2-11D6-

981100508B104751&method=display

Innovation Awards

Organization Federal Reserve Bank of Atlanta

Content Type Honors & Awards

Subject Salary & Compensation?

FACET A FACET B FACET C FACET D MISSING

IDEAS

 Samples are used to define training sets for automatic classifiers.

52

Taxonomy Strategies LLC The business of organized information

Quantitative Method | How evenly does it divide the content?

Background:

Documents do not distribute uniformly across categories

 Zipf (1/x) distribution is expected behavior

 80/20 rule in action (actually 70/20 rule)

Measured and Expected Distribution of Top 10 Content Types in Library of Congress Database

350,000

300,000

250,000

200,000

150,000

100,000

50,000

0

C on gr es se s

B io gr ap hy

P er io di ca ls

M ap s

Fi ct io n

E iti on s xh ib

Ju ve ni le

li te ra tu re

B ib lio gr ap hy

S ta tis tic s

Top 10 Content Types

Series2

Series1

Methodology:

Part of alpha test of ‘content type’ for corporate intranet

 115 URLs selected at random from search index were manually categorized.

Inaccessible files and ‘junk’ were removed

Results:

Results were slightly more uniform than the Zipf distribution, which is better than expected

25

20

15

10

5

0

Measured and Expected Distribution of Content Types in an

Intranet

Measured

Expected

Content Type

53

Taxonomy Strategies LLC The business of organized information

Quantitative Method | How intuitive (repeatable) are the categorizations?

 Methodology: Closed Card Sort

 For alpha test of a grocery site

 15 Testers put each of 100 bestselling products into one of 10 predefined categories

 Categories where fewer than 14 of 15 testers put product into same category were flagged

 Results:

“Cocoa Drinks – Powder” is best categorized in both “Beverages” and “Grocery”.

% of

Testers

15/15

14/15

13/15

12/15

11/15

<11/15

Cumulative % of Products

54%

70%

77%

83%

85%

100%

Taxonomy Strategies LLC The business of organized information

In the trade, “Corn Tortillas” are a Dairy item!

54

Quantitative Method | How does taxonomy “shape” match that of content?

 Background:

 Hierarchical taxonomies allow comparison of “fit” between content and taxonomy areas

 Methodology:

 25,380 resources tagged with taxonomy of 179 terms. (Avg. of 2 terms per resource)

 Counts of terms and documents summed within taxonomy hierarchy

 Results:

 Roughly Zipf distributed ( top 20 terms: 79%; top 30 terms: 87%)

 Mismatches between term% and document% flagged

Term Group

Administrators

Community Groups

Counselors

Federal Funds Recipients and Applicants

Librarians

News Media

Other

Parents and Families

Policymakers

Researchers

School Support Staff

Student Financial Aid

Providers

Students

Teachers

%

Terms

7.8

2.8

3.4

9.5

2.8

0.6

7.3

2.8

4.5

2.2

2.2

1.7

27.4

25.1

% Docs

15.8

1.8

1.4

34.4

6.0

11.5

3.6

0.2

0.7

1.1

3.1

2.0

7.0

11.4

Source: Courtesy Keith Stubbs, US. Dept. of Education

55

Taxonomy Strategies LLC The business of organized information

Taxonomy ROI

 What level of effort in taxonomy creation and maintenance is justified?

Taxonomy Strategies LLC The business of organized information

56

Fundamentals of Taxonomy ROI

 Building and maintaining a taxonomy, and tagging data with it, are costs not benefits.

 There is no benefit without exposing the tagged data to users in some way that cuts costs or improves revenues.

 Putting a new taxonomy into operation requires UI changes and/or backend system changes.

 You need to determine those changes, and their costs, as part of the taxonomy ROI.

57

Taxonomy Strategies LLC The business of organized information

Common Taxonomy ROI Scenarios

 Catalog site - ROI based on increased sales through improved

 product findability product cross-sells and up-sells customer loyalty

 Call center - ROI based on cutting costs through

 fewer customer calls due to improved website self-service faster, more accurate CSR responses through better information access

 Knowledge worker productivity - ROI based on cutting costs through

 less time searching for things less time recreating existing materials, with knock-on benefits of less confusion and reduced storage and backup costs

 Executive mandate

 No ROI at the start, just someone with a vision and the budget to make it happen.

58

Taxonomy Strategies LLC The business of organized information

Tagging and Training

 How are we going to populate metadata elements with complete and consistent values?

 The tagging problem

 How are we going to get people (and/or software) to assign consistent, and accurate, metadata to the content?

 The tagger training problem

59

Taxonomy Strategies LLC The business of organized information

Taxonomy governance: Workflow-driven metadata tagging

Automatically fill-in metadata

Compose in

Template

Submit to

CMS

Approve/Edit metadata

Tagging Tool Analyst

Taxonomy Strategies LLC The business of organized information

Review content

Problem?

No

Yes

Editor

Yes

Problem?

No

Hard

Copy

Copy Edit content

Web site

Tagging Process

Doesn’t Stop Here!

Copywriter Sys Admin

60

Training Taxonomy Editors and Tagging Staff

 Staff will require training on

The structure of the taxonomy

The UI they use to tag the content

 The rules to follow when deciding what codes to apply

 The end-effect of the codes they apply – have a running prototype or QA environment.

 Tagging examples come from samples tagged during taxonomy development.

 Hardcopies of the taxonomy, and yellow highlighters, are helpful during training.

Indexing UI

Indexing rules

Rule

Specificity rule

Repeatable rule

Appropriate ness rule

Usability rule

Description

Apply the most specific terms when tagging assets. Specific terms can always be generalized, but generic terms cannot be specialized.

All attributes should be repeatable. Use as many terms as necessary to describe What the asset is about and Why it is important . Storage is cheap.

Re-creating content is expensive.

Not all attributes apply to all assets. Only supply values for attributes that make sense.

Anticipate how the asset will be searched for in the future, and how to make it easy to find it .

Remember that search engines can only operate on explicit information.

61

Taxonomy Strategies LLC The business of organized information

Tagging tool example —Interwoven MetaTagger

Auto-categorization

Manual form fill-in w/ check boxes, pulldown lists, etc.

Parse & lookup

(recognize names)

Auto keyword & summarization

Taxonomy Strategies LLC The business of organized information

Rules & pattern matching

62

Taxonomy Roadmap

 How to plan for long-term taxonomy development projects?

Taxonomy Strategies LLC The business of organized information

63

Taxonomy Roadmap

 Most organizations require a phased implementation of an

Enterprise Taxonomy

 A Taxonomy Roadmap defines the facets to be developed, their timing, and the reasons why

 Factors to consider in prioritizing the facets include:

Immediacy of application – how will the taxonomy be put into use? A

Search Engine? Portal Navigation? Other? How long will that take?

Impact

– How many users will a facet help? How big of a help will it be?

Ease of development

– does the vocabulary exist, can it be bought, or must it be developed? How big and complex will it be? How often will it change? Are there tools to help manage taxonomy changes or must those be acquired too?

 What data must be tagged for that? What are the requirements on the metadata’s density and accuracy ? Can those be met with automatic methods, or will more extensive human involvement be needed?

 Staff expertise and Team experience.

64

Taxonomy Strategies LLC The business of organized information

Roadmap: Dependencies

 Roadmap requires an organization plan their projects well in advance, so that upcoming projects can be influenced by the taxonomy

 Consequently, this is an advanced practice

 Roadmap prioritizes vocabularies according to benefit, cost, and fit with projects.

 Governance Team is responsible for maintaining the Roadmap and the necessary outreach .

Taxonomy Strategies LLC The business of organized information

65

Roadmap: Facet Prioritization Matrix

Facet Description Impact Effort to create/ maintain CV

Done/Low

Effort to tag

Low Language*

Format

Location*

Content Type

Organization

Subject

Publishing organization that owns content

Also referred to as topic

(benefits, travel, etc…)

Products & Services Corporate product and service offerings

Role (level of responsibility)*

Manager, employee, nonemployee

Access Control

Languages supported by portal Medium (High impact for subset)

Low File format (PDF, doc, html, etc…)

Geo, region, country, site Med-High

Also referred to as genre (news, policy, checklist, form, etc…)

Medium

Organization as audience

Medium

High

Medium

High (In use on portal, but search has limited access to secure content)

Low

Low/Low

Done/Low

Medium/Low

Medium/High

High/High

High/High

Done/Low

Medium/High

Low

Medium

Medium

Medium

Medium

High

High

High

* Facets already in existence in client’s Intranet

Taxonomy Strategies LLC The business of organized information

66

Roadmap: Timeline

Timeline lists the facets to be developed, and when those development efforts start and end.

Language

Format

Content Type

Role

Search

Search

Auto-

Classification

Tool

Taxonomy Tool Projects

Timeline shows what projects will make use of the facet, and how long that should take.

Search

CM?

FY04Q2 FY04Q3 FY04Q4

Organization

Location (Region)

Subject

Search &

Org Chart UI

Search

Search &

Portal Nav

Location

(Country)

Products/

Services

FY05Q2

Search?

Index

Search

&

Index

FY05Q3

Taxonomy Strategies LLC The business of organized information

FY05Q1

Access

Control

Sear ch?

Index

CM?

FY05Q4

67

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

68

Agenda

 1:30 Welcome & Introductions

 1:45 Exercise: Taxonomy Revisions

 2:15 Fundamental Processes

 2:30 Governance Team Roles and Structures

 3:00 Tools

 3:05 Break

 3:15 Exercise: Organizational Self-Assessment

 3:30 Maturity Model

 3:40 Designing and Building Maintainable Taxonomies & Metadata

 4:00 Additional Processes

 4:20 Q &A

 4:30 Adjourn

Taxonomy Strategies LLC The business of organized information

69

Taxonomy Strategies LLC

Contact Info

Ron Daniel, Jr.

925-368-8371 rdaniel@taxonomystrategies.com

Joseph Busch

415-377-7912 jbusch@taxonomystrategies.com

May 16, 2005 Copyright 2005 Taxonomy Strategies LLC. All rights reserved.