Making the Business Case for Taxonomy

Taxonomy Strategies LLC
Testing & Usability: Making It Work
Joseph A. Busch & Ron Daniel, Jr.
September 28, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Agenda
 Qualitative methods
 Quantitative methods
Taxonomy Strategies LLC The business of organized information
2
Qualitative taxonomy testing methods
Method
Process
Who
Requires
Validation
Walk-thru
Show &
explain
 Taxonomist
 SME
 Team
 Rough
taxonomy
 Approach
 Appropriateness to task
Walk-thru
Check
conformance
to editorial
rules
 Taxonomist
 Draft
taxonomy
 Editorial
Rules
 Consistent look and feel
Usability
Testing
Contextual
analysis (card
sorting,
scenario
testing, etc.)
 Users
 Rough
taxonomy
 Tasks &
Answers
 Tasks are completed
successfully
 Time to complete task is
reduced
User
Satisfaction
Survey
 Users
 Rough
Taxonomy
 UI Mockup
 Search
prototype
Reaction to taxonomy
Reaction to new interface
Reaction to search results
Tagging
Samples
Tag sample
content with
taxonomy
 Taxonomist
 Team
 Indexers
 Sample
content
 Rough
taxonomy
(or better)
Content ‘fit’
Fills out content inventory
Training materials for people &
algorithms
Basis for quantitative
methods
Taxonomy Strategies LLC The business of organized information
3
Walk-through method—
Show & explain
Public Utility XYZ
Audiences
Organizations
General Audience
Business
Customer-Owners
Employees
Education
Finance
Job Seekers
Media
National
Power Industry
Recreation Interests
Regional
Regulators
Licensing &
Compliance
Stakeholders
Retirees
Administration
Finance &
Technology
Distribution Services
Generation
Customer &
Environmental
Services
Corporate &
Treasury
Services
Careers
Commissioners
Customer Service
Distribution
Education
Environmental
Fish & Wildlife
Forestry & Tree
Trimming
Hydro
Parks
Pwr Conservation
Pwr Industry Info
Power Mgmt
Procurement
Public Info
Recreation
Licensing &
Compliance
Retiree Info
Safety
SNAP
Tours
Water/Wastewater
Wtr Conservation
Wholesale Fiber
Other Services
Taxonomy Strategies LLC The business of organized information
Facilities
Hydro Projects
Hatcheries
Parks
Water Wastewater
Fiber Networks
Distribution System
Substations &
Switchyards
Transmission
Support Facilities
Communication
Sites
Utility Systems
Content Types
Communication
Equipment
Conductors &
Devices
Conduit
Electric Equipment
Accessories
Equipment - Misc. by
Service
Fiber Backbone
Fiber Customer
Connections
Fiber Distribution
Fire Mains
Fisheries Equipment
Franchises &
Consents
Fuel Tanks &
Accessories
Generators,
Turbines &
Waterwheels
Hydrants
Laboratory
Equipment
Land & Land Rights
by Service
etc.
General Information
Agenda
Annual report
Audio
Brochure
Budget
Contract
Correspondence
Directory
Drawing
Form
FAQ
Job Listing
Map
Memo
Minutes
Newsletter
Photo
Plan
News Release
Presentation
Procedure
Report
Schedule
Standard
Video
4
Walk-through method—
Editorial rules consistency check















Abbreviations
Ampersands
Capitalization
General…, More…, Other…
Languages & character sets
Length limits
Multiple parents
Plural vs. singular form
Scope notes
Serial comma
Sources of terms
Spaces
Synonyms & acronyms
Term order (Alphabetic or …)
Term label order (Direct vs.
inverted)
Rule Name
Abbreviations
Abbreviations, other than colloquial
terms and acronyms, shall not be used
in term labels.
Example:
Public Information
NOT:
Public Info.
Ampersands
The ampersand [&] character shall be
used instead of the word ‘and’.
Example:
Licensing & Compliance
NOT:
Licensing and Compliance
Capitalization
Title case capitalization shall be used.
Example: Customer Service
NOT:
CUSTOMER SERVICE
NOT:
Customer service
NOT:
customer service
General…,
More…,
Other…
The term labels “General…”, “More…”,
and “Other…” shall be used for
categories which contain content items
that are not further classifiable.
Example:
“Other Property”
“Other Services”
“General Information”
“General Audience”
…
…
…
Taxonomy Strategies LLC The business of organized information
Editorial Rule
5
Usability testing method—
Task-based card sorting (1)
 15 representative questions were selected
 Perspective of various organizational units
 Most frequent website searches
 Most frequently accessed website content
 Correct answers to the questions were agreed in advance by team.
 15 users were tested
 Did not work for the organization
 Represented target audiences
 Testers were asked “where would you look for …”
 “under which facet… Topic, Commodity, or Geography?”
 Then, “… under which category?”
 Then, “…under which sub-category?”
 Tester choices were recorded
 Testers were asked to “think aloud”
 Notes were taken on what they said
 Pre- and post questions were asked
 Tester answers were recorded
Taxonomy Strategies LLC The business of organized information
6
Usability testing method—
Task-based card sorting (2)
3. What is the average
farm income level in
your state?
1. Topics
2. Commodities
3. Geographic Coverage
1.
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
1.10
Topics
Agricultural Economy
Agriculture-Related Policy
Diet, Health & Safety
Farm Financial
Conditions
Farm Practices &
Management
Food & Agricultural
Industries
Food & Nutrition
Assistance
Natural Resources &
Environment
Rural Economy
Trade & International
Markets
Taxonomy Strategies LLC The business of organized information
1.4
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.6
1.4.7
Farm Financial
Conditions
Costs of Production
Commodity Outlook
Farm Financial
Management &
Performance
Farm Income
Farm Household
Financial Well-being
Lenders & Financial
Markets
Taxes
7
Analysis of task-based card sorting (1)
Find-it Tasks
User 1
User 2
User 3
User 4
User 5
1. Cotton
Cotton
Cotton
Asia
Cotton
Cotton
2. Mad cow
Cattle
Food Safety
Cattle
Cattle
Cattle
3. Farm income
Farm Income
Farm Income
US States
Farm Income
Farm Income
4. Fast food
Food
Consumption
Diet Quality &
Nutrition
Food
Expenditures
Diet Quality &
Nutrition
Diet Quality &
Nutrition
5. WIC
WIC Program
WIC Program
WIC Program
WIC Program
WIC Program
6. GE Corn
Corn
Corn
Corn
Corn
Corn
7. Foodborne illness
Foodborne
Disease
Foodborne
Disease
Consumer Food
Safety
Foodborne
Disease
Foodborne
Disease
Retailing &
Wholesaling
8. Food costs
Food Prices
Market Structure
Market Analysis
Food
Expenditures
9. Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
Tobacco
10. Small Farms
Farm Structure
Farm Structure
Farm Structure
Farm Structure
Farm Structure
11. Traceability
Food System
Labeling Policy
Food Safety
Innovations
Food Safety
Policy
Food Prices
12. Hunger
Food Security
Food Security
Food Security
Food Security
Food Security
13. Trade balance
Commodity
Trade
Trade & Intl
Markets
Commodity
Trade
Market Analysis
Commodity
Trade
14. Conservations
Cropping
Practices
Conservation
Policy
Conservation
Policy
Conservation
Policy
Conservation
Policy
Trade Policy
Food Safety &
Trade
Market Analysis
Commodity
Trade
15. Trade restrictions
WTO
Analysis of task-based card sorting (2)
 In 80% of the trials users looked for information under the
categories that we expected them to look for it.
 Breaking-up topics into facets makes it easier to find
information, especially information related to
commodities.
Taxonomy Strategies LLC The business of organized information
9
Analysis of task-based card sorting (3)
Test Questions
%
Correct
%
Agree
1. Cotton
91%
82%
2. Mad cow
73%
64%
100%
55%
91%
73%
5. WIC
100%
100%
6. GE corn
100%
100%
7. Foodborne illness
82%
82%
8. Food costs
55%
27%
100%
100%
10. Small farms
91%
91%
11. Traceability
36%
18%
100%
73%
13. Trade balance
36%
64%
14. Conservation
91%
91%
15. Trade restrictions
55%
36%
3. Farm income
4. Fast food
9. Tobacco
12. Hunger
Taxonomy Strategies LLC The business of organized information
Possible change required.
Change required.
Policy of “Traceability” needs to be clarified.
Use quasi-synonyms.
On these trials, only 50% looked in the right
category, & only 27-36% agreed on the
category.
Possible error in categorization of this
question because 64% thought the answer
should be “Commodity Trade.”
10
User satisfaction method—
Card Sort Questionnaire (1)
 Was it easy, medium or difficult to choose the appropriate
Topic?
– Easy
– Medium
– Difficult
 Was it easy, medium or difficult to choose the appropriate
Commodity?
– Easy
– Medium
– Difficult
 Was it easy, medium or difficult to choose the appropriate
Geographic Coverage?
– Easy
– Medium
– Difficult
Taxonomy Strategies LLC The business of organized information
11
User satisfaction method—
Card Sort Questionnaire (2)
More Difficult
Easier
Difficult
1.50
-->
1.00
Easy
2.00
0.50
Topic
Commodity
Geography
Facet
Taxonomy Strategies LLC The business of organized information
12
User interface survey—
Which search UI is ‘better’?
 Criteria
 User satisfaction
 Success completing tasks
 Confidence in results
 Fewer dead ends
 Methodology
 Design tasks from specific to




general
Time performance
Calculate success rates
Survey subjective criteria
Pay attention to survey
hygiene:
–
–
–
Participant selection
Counterbalancing
T-scores
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
13
User interface survey — Results (1)
Which Interface would you rather use for these tasks?
Find images of roses
Google-like
Baseline
Faceted
Category
15
16
Find all works from a certain period
2
30
Find pictures by 2 artists in the same media
1
29
…
Overall assessment:
Google-like
Baseline
Faceted
Category
More useful for your usual tasks
4
28
Easiest to use
8
23
Most flexible
6
24
28
3
Helped you learn more
1
31
Overall preference
2
29
More likely to result in dead-ends
…
Source: Yee, Swearingen, Li, & Hearst
Taxonomy Strategies LLC The business of organized information
14
User interface survey — Results (2)
9
8
7
6
5
4
3
2
1
0
y
s
a
E
7.6
7.7
7.2
6.7
6.0
6.3
4.7
5.8
7.8
7.4
6.0
5.5
4.8
4.0
4.6
3.5
to
e
Us
m
Si
e
pl
e
Fl
le
b
i
x
ou
i
d
e
T
Google-like Baseline
Faceted Category
Taxonomy Strategies LLC The business of organized information
s
In
re
te
in
st
g
Ea
sy
to
ow
r
B
se
le
b
a
oy
j
En
O
rw
e
v
lm
e
h
g
in
Source: Yee, Swearingen, Li, & Hearst
15
Tagging samples—
How many items?
Goal
Illustrate metadata schema
Number of
Items
1-3
Criteria
Random (excluding junk)
Develop training
documentation
10-20
Show typical & unusual
cases
Qualitative test of small
vocabulary (<100 categories)
25-50
Random (excluding junk)
3-10X
number of
categories
Use computer-assisted
methods when more than
10-20 categories. Preexisting metadata is the
most meaningful.
Quantitative test of
vocabularies
WARNING: Quantitative methods require large amounts of tagged content.
This leads to having specialists, or software, do the tagging. The results may
be very different than how users would categorize.
Taxonomy Strategies LLC The business of organized information
16
Tagging samples—
Manually tagged metadata sample
Attribute
Values
Title
Jupiter’s Ring System
URL
http://ringmaster.arc.nasa.gov/jupiter/
Description
Overview of the Jupiter ring system. Many images,
animations and references are included for both the
scientist and the public.
Content Types
Web Sites; Animations; Images; Reference Sources
Audiences
Educators; Students
Organizations
Ames Research Center
Missions & Projects
Voyager; Galileo; Cassini; Hubble Space Telescope
Locations
Jupiter
Business Functions
Scientific and Technical Information
Disciplines
Planetary and Lunar Science
Time Period
1979-1999
Taxonomy Strategies LLC The business of organized information
17
Tagging samples—
Spreadsheet for tagging 10’s-100’s of items
1) Clickable URLs for sample content
2) Review small sample and describe
3) Drop-down for tagging (including
‘Other’ entry for the unexpected
4) Flag questions
Taxonomy Strategies LLC The business of organized information
18
Rough Bulk Tagging—
Facet Demo (1)
 Collections: 4 content sources
 NTRS, SIRTF, Webb, Lessons Learned
 Taxonomy
 Converted MultiTes format into RDF for Seamark
 Metadata
 Converted from existing metadata on web pages, or
 Created using simple automatic classifier (string matching with
terms & synonyms)
 250k items, ~12 metadata fields, 1.5 weeks effort
 OOTB Seamark user interface, plus logo
Taxonomy Strategies LLC The business of organized information
19
Rough Bulk Tagging—
OOTB Facet Demo (2)
Taxonomy Strategies LLC The business of organized information
20
Agenda
 Qualitative methods
 Quantitative methods
Taxonomy Strategies LLC The business of organized information
21
Quantitative Method—
How evenly does it divide the content?
Taxonomy Strategies LLC The business of organized information
Number of Records
300,000
250,000
Series2
200,000
Series1
150,000
100,000
50,000
er
at
ur
e
bl
io
gr
ap
hy
St
at
is
tic
s
Bi
Ju
ve
ni
le
lit
itio
ns
ct
io
n
Ex
hi
b
ap
s
Fi
M
Co
ng
re
ss
es
Bi
og
ra
ph
y
Pe
rio
di
ca
ls
0
Top 10 Content Types
Leading candidates
for merging
Measured and Expected Distribution of Content Types in an
Intranet
Above the curve is
better than expected
25
20
15
Measured
Expected
10
5
Programs,
Proposals, Plans
& Schedules
Other &
Unclassified
Papers &
Presentations
Regulations,
Policies,
Procedures &
Marketing &
Sales
News & Events
0
Manuals &
Learning
Materials
Operations &
Internal
Communications
than the Zipf distribution, which is
better than expected
350,000
People, Groups
& Places
 Results were slightly more uniform
Measured and Expected Distribution of Top 10 Content Types
in Library of Congress Database
# Documents
Background:
 Documents will not distribute uniformly
across categories
 Zipf (1/x) distribution is expected
behavior
 80/20 rule in action (actually 70/20
rule)
Methodology:
 Part of alpha test of ‘content type’ for
corporate intranet
 115 URLs selected at random from
search index were manually
categorized. Inaccessible files and
‘junk’ were removed.
Results:
Leading candidate
for splitting
Content Type
Method warns you if something is strange.
Seeing expected behavior does not mean the
taxonomy is good.
22
Quantitative Method—How intuitive (repeatable) are
the categorizations?
Methodology: Closed Card Sort
 For alpha test of a grocery site
 15 Testers put each of 71 best-selling
product types into one of 10 predefined categories
“Cocoa Drinks – Powder” is
best categorized in both
“Beverages” and “Grocery”.
 Categories where fewer than 14 of 15
testers put product into same
category were flagged
Results:
% of
Testers
Cumulative %
of Products
With PolyHierarchy
15/15
54%
69%
14/15
70%
83%
13/15
77%
93%
12/15
83%
100%
11/15
85%
100%
<11/15
100%
100%
Taxonomy Strategies LLC The business of organized information
How to improve? Allow products in
multiple categories. (Results are for
minimum size = 4 votes)
23
Quantitative Method—How does taxonomy “shape”
match that of content?
Background:
 Hierarchical taxonomies allow
comparison of “fit” between content
and taxonomy areas
Methodology:
 25,380 resources tagged with
taxonomy of 179 terms. (Avg. of 2
terms per resource)
 Counts of terms and documents
summed within taxonomy hierarchy
Results:
 Roughly Zipf distributed (top 20
terms: 79%; top 30 terms: 87%)
 Mismatches between term% and
document% flagged
Term Group
%
Terms
%
Docs
Administrators
7.8
15.8
Community Groups
2.8
1.8
Counselors
3.4
1.4
Federal Funds Recipients and
Applicants
9.5
34.4
Librarians
2.8
1.1
News Media
0.6
3.1
Other
7.3
2.0
Parents and Families
2.8
6.0
Policymakers
4.5
11.5
Researchers
2.2
3.6
School Support Staff
2.2
0.2
Student Financial Aid Providers
1.7
0.7
Students
27.4
7.0
Teachers
25.1
11.4
Source: Courtesy Keith Stubbs, US. Dept. of Ed.
Taxonomy Strategies LLC The business of organized information
24
Pop Quiz
What is the #1 underused source of quantitative
information on how to improve your taxonomy?
Taxonomy Strategies LLC The business of organized information
25
Query Logs & Click Trails—Who are the users &
what are they looking for?
Query Log & Click Trail Examination
 Only 30-40% of organizations
regularly examine their logs*.
 Sophisticated software available,
but don’t wait.
 80% of value comes from basic
reports
Governance Foreshadowing
 Start a “Measure & Improve”
mindset
 Taxonomy changes do not stand
alone
 Search system improvements
 Navigation improvements
•
•
•
•
•
•
UltraSeek Reporting
Top queries
Queries with no
results
Queries with no
click-through
Most requested
documents
Query trend analysis
Complete server
usage summary
Click Trail
Packages
iWebTrack
NetTracker
OptimalIQ
SiteCatalyst
Visitorville
WebTrends
 Content improvements
 Process improvements
 …
Taxonomy Strategies LLC The business of organized information
Source: Daniel, ESS’05
26
Taxonomy Strategies LLC
Questions
Joseph A. Busch
jbusch@taxonomystrategies.com
Ron Daniel, Jr.
rdaniel@taxonomystrategies.com
http://ww.taxonomystrategies.com
September 28, 2005
Copyright 2005 Taxonomy Strategies LLC. All rights reserved.
Bibliography
K. Yee, K. Swearingen, K. Li, M. Hearst. "Searching and organizing:
Faceted metadata for image search and browsing." Proceedings of the
Conference on Human Factors in Computing Systems (April 2003)
http://bailando.sims.berkeley.edu/papers/flamenco-chi03.pdf
R. Daniel and J. Busch. "Benchmarking Your Search Function: A Maturity
Model.” http://www.taxonomystrategies.com/presentations/maturity-200505-17%28as-presented%29.ppt
Taxonomy Strategies LLC The business of organized information
28