Taxonomy Development Workshop

advertisement
Taxonomy Development
An Infrastructure Model
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Introduction
 Type of Taxonomies
 The Enterprise Context
–
Making the Business Case
 Infrastructure Model of Taxonomy Development
–
Taxonomy in 4 Contexts
• Content, People, Processes, Technology
 Infrastructure Solutions – the Elements
 Applying the Model – Practical Dimension
–
Starting and Resources
 Conclusion
2
KAPS Group





Knowledge Architecture Professional Services (KAPS)
Consulting, strategy recommendations
Knowledge architecture audits
Partners – Convera, Inxight, FAST, and others
Taxonomies: Enterprise, Marketing, Insurance, etc.
–
Taxonomy customization
 Intellectual infrastructure for organizations
–
–
Knowledge organization, technology, people and processes
Search, content management, portals, collaboration,
knowledge management, e-learning, etc.
3
Two Types of Taxonomies: Browse and Formal
Browse Taxonomy – Yahoo
4
Two Types of Taxonomies: Formal
5
Browse Taxonomies: Strengths and Weaknesses
 Strengths: Browse is better than search
–
–
Context and discovery
Browse by task, type, etc.
 Weaknesses:
–
Mix of organization
• Catalogs, alphabetical listings, inventories
• Subject matter, functional, publisher,
document type
–
–
–
Vocabulary and nomenclature Issues
Problems with maintenance, new material
Poor granularity and little relationship
between parts.
• Web site unit of organization
–
No foundation for standards
6
Formal Taxonomies: Strengths and Weaknesses
 Strengths:
–
–
–
Fixed Resource – little or no maintenance
Communication Platform – share ideas, standards
Infrastructure Resource
• Controlled vocabulary and keywords
• More depth, finer granularity
 Weaknesses:
–
–
Difficult to develop and customize
Don’t reflect users’ perspectives
• Users have to adapt to language
7
Facets and Dynamic Classification
 Facets are not categories
–
–
Entities or concepts belong to a category
Entities have facets
 Facets are metadata - properties or attributes
–
–
Entities or concepts fit into one category
All entities have all facets – defined by set of values
 Facets are orthogonal – mutually exclusive – dimensions
–
An event is not a person is not a document is not a place.
 Facets – variety – of units, of structure
– Date or price – numerical range
– Location – big to small (partonomy)
– Winery – alphabetical
– Hierarchical - taxonomic
8
Faceted Navigation: Strengths and Weaknesses
 Strengths:
–
More intuitive – easy to guess what is behind each door
• 20 questions – we know and use
–
Dynamic selection of categories
• Allow multiple perspectives
–
Trick Users into “using” Advanced Search
• wine where color = red, price = x-y, etc..
 Weaknesses:
–
Difficulty of expressing complex relationships
• Simplicity of internal organization
–
Loss of Browse Context
• Difficult to grasp scope and relationships
–
Limited Domain Applicability – type and size
• Entities not concepts, documents, web sites
9
Dynamic Classification / Faceted navigation
 Search and browse better than either alone
– Categorized search – context
– Browse as an advanced search
 Dynamic search and browse is best
– Can’t predict all the ways people think
• Advanced cognitive differences
• Panda, Monkey, Banana
–
Can’t predict all the questions and activities
• Intersections of what users are looking for
and what documents are often about
• China and Biotech
• Economics and Regulatory
10
Business Case for Taxonomies:
The Right Context
 Traditional Metrics
–
–
–
Time Savings – 22 minutes per user per day = $1Mil a Year
Apply to your organization – customer service, content
creation, knowledge industry
Cost of not-finding = re-creating content
 Research
–
–
Advantages of Browsing – Marti Hearst, Chen and Dumais
Nielsen – “Poor classification costs a 10,000 user
organization $10M each year – about $1,000 per employee.”
 Stories
–
Pain points, success and failure – in your corporate language
11
Business Case for Taxonomies:
IDC White Paper
 Information Tasks
–
–
–
–
–
Email – 14.5 hours a week
Create documents – 13.3 hours a week
Search – 9.5 hours a week
Gather information for documents – 8.3 hours a week
Find and organize documents – 6.8 hours a week
 Gartner: “Business spend an estimated $750 Billion annually
seeking information necessary to do their job. 30-40% of a
knowledge worker’s time is spent managing documents.”
12
Business Case for Taxonomies:
IDC White Paper
 Time Wasted
–
–
–
Reformat information - $5.7 million per 1,000 per year (400M)
Not finding information - $5.3 million per 1,000 (370M)
Recreating content - $4.5 Million per 1,000 (315M)
 Small Percent Gain = large savings
–
–
–
1% - $10 million
5% - $50 million
10% - $100 million
13
Business Case for Taxonomies:
The Right Context
 Justification
–
–
–
–
Search Engine - $500K-$2Mil
Content Management - $500K-$2Mil
Portal - $500-$2Mil
Plus maintenance and employee costs
 Taxonomy
–
–
Small comparative cost
Needed to get full value from all the above
 ROI – asking the wrong question
–
–
What is ROI for having an HR department?
What is ROI for organizing your company?
14
Infrastructure Model of Taxonomy Development
Taxonomy in Basic 4 Contexts
 Ideas – Content Structure
–
–
Language and Mind of your organization
Applications - exchange meaning, not data
 People – Company Structure
–
Communities, Users, Central Team
 Activities – Business processes and procedures
–
Central team - establish standards, facilitate
 Technology / Things
–
–
CMS, Search, portals, taxonomy tools
Applications – BI, CI, Text Mining
15
Taxonomy in Context
Structuring Content
 All kinds of content and Content Structures
–
Structured and unstructured, Internet and desktop
 Metadata standards – Dublin core+
–
Keywords - poor performance
– Need controlled vocabulary, taxonomies, semantic network
 Other Metadata
–
Document Type
• Form, policy, how-to, etc.
–
Audience
• Role, function, expertise, information behaviors
–
Best bets metadata
 Facets – entities and ideas
–
Wine.com
16
Taxonomy in Context:
Structuring People
 Individual People
–
–
Tacit knowledge, information behaviors
Advanced personalization – category priority
• Sales – forms ---- New Account Form
• Accountant ---- New Accounts ---- Forms
 Communities
–
–
–
–
Variety of types – map of formal and informal
Variety of subject matter – vaccines, research, scuba
Variety of communication channels and information behaviors
Community-specific vocabularies, need for inter-community
communication (Cortical organization model)
17
Taxonomy in Context:
Structuring Processes and Technology
 Technology: infrastructure and applications
–
Enterprise platforms: from creation to retrieval to application
– Taxonomy as the computer network
• Applications – integrated meaning, not just data
 Creation – content management, innovation, communities of
practice (CoPs)
–
When, who, how, and how much structure to add
– Workflow with meaning, distributed subject matter experts (SMEs)
and centralized teams
 Retrieval – standalone and embedded in applications and
business processes
–
Portals, collaboration, text mining, business intelligence, CRM
18
Taxonomy in Context:
The Integrating Infrastructure
 Starting point: knowledge architecture audit, K-Map
–
Social network analysis, information behaviors
 People – knowledge architecture team
–
–
Infrastructure activities – taxonomies, analytics, best bets
Facilitation – knowledge transfer, partner with SMEs
 “Taxonomies” of content, people, and activities
–
–
Dynamic Dimension – complexity not chaos
Analytics based on concepts, information behaviors
 Taxonomy as part of a foundation, not a project
–
In an Infrastructure Context
19
Taxonomy in Context:
The Integrating Infrastructure
 Integrated Enterprise requires both an infrastructure team and
distributed expertise.
–
Software and SME’s is not the answer - keywords
 Taxonomies not stand alone
–
–
Metadata, controlled vocabularies, synonyms, etc.
Variety of taxonomies, plus categorization, classification, etc.
• Important to know the differences, when to use which
 Multiple Applications
–
Search, browse, content management, portals, BI & CI, etc.
 Infrastructure as Operating System
–
Word vs. Word Perfect
– Instead of sharing clipboard, share information and knowledge.
20
Infrastructure Solutions: The start and foundation
Knowledge Architecture Audit
 Knowledge Map - Understand what you have, what you
are, what you want
–
The foundation of the foundation
 Contextual interviews, content analysis, surveys, focus
groups, ethnographic studies
 Category modeling – “Intertwingledness” -learning new
categories influenced by other, related categories
 Natural level categories mapped to communities, activities
• Novice prefer higher levels
• Balance of informative and distinctiveness
 Living, breathing, evolving foundation is the goal
21
Infrastructure Solutions: Resources
People and Processes: Roles and Functions




Knowledge Architect and learning object designers
Knowledge engineers and cognitive anthropologists
Knowledge facilitators and trainers and librarians
Part Time
–
–
Librarians and information architects
Corporate communication editors and writers
 Partners
–
–
IT, web developers, applications programmers
Business analysts and project managers
22
Infrastructure Solutions: Resources
People and Processes: Central Team
 Central Team supported by software and offering services
–
–
–
–
–
–
–
Creating, acquiring, evaluating taxonomies, metadata standards,
vocabularies
Input into technology decisions and design – content management,
portals, search
Socializing the benefits of metadata, creating a content culture
Evaluating metadata quality, facilitating author metadata
Analyzing the results of using metadata, how communities are using
Research metadata theory, user centric metadata
Design content value structure – more nuanced than good / poor
content.
23
Infrastructure Solutions: Resources
People and Processes: Facilitating Knowledge Transfer
 Need for Facilitators
–
Amazon hiring humans to refine recommendations
– Google – humans answering queries
 Facilitate projects, KM project teams
–
Facilitate knowledge capture in meetings, best practices
 Answering online questions, facilitating online discussions,
networking within a community
 Design and run KM forums, education and innovation fairs
 Work with content experts to develop training, incorporate
intelligence into applications
 Support innovation, knowledge creation in communities
24
Infrastructure Solutions: Resources
People and Processes: Location of Team
 KM/KA Dept. – Cross Organizational, Interdisciplinary
 Balance of dedicated and virtual, partners
–
Library, Training, IT, HR, Corporate Communication
 Balance of central and distributed
 Industry variation
–
–
–
Pharmaceutical – dedicated department, major place in the
organization
Insurance – Small central group with partners
Beans – a librarian and part time functions
 Which design – knowledge architecture audit
25
Infrastructure Solutions: Resources
Technology
 Taxonomy Management
–
Text and Visualization
 Entity and Fact Extraction
 Text Mining
 Search for professionals
–
Different needs, different interfaces
 Integration Platform technology
–
Enterprise Content Management
26
Taxonomy Development: Tips and Techniques
Stage One – How to Begin
 Step One: Strategic Questions – why, what value from the
taxonomy, how are you going to use it
–
Variety of taxonomies – important to know the differences, when to
use what.
 Step Two: Get a good taxonomist! (or learn)
–
Library Science+ Cognitive Science + Cognitive Anthropology
 Step Three: Software Shopping
–
Automatic Software – Fun Diversion for a rainy day
• Uneven hierarchy, strange node names, weird clusters
–
Taxonomy Management, Entity Extraction, Visualization
 Step Four: Get a good taxonomy!
–
Glossary, Index, Pull from multiple sources
– Get a good document collection
27
Infrastructure Solutions: Taxonomy Development
Stage Two: Taxonomy Model
 Enterprise Taxonomy
–
No single subject matter taxonomy
– Need an ontology of facets or domains
 Standards and Customization
–
Balance of corporate communication and departmental specifics
– At what level are differences represented?
– Customize pre-defined taxonomy – additional structure, add
synonyms and acronyms and vocabulary
 Enterprise Facet Model:
–
Actors, Events, Functions, Locations, Objects, Information
Resources
– Combine and map to subject domains
28
Taxonomy Development: Tips and Techniques
Stage Three: Development and/or Customization
 Combination of top down and bottom up (and Essences)
–
–
–
Top: Design an ontology, facet selection
Bottom: Vocabulary extraction – documents, search logs,
interview authors and users
Develop essential examples (Prototypes)
• Most Intuitive Level – genus (oak, maple, rabbit)
• Quintessential Chair – all the essential characteristics, no more
–
–
Work toward the prototype and out and up and down
Repeat until dizzy or done
 Map the taxonomy to communities and activities
–
–
Category differences
Vocabulary differences
29
Taxonomy Development: Tips and Techniques
Stage Four: Evaluate and Refine
 Formal Evaluation
–
–
–
–
–
Quality of corpus – size, homogeneity, representative
Breadth of coverage – main ideas, outlier ideas (see next)
Structure – balance of depth and width
Kill the verbs
Evaluate speciation steps – understandable and systematic
• Person – Unwelcome person – Unpleasant person - Selfish
person
–
–
Avoid binary levels, duplication of contrasts
Primary and secondary education, public and private
30
Taxonomy Development: Tips and Techniques
Stage Four: Evaluate and Refine
 Practical Evaluation
–
–
–
Test in real life application
Select representative users and documents
Test node labels with Subject Matter Experts
• Balance of making sense and jargon
–
–
Test with representative key concepts
Test for un-representative strange little concepts that only
mean something to a few people but the people and ideas are
key and are normally impossible to find
31
Sources
 Books
–
Women, Fire, and Dangerous Things
• What Categories Reveal about the Mind
• George Lakoff
–
The Geography of Thought
• Richard E. Nisbett
 Software
–
–
Convera Retrievalware
Inxight Smart Discovery – entity and fact extraction
 Courses
–
Convera Taxonomy Certification
32
Conclusion
 Taxonomy development is not just a project
–
It has no beginning and no end
 Taxonomy development is not an end in itself
–
It enables the accomplishment of many ends
 Taxonomy development is not just about search or browse
–
It is about language, cognition, and applied intelligence
 Strategic Vision (articulated by K Map) is important
–
Even for your under the radar vocabulary project
 Paying attention to theory is practical
–
So is adapting your language to business speak
33
Conclusion
 Taxonomies are part of your intellectual infrastructure
–
Roads, transportation systems not cars or types of cars
 Taxonomies are part of creating smart organizations
–




Self aware, capable of learning and evolving
Think Big, Start Small, Scale Fast
If we really are in a knowledge economy
We need to pay attention to –
Knowledge!
34
Questions?
Tom Reamy
tomr@kapsgroup.com
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Download