Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda Introduction: Time for Taxonomies Taxonomy Types: Strengths and Weaknesses – Formal and Browse Taxonomy in the Organization: Intellectual Infrastructure – Content, People, Activities Taxonomy Tips and Techniques – Development Stages – Issues and Ideas Future Directions – Building on the Intellectual Infrastructure 2 KAPS Group Knowledge Architecture Professional Services (KAPS) Consulting, strategy recommendations Knowledge architecture audits Partners – Convera and others – First Convera Certified Taxonomy Developers Taxonomies: Enterprise, Marketing, Insurance, etc. – Taxonomy customization Intellectual infrastructure for organizations – – Knowledge organization, technology, people and processes Search, content management, portals, collaboration, knowledge management, e-learning, etc. 3 Time for Taxonomies Taxonomy Time: Technology is not delivering – – Professionals spend more time looking for information than using it 50% of them spend > 2 hours a day looking Search not enough – text strings vs. concepts – Relevance isn’t very relevant Data mining misses 80% of significant content – Text mining needs more structure (taxonomies) Surveys – – 76% say taxonomies are important 90% plan on a taxonomy strategy in 24 months 4 Time for Taxonomies: Word of Caution Taxonomy is not the answer – Is this a taxonomy? • Inventories, catalogs, classifications, categorization schemas, thesauri, controlled vocabularies – Taxonomy not enough – need other structures • Metadata, facets – Taxonomies have to be used to be useful How to fail: – – Taxonomy as a project Taxonomy as a search engine project afterthought 5 Two Types of Taxonomies: Browse and Formal Browse Taxonomy – Yahoo 6 Browse Taxonomies: Strengths and Weaknesses Strengths: Browse is better than search – – Context and discovery Browse by task, type, etc. Weaknesses: – Mix of organization • Catalogs, alphabetical listings, inventories • Subject matter, functional, publisher, document type – – – Vocabulary and nomenclature Issues Problems with maintenance, new material Poor granularity and little relationship between parts. • Web site unit of organization – No foundation for standards 7 Formal Taxonomies: Strengths and Weaknesses Strengths: – – – Fixed Resource – little or no maintenance Communication Platform – share ideas, standards Infrastructure Resource • Controlled vocabulary and keywords • More depth, finer granularity Weaknesses: – – Difficult to develop and customize Don’t reflect users’ perspectives • Users have to adapt to language 8 Dynamic Classification: Best of Both Worlds Search and browse better than either alone – Categorized search – context – Browse as an advanced search Dynamic search and browse is best – Can’t predict all the ways people think • Advanced cognitive differences • Panda, Monkey, Banana – Can’t predict all the questions and activities • Intersections of what users are looking for and what documents are often about • China and Biotech • Economics and Regulatory Facet Taxonomies – Actors, events, functions, geography 9 Taxonomy in Context: Intellectual Infrastructure 3 infrastructures: technology, organizational, intellectual – – – Technology – systems and applications, servers and desktops, programmers and help desks, etc. Organizational – business units and project groups, policies and procedures, administrators and facilitators Intellectual – Information and knowledge, vocabularies and applications, authors and editors and librarians Taxonomy at the nexus of the three infrastructures Taxonomy enables communication among people, content, and technology 10 Taxonomy in the Organization: Project Approach or Infrastructure Approach Situation: Problem with access to information – Project Approach • • • • • • – Publish everything on the intranet Buy a search engine Do some keyword and usability tests Buy a portal (or two) Buy content management software Try knowledge organization – taxonomy? Infrastructure Approach • “The path up and down is one and the same.” (Heraclitus) 11 Taxonomy in the Organization: Why an Infrastructure Approach? Immanuel Kant “Concepts without percepts are empty.” – “Percepts without concepts are blind.” – Knowledge Management (KM) / Information Projects – KM without applications is empty • Strategy only, management fad • Elegant taxonomies – unused Applications without knowledge architecture (KA) are blind – IT based KM – Fragmented applications 12 Taxonomy in the Organization: Structuring Content All kinds of content – Structured and unstructured, Internet and desktop Metadata standards – Dublin core+ – Keywords - poor performance – Need controlled vocabulary, taxonomies, semantic network Document Type – – Form, policy, how-to, etc. Dynamic classification with subject matter taxonomies Audience – – Role, function, expertise, information behaviors Consistent across subject matter and people Best bets metadata 13 Taxonomy in the Organization: Structuring People Individual People – – Tacit knowledge, information behaviors Advanced personalization – category priority • Sales – forms ---- New Account Form • Accountant ---- New Accounts ---- Forms Communities – – – – Variety of types – map of formal and informal Variety of subject matter – vaccines, research, scuba Variety of communication channels and information behaviors Community-specific vocabularies, need for inter-community communication (Cortical organization model) 14 Taxonomy in the Organization: Structuring Processes and Technology Technology: infrastructure and applications – Enterprise platforms: from creation to retrieval to application – Taxonomy as the computer network • Applications – integrated meaning, not just data Creation – content management, innovation, communities of practice (CoPs) – When, who, how, and how much structure to add – Workflow with meaning, distributed subject matter experts (SMEs) and centralized teams Retrieval – standalone and embedded in applications and business processes – Portals, collaboration, text mining, business intelligence, CRM 15 Taxonomy in the Organization: The Integrating Infrastructure Starting point: knowledge architecture audit, K-Map – Social network analysis, information behaviors People – knowledge architecture team – – Infrastructure activities – taxonomies, analytics, best bets Facilitation – knowledge transfer, partner with SMEs “Taxonomies” of content, people, and activities – – Dynamic Dimension – complexity not chaos Analytics based on concepts, information behaviors Taxonomy is the answer – In an Infrastructure Context 16 Taxonomy Development: Tips and Techniques Stage One – How to Begin Step One: Strategic Questions – why, what value from the taxonomy, how are you going to use it – Variety of taxonomies – important to know the differences, when to use what. Step Two: Get a good taxonomist! (or learn) – Library Science+ Cognitive Science + Cognitive Anthropology Step Three: Software Shopping – Automatic Software – Fun Diversion for a rainy day • Uneven hierarchy, strange node names, weird clusters – Taxonomy Management, Entity Extraction, Visualization Step Four: Get a good taxonomy! – Glossary, Index, Pull from multiple sources – Get a good document collection 17 Taxonomy Development: Tips and Techniques Stage Two: Development and/or Customization Combination of top down and bottom up (and Essences) – – – Top: Design an ontology, facet selection Bottom: Vocabulary extraction – documents, search logs, interview authors and users Develop essential examples (Prototypes) • Most Intuitive Level – genus (oak, maple, rabbit) • Quintessential Chair – all the essential characteristics, no more – – Work toward the prototype and out and up and down Repeat until dizzy or done 18 Taxonomy Development: Tips and Techniques Stage Three: Evaluate and Refine Formal Evaluation – – – – – Quality of corpus – size, homogeneity, representative Breadth of coverage – main ideas, outlier ideas (see next) Structure – balance of depth and width Kill the verbs Evaluate speciation steps – understandable and systematic • Person – Unwelcome person – Unpleasant person - Selfish person – – Avoid binary levels, duplication of contrasts Primary and secondary education, public and private 19 Taxonomy Development: Tips and Techniques Stage Three: Evaluate and Refine Practical Evaluation – – – Test in real life application Select representative users and documents Test node labels with Subject Matter Experts • Balance of making sense and jargon – – Test with representative key concepts Test for un-representative strange little concepts that only mean something to a few people but the people and ideas are key and are normally impossible to find 20 Taxonomy Development: Tips and Techniques Issues and Ideas Complex Topics – intersection of subject domains and facets – – What documents are often about is the intersection Example – China and Biotech Standards and Customization – – – Balance of corporate communication and departmental specifics At what level are differences represented? Customize pre-defined taxonomy – additional structure, add synonyms and acronyms and vocabulary 21 Taxonomy Development: Tips and Techniques Issues and Ideas Enterprise Taxonomy – – No single subject matter taxonomy Need an ontology of facets or domains Enterprise Facet Model: – – Actors, Events, Functions, Locations, Objects, Information Resources Combine and map to subject domains 22 Future Directions: Knowledge Organization New analytic methods – Cognitive anthropology, history of ideas, ESNA New metadata schemas – – SCORM, RDF and semantic Web Learning and knowledge objects New people models – Bloom’s Taxonomy, Gardner’s 7 Intelligences Advanced personalization – – Community-based, cognitive-based Adaptive, dynamic presentation variations 23 Future Directions: Technology Taxonomies within applications – Richer world knowledge and better learning Entity extraction and fact extraction Natural language processing (NLP) search – answers, not document lists Integrated KM platform – – – Creation, structure, retrieval, application, measurement Integrated KM/KA team Contextualizing content: related content, best bets, expertise, communities 24 Future Directions: Well-Articulated Organization Learning takes place throughout the system – – Smart applications – adapts to users’ and community’s activities Just-in-time training and performance support Combination of analytics and knowledge organization – – Concept-level, not document-level Taxonomy is the brain, analytics are the eyes Self-knowledge – highest form of knowledge – – “Unexamined life is not worth living.” (Plato) Unexamined, inarticulate enterprise is not worth having 25 The Contextual Desktop: Document, List of Documents, Applications Screen Before you view: When you look for information – Agent keeps you up to date – Your connections to content and communities, your preferences – Your history and the history of other members of your communities – – – When you add/change content – Suggests categorization value, metadata values – Routes to appropriate content and communities – Prompt on unusual connections • Pre-existing content • Related content • Regulatory issues • Ask the question – route to experts? – – Taxonomy-based dynamic browse Entities • People, companies, wells Related content • Regulatory, patents, BI-CI • Geological data • News stories Dictionaries, USGS data, databases Experts • Ask questions, chat When you use information – Communities • Search, chat, email – Performance aids, classes – Stories 26 Sources Books – Women, Fire, and Dangerous Things • What Categories Reveal about the Mind • Geroge Lakoff – The Geography of Thought • Richard E. Nisbett Software – – Convera Retrievalware Inxight Smart Discovery – entity and fact extraction Courses – Convera Taxonomy Certification 27 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com