Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda Introduction Type of Taxonomies The Enterprise Context – Making the Business Case Infrastructure Model of Taxonomy Development – Taxonomy in 4 Contexts • Content, People, Processes, Technology Infrastructure Solutions – the Elements Applying the Model – Practical Dimension – Starting and Resources Conclusion 2 KAPS Group Knowledge Architecture Professional Services (KAPS) Consulting, strategy recommendations Knowledge architecture audits Partners – Convera, Inxight, FAST, and others Taxonomies: Enterprise, Marketing, Insurance, etc. – Taxonomy customization Intellectual infrastructure for organizations – – Knowledge organization, technology, people and processes Search, content management, portals, collaboration, knowledge management, e-learning, etc. 3 Two Types of Taxonomies: Browse and Formal Browse Taxonomy – Yahoo 4 Two Types of Taxonomies: Formal 5 Browse Taxonomies: Strengths and Weaknesses Strengths: Browse is better than search – – Context and discovery Browse by task, type, etc. Weaknesses: – Mix of organization • Catalogs, alphabetical listings, inventories • Subject matter, functional, publisher, document type – – – Vocabulary and nomenclature Issues Problems with maintenance, new material Poor granularity and little relationship between parts. • Web site unit of organization – No foundation for standards 6 Formal Taxonomies: Strengths and Weaknesses Strengths: – – – Fixed Resource – little or no maintenance Communication Platform – share ideas, standards Infrastructure Resource • Controlled vocabulary and keywords • More depth, finer granularity Weaknesses: – – Difficult to develop and customize Don’t reflect users’ perspectives • Users have to adapt to language 7 Facets and Dynamic Classification Facets are not categories – – Entities or concepts belong to a category Entities have facets Facets are metadata - properties or attributes – – Entities or concepts fit into one category All entities have all facets – defined by set of values Facets are orthogonal – mutually exclusive – dimensions – An event is not a person is not a document is not a place. Facets – variety – of units, of structure – Date or price – numerical range – Location – big to small (partonomy) – Winery – alphabetical – Hierarchical - taxonomic 8 Faceted Navigation: Strengths and Weaknesses Strengths: – More intuitive – easy to guess what is behind each door • 20 questions – we know and use – Dynamic selection of categories • Allow multiple perspectives – Trick Users into “using” Advanced Search • wine where color = red, price = x-y, etc.. Weaknesses: – Difficulty of expressing complex relationships • Simplicity of internal organization – Loss of Browse Context • Difficult to grasp scope and relationships – Limited Domain Applicability – type and size • Entities not concepts, documents, web sites 9 Dynamic Classification / Faceted navigation Search and browse better than either alone – Categorized search – context – Browse as an advanced search Dynamic search and browse is best – Can’t predict all the ways people think • Advanced cognitive differences • Panda, Monkey, Banana – Can’t predict all the questions and activities • Intersections of what users are looking for and what documents are often about • China and Biotech • Economics and Regulatory 10 Business Case for Taxonomies: The Right Context Traditional Metrics – – – Time Savings – 22 minutes per user per day = $1Mil a Year Apply to your organization – customer service, content creation, knowledge industry Cost of not-finding = re-creating content Research – – Advantages of Browsing – Marti Hearst, Chen and Dumais Nielsen – “Poor classification costs a 10,000 user organization $10M each year – about $1,000 per employee.” Stories – Pain points, success and failure – in your corporate language 11 Business Case for Taxonomies: IDC White Paper Information Tasks – – – – – Email – 14.5 hours a week Create documents – 13.3 hours a week Search – 9.5 hours a week Gather information for documents – 8.3 hours a week Find and organize documents – 6.8 hours a week Gartner: “Business spend an estimated $750 Billion annually seeking information necessary to do their job. 30-40% of a knowledge worker’s time is spent managing documents.” 12 Business Case for Taxonomies: IDC White Paper Time Wasted – – – Reformat information - $5.7 million per 1,000 per year (400M) Not finding information - $5.3 million per 1,000 (370M) Recreating content - $4.5 Million per 1,000 (315M) Small Percent Gain = large savings – – – 1% - $10 million 5% - $50 million 10% - $100 million 13 Business Case for Taxonomies: The Right Context Justification – – – – Search Engine - $500K-$2Mil Content Management - $500K-$2Mil Portal - $500-$2Mil Plus maintenance and employee costs Taxonomy – – Small comparative cost Needed to get full value from all the above ROI – asking the wrong question – – What is ROI for having an HR department? What is ROI for organizing your company? 14 Infrastructure Model of Taxonomy Development Taxonomy in Basic 4 Contexts Ideas – Content Structure – – Language and Mind of your organization Applications - exchange meaning, not data People – Company Structure – Communities, Users, Central Team Activities – Business processes and procedures – Central team - establish standards, facilitate Technology / Things – – CMS, Search, portals, taxonomy tools Applications – BI, CI, Text Mining 15 Taxonomy in Context Structuring Content All kinds of content and Content Structures – Structured and unstructured, Internet and desktop Metadata standards – Dublin core+ – Keywords - poor performance – Need controlled vocabulary, taxonomies, semantic network Other Metadata – Document Type • Form, policy, how-to, etc. – Audience • Role, function, expertise, information behaviors – Best bets metadata Facets – entities and ideas – Wine.com 16 Taxonomy in Context: Structuring People Individual People – – Tacit knowledge, information behaviors Advanced personalization – category priority • Sales – forms ---- New Account Form • Accountant ---- New Accounts ---- Forms Communities – – – – Variety of types – map of formal and informal Variety of subject matter – vaccines, research, scuba Variety of communication channels and information behaviors Community-specific vocabularies, need for inter-community communication (Cortical organization model) 17 Taxonomy in Context: Structuring Processes and Technology Technology: infrastructure and applications – Enterprise platforms: from creation to retrieval to application – Taxonomy as the computer network • Applications – integrated meaning, not just data Creation – content management, innovation, communities of practice (CoPs) – When, who, how, and how much structure to add – Workflow with meaning, distributed subject matter experts (SMEs) and centralized teams Retrieval – standalone and embedded in applications and business processes – Portals, collaboration, text mining, business intelligence, CRM 18 Taxonomy in Context: The Integrating Infrastructure Starting point: knowledge architecture audit, K-Map – Social network analysis, information behaviors People – knowledge architecture team – – Infrastructure activities – taxonomies, analytics, best bets Facilitation – knowledge transfer, partner with SMEs “Taxonomies” of content, people, and activities – – Dynamic Dimension – complexity not chaos Analytics based on concepts, information behaviors Taxonomy as part of a foundation, not a project – In an Infrastructure Context 19 Taxonomy in Context: The Integrating Infrastructure Integrated Enterprise requires both an infrastructure team and distributed expertise. – Software and SME’s is not the answer - keywords Taxonomies not stand alone – – Metadata, controlled vocabularies, synonyms, etc. Variety of taxonomies, plus categorization, classification, etc. • Important to know the differences, when to use which Multiple Applications – Search, browse, content management, portals, BI & CI, etc. Infrastructure as Operating System – Word vs. Word Perfect – Instead of sharing clipboard, share information and knowledge. 20 Infrastructure Solutions: The start and foundation Knowledge Architecture Audit Knowledge Map - Understand what you have, what you are, what you want – The foundation of the foundation Contextual interviews, content analysis, surveys, focus groups, ethnographic studies Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories Natural level categories mapped to communities, activities • Novice prefer higher levels • Balance of informative and distinctiveness Living, breathing, evolving foundation is the goal 21 Infrastructure Solutions: Resources People and Processes: Roles and Functions Knowledge Architect and learning object designers Knowledge engineers and cognitive anthropologists Knowledge facilitators and trainers and librarians Part Time – – Librarians and information architects Corporate communication editors and writers Partners – – IT, web developers, applications programmers Business analysts and project managers 22 Infrastructure Solutions: Resources People and Processes: Central Team Central Team supported by software and offering services – – – – – – – Creating, acquiring, evaluating taxonomies, metadata standards, vocabularies Input into technology decisions and design – content management, portals, search Socializing the benefits of metadata, creating a content culture Evaluating metadata quality, facilitating author metadata Analyzing the results of using metadata, how communities are using Research metadata theory, user centric metadata Design content value structure – more nuanced than good / poor content. 23 Infrastructure Solutions: Resources People and Processes: Facilitating Knowledge Transfer Need for Facilitators – Amazon hiring humans to refine recommendations – Google – humans answering queries Facilitate projects, KM project teams – Facilitate knowledge capture in meetings, best practices Answering online questions, facilitating online discussions, networking within a community Design and run KM forums, education and innovation fairs Work with content experts to develop training, incorporate intelligence into applications Support innovation, knowledge creation in communities 24 Infrastructure Solutions: Resources People and Processes: Location of Team KM/KA Dept. – Cross Organizational, Interdisciplinary Balance of dedicated and virtual, partners – Library, Training, IT, HR, Corporate Communication Balance of central and distributed Industry variation – – – Pharmaceutical – dedicated department, major place in the organization Insurance – Small central group with partners Beans – a librarian and part time functions Which design – knowledge architecture audit 25 Infrastructure Solutions: Resources Technology Taxonomy Management – Text and Visualization Entity and Fact Extraction Text Mining Search for professionals – Different needs, different interfaces Integration Platform technology – Enterprise Content Management 26 Taxonomy Development: Tips and Techniques Stage One – How to Begin Step One: Strategic Questions – why, what value from the taxonomy, how are you going to use it – Variety of taxonomies – important to know the differences, when to use what. Step Two: Get a good taxonomist! (or learn) – Library Science+ Cognitive Science + Cognitive Anthropology Step Three: Software Shopping – Automatic Software – Fun Diversion for a rainy day • Uneven hierarchy, strange node names, weird clusters – Taxonomy Management, Entity Extraction, Visualization Step Four: Get a good taxonomy! – Glossary, Index, Pull from multiple sources – Get a good document collection 27 Infrastructure Solutions: Taxonomy Development Stage Two: Taxonomy Model Enterprise Taxonomy – No single subject matter taxonomy – Need an ontology of facets or domains Standards and Customization – Balance of corporate communication and departmental specifics – At what level are differences represented? – Customize pre-defined taxonomy – additional structure, add synonyms and acronyms and vocabulary Enterprise Facet Model: – Actors, Events, Functions, Locations, Objects, Information Resources – Combine and map to subject domains 28 Taxonomy Development: Tips and Techniques Stage Three: Development and/or Customization Combination of top down and bottom up (and Essences) – – – Top: Design an ontology, facet selection Bottom: Vocabulary extraction – documents, search logs, interview authors and users Develop essential examples (Prototypes) • Most Intuitive Level – genus (oak, maple, rabbit) • Quintessential Chair – all the essential characteristics, no more – – Work toward the prototype and out and up and down Repeat until dizzy or done Map the taxonomy to communities and activities – – Category differences Vocabulary differences 29 Taxonomy Development: Tips and Techniques Stage Four: Evaluate and Refine Formal Evaluation – – – – – Quality of corpus – size, homogeneity, representative Breadth of coverage – main ideas, outlier ideas (see next) Structure – balance of depth and width Kill the verbs Evaluate speciation steps – understandable and systematic • Person – Unwelcome person – Unpleasant person - Selfish person – – Avoid binary levels, duplication of contrasts Primary and secondary education, public and private 30 Taxonomy Development: Tips and Techniques Stage Four: Evaluate and Refine Practical Evaluation – – – Test in real life application Select representative users and documents Test node labels with Subject Matter Experts • Balance of making sense and jargon – – Test with representative key concepts Test for un-representative strange little concepts that only mean something to a few people but the people and ideas are key and are normally impossible to find 31 Sources Books – Women, Fire, and Dangerous Things • What Categories Reveal about the Mind • George Lakoff – The Geography of Thought • Richard E. Nisbett Software – – Convera Retrievalware Inxight Smart Discovery – entity and fact extraction Courses – Convera Taxonomy Certification 32 Conclusion Taxonomy development is not just a project – It has no beginning and no end Taxonomy development is not an end in itself – It enables the accomplishment of many ends Taxonomy development is not just about search or browse – It is about language, cognition, and applied intelligence Strategic Vision (articulated by K Map) is important – Even for your under the radar vocabulary project Paying attention to theory is practical – So is adapting your language to business speak 33 Conclusion Taxonomies are part of your intellectual infrastructure – Roads, transportation systems not cars or types of cars Taxonomies are part of creating smart organizations – Self aware, capable of learning and evolving Think Big, Start Small, Scale Fast If we really are in a knowledge economy We need to pay attention to – Knowledge! 34 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com