Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda Introduction – Elements & Infrastructure Platform – Semantics not technology – Infrastructure not project – Value of Text Analytics Evaluating Software – Two Phase Process – Designing the Team and Content Structures Development – Taxonomy, Categorization, Faceted Metadata Text Analytics Applications – Integration with Search and ECM – Platform for Information Applications 2 KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: – Taxonomy/Text Analytics development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Evaluation of Enterprise Search, Text Analytics – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories 3 Introduction to Text Analytics Semantic Infrastructure - Elements Taxonomy – Thesauri, Controlled Vocabulary Metadata – Standard (Dublin Core) and Facets Basic Text Analytics Categorization – Document Topics – Aboutness – Entity Extraction – noun phrases, feed facets – Summarization – beyond snippets – Advanced Text Analytics – – Fact extraction – ontologies Sentiment Analysis – good, bad, and ugly What is in a Name – text analytics or ? 4 Introduction to Text Analytics Taxonomy Thesauri, Controlled Vocabulary – Resources to build on – Indexing not categorization Taxonomy – – – – – Foundation for Categorization Browse – classification scheme Formal – Is-Child-Of, Is-Part-Of Large taxonomies - MeSH – indexing all topics Small is better – for categorization and faceted navigation 5 Introduction to Text Analytics Metadata Metadata standards – Dublin Core - Mostly syntactic not semantic Description – static or dynamic (summarization) – Semantic – keywords – very poor performance – Best Bets – high level categorization-search – Human judgments Audience – mixed results – Role, function, expertise, information behaviors Facets – classes of metadata – – Standard - People, Organization, Document type-purpose Specialized – methods, materials, products 6 Introduction to Text Analytics Text Analytics Categorization Multiple techniques – examples, terms, Boolean – Built on a taxonomy – Entity Extraction – Catalogs with variants, rule based dynamic Summarization – Rules – find sentences in a document Fact Extraction – Relationships of entities – people-organizations-activities Sentiment Analysis – Rules – adjectives & adverbs not nouns 7 Introduction to Text Analytics Text Analytics Why Text Analytics? – Enterprise search has failed to live up to its potential – Enterprise Content management has failed to live up to its potential – Taxonomy has failed to live up to its potential – Adding metadata, especially keywords has not worked What is missing? Intelligence – human level categorization, conceptualization – Infrastructure – Integrated solutions not technology, software – Text Analytics can be the foundation that (finally) drives success – search, content management, and much more 8 Text Analytics Platform 4 Basic Contexts Ideas – Content Structure – – Language and Mind of your organization Applications - exchange meaning, not data People – Company Structure – – Communities, Users Central team - establish standards, facilitate Activities – Business processes and procedures Technology – – CMS, Search, portals, taxonomy tools Applications – BI, CI, Text Mining 9 Text Analytics Platform: The start and foundation Knowledge Architecture Audit Knowledge Map - Understand what you have, what you are, what you want – The foundation of the foundation Contextual interviews, content analysis, surveys, focus groups, ethnographic studies Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories Natural level categories mapped to communities, activities • Novice prefer higher levels • Balance of informative and distinctiveness Living, breathing, evolving foundation is the goal 10 Text Analytics Platform – Benefits IDC White Paper Time Wasted – – – Reformat information - $5.7 million per 1,000 per year Not finding information - $5.3 million per 1,000 Recreating content - $4.5 Million per 1,000 Small Percent Gain = large savings – – – 1% - $10 million 5% - $50 million 10% - $100 million 11 Text Analytics Platform – Benefits Findability within and outside the enterprise – Savings per year - $millions Rescue enterprise search and ECM projects – Add semantics to search Clean up enterprise content – Duplication and accurate categorization Improve the quality of information access – Finding the right information can save millions Build smarter applications – Social networking, locate expertise within the enterprise 12 Text Analytics Platform – Benefits Understand your customers – What they are talking about and how they feel about it Empower your employees – Not only more time, but they work smarter Understand your competitors – – What they are working on, talking about Combine unstructured content and rich data sources – more intelligent analysis 13 Text Analytics Platform – Dangers Text Analytics as a software project Not enough resources – to develop, to maintain-refine Wrong resources – SME’s, IT, Library – Need all of the above and taxonomists+ Bad Design: – – Start with bad taxonomy Wrong taxonomy – too big or two flat Bad Categorization / Entity Extraction – Right kind of experience 14 Resources Books – Women, Fire, and Dangerous Things • George Lakoff – Knowledge, Concepts, and Categories • Koen Lamberts and David Shanks – The Stuff of Thought – Steven Pinker Web Sites – Text Analytics News http://social.textanalyticsnews.com/index.php – Text Analytics Wiki - http://textanalytics.wikidot.com/ 15 Resources Blogs – SAS- Manya Mayes – Chief Strategist http://blogs.sas.com/text-mining/ Web Sites – Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ – Whitepaper – CM and Text Analytics http://www.textanalyticsnews.com/usa/contentmanagementm eetstextanalytics.pdf 16 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com