Text Analytics And Text Mining Best of Text and Data Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda Text Analytics Capabilities Text Analytics Applications Text Mining and Text Analytics – Data and Unstructured Content Case Study – Text Mining for Taxonomy Development Conclusion 2 KAPS Group: General Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: – Text Analytics evaluation, development, consulting, customization – Knowledge Representation – taxonomy, ontology, Prototype – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories 3 Introduction to Text Analytics Text Analytics Features Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets Summarization – Customizable rules, map to different content Fact Extraction Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc. – Sentiment Analysis – Statistical, rules – full categorization set of operators 4 Introduction to Text Analytics Text Analytics Features Auto-categorization Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – NEAR (#), PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction, Sentiment Foundation for best text analytics & combination – 5 6 7 8 9 10 11 Varieties of Taxonomy/ Text Analytics Software Taxonomy Management – Synaptica, SchemaLogic Full Platform – SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE Content Management – embedded Embedded – Search – FAST, Autonomy, Endeca, Exalead, etc. Specialty Sentiment Analysis , VOC – Lexalytics, Attensity / Reports – Ontology – extraction, plus ontology – 12 Text Analytics Applications Platform for Multiple Applications Content Aggregation, Duplicate Documents – save millions! Business intelligence, Customer Intelligence Social Media - sentiment analysis, Voice of the Customer Social – Hybrid folksonomy / taxonomy / auto-metadata Social – expertise, categorize tweets and blogs, reputation Ontology – travel assistant, semantic web, etc. eDiscovery, Reputation management, Customer Experience Expertise Location, Crowd sourcing Technical support 13 Text Analytics Applications: Enterprise Search - Elements Text Analytics can “solve” enterprise search Multiple Knowledge Structures – – Facet – orthogonal dimension of metadata Taxonomy - Subject matter / aboutness Software - Search, ECM, auto-categorization, entity extraction, Text Analytics and Text Mining People – tagging, evaluating tags, fine tune rules and taxonomy Rich Search Results – context and conversation Platform for search based applications 14 15 16 Text Analytics and Text Mining Data and Unstructured Content 80% of content is unstructured – adding to semantic web is major Text Analytics – content into data – Big Data meets Big Content Real integration of text and ontology – Beyond “hasDescription” – Improve accuracy of extracted entities, facts – disambiguation • Pipeline – oil & gas OR research / Ford – Add Concepts, not just “Things” – 68% want this Semantic Web + Text Analytics = real world value Linked Data + Text Analytics – best of both worlds Build superior foundation elements – taxonomies, categorization 17 Text Analytics and Text Mining and Data Mining Vaccine Adverse Reaction Combine with Data Mining New sources of information News stories, medical records Blogs, social Find new connections, sources of knowledge Vaccine Adverse Effects – disease, symptoms, variables Unstructured text into a data source Some preliminary analysis, content structure Find unknown adverse effects and prevalence Drug Discovery + search / research – 5 year story 18 Text Analytics Applications Example – Vaccine Adverse Effects 19 Text Analytics Applications Example – Vaccine Adverse Effects 20 Text Analytics Applications Example – Vaccine Adverse Effects 21 Text Analytics and Text Mining Case Study – Taxonomy Development Problem – 200,000 new uncategorized documents Old taxonomy –need one that reflects change in corpus Text mining, entity extraction, categorization Bottom Up- terms in documents – frequency, date, Clustering – suggested categories Clustering – chunking for editors Time savings – only feasible way to scan documents Quality – important terms, co-occurring terms 22 Text Analytics and Text Mining Case Study – Taxonomy Development Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms Add Data: PubDate, journalTitle, Taxonomy Node Terms – Map to frequency, date, date ranges, Taxonomy Node – New Terms, Trends Relevance – frequency, Abstract, Title, human judgment Entity Extraction – Authors, Organizations, Products, Categorization – build on clusters & taxonomy Combination – reports, visualizations, interactive explorations 23 Case Study – Taxonomy Development 24 25 26 Case Study – Taxonomy Development 27 Case Study – Taxonomy Development 28 Conclusion Text Analytics impact is huge – solve information overload Enterprise Search and Search Based Applications: Save millions and enhance productivity Combination of Text Analytics & Text Mining – unlimited range of applications Mutual Enrichment – more data, add structure to unstructured Add Ontology = Richer Text Analytics – smarter, more useful Text Analytics + Text Mining + Semantic Web – Move from theory to new practical applications The best is yet to come! 29 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com