Taxonomy Development Workshop

advertisement
Text Analytics And Text Mining
Best of Text and Data
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Text Analytics Capabilities
 Text Analytics Applications
 Text Mining and Text Analytics
–
Data and Unstructured Content
 Case Study – Text Mining for Taxonomy Development
 Conclusion
2
KAPS Group: General





Knowledge Architecture Professional Services
Virtual Company: Network of consultants – 8-10
Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc.
Consulting, Strategy, Knowledge architecture audit
Services:
– Text Analytics evaluation, development, consulting, customization
– Knowledge Representation – taxonomy, ontology, Prototype
– Metadata standards and implementation
– Knowledge Management: Collaboration, Expertise, e-learning
– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
3
Introduction to Text Analytics
Text Analytics Features
 Noun Phrase Extraction
–
Catalogs with variants, rule based dynamic
– Multiple types, custom classes – entities, concepts, events
– Feeds facets
 Summarization
–
Customizable rules, map to different content
 Fact Extraction
Relationships of entities – people-organizations-activities
– Ontologies – triples, RDF, etc.
–
 Sentiment Analysis
–
Statistical, rules – full categorization set of operators
4
Introduction to Text Analytics
Text Analytics Features
 Auto-categorization
Training sets – Bayesian, Vector space
– Terms – literal strings, stemming, dictionary of related terms
– Rules – simple – position in text (Title, body, url)
– Semantic Network – Predefined relationships, sets of rules
– Boolean– Full search syntax – AND, OR, NOT
– Advanced – NEAR (#), PARAGRAPH, SENTENCE
This is the most difficult to develop
Build on a Taxonomy
Combine with Extraction, Sentiment
Foundation for best text analytics & combination
–




5
6
7
8
9
10
11
Varieties of Taxonomy/ Text Analytics Software
 Taxonomy Management
–
Synaptica, SchemaLogic
 Full Platform
–
SAS-Teragram, SAP-Inxight, Smart Logic, Data Harmony, Concept
Searching, Expert System, IBM, GATE
 Content Management – embedded
 Embedded – Search
–
FAST, Autonomy, Endeca, Exalead, etc.
 Specialty
Sentiment Analysis , VOC – Lexalytics, Attensity / Reports
– Ontology – extraction, plus ontology
–
12
Text Analytics Applications
Platform for Multiple Applications








Content Aggregation, Duplicate Documents – save millions!
Business intelligence, Customer Intelligence
Social Media - sentiment analysis, Voice of the Customer
Social – Hybrid folksonomy / taxonomy / auto-metadata
Social – expertise, categorize tweets and blogs, reputation
Ontology – travel assistant, semantic web, etc.
eDiscovery, Reputation management, Customer Experience
Expertise Location, Crowd sourcing Technical support
13
Text Analytics Applications:
Enterprise Search - Elements
 Text Analytics can “solve” enterprise search
 Multiple Knowledge Structures
–
–
Facet – orthogonal dimension of metadata
Taxonomy - Subject matter / aboutness
 Software - Search, ECM, auto-categorization, entity
extraction, Text Analytics and Text Mining
 People – tagging, evaluating tags, fine tune rules and
taxonomy
 Rich Search Results – context and conversation
 Platform for search based applications
14
15
16
Text Analytics and Text Mining
Data and Unstructured Content
 80% of content is unstructured – adding to semantic web is major
 Text Analytics – content into data
–
Big Data meets Big Content
 Real integration of text and ontology
– Beyond “hasDescription”
– Improve accuracy of extracted entities, facts – disambiguation
• Pipeline – oil & gas OR research / Ford
– Add Concepts, not just “Things” – 68% want this
 Semantic Web + Text Analytics = real world value
 Linked Data + Text Analytics – best of both worlds
 Build superior foundation elements – taxonomies, categorization
17
Text Analytics and Text Mining and Data Mining
Vaccine Adverse Reaction
 Combine with Data Mining
 New sources of information
 News stories, medical records
 Blogs, social
 Find new connections, sources of knowledge
 Vaccine Adverse Effects – disease, symptoms, variables




Unstructured text into a data source
Some preliminary analysis, content structure
Find unknown adverse effects and prevalence
Drug Discovery + search / research – 5 year story
18
Text Analytics Applications
Example – Vaccine Adverse Effects
19
Text Analytics Applications
Example – Vaccine Adverse Effects
20
Text Analytics Applications
Example – Vaccine Adverse Effects
21
Text Analytics and Text Mining
Case Study – Taxonomy Development








Problem – 200,000 new uncategorized documents
Old taxonomy –need one that reflects change in corpus
Text mining, entity extraction, categorization
Bottom Up- terms in documents – frequency, date,
Clustering – suggested categories
Clustering – chunking for editors
Time savings – only feasible way to scan documents
Quality – important terms, co-occurring terms
22
Text Analytics and Text Mining
Case Study – Taxonomy Development
 Text into Data: Article, Abstract, Title, Subtitle – fields & source of terms
 Add Data: PubDate, journalTitle, Taxonomy Node
 Terms – Map to frequency, date, date ranges, Taxonomy Node
– New Terms, Trends
 Relevance – frequency, Abstract, Title, human judgment
 Entity Extraction – Authors, Organizations, Products,
 Categorization – build on clusters & taxonomy
 Combination – reports, visualizations, interactive explorations
23
Case Study – Taxonomy Development
24
25
26
Case Study – Taxonomy Development
27
Case Study – Taxonomy Development
28
Conclusion
 Text Analytics impact is huge – solve information overload
 Enterprise Search and Search Based Applications: Save millions
and enhance productivity
 Combination of Text Analytics & Text Mining – unlimited range of
applications
 Mutual Enrichment – more data, add structure to unstructured
 Add Ontology = Richer Text Analytics – smarter, more useful
 Text Analytics + Text Mining + Semantic Web
–
Move from theory to new practical applications
 The best is yet to come!
29
Questions?
Tom Reamy
tomr@kapsgroup.com
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Download