Enterprise Search / Text Analytics Evaluation

Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda – Part II  Introduction – Text Analytics Can Rescue Search – Elements of Text Analytics  Evaluating Text Analytics Software – Varieties of Software – platform and embedded  Three Phase Process / Two Examples – – – Initial Evaluation – 4 vendors Demo’s – from 4 to 2 4-6 week POC – 2 vendors  Conclusions 2 KAPS Group: General      Knowledge Architecture Professional Services Virtual Company: Network of consultants – 8-10 Partners – SAS, SAP, FAST, Smart Logic, Concept Searching, etc. Consulting, Strategy, Knowledge architecture audit Services: – Taxonomy/Text Analytics development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Evaluation of Enterprise Search, Text Analytics – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories 3 Introduction Text Analytics to the Rescue     Enterprise Search is Dead! Taxonomy is Dead! Long Live Text Analytics! ECM and ES failed because search is about semantics / meaning not technology  Taxonomy failed because it is too rigid, and too dumb  Metadata failed because it is too hard to do for authors (not really)  They all failed because Search is a semantic infrastructure element not a project 4 Introduction to Text Analytics Text Analytics Features  Noun Phrase Extraction – Catalogs with variants, rule based dynamic – Multiple types, custom classes – entities, concepts, events – Feeds facets  Summarization – Customizable rules, map to different content  Fact Extraction Relationships of entities – people-organizations-activities – Ontologies – triples, RDF, etc. –  Sentiment Analysis – Rules – Objects and phrases 5 Introduction to Text Analytics Text Analytics Features  Auto-categorization Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – NEAR (#), PARAGRAPH, SENTENCE This is the most difficult to develop Build on a Taxonomy Combine with Extraction – If any of list of entities and other words –    6 7 8 9 10 11 Varieties of Taxonomy/ Text Analytics Software  Taxonomy Management – Synaptica, SchemaLogic  Full Platform – SAS-Teragram, SAP-Inxight, Clear Forest, Smart Logic, Data Harmony, Concept Searching, Expert System, IBM, GATE  Content Management – Nstein, Interwoven, Documentum, etc.  Embedded – Search – FAST, Autonomy, Endeca, Exalead, etc.  Specialty – Sentiment Analysis - Lexalytics 12 Evaluation Process & Methodology  Start with Self Knowledge – Think Big, Start Small, Scale Fast  Eliminate the unfit – Filter One- Ask Experts - reputation, research – Gartner, etc. • Market strength of vendor, platforms, etc. • Feature scorecard – minimum, must have, filter to top 3-6 – – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus Filter Three – In-Depth Demo – 3-6 vendors • Beyond “Yes, we have that feature.”  Deep POC (2) – advanced, integration, semantics  Focus on working relationship with vendor. 13 Evaluation Process & Methodology Initial Evaluation - Basic Requirements  Platform – range of capabilities – Categorization, Sentiment analysis, etc.  Technical – Search evaluation + Integration – – – API’s, Java based, Linux run time Scalability – millions of documents a day Import-Export – XML, RDF  Usability  Multiple Language Support 14 Evaluating Text Analytics Software Initial Evaluation: Usability  Ease of use – copy, paste, rename, merge, etc.  User Documentation, user manuals, on-line help, training and tutorials  Visualization – file structure, tree, Hierarchy and alphabetical  Automatic Taxonomy/Node & Rule Generation – Nonsense for Taxonomy – Node – suggestions for sub-categories, rules  Variety of node relationships – child-parent, related 15 Initial Evaluation Example Outcomes  Filter One: – – – – – Company A, B – sentiment analysis focus, weak categorization Company C – Lack of full suite of text analytics Company D – business concerns, support Open Source – license issues Ontology Vendors – missing categorization capabilities  4 Demos – – – Saw a variety of different approaches, but Company X – lacking sentiment analysis, require 2 vendors Company Y – lack of language support, development cost 16 Evaluating Taxonomy Software Proof Of Concept - POC     Quality of results is the essential factor 4-6 weeks POC – bake off / or short pilot Real life scenarios, categorization with your content Preparation: – Preliminary analysis of content and users information needs – Set up software in lab – relatively easy – Train taxonomist(s) on software(s) – Develop taxonomy if none available  4-6 week POC – 2-3 rounds of development, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content 17 Evaluating Taxonomy Software POC  Majority of time is on auto-categorization  Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time  Risks – getting software installed and working, getting the right content, initial categorization of content  Elements: – Content – Search terms / search scenarios – Training sets – Test sets of content  Taxonomy Developers – expert consultants plus internal taxonomists 18 Evaluating Taxonomy Software POC Feature Test Cases        Auto-categorization to existing taxonomy – variety of content Clustering – automatic node generation Summarization Entity extraction – build a number of catalogs – People, products, etc. Sentiment Analysis – products and categorization rules Evaluate usability in action by taxonomists Question – Integration with Ontologies?  Technical / integration – Output in XML- API’s  Map above to Client use cases 19 POC Design Discussion: Evaluation Criteria  Basic Test Design – categorize test set – – – Score – by file name, human testers Accuracy Level – 80-90% Effort Level per accuracy level  Sentiment Analysis – – Accuracy Level – 80-90% Effort Level per accuracy level  Quantify development time – main elements  Comparison of two vendors – how score? – Combination of scores and report 20 Evaluating Taxonomy Software POC - Issues      Quality of content Quality of initial human categorization Normalize among different test evaluators Quality of taxonomists – experience with text analytics software and/or experience with content and information needs and behaviors Quality of taxonomy General issues – structure (too flat or too deep) – Overlapping categories – Differences in use – browse, index, categorize –  Foundation for Development 21 Text Analytics Evaluation Context Dependent – Tale of Two POC’s  Taxonomy GAO – flat, not very orthogonal, political concerns • Department centric, subject – Company A– flat, two taxonomies, technical - open to change? • Action oriented, activities and events –  Content GAO – giant 200 page PDF formal documents, variety • Start_200, Title – Company A – short cryptic customer support notes, Social Media • Creative spelling, combination of formal and individual –  Culture / Applications GAO - Formal Federal – big applications, infrastructure – Company A – informal, technical, lots of small apps – 22 Text Analytics Evaluation - Conclusions  Text Analytics with Search and ECM can finally deliver the promise  Search / Text analytics is not like other software – meaning  Only way to deal with meaning is in context – your context – Technical issues are filters not features  Enterprise Text Analytics (Platform) is best – If have to start with one application, plan for the platform  POC seems expensive, but is cheaper than making a dead end choice – It is also the foundation and head start on development 23 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com

Enterprise Search / Text Analytics Evaluation

Related documents

Products

Support

Enterprise Search / Text Analytics Evaluation

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib