Text Analytics Workshop

Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com Agenda  Introduction – State of Text Analytics – Text Analytics Features – Information / Knowledge Environment – Taxonomy, Metadata, Information Technology – Value of Text Analytics – Quick Start for Text Analytics  Development – Taxonomy, Categorization, Faceted Metadata  Text Analytics Applications – – Integration with Search and ECM Platform for Information Applications  Questions / Discussions 2 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies  Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.  Presentations, Articles, White Papers – www.kapsgroup.com 3 Text Analytics Workshop Introduction: Text Analytics  History – academic research, focus on NLP  Inxight –out of Zerox Parc – Moved TA from academic and NLP to auto-categorization, entity extraction, and Search-Meta Data  Explosion of companies – many based on Inxight extraction with some analytical-visualization front ends – Half from 2008 are gone - Lucky ones got bought  Focus on enterprise text analytics – shift to sentiment analysis easier to do, obvious pay off (customers, not employees) – Backlash – Real business value?  Enterprise search down – 10 years of effort for what? – Need Text Analytics to work  Text Analytics is slowly growing – time for a jump? 4 Text Analytics Workshop Current State of Text Analytics  Current Market: 2012 – exceed $1 Bil for text analytics (10% of total Analytics)  Growing 20% a year  Search is 33% of total market  Other major areas: – Sentiment and Social Media Analysis, Customer Intelligence – Business Intelligence, Range of text based applications  Fragmented market place – full platform, low level, specialty – Embedded in content management, search, No clear leader.  Big Data – Big Text is bigger, text into data, data for text – Watson – ensemble methods, pun module 5 Text Analytics Workshop Current State of Text Analytics: Vendor Space  Taxonomy Management – SchemaLogic, Pool Party  From Taxonomy to Text Analytics – Data Harmony, Multi-Tes  Extraction and Analytics – Linguamatics (Pharma), Temis, whole range of companies  Business Intelligence – Clear Forest, Inxight  Sentiment Analysis – Attensity, Lexalytics, Clarabridge  Open Source – GATE  Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching  Embedded in Content Management, Search – Autonomy, FAST, Endeca, Exalead, etc. 6 Future Directions: Survey Results  Important Areas: – Predictive Analytics & text mining – 90% – Search & Search-based Apps – 86% – Business Intelligence – 84% – Voice of the Customer – 82%, Social Media – 75% – Decision Support, KM – 81% – Big Data- other – 70%, Finance – 61% – Call Center, Tech Support – 63% – Risk, Compliance, Governance – 61% – Security, Fraud Detection-54% 7 Future Directions: Survey Results  28% just getting started, 11% not yet  What factors are holding back adoption of TA? Lack of clarity about value of TA – 23.4% – Lack of knowledge about TA – 17.0% – Lack of senior management buy-in - 8.5% – Don’t believe TA has enough business value -6.4% –  Other factors Financial Constraints – 14.9% – Other priorities more important – 12.8% –  Lack of articulated strategic vision – by vendors, consultants, advocates, etc. 8 Introduction: Future Directions What is Text Analytics Good For? 9 Text Analytics Workshop What is Text Analytics?  Text Mining – NLP, statistical, predictive, machine learning  Semantic Technology – ontology, fact extraction  Extraction – entities – known and unknown, concepts, events – Catalogs with variants, rule based  Sentiment Analysis – Objects/ Products and phrases – Statistics, catalogs, rules – Positive and Negative  Auto-categorization – – – – – Training sets, Terms, Semantic Networks Rules: Boolean - AND, OR, NOT Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE Disambiguation - Identification of objects, events, context Build rules based, not simply Bag of Individual Words 10 11 Case Study – Categorization & Sentiment 12 Case Study – Categorization & Sentiment 13 14 15 16 17 18 19 Case Study – Taxonomy Development 20 Text Analytics Workshop: Information Environment Building an Infrastructure  Semantic Layer = Taxonomies, Metadata, Vocabularies + Text Analytics – adding cognitive science, structure to unstructured  Modeling users/audiences  Technology Layer – Search, Content Management, SharePoint, Intranets  Publishing process, multiple users & info needs – SharePoint – taxonomies but • Folksonomies – still a bad idea  Infrastructure – Not an Application – Business / Library / KM / EA – not IT  Building on the Foundation – Info Apps (Search-based Applications)  Foundation of foundation – Text Analytics 21 Text Analytics Workshop: Information Environment TA & Taxonomy Complimentary Information Platform  Taxonomy provides a consistent and common vocabulary – Enterprise resource – integrated not centralized  Text Analytics provides a consistent tagging – Human indexing is subject to inter and intra individual variation  Taxonomy provides the basic structure for categorization – And candidates terms  Text Analytics provides the power to apply the taxonomy – And metadata of all kinds  Text Analytics and Taxonomy Together – Platform – Consistent in every dimension – Powerful and economic 22 Text Analytics Workshop: Information Environment Metadata - Tagging  How do you bridge the gap – taxonomy to documents?  Tagging documents with taxonomy nodes is tough – And expensive – central or distributed  Library staff –experts in categorization not subject matter – Too limited, narrow bottleneck – Often don’t understand business processes and business uses  Authors – Experts in the subject matter, terrible at categorization – Intra and Inter inconsistency, “intertwingleness” – Choosing tags from taxonomy – complex task – Folksonomy – almost as complex, wildly inconsistent – Resistance – not their job, cognitively difficult = non-compliance  Text Analytics is the answer(s)! 23 Text Analytics Workshop: Information Environment Mind the Gap – Manual-Automatic-Hybrid  All require human effort – issue of where and how effective  Manual - human effort is tagging (difficult, inconsistent) – Small, high value document collections, trained taggers  Automatic - human effort is prior to tagging – auto-categorization rules and/or NLP algorithm effort  Hybrid Model – before (like automatic) and after – Build on expertise – librarians on categorization, SME’s on subject terms  Facets – Requires a lot of Metadata - Entity Extraction feeds facets – more automatic, feedback by design  Manual - Hybrid – Automatic is a spectrum – depends on context 24 Text Analytics Workshop Benefits of Text Analytics  Why Text Analytics? – Enterprise search has failed to live up to its potential – Enterprise Content management has failed to live up to its potential – Taxonomy has failed to live up to its potential – Adding metadata, especially keywords has not worked  What is missing? Intelligence – human level categorization, conceptualization – Infrastructure – Integrated solutions not technology, software –  Text Analytics can be the foundation that (finally) drives success – search, content management, and much more 25 Text Analytics Workshop Costs and Benefits  IDC study – quantify cost of bad search  Three areas: – Time spent searching – Recreation of documents – Bad decisions / poor quality work  Costs – 50% search time is bad search = $2,500 year per person – Recreation of documents = $5,000 year per person – Bad quality (harder) = $15,000 year per person  Per 1,000 people = $ 22.5 million a year – 30% improvement = $6.75 million a year – Add own stories – especially cost of bad information – Human measure - # of FTE’s, savings passed on to customers, etc. 26 Text Analytics Workshop Need for a Quick Start  Text Analytics is weird, a bit academic, and not very practical • It involves language and thinking and really messy stuff  On the other hand, it is really difficult to do right (Rocket Science)  Organizations don’t know what text analytics is and what it is for  TAW Survey shows - need two things: • Strategic vision of text analytics in the enterprise • Business value, problems solved, information overload • Text Analytics as platform for information access • Real life functioning program showing value and demonstrating an understanding of what it is and does  Quick Start – Strategic Vision – Software Evaluation – POC / Pilot 27 Text Analytics Workshop Text Analytics Vision & Strategy  Strategic Questions – why, what value from the text analytics, how are you going to use it – Platform or Applications?  What are the basic capabilities of Text Analytics?  What can Text Analytics do for Search? – After 10 years of failure – get search to work?  What can you do with smart search based applications? – RM, PII, Social  ROI for effective search – difficulty of believing – Problems with metadata, taxonomy 28 Text Analytics Workshop Quick Start Step One- Knowledge Audit  Ideas – Content and Content Structure Map of Content – Tribal language silos – Structure – articulate and integrate – Taxonomic resources –  People – Producers & Consumers – Communities, Users, Central Team  Activities – Business processes and procedures – Semantics, information needs and behaviors – Information Governance Policy  Technology – – CMS, Search, portals, text analytics Applications – BI, CI, Semantic Web, Text Mining 29 Text Analytics Workshop Quick Start Step One- Knowledge Audit  Info Problems – what, how severe  Formal Process – Knowledge Audit – Contextual & Information interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining  Informal for smaller organizations, specific application  Category modeling – Cognitive Science – how people think – Panda, Monkey, Banana  Natural level categories mapped to communities, activities • Novice prefer higher levels • Balance of informative and distinctiveness  Strategic Vision – Text Analytics and Information/Knowledge Environment 30 Quick Start Step Two - Software Evaluation Varieties of Taxonomy/ Text Analytics Software  Software is more important to text analytics – No spreadsheets for semantics  Taxonomy Management - extraction  Full Platform – SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE  Embedded – Search or Content Management – FAST, Autonomy, Endeca, Vivisimo, NLP, etc. – Interwoven, Documentum, etc.  Specialty / Ontology (other semantic) Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology – 31 Quick Start Step Two - Software Evaluation Different Kind of software evaluation  Traditional Software Evaluation - Start Filter One- Ask Experts - reputation, research – Gartner, etc. • Market strength of vendor, platforms, etc. • Feature scorecard – minimum, must have, filter to top 6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors –  Reduce to 1-3 vendors  Vendors have different strengths in multiple environments – – Millions of short, badly typed documents, Build application Library 200 page PDF, enterprise & public search 32 Quick Start Step Two - Software Evaluation Design of the Text Analytics Selection Team  IT - Experience with software purchases, needs assess, budget – Search/Categorization is unlike other software, deeper look  Business -understand business, focus on business value  They can get executive sponsorship, support, and budget – But don’t understand information behavior, semantic focus  Library, KM - Understand information structure  Experts in search experience and categorization – But don’t understand business or technology  Interdisciplinary Team, headed by Information Professionals  Much more likely to make a good decision  Create the foundation for implementation 33 Quick Start Step Three – Proof of Concept / Pilot Project  POC use cases – basic features needed for initial projects  Design - Real life scenarios, categorization with your content  Preparation: – Preliminary analysis of content and users information needs • Training & test sets of content, search terms & scenarios – Train taxonomist(s) on software(s) – Develop taxonomy if none available  Four week POC – 2 rounds of develop, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Majority of time is on auto-categorization 34 Text Analytics Workshop POC Design: Evaluation Criteria & Issues  Basic Test Design – categorize test set – Score – by file name, human testers  Categorization & Sentiment – Accuracy 80-90% – Effort Level per accuracy level  Combination of scores and report  Operators (DIST, etc.) , relevancy scores, markup  Development Environment – Usability, Integration  Issues: – Quality of content & initial human categorization – Normalize among different test evaluators – Quality of taxonomy – structure, overlapping categories 35 Quick Start for Text Analytics Proof of Concept -- Value of POC  Selection of best product(s)  Identification and development of infrastructure elements – taxonomies, metadata – standards and publishing process  Training by doing –SME’s learning categorization, Library/taxonomist learning business language  Understand effort level for categorization, application  Test suitability of existing taxonomies for range of applications  Explore application issues – example – how accurate does categorization need to be for that application – 80-90%  Develop resources – categorization taxonomies, entity extraction catalogs/rules 36 Text Analytics Workshop POC and Early Development: Risks and Issues  CTO Problem –This is not a regular software process  Semantics is messy not just complex – 30% accuracy isn’t 30% done – could be 90%  Variability of human categorization  Categorization is iterative, not “the program works” – Need realistic budget and flexible project plan  Anyone can do categorization – Librarians often overdo, SME’s often get lost (keywords)  Meta-language issues – understanding the results – Need to educate IT and business in their language 37 Development 38 Text Analytics Development: Categorization Process Start with Taxonomy and Content  Starter Taxonomy – If no taxonomy, develop (steal) initial high level • Textbooks, glossaries, Intranet structure • Organization Structure – facets, not taxonomy  Analysis of taxonomy – suitable for categorization – – Structure – not too flat, not too large Orthogonal categories  Content Selection – – – Map of all anticipated content Selection of training sets – if possible Automated selection of training sets – taxonomy nodes as first categorization rules – apply and get content 39 Text Analytics Workshop Text Analytics Development: Categorization Process  First Round of Categorization Rules  Term building – from content – basic set of terms that        appear often / important to content Add terms to rule, apply to broader set of content Repeat for more terms – get recall-precision “scores” Repeat, refine, repeat, refine, repeat Get SME feedback – formal process – scoring Get SME feedback – human judgments Test against more, new content Repeat until “done” – 90%? 40 Text Analytics Workshop Text Analytics Development: Entity Extraction Process  Facet Design – from Knowledge Audit, K Map  Find and Convert catalogs: – – – – Organization – internal resources People – corporate yellow pages, HR Include variants Scripts to convert catalogs – programming resource  Build initial rules – follow categorization process – – – Differences – scale, threshold – application dependent Recall – Precision – balance set by application Issue – disambiguation – Ford company, person, car 41 Text Analytics Workshop Text Analytics Development: Demo  BioPharma – scientific vocabulary / articles  42 Text Analytics Workshop Case Study - Background  Inxight Smart Discovery  Multiple Taxonomies – – Healthcare – first target Travel, Media, Education, Business, Consumer Goods,  Content – 800+ Internet news sources – 5,000 stories a day  Application – Newsletters – – Editors using categorized results Easier than full automation 43 Text Analytics Workshop Case Study - Approach  Initial High Level Taxonomy Auto generation – very strange – not usable – Editors High Level – sections of newsletters – Editors & Taxonomy Pro’s - Broad categories & refine –  Develop Categorization Rules – Multiple Test collections – Good stories, bad stories – close misses - terms  Recall and Precision Cycles – – Refine and test – taxonomists – many rounds Review – editors – 2-3 rounds  Repeat – about 4 weeks 44 45 46 47 Text Analytics Workshop Case Study – Issues & Lessons  Taxonomy Structure: Aggregate vs. independent nodes – Children Nodes – subset – rare  Trade-off of depth of taxonomy and complexity of rules  No best answer – taxonomy structure, format of rules – Need custom development – Recall more important than precision – editors role  Combination of SME and Taxonomy pros – Combination of Features – Entity extraction, terms, Boolean, filters, facts  Training sets and find similar are weakest  Plan for ongoing refinement 48 Text Analytics Workshop Enterprise Environment – Case Studies  A Tale of Two Taxonomies – It was the best of times, it was the worst of times  Basic Approach – – – – – – Initial meetings – project planning High level K map – content, people, technology Contextual and Information Interviews Content Analysis Draft Taxonomy – validation interviews, refine Integration and Governance Plans 49 Text Analytics Workshop Enterprise Environment – Case One – Taxonomy, 7 facets  Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins  Facets: – Organization > Division > Group – Clients > Federal > EPA – Facilities > Division > Location > Building X – Content Type – Knowledge Asset > Proposals – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Methods > Social > Population Study – Materials > Compounds > Chemicals 50 Text Analytics Workshop Enterprise Environment – Case One – Taxonomy, 7 facets  Project Owner – KM department – included RM, business process  Involvement of library - critical  Realistic budget, flexible project plan  Successful interviews – build on context – Overall information strategy – where taxonomy fits  Good Draft taxonomy and extended refinement – – Software, process, team – train library staff Good selection and number of facets  Developed broad categorization and one deep-Chemistry  Final plans and hand off to client 51 Text Analytics Workshop Enterprise Environment – Case Two – Taxonomy, 4 facets  Taxonomy of Subjects / Disciplines: – Geology > Petrology  Facets: – Organization > Division > Group – Process > Drill a Well > File Test Plan – Assets > Platforms > Platform A – Content Type > Communication > Presentations 52 Enterprise Environment – Case Two – Taxonomy, 4 facets Environment & Project Issues  Value of taxonomy understood, but not the complexity and scope – Under budget, under staffed  Location – not KM – tied to RM and software – Solution looking for the right problem  Importance of an internal library staff – Difficulty of merging internal expertise and taxonomy  Project mind set – not infrastructure – Rushing to meet deadlines doesn’t work with semantics Importance of integration – with team, company – Project plan more important than results 53 Enterprise Environment – Case Two – Taxonomy, 4 facets Research and Design Issues  Research Issues – Not enough research – and wrong people – Misunderstanding of research – wanted tinker toy connections • Interview 1 leads to taxonomy node 2  Design Issues – Not enough facets – Wrong set of facets – business not information – Ill-defined facets – too complex internal structure 54 Enterprise Environment – Case Two – Taxonomy, 4 facets Conclusion: Risk Factors  Political-Cultural-Semantic Environment – Not simple resistance - more subtle • – re-interpretation of specific conclusions and sequence of conclusions / Relative importance of specific recommendations  Access to content and people – Enthusiastic access  Importance of a unified project team – Working communication as well as weekly meetings 55 Applications 56 Text Analytics Workshop Building on the Foundation  Text Analytics: Create the Platform – CM & Search – New Electronic Publishing Process • Use text analytics to tag, new hybrid workflow – New Enterprise Search • Build faceted navigation on metadata, extraction  Enhance Information Access in the Enterprise - InfoApps – Governance, Records Management, Doc duplication, Compliance – Applications – Business Intelligence, CI, Behavior Prediction eDiscovery, litigation support, Fraud detection Productivity / Portals – spider and categorize, extract – – 57 Text Analytics Workshop Information Platform: Content Management  Hybrid Model – Internal Content Management – Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author – Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy – Feedback – if author overrides -> suggestion for new category  External Information - human effort is prior to tagging – More automated, human input as specialized process – periodic evaluations – Precision usually more important – Target usually more general 58 Text Analytics and Search Multi-dimensional and Smart  Faceted Navigation has become the basic/ norm –     Facets require huge amounts of metadata – Entity / noun phrase extraction is fundamental – Automated with disambiguation (through categorization) Taxonomy – two roles – subject/topics and facet structure – Complex facets and faceted taxonomies Clusters and Tag Clouds – discovery & exploration Auto-categorization – aboutness, subject facets – This is still fundamental to search experience – InfoApps only as good as fundamentals of search People – tagging, evaluating tags, fine tune rules and taxonomy 59 60 61 Integrated Facet Application Design Issues - General  What is the right combination of elements? – Dominant dimension or equal facets – Browse topics and filter by facet, search box – How many facets do you need?  Scale requires more automated solutions – More sophisticated rules  Issue of disambiguation: Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford – Same word, different entity – Ford and Ford –  Number of entities and thresholds per results set / document – Usability, audience needs  Relevance Ranking – number of entities, rank of facets 62 Text Analytics Workshop : Applications Text and Data: Two Way Street  New types of applications – New ways to make sense of data, enrich data  Harvard – Analyzing Text as Data – Detecting deception, Frame Analysis  Narrative Science – take data (baseball statistics, financial data) and turn into a story  Political campaigns using Big Data, social media, and text analytics  Watson for healthcare – help doctors keep up with massive information overload 63 Text Analytics Workshop : Applications Social Media: Beyond Simple Sentiment  Beyond Good and Evil (positive and negative) – Social Media is approaching next stage (growing up) – Where is the value? How get better results?  Importance of Context – around positive and negative words Rhetorical reversals – “I was expecting to love it” – Issues of sarcasm, (“Really Great Product”), slanguage –  Granularity of Application Early Categorization – Politics or Sports Limited value of Positive and Negative – Degrees of intensity, complexity of emotions and documents  Addition of focus on behaviors – why someone calls a support center – and likely outcomes –  64 Text Analytics Workshop : Applications Social Media: Beyond Simple Sentiment  Two basic approaches [Limited accuracy, depth] – Statistical Signature of Bag of Words – Dictionary of positive & negative words  Essential – need full categorization and concept extraction  New Taxonomies – Appraisal Groups – Adjective and modifiers – “not very good” – Supports more subtle distinctions than positive or negative  Emotion taxonomies - Joy, Sadness, Fear, Anger, Surprise, Disgust – New Complex – pride, shame, confusion, skepticism 65 Text Analytics Workshop: Applications Expertise Analysis  Expertise Analysis – Experts think & write differently – process, chunks  Expertise Characterization for individuals, communities, documents, and sets of documents  Applications: – Business & Customer intelligence, Voice of the Customer – Deeper understanding of communities, customers – better models – Security, threat detection – behavior prediction, Are they experts? – Expertise location- Generate automatic expertise characterization  Crowd Sourcing – technical support to Wiki’s  Political – conservative and liberal minds/texts – Disgust, shame, cooperation, openness 66 Text Analytics Workshop: Applications Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Basic Rule – (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act  More sophisticated analysis of text and context in text  Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications 67 Text Analytics Workshop: Applications Variety of New Applications  Essay Evaluation Software - Apply to expertise characterization – Avoid gaming the system – multi-syllabic nonsense • Model levels of chunking, procedure words over content  Legal Review Significant trend – computer-assisted review (manual =too many) – TA- categorize and filter to smaller, more relevant set – Payoff is big – One firm with 1.6 M docs – saved $2M –  Financial Services – – – – Trend – using text analytics with predictive analytics – risk and fraud Combine unstructured text (why) and structured transaction data (what) Customer Relationship Management, Fraud Detection Stock Market Prediction – Twitter, impact articles 68 Text Analytics Workshop: Applications Pronoun Analysis: Fraud Detection; Enron Emails  Patterns of “Function” words reveal wide range of insights  Function words = pronouns, articles, prepositions, conjunctions, etc. – Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words  Areas: sex, age, power-status, personality – individuals and groups  Lying / Fraud detection: Documents with lies have – Fewer and shorter words, fewer conjunctions, more positive emotion words – More use of “if, any, those, he, she, they, you”, less “I” – More social and causal words, more discrepancy words  Current research – 76% accuracy in some contexts  Text Analytics can improve accuracy and utilize new sources  Data analytics (standard AML) can improve accuracy 69 Text Analytics Workshop Conclusions  Text Analytics and Taxonomy are partners – enrich each other  Text Analytics can mind the gap – between taxonomies and documents  Text Analytics needs strategic vision and quick start – Need to approach as platform – deep context – understand information environment  Text Analytics is a platform for huge range of applications: – – Search and Content Management and Basic productivity apps New kinds of applications - social, data, Info Apps of all kinds  Want to learn more – come to Text Analytics World in Boston in Fall! – Early Bird Registration – www.textanalyticsworld.com 70 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com Resources  Books – Women, Fire, and Dangerous Things • George Lakoff – Knowledge, Concepts, and Categories • Koen Lamberts and David Shanks – Formal Approaches in Categorization • Ed. Emmanuel Pothos and Andy Wills – The Mind • Ed John Brockman • Good introduction to a variety of cognitive science theories, issues, and new ideas – Any cognitive science book written after 2009 72 Resources  Conferences – Web Sites – Text Analytics World - All aspects of text analytics • Oct 2-3, Boston – http://www.textanalyticsworld.com – Text Analytics Summit http://www.textanalyticsnews.com – – – Semtech http://www.semanticweb.com 73 Resources  Blogs – SAS- http://blogs.sas.com/text-mining/  Web Sites – – – – – Taxonomy Community of Practice: http://finance.groups.yahoo.com/group/TaxoCoP/ LindedIn – Text Analytics Summit Group http://www.LinkedIn.com Whitepaper – CM and Text Analytics http://www.textanalyticsnews.com/usa/contentmanagementm eetstextanalytics.pdf Whitepaper – Enterprise Content Categorization strategy and development – http://www.kapsgroup.com 74 Resources  Articles – – – – Malt, B. C. 1995. Category coherence in cross-cultural perspective. Cognitive Psychology 29, 85-148 Rifkin, A. 1985. Evidence for a basic level in event taxonomies. Memory & Cognition 13, 538-56 Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987. Emotion Knowledge: further explorations of prototype approach. Journal of Personality and Social Psychology 52, 1061-1086 Tanaka, J. W. & M. E. Taylor 1991. Object categories and expertise: is the basic level in the eye of the beholder? Cognitive Psychology 23, 457-82 75 Resources  LinkedIn Groups: – – – – – – Text Analytics World Text Analytics Group Data and Text Professionals Sentiment Analysis Metadata Management Semantic Technologies  Journals – – Academic – Cognitive Science, Linguistics, NLP Applied – Scientific American Mind, New Scientist 76

Text Analytics Workshop

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib