Taxonomy Development Workshop

advertisement
Expertise Analysis
Sentiment Plus
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Introduction – Context
Sentiment Analysis – Second Generation
– Categorization and Category Theory
–
 Basic Level Categories
–
Features and Issues
 Basic Level Categories and Expertise
– Experts prefer lower levels
– Categorization of Expertise
 Applications
–
Integration with Text Mining, Search, and ECM
– Platform for Information Applications
2
KAPS Group: General





Knowledge Architecture Professional Services
Virtual Company: Network of consultants – 8-10
Partners – SAS, Smart Logic, Microsoft-FAST, Concept Searching, etc.
Consulting, Strategy, Knowledge architecture audit
Services:
– Text Analytics/Taxonomy development, consulting, customization
– Technology Consulting – Search, CMS, Portals, etc.
– Evaluation of Enterprise Search, Text Analytics
– Metadata standards and implementation
– Knowledge Management: Collaboration, Expertise, e-learning
– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
3
Introduction – Sentiment Analysis
Sentiment & Categorization – Second Generation
 Emphasis on context around positive and negative words
–
–
Issue of sarcasm, slanguage – “Really great product”
Rules – not just statistical and terms
 Beyond Good and Evil (positive and negative)
–
–
Taxonomy of Objects and Features to taxonomy of emotions
Addition of focus on behaviors – why someone calls a support
center – and likely outcomes
 Social Media Knowledge Base
–
Wisdom of crowds, crowd-sourcing
4
Introduction – Sentiment Analysis
Sentiment & Categorization
 Essential – need full categorization and concept extraction
to do sentiment analysis well
 Sentiment Analysis to Expertise Analysis
–
Sentiment software plus cognitive science
– Develop expertise categorization rules
 Categorization
– Most basic to human cognition
– Most difficult to do with software
 No single correct categorization
–
Women, Fire, and Dangerous Things
5
Introduction – Sentiment Analysis
Sentiment & Categorization
 Borges – Celestial Emporium of Benevolent Knowledge
–
–
–
–
–
–
–
–
–
–
–
Those that belong to the Emperor
Embalmed ones
Those that are trained
Suckling pigs
Mermaids
Fabulous ones
Stray dogs
Those that are included in this classification
Those that tremble as if they were mad
Innumerable ones
Other
6
Basic Level Categories
Introduction: What are Basic Level Categories?
 Mid-level in a taxonomy / hierarchy
 Levels: Superordinate – Basic – Subordinate
–
–
Mammal – Dog – Golden Retriever
Furniture – chair – kitchen chair
 Basic in 4 dimensions
– Perception – overall perceived shape, single mental image, fast
identification
– Function – general motor program
– Communication – shortest, most commonly used, neutral, first
learned by children
– Knowledge Organization – most attributes are stored at this level
7
Basic Level Categories
Introduction: Other levels
 Subordinate – more informative but less distinctive
–
Basic shape and function with additional details
• Ex – Chair – office chair, armchair
–
Convention – people name objects by their basic category
label, unless extra information in subordinate is useful
 Superordinate – Less informative but more distinctive
–
–
–
–
All refer to varied collections – furniture
Often mass nouns, not count nouns
List abstract / functional properties
Very hard for children to learn
8
Basic Level Categories
Introduction: How recognize Basic level
 Short words – fewer noun phrases
 Kinds of attributes
–
–
–
Superordinate – functional (keeps you warm, sit on it)
Basic – Noun and adjectives – legs, belt loops, cloth
Subordinate – adjectives – blue, tall
 Basic Level – similar movements, similar shapes
 More complex for non-object domains
 Issue – what is basic level is context dependent
9
Basic Level Categories
Introduction: How recognize Basic level
 Cue Validity – probability that a particular object belongs to
some category given that it has a particular feature (cue)
–
–
–
X has wings – bird
Superordinates have lower – fewer common attributes
Subordinates have lower – share more attributes with other
members at same level
 Category utility – frequency of a category + category
validity + base rates of each of these features
 Issue – how decide which features?
–
Cat – “can be picked up”, is bigger than a beetle
10
Basic Level Categories and Expertise
 Experts prefer lower, subordinate levels
–
–
–
In their domain, (almost) never used superordinate
Novices prefer higher, superordinate levels
General Populace prefers basic level
 Not just individuals but whole societies / communities differ
in their preferred levels
 Develop expertise rules – similar to categorization rules
– Hybrid – all of the above – depending on context
– Use basic level for subject
– Superordinate for general, subordinate for expert
11
Expertise Analysis: Techniques
 Corpus context dependent
–
Author748 – is general in scientific health care context,
advanced in news health care context
 Need to generate overall expertise level for a corpus
 Also contextual rules
–
–
“Tests” is general, high level
“Predictive value of tests” is lower, more expert
 Categorization rule – SENT, DIST
–
If same sentence, expert
 Demo – Sample Documents, Rules
12
Education Terms
Expert
General
Research (context dependent)
Kid
Statistical
Pay
Program performance
Classroom
Protocol
Fail
Adolescent Attitudes
Attendance
Key academic outcomes
School year
Job training program
Closing
American Educational Research Association
Counselor
Graduate management education
Discipline
13
Healthcare Terms
Expert
General
Mouse
Cancer
Dose
Scientific
Toxicity
Physical
Diagnostic
Consumer
Mammography
Cigarette
Sampling
Smoking
Inhibitor
Weight gain
Edema
Correct
Neoplasms
Empirical
Isotretinion
Drinking
Ethylene
Testing
Significantly
Lesson
Population-base
Knowledge
Pharmacokinetic
Medicine
Metabolite
Sociology
Polymorphism
Theory
Subsyndromic
Experience
Radionuclide
Services
Etiology
Hospital
Oxidase
Social
Captopril
Domestic
Pharmacological agents
Dermatotoxicity
Mammary cancer model
Biosynthesis
14
Education Terms
15
Expertise Analysis: Application areas
 Text Mining
–
–
–
Preprocessing of documents
Expertise characterization of writer, corpus
Best results with existing taxonomy (s)
• Can use a very general, high level taxonomy – superordinate and
basic
 eCommerce
–
–
Organization and Presentation of information – expert, novice
How determine?
• Search queries, profiles, buying patterns, specific products
16
Expertise Analysis: Application areas
 Search – enterprise and/or internet
–
Query level
 Relevance ranking
–
Adjust documents for novice and expert queries
 Information presentation
–
Tag clouds – match novice and expert
 Clustering
–
Incorporate into clustering algorithms
 Presentation – expose basic level & provide up and down
browse
17
Expertise Analysis: Application areas
 Social Media - Community of Practice
–
–
–
Characterize the level of expertise in the community
Evaluate other communities expertise level
Identify experts (and leaders) in the community
 Expertise location
–
Generate automatic expertise characterization based on
authored documents
 Expertise of people in a social network
–
Terrorists and bomb-making
18
Expertise Analysis: Application areas - Tags
 Basic Level
 Superordinate


















Blog
Software (Design)
Web (Design)
Linux
Javascript
Web2.0
Google
Css
Flash
Music
Photography
News
Education
Business
Technology
Politics
Science
Culture
19
Expertise Analysis: Application areas
 Business & Customer intelligence
–
General – characterize people’s expertise to add to evaluation
of their comments
 Combine with VOC & sentiment analysis – finer evaluation
– what are experts saying, what are novices saying
–
Deeper research into communities, customers
 Enterprise Content Management
–
–
At publish time, software automatically gives an expertise
level – present to author for validation
Combine with categorization – offer tags that are suitable level
of expertise
20
Expertise Analysis:
Future Directions
 Data mining + Text Mining + Expertise-Sentiment
–
–
New applications
Group Behavior – leaders, decisions
 Predictive Analytics
–
Adding new dimensions
 Neuro-Marketing, Economics, Law, Intelligence
–
Social forecasting – Twitter and Stock Market
 Language & category theory – Metaphor Analysis, etc.
 Need an emotion taxonomy?
21
Expertise Analysis:
Conclusions
 Expertise analysis adds a new dimension to Text Analysis and
Sentiment Analysis
– Broad range of applications – personalization, customer
depth, Social Media, enterprise text analytics
 Expertise analysis builds on Basic Level Categories
– Plus expertise categorization rules
 What is expert / basic level is context dependent
 Text & Expertise Analytics builds on Sentiment Analysis and
Cognitive Science
– Not just library science or data modeling or ontology or
sentiment or linguistics – all of the above
22
Questions?
Tom Reamy
tomr@kapsgroup.com
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Resources
 Books
–
–
–
Affective Neuroscience: The Foundations of Human and
Animal Emotions– Jaak Panskeep
Decisions, Uncertainty, and the Brain: The Science of
Neuroeconomics – Paul Glimcher
Women, Fire, and Dangerous Things
• George Lakoff
–
Knowledge, Concepts, and Categories
• Koen Lamberts and David Shanks
–
The Tell-Tale Brain: A Neuroscientist’s Quest for What Makes
Up Human – V. S. Ramachandran
24
Resources
 Web Sites
–
Text Analytics News http://social.textanalyticsnews.com/index.php
–
Text Analytics Wiki - http://textanalytics.wikidot.com/
Taxonomy Community of Practice:
http://finance.groups.yahoo.com/group/TaxoCoP/
LindedIn – Text Analytics Summit Group
http://www.LinkedIn.com
–
–
–
25
Resources
 Blogs
–
SAS- http://blogs.sas.com/text-mining/
 Web Sites
–
–
Whitepaper – CM and Text Analytics http://www.textanalyticsnews.com/usa/contentmanagementm
eetstextanalytics.pdf
Whitepapers – Enterprise Content Categorization strategy and
development – http://www.kapsgroup.com
26
Resources
 Articles
–
–
–
–
Malt, B. C. 1995. Category coherence in cross-cultural
perspective. Cognitive Psychology 29, 85-148
Rifkin, A. 1985. Evidence for a basic level in event
taxonomies. Memory & Cognition 13, 538-56
Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987.
Emotion Knowledge: further explorations of prototype
approach. Journal of Personality and Social Psychology 52,
1061-1086
Tanaka, J. W. & M. E. Taylor 1991. Object categories and
expertise: is the basic level in the eye of the beholder?
Cognitive Psychology 23, 457-82
27
Basic Level Categories
Introduction: What are Basic Level Categories?









Short and easy words
Maximum distinctness and expressiveness
Similarly perceived shapes
Most commonly used labels
Easiest and fastest to indentify members
First level named and understood by children
Terms usually used in neutral contexts
Level at which most of our knowledge is organized
Objects – most studied, most pronounced effects
28
Basic Level Categories
Introduction: Basic Level Categories: Non-Object
 Basic level effects, but no widespread acceptance of categories and
category names
 Thus a basic level in a category hierarchy but not the category hierarchy
that people actually use in everyday life
 Not just IS-A relationship – messier – more like ontologies
 Examples:
– Scenes – indoors – school – elementary school
– Events – travel – highway travel – truck travel
– Emotions – positive emotion – joy – contentment
– Programming – Algorithm – sort – binary
29
Basic Level Categories and Expertise
 Experts chunk series of actions, ideas, etc.
–
–
–
Novice – high level only
Intermediate – steps in the series
Expert – special language – based on deep connections
 Types of expert :
–
–
Technical – lower level terms only
Strategic – high level and lower level terms, special language
30
Expertise Analysis: Techniques
 What is basic level is context(s) dependent
 Documents / Tags – analyze in terms of levels of words
– Taxonomy for high level
– Length for basic – short
– Length for subordinate – long, special vocabulary
 Category Utility
 Develop expertise rules – similar to categorization rules
– Hybrid – all of the above – depending on context
– Use basic level for subject
– Superordinate for general, subordinate for expert
31
Download