Basic Level Categories

advertisement
Basic Level Categories
for
Knowledge Representation
Tom Reamy
Chief Knowledge Architect
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Agenda
 Introduction – Context
Category Theory – Cognitive Science
– Enterprise Text Analytics
–
 Basic Level Categories
–
Features and Issues
 Basic Level Categories and Expertise
– Experts prefer lower levels
– Categorization of Expertise
 Applications
–
Integration with Search and ECM
– Platform for Information Applications
2
KAPS Group: General





Knowledge Architecture Professional Services
Virtual Company: Network of consultants – 8-10
Partners – SAS, SAP, Microsoft-FAST, Concept Searching, etc.
Consulting, Strategy, Knowledge architecture audit
Services:
– Taxonomy/Text Analytics development, consulting, customization
– Technology Consulting – Search, CMS, Portals, etc.
– Evaluation of Enterprise Search, Text Analytics
– Metadata standards and implementation
– Knowledge Management: Collaboration, Expertise, e-learning
– Applied Theory – Faceted taxonomies, complexity theory, natural
categories
3
Basic Level Categories
Context
 Unstructured Content - Enterprise & External
 Preprocessing of documents and sets
– Includes categorization, information extraction
 Representation of Domain knowledge – taxonomy, ontology
 Presentation of results of search, text mining – and refinement
 Categorization
– Most basic to human cognition
– Most difficult to do with software
 No single correct categorization
–
Women, Fire, and Dangerous Things
4
Basic Level Categories
Context
 Borges – Celestial Emporium of Benevolent Knowledge
–
–
–
–
–
–
–
–
–
–
–
Those that belong to the Emperor
Embalmed ones
Those that are trained
Suckling pigs
Mermaids
Fabulous ones
Stray dogs
Those that are included in this classification
Those that tremble as if they were mad
Innumerable ones
Other
5
Basic Level Categories – software context
Enterprise Text Analytics (ETA)
 Enterprise Search – Faceted Navigation
Categorization – Document Topics – Aboutness
– Entity Extraction – noun phrases, feed facets, ontologies
– Summarization – beyond snippets
–
 Enterprise Content Management
–
–
–
Hybrid model of metadata
Categorization – suggestions
Entity, Noun phrase – facets need a lot of metadata
6
Basic Level Categories – software context
Enterprise Text Analytics (ETA)
 Advanced Text Analytics
–
–
–
Fact extraction – ontologies
Sentiment Analysis – good, bad, and ugly
Expertise Analysis
 Enterprise Applications –Information Applications
–
–
Text mining – alone or in conjunction with data mining
Business & Customer intelligence
7
Basic Level Categories
Introduction: What are Basic Level Categories?









Mid-level in a taxonomy / hierarchy
Short and easy words
Maximum distinctness and expressiveness
Similarly perceived shapes
Most commonly used labels
Easiest and fastest to indentify members
First level named and understood by children
Terms usually used in neutral contexts
Level at which most of our knowledge is organized
8
Basic Level Categories
Introduction: What are Basic Level Categories?
 Objects – most studied, most pronounced effects
 Levels: Superordinate – Basic – Subordinate
–
–
Mammal – Dog – Golden Retriever
Furniture – chair – kitchen chair
 Basic in 4 dimensions
– Perception – overall perceived shape, single mental image, fast
identification
– Function – general motor program
– Communication – shortest, most commonly used, neutral, first
learned by children
– Knowledge Organization – most attributes are stored at this level
9
Basic Level Categories
Introduction: Basic Level Categories: Non-Object
 Basic level effects, but no widespread acceptance of categories and
category names
 Thus a basic level in a category hierarchy but not the category hierarchy
that people actually use in everyday life
 Not just IS-A relationship – messier – more like ontologies
 Examples:
– Scenes – indoors – school – elementary school
– Events – travel – highway travel – truck travel
– Emotions – positive emotion – joy – contentment
– Programming – Algorithm – sort – binary
10
Basic Level Categories
Introduction: Other levels
 Subordinate – more informative but less distinctive
–
Basic shape and function with additional details
• Ex – Chair – office chair, armchair
–
Convention – people name objects by their basic category
label, unless extra information in subordinate is useful
 Superordinate – Less informative but more distinctive
–
–
–
–
All refer to varied collections – furniture
Often mass nouns, not count nouns
List abstract / functional properties
Very hard for children to learn
11
Basic Level Categories
Introduction: How recognize Basic level
 Short words – noun phrase
–
Selected list (extended stop words)
 Kinds of attributes
–
–
–
Superordinate – functional (keeps you warm, sit on it)
Basic – Noun and adjectives – legs, belt loops, cloth
Subordinate – adjectives – blue, tall
 Basic Level – similar movements, similar shapes
 More complex for non-object domains
 Issue – what is basic level is context dependent
12
Basic Level Categories
Introduction: How recognize Basic level
 Cue Validity – probability that a particular object belongs to
some category given that it has a particular feature (cue)
–
–
–
X has wings – bird
Superordinates have lower – fewer common attributes
Subordinates have lower – share more attributes with other
members at same level
 Category utility – frequency of a category + category
validity + base rates of each of these features
 Issue – how decide which features?
–
Cat – “can be picked up”, is bigger than a beetle
13
Basic Level Categories and Expertise
 Experts prefer lower, subordinate levels
–
In their domain, (almost) never used superordinate
 Novice prefer higher, superordinate levels
 General Populace prefers basic level
 Not just individuals but whole societies / communities differ
in their preferred levels
 Issue – artificial languages – ex. Science discipline
 Issue – difference of child and adult learning – adults start
with high level
14
Basic Level Categories and Expertise
 Experts chunk series of actions, ideas, etc.
–
–
–
Novice – high level only
Intermediate – steps in the series
Expert – special language – based on deep connections
 Expertise is a combination of knowledge and skill
–
–
Everything from riding a bike to merging two companies
No such thing as tacit knowledge - spectrum
 Types of expert :
–
–
Technical – lower level terms only
Strategic – high level and lower level terms, special language
15
Basic Level Categories
Analytical Techniques
 What is basic level is context(s) dependent
 Documents / Tags – analyze in terms of levels of words
– Taxonomy for high level
– Length for basic – short
– Length for subordinate – long, special vocabulary
 Category Utility
 Hybrid – simple high level taxonomy (superordinate), short words –
basic, longer words – expert Plus
 Develop expertise rules – similar to categorization rules
– Use basic level for subject
– Superordinate for general, subordinate for expert
16
Basic Level Categories
Analytical Techniques
 Corpus context dependent
–
Author748 – is general in scientific health care context,
advanced in news health care context
 Need to generate overall expertise level for a corpus
 Also contextual rules
–
–
“Tests” is general, high level
“Predictive value of tests” is lower, more expert
 Categorization rule – SENT, DIST
–
If same sentence, expert
 Demo – Sample Documents, Rules
17
Education Terms
Expert
General
Research (context dependent)
Kid
Statistical
Pay
Program performance
Classroom
Protocol
Fail
Adolescent Attitudes
Attendance
Key academic outcomes
School year
Job training program
Closing
American Educational Research Association
Counselor
Graduate management education
Discipline
18
Healthcare Terms
Expert
General
Mouse
Cancer
Dose
Scientific
Toxicity
Physical
Diagnostic
Consumer
Mammography
Cigarette
Sampling
Smoking
Inhibitor
Weight gain
Edema
Correct
Neoplasms
Empirical
Isotretinion
Drinking
Ethylene
Testing
Significantly
Lesson
Population-base
Knowledge
Pharmacokinetic
Medicine
Metabolite
Sociology
Polymorphism
Theory
Subsyndromic
Experience
Radionuclide
Services
Etiology
Hospital
Oxidase
Social
Captopril
Domestic
Pharmacological agents
Dermatotoxicity
Mammary cancer model
Biosynthesis
19
Basic Level Categories
Expertise – application areas
 Taxonomy development /design – use basic level
 User contribution
–
–
Card sorting – non-experts use superficial similarities
Survey for attributes instead of cart sorting, general structure
 Develop expert and general versions/sections/synonyms
–
ID communities by their documents, tags
 Info presentation – combine superordinate and basic
–
Similar to scientific – Genus – Species is official name
 Info presentation – document maps – expose basic level
20
Basic Level Categories
Expertise – application areas
 Ontology development / design
–
Need more focus on who is intended audience
• Structure, nomenclature
–
–
Defining classes & hierarchy – same as taxonomy
Defining properties - Expert dependent
• Wine for snobs (experts) very different than Joe Sixpack
–
Two approaches
• One ontology, classes and/or properties as expert
• Two ontologies – expert and novice
21
Basic Level Categories
Expertise – application areas
 Text Mining
–
–
–
Preprocessing of documents
Expertise characterization of writer
Best results with existing taxonomy
• Can use a very general, high level taxonomy – superordinate and
basic
• Can use existing large taxonomies – MeSH, etc.
 eCommerce
–
–
Organization and Presentation of information – expert, novice
How determine?
• Search queries, profiles, buying patterns, specific products
22
Basic Level Categories
Expertise – application areas
 Search – enterprise and/or internet
–
Query level
 Relevance ranking
–
Adjust documents for novice and expert queries
 Information presentation
–
Tag clouds – match novice and expert
 Clustering
–
–
Incorporate into clustering algorithms
Presentation – expose basic level & provide up and down
browse
23
Basic Level Categories
Expertise – application areas
 Social Media - Community of Practice
–
–
–
Characterize the level of expertise in the community
Evaluate other communities expertise level
Personalize information presentation by expertise
 Expertise location
–
Generate automatic expertise characterization based on
authored documents
 Expertise of people in a social network
–
Terrorists and bomb-making
 Issue of Levels of expertise – how granular?
24
Basic Level Categories
Expertise – application areas - CoP
 Basic Level
 Superordinate









Blog
Software (Design)
Web (Design)
Linux
Javascript
Web2.0
Google
Css
Flash









Music
Photography
News
Education
Business
Technology
Politics
Science
Culture
25
Basic Level Categories
Expertise – Related Tags - Delicious
 CSS
 Education











Web Design
Design
Css3
Tutorial
Webdev
Javascript
Web
Development
Html
Jquery
html5











Technology
Resources
Teaching
Learning
Science
Web20
Games
Interactive
Research
Tools
reference
26
Basic Level Categories
Expertise – application areas
 Business & Customer intelligence
–
–
–
General – characterize people’s expertise to add to evaluation
of their comments
Combine with sentiment analysis – finer evaluation – what are
experts saying, what are novices saying
Deeper research into communities, customers
 Enterprise Content Management
–
–
At publish time, software automatically gives an expertise
level – present to author for validation
Combine with categorization – offer tags that are suitable level
of expertise
27
Basic Level Categories
Conclusions
 Basic Level Categories are fundamental to thought
 What is basic level is context dependent
 Basic level effect is most obvious with objects, more work
for concepts
 Most domains need some taxonomy – need not be big
–
Categorization-like rules
 This is exciting, but not a revolution
 Beware Egalitarian stance – People are different
 Text Analytics needs Cognitive Science
–
Not just library science or data modeling or ontology
28
Resources
 Books
–
Women, Fire, and Dangerous Things
• George Lakoff
–
Knowledge, Concepts, and Categories
• Koen Lamberts and David Shanks
–
The Stuff of Thought – Steven Pinker
 Web Sites
–
Text Analytics News http://social.textanalyticsnews.com/index.php
–
Text Analytics Wiki - http://textanalytics.wikidot.com/
29
Resources
 Blogs
–
SAS- Manya Mayes – Chief Strategist http://blogs.sas.com/text-mining/
 Web Sites
–
Taxonomy Community of Practice:
http://finance.groups.yahoo.com/group/TaxoCoP/
–
Whitepaper – CM and Text Analytics http://www.textanalyticsnews.com/usa/contentmanagementm
eetstextanalytics.pdf
Whitepaper – Enterprise Content Categorization – coming
soon
–
30
Resources
 Articles
–
–
–
–
Malt, B. C. 1995. Category coherence in cross-cultural
perspective. Cognitive Psychology 29, 85-148
Rifkin, A. 1985. Evidence for a basic level in event
taxonomies. Memory & Cognition 13, 538-56
Shaver, P., J. Schwarz, D. Kirson, D. O’Conner 1987.
Emotion Knowledge: further explorations of prototype
approach. Journal of Personality and Social Psychology 52,
1061-1086
Tanaka, J. W. & M. E. Taylor 1991. Object categories and
expertise: is the basic level in the eye of the beholder?
Cognitive Psychology 23, 457-82
31
Questions?
Tom Reamy
tomr@kapsgroup.com
KAPS Group
Knowledge Architecture Professional Services
http://www.kapsgroup.com
Download