The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs?

advertisement
The Knowledge Acquisition
Bottleneck Revisited:
How can we build large KBs?
Illustrations of different approaches
Peter Clark and John Thompson
Boeing Research
2004
Premise
• Intelligent machines needs lots of knowledge, for
– question-answering
– intelligent search
– information integration
– natural language understanding
– decision support
– modeling
– etc. etc.
• Much of this knowledge can be drawn from some
general repository of reusable knowledge
– e.g., WordNet
• How does one build such a repository?
“No-one considers hand-building a large KB to be a realistic
proposition these days” [paraphrase of Daphne Koller, 2004]
1. Build it by Hand
• “Let’s roll up our sleeves and
get on with it!”
• But: It’s a daunting task
– Our own work
• Cyc
+ Lots in it, (Relatively)
well designed ontology
- 650 person-years effort so
far
- Still patchy coverage
(why?)
- Difficult to use outside
Cycorp
1. Build it by Hand (cont)
- WordNet
+ Easy to use
+ Comprehensive
- Little inferencesupporting knowledge
in
- Ad hoc ontology
1. Build it by Hand (cont)
• The Component Library
Claim: can bound the
required knowledge by
working at a coarsegrained level
+ Large, more doable
- Hard to use, still very
incomplete
2. Extract from Dictionaries
- MindNet
+ Automatically built
- Unusable?
- Extended WordNet
+ Won TREC
competition
- Still somewhat
incoherent
- Lot of manual
labor
3. Corpus-based Text/Web Mining
- Schubert’s system
+ Automatic
+ Lots of
knowledge
- Noisy
- No word senses
- Only grabs certain
kinds of
knowledge
30M entries…
3. Corpus-based Text/Web Mining (cont)
- KnowIt (Etsioni)
+ automatic
- only factoids
4. Community-Based Acquisition
• Knowledge entry by the masses
• OpenMind
+ Large
- Full of junk, unusable (?)
- Would this work with better acquisition tools?
(see next slide for illustration)
5. Use Existing Resources
• e.g.,
– databases
– CIA World Fact Book
– Web data/services
• e.g., SRI/ISI’s ARDA QA system
+ Syntactically simple
+ Available
- Largely limited to factoids
- Information integration is a major challenge
- different ontologies, contradictory data
Where to?
• Can we bound the knowledge needed
– for a particular application
– for a useful, sharable, general resource?
• Which of these approaches seems most realistic?
– build by hand
– extract from dictionaries
– mine text corpora
– community knowledge entry
– use existing resources
Download