The Knowledge Acquisition Bottleneck Revisited: How can we build large KBs? Illustrations of different approaches Peter Clark and John Thompson Boeing Research 2004 Premise • Intelligent machines needs lots of knowledge, for – question-answering – intelligent search – information integration – natural language understanding – decision support – modeling – etc. etc. • Much of this knowledge can be drawn from some general repository of reusable knowledge – e.g., WordNet • How does one build such a repository? “No-one considers hand-building a large KB to be a realistic proposition these days” [paraphrase of Daphne Koller, 2004] 1. Build it by Hand • “Let’s roll up our sleeves and get on with it!” • But: It’s a daunting task – Our own work • Cyc + Lots in it, (Relatively) well designed ontology - 650 person-years effort so far - Still patchy coverage (why?) - Difficult to use outside Cycorp 1. Build it by Hand (cont) - WordNet + Easy to use + Comprehensive - Little inferencesupporting knowledge in - Ad hoc ontology 1. Build it by Hand (cont) • The Component Library Claim: can bound the required knowledge by working at a coarsegrained level + Large, more doable - Hard to use, still very incomplete 2. Extract from Dictionaries - MindNet + Automatically built - Unusable? - Extended WordNet + Won TREC competition - Still somewhat incoherent - Lot of manual labor 3. Corpus-based Text/Web Mining - Schubert’s system + Automatic + Lots of knowledge - Noisy - No word senses - Only grabs certain kinds of knowledge 30M entries… 3. Corpus-based Text/Web Mining (cont) - KnowIt (Etsioni) + automatic - only factoids 4. Community-Based Acquisition • Knowledge entry by the masses • OpenMind + Large - Full of junk, unusable (?) - Would this work with better acquisition tools? (see next slide for illustration) 5. Use Existing Resources • e.g., – databases – CIA World Fact Book – Web data/services • e.g., SRI/ISI’s ARDA QA system + Syntactically simple + Available - Largely limited to factoids - Information integration is a major challenge - different ontologies, contradictory data Where to? • Can we bound the knowledge needed – for a particular application – for a useful, sharable, general resource? • Which of these approaches seems most realistic? – build by hand – extract from dictionaries – mine text corpora – community knowledge entry – use existing resources