semantics

advertisement
Semantic Wikis:
Fusing Two Strands of the Semantic Web
Dr. Mark Greaves
Vulcan Inc.
markg@vulcan.com
© 2008 Vulcan Inc.
Talk Outline

The Argument for Semantic Wikis
– Two Strands of the Semantic Web
– Semantic Wikis: Bridging the Gap
– Lessons from the Design of SMW+

Semantic Wiki Experience with
Vulcan’s Project Halo
– Question Answering in Science
– Wikis for Question Answering
2
Semantic MediaWiki+
Talk Outline

The Argument for Semantic Wikis
– Two Strands of the Semantic Web
– Semantic Wikis: Bridging the Gap
– Lessons from the Design of SMW+

Semantic Wiki Experience with
Vulcan’s Project Halo
– Question Answering in Science
– Wikis for Question Answering
3
Semantic MediaWiki+
Strand 1: The Semantic Strand of the Semantic Web

Semantic Web as RDBMS Integration Technology
–
–
–
–

Semantic representation of schema relations
Centralized workflows for ontology/data definition and management
Powerful reasoning and inference
Enterprise-oriented
Rooted in the original software/tools of the Semantic Web
– Initial triplestores and authoring systems were (mostly) stand-alone or within the
confines of a controlled data set
– Early DARPA use cases were oriented around data integration
• EII-style applications: BBN’s Foreign Clearance Guide for AMC
• More XML-oriented than Web-oriented

The Primary Commercial use of Semantic Web for many years
– Examples: Siderean Seamark, Oracle RDF
– Still the most well-understood
use cases for the semantic web
– Still extremely important commercially
4
Strand 2: The Web Strand of the Semantic Web

Semantic Web as a web-scale knowledge publishing technology
–
–
–
–
–

Uncontrolled data dynamics, imperfect and voluminous data
Anyone can publish with limited/no knowledge engineer involvement
A massive base of socially-curated semantic data
Balance between quantity and purity (issue with owl:sameAs links)
Semantic data doesn’t have to be associated with HTML web text
Rooted in the original vision of the Semantic Web
– Took several years to start to be realized
– Difficulty conceiving of massive numbers of overlapping ontologies and class
hierarchies, and uncoordinated data publishing
– Hard problem is maintaining a set of informal, evolving, and partial agreements
on vocabularies and ontologies

An exciting and emerging data set
– Examples: Yahoo!, Sindice, Linking Open Data
– Fairly poorly understood use cases
(especially commercially)
– Web-oriented and web-scale is extremely attractive
5
What do Strand 2 Semantic Web Applications Do?

Strand 1 semantic web applications have enterprise use cases
– EII, E-science, Enterprise content management...
– Success of use cases requires unified data models, familiar to DB thinking

Strand 2 semantic web applications address a brand new use
case type
– “Semantic Web should allow people to have a better online
experience” – Alex Iskold, CEO of AdaptiveBlue
– Enhance the human activities of content creation, publishing, linking my data to
other data, forming community, purchasing satisfying things, browsing, etc.
– Strongly linked to Web 2.0 business models (such as they are)
• Improve the effectiveness/targeting of advertising
• Knowledge management tools for communities

Strand 2 use cases still require Strand 1-style data consistency
and vocabulary agreement
Can Strand 2 Semantic Web Applications Overcome the
Data Chaos of the Emerging Semantic Web?
6
Semantic Wikis are in both Strands

Wikis are tools for Publication and Consensus

MediaWiki (software for Wikipedia, Wikimedia, Wikibooks, etc.)
– Most successful Wiki software
• High performance: 10K pages/sec served, scalability demonstrated
• LAMP web server architecture, GPL license
– Publication: simple distributed authoring model
• Wikipedia: >2.5M English articles, >250M edits, >2.5M images, #8 Alexa traffic rank in August
– Consensus achieved by global editing and rollback
• Fixpoint hypothesis, although consensus is not static
• Gardener/admin role for contentious cases

Semantic Wikis apply the wiki idea to structured (typically RDFS) information
–
–
–
–
Authoring includes instances, data types, vocabularies, classes
Natural language text used for explanations
Automatic list generation from structured data, basic analytics, database imports
See e.g., http://wiki.ontoprise.com for one powerful semantic wiki
Semantic Wiki Hypotheses:
(1) Significant interesting non-RDBMS Semantic Data can be collected cheaply
(2) Wiki mechanisms can be used to maintain consensus on vocabularies and classes
7
Example: Semantic MediaWiki with Halo Extensions (SMW+)
Semantic MediaWiki+

Knowledge Authoring Capabilities
– Syntax highlighting when editing a page
– Semantic toolbar in edit mode
• Displays annotations present on the page that is edited
• Allows changing annotation values without locating the annotation in the wiki text
– Autocompletion for all instances, properties, categories and templates
– Increased expressivity through n-ary relations (available with the SMW 1.0 release)
8
Example: Semantic MediaWiki with Halo Extensions (SMW+)
Semantic MediaWiki+

Semantic Navigation Capabilities
– GUI-based ontology browser, enables browsing of the wiki's taxonomy and lookup of
instance and property information
– Linklist in edit mode, enables quick access of pages that are within the context of the
page being currently edited
– Search input field with autocompletion, to prevent typing errors and give a fast
overview of relevant content
9
Example: Semantic MediaWiki with Halo Extensions (SMW+)
Semantic MediaWiki+

Knowledge Retrieval Capabilities
– Combined text-based and semantic search
– Basic reasoning in queries with sub-/super-category/-property reasoning and
resolution of redirects (equality reasoning)
– GUI-based query formulation interface
Web service integration and import/export support for popular formats
Rule system developed for OWL-DLP and most of OWL-R
 Fully open source under GPL, supported by Ontoprise


10
Cool Idea... But Does it Work?

User tests were performed in Chemistry
– 20 graduate students were each paid for 20
hours (over 1 month) to collaborate on
semantic annotation for chemistry
– ~700 Wikipedia base articles
– US high-school AP exams were provided
as content guidance

Gardening Statistics for Test Wiki
Initial Results (SMW+ 1.0)
– Sparse: 1164 pages (entites), avg 5
assertions per entity
• 226 Relations (1123 relation-statements)
and 281 attributes (4721 attribute-statements)
– Many bizarre attributes and relations
– Very difficult to use with a reasoner

User testing and quality results for (SMW+ 1.1) extensions
– Initial SUS scoring (6 SMEs, AP science task) went from 43 to 61; final scores in the 70s
– 3 sessions using the Intrinsic Motivation Inventory (interest/value/usefulness); up 14%
– Aided by the consistency bot, users corrected 2072 errors (80% of those found) over 3
months
11

We have continued to build on this framework
Some Lessons Learned from SMW+ (and Freebase)

User Interface design matters
– This is core to MediaWiki’s success
– Formal usability testing with SMEs matters a lot
– Zero-training matters a lot

Gardening matters
– Users need support for debugging
– Gardeners can do large scale ontology editing
– Supports “Schema Last” data engineering

User-created ontologies are not always well-designed
– Flatter than normal
– Cheaper than normal

Natural language is necessary to augment bare RDF(S) semantics
– Supplemental semantics can be usefully carried in natural language
12
From Strand 2 Web to Strand 1 Semantics

Well-designed semantic wikis make possible certain
Strand 2 applications
– They enable local consensus-building on socially-published data
– They allow Strand 2 knowledge publication to go beyond search

Strand 1 semantic data can certainly support Strand 2
applications
– Example: use of other triplestore data in SMW+

How can you use Strand 2-collected data to support
Strand 1 applications?
– Corporate uses of socially-curated data (Metaweb)
– Project Halo: Scientific question-answering
13
Talk Outline

The Argument for Semantic Wikis
– Two Strands of the Semantic Web
– Semantic Wikis: Bridging the Gap
– Lessons from the Design of SMW+

Semantic Wiki Experience with
Vulcan’s Project Halo
– Question Answering in Science
– Wikis for Question Answering
14
Semantic MediaWiki+
Envisioning the Digital Aristotle for Scientific Knowledge

Inspired by Dickson’s Final Encyclopedia, the
HAL-9000, and the broad SF vision of computing
– The “Big AI” Vision of computers that work with people

The volume of scientific knowledge has outpaced
our ability to manage it
– This volume is too great for researchers in a given
domain to keep abreast of all the developments
– Research results may have cross-domain implications
that are not apparent due to terminology and knowledge
volume

“Shallow” information retrieval and keyword
indexing systems are not well suited to scientific
knowledge management because they cannot
reason about the subject matter
– Example: “What are the reaction products if metallic
copper is heated strongly with concentrated sulfuric
acid?” (Answer: Cu2+, SO2(g), and H2O)

15
Response to a query should supply the answer
(possibly coupled with conceptual navigation)
rather than simply list 1000s of possibly relevant
documents
The Halo Project in One Slide

Project Halo: SME-based Authoring for scientific questionanswering systems

Project Halo Goal: To determine whether tools can be built to
facilitate robust knowledge formulation, query and evaluation by
domain experts, with ever-decreasing reliance on knowledge
engineers
– Can SMEs build robust question-answering systems that demonstrate
excellent coverage of a given syllabus, the ability to answer novel
questions, and produce readable domain appropriate justifications using
reasonable computational resources?
– Will SMEs be capable of posing questions and complex
problems to these systems?
– Do these systems address key failure, scalability and
cost issues encountered in expert systems?

Experimental Scope: Selected portions of the AP syllabi for
chemistry, biology and physics
– Example: Balance the following reactions, and indicate whether they are
examples of combustion, decomposition, or combination
(a) C4H10 + O2  CO2 + H2O
(b) KClO3  KCl + O2
(c) CH3CH2OH + O2  CO2 + H2O
(d) P4 + O2  P2O5
(e) N2O5 + H2O  HNO3
16
AURA – Automated User-centered Reasoning and Acquisition System



17
Aura is a tool to help users formalize AP-level scientific knowledge
Aura can then reason with that knowledge
So users can ask questions and understand the answers
2006 Experimental Results for the Aura System
SME Group
 Science grad
student KBs
 Extensive
natural lang
 ~$100 per
syllabus page
Pilot Group
Halo Pilot
System
Percent
correct
Cycorp
37%
40%
SRI
44%
21%
Ontoprise
47%
Percentage correct
Number of
questions
SME1
SME2
Avg
KE
Bio
146
52%
24%
38%
51%
Chem
86
42%
33%
37.5%
Phy
131
16%
22%
19%
Domain
Knowledge Formulation
Time for KF
– Concept: ~20 mins for all SMEs
– Equation: ~70 s (Chem) to ~120
sec (Physics)
– Table: ~10 mins (Chem)
– Reaction: ~3.5 mins (Chem)
– Constraint: 14s Bio; 88s (Chem)
SME need for help
– 68 requests over 480 person
hours (33%/55%/12%) = 1/day
VS.
Question Formulation
Avg time for SME to formulate a
question
– 2.5 min (Bio)
– 4 min (Chem)
– 6 min (Physics)
– Avg 6 reformulation attempts
Usability
– SMEs requested no significant help
– Pipelined errors dominated failure
analysis
 Professional
KE KBs
 No natural
language
 ~$10K per
syllabus page
System Responsiveness
Biology: 90% answer < 10 sec
Chem: 60% answer < 10 sec
Physics: 45% answer < 10 sec
Interpretation
(Median/Max)
Answer
(Median/Max)
Bio
3s / 601s
1s / 569s
Chem
7s / 493s
7s / 485s
Phy
34s / 429s
14s / 252s
How Can We Increase the Efficiency of SME Authoring?
18
Symbiosis Between Aura and SMW+

Classical Knowledge Engineering

– Expressive knowledge representation
– Sophisticated testing and debugging

Knowledge Engineering in Aura
– Acquires knowledge for deductive Q/A that
can be used for answering AP questions in
sciences
• Uses a DL style class taxonomy, and logic
programming style rules with many
extensions
– Requires 40 hours of training for
knowledge formulation

Semantic Web Knowledge Engineering
– Simple knowledge representation
– Quantity at some expense of quality

Knowledge Engineering in SMW+
– Tool for online authoring and consensusbuilding around semantic web content
– Captures knowledge at the level of RDFS
– Collective editing for quality control
– Gardening appropriate for scientific
knowledge
– Almost walk up and use system
Can we use the Semantic Media Wiki to capture knowledge that could be used for Q/A
in AURA?
– Factual knowledge (e.g., atomic number for carbon is 6, solubility constraints, etc.)
– Taxonomic knowledge (e.g., eukaryotic and prokaryotic are two types of cells)

19
Knowledge creation would be faster, distributed, and cheaper
Example: Wikipedia Article on Organelle
20
Source Text of Article on Organelle in SMW+
21
Fact Box Summarizing the Annotations in SMW+
22
Ontology Browser for Test Biology Data in SMW+
23
Aura/SMW+ Use Case

Semantic Wiki includes relevant knowledge
Aura knowledge formulation engineer searches for
knowledge during knowledge formulation
 The KFE notices useful information in SMW+
 The KFE maps the knowledge into Aura

– Currently uses a derivative of Ontomap
– Experimenting with FOAM support
– ETL workflow

24
The knowledge is translated into Aura and available for
querying
AURA User Searches for Information
25
Aura User Notices Useful Information in Wiki
26
Aura User Maps Wiki Knowledge into Aura KB
27
Wiki Knowledge Available in Aura for Question-Answering
28
Conclusions

Two strands of semantic web applications
– Strand 1: Structured, enterprise-quality semantic data
• Designed for powerful analytics and easier data fusion
– Strand 2: Lightweight web-scale semantic publishing
• A revolution in AI if we can keep the quality up

Semantic Wikis have features from both strands
– Easy to see how semantic wikis can leverage Strand 1 data for Strand 2 support
– Harder to see how semantic wikis can leverage Strand 2 data for Strand 1
support

Vulcan’s Project Halo
– Use of SMW+ to use web-collected data in a question-answering application
– Addresses very hard AI problems in scaling up knowledge authoring
– Full evaluation of SMW+ and Aura in early 2009
• Is mapping easier than authoring?
29
Thank You
30
Disclaimer: The preceding slides represent the views of the author only.
All brands, logos and products are trademarks or registered trademarks of their respective companies.
Download