New Directions In Semantic Interoperability

advertisement
+ New Directions In Semantic Interoperability
Lucian Russell, PhD
Expert Reasoning & Decisions LLC
SICoP Special Conference 2
Building Knowledgebases for CrossDomain Semantic Interoperability
April 25th, 2007
Page 1
+ While you were not watching …
• In 2000 the Intelligence Community set in motion the Advanced Research
and Development Activity (ARDA).
• Some programs had no restrictions on the ability of their researchers to
publish.
• A multi-discipline activity was started in Information Exploitation.
• One major program was the AQUAINT program, established by Dr. John
Prange.
– This program, Advanced Question Answering for Intelligence, was bold, and its
goals of advancing State of the Art seemed extremely ambitious.
– Fortunately that was not the case – the program is now in its Third Phase.
– It remains unclassified, just not widely known.
• Another major program was NIMD. It looked at a number of issues
including reasoning. It’s findings are generally labeled For Official Use Only,
but fortunately one is no longer is: IKRIS.
– The Interoperable Knowledge Representation for Intelligence Support is a new
extension of logic, incorporating OWL, ISO Common Logic and other features.
– It is the new features that enable a breakthrough in Semantics.
Page 2
+ Why You Should Care
• Def: Semantic Interoperability is a state of an information system artifact.
When an Artifact A is semantically interoperable then a service which
whishes to discover the meaning of data associated with the artifact can do
so precisely.
• We do not have semantic interoperability today.
– XML is a message format
– UDDI and WSDL are means for pre-agreed data descriptions to be
communicated.
– OWL is a formalism to describe IS-A relationships which includes Functions
• Semantic Interoperability requires
– Computers that understand human language
– Schema descriptions that are precise.
• Prior to Info-X we had neither
• On April 19th 2006 it became possible to develop Semantic Interoperability
• WARNING: Computers cannot detect lies and miscommunication and
cannot compensate for incorrect or intentionally ambiguous language.
Page 3
+ Barriers to Interoperability: Texts and Schemas
• Ideally an English language document describing an Artifact should suffice
to describe it for SI purposes.
– Databases could be defined clearly as to the nature and purposes of their data
elements.
– Text documents could be read by the computer and described by summaries as
well as key concepts extracted.
• Barriers:
– Human language is ambiguous
Google gets around it by using the “MySpace” model for Web Pages, a social engineering
construct, plus paid placements.
 Lacking URLs and reference frequencies one is left with pre-culled word lists reduced to
stems whose frequency is used as a surrogate for significance.
 Well meaning attempts by non-linguists to create OWL Ontologies do not get at the real
problem of correctly specifying concepts.

– The Schema Mismatch problem
Database schemas use names for Entities and Attributes that are too abbreviated to be,
by themselves, of use by computers, and even by computer professionals. Data
Dictionaries, though helpful are rarely implemented.
 There are syntax mis-matches (SSN) and an Entity can be an Attribute can be a Value

Page 4
+ This has an impact on the DRM
• Version 2.0 had three sections: Data Description, Data Context and Data
Sharing. We can now see going from three to two, Data Sharing and
Intelligent Awareness.
• The key is unifying descriptions in language and logic.
– The English Language is now far better understood as a formal construct; it can
be used precisely to augment Data Modeling for fixed field databases.
– A Logic formalism is now available that unified First Order Logic (and Descriptive
Logic), some second order logic predicates and non-monotonic reasoning.
• The new results allow for the first time the chance to make progress on the
Schema Mismatch problem which has stymied Data Sharing of fixed field
databases since 1991.
• It is now cost-effective to build a set of data artifacts describing databases
because the new tools can process them and enable a new set of Data
Sharing Services.
• A starting point is the interwoven set of data descriptions and keywords in
the Global Change Master Directory, a multi-agency index to 18 Peratbytes
of data maintained at NASA.
•
Page 5
+
The Data Reference Model 3.0, Web 3.0 & SOAs
dynamic
Language
static
Data Resource Awareness Agent
Logic
Data & Information & Knowledge Repository
Figure 3-1 DRM standardization Areas
Page 6
+ The First Building Block: WordNet
• WordNet is found at (http://wordnet.princeton.edu/)
• WordNet disambiguates the English language by listing all the senses of
the most common words in English.
– Synset: a set of words that can be considered synonyms; each has a number
With nouns these are generally replaceable
 With verbs the situation is not so precise: there may be a shade of difference
 All entries for a word have an associated phrase – a gloss – where it is used.

– Four parts of speech are used: nouns, verbs, adjectives and adverbs
• WordNet started in 1990 – but has EVOLVED
–
–
–
–
–
Although the project remains the same the content of the system is very different.
When the book on WordNet was published 10 years ago there was WordNet 1.6
The WordNet system and database is on Release 3.0 (free download)
All glosses consist of words that are marked up with their synset numbers.
Category words are distinguished from instances, e.g. “Atlantic” as a noun is an
instance – the Atlantic Ocean.
– WordNet might be more aptly named Wordnets
Page 7
+ In what ways are words networked?
• There are at least 10 in WordNet 3.0
– Nouns
Hypernyms are more general terms and hyponyms the more specific ones
 Holonyms are higher level parts and meronyms are lower level parts
 Telic relationships: “A chicken is a bird” vs. “A chicken is a food” – short for “A chicken is
used as a food”. The latter is atelic relationship.
 Synonyms
 Antonyms

– Verbs
Hypernyms exist but there are 4 types of “hyponyms, different aspects (in time) of
entailment.
 Holonyms are higher level parts and meronyms are lower level parts of a process

• Coming Soon: others, e.g. noun to verb forms
• HOWEVER:
– A NOUN IS NOT A VERB
– A VERB IS NOT A NOUN
• Nouns have inheritance properties with hypernyms that differ from the
hypernyms of verbs. Do not put a verb in an OWL-DL Ontology
Page 8
+ Document Reading and AQUAINT
• Given the new WordNet and the results from AQUAINT-funded projects
documents can now be read (i.e. the content understood).
– To answer a question posed by a user a system must be able to
Understand the question
 Determine if it entails a number of sub-questions it must determine those
 Each document must be read to find if it has the answers
 The answer from each one of them must be evaluated
 The results must be combined
 The reasoning about the answer provided to the user

– Obviously WordNet and sources like FrameNet, Verb-net and other reference
collections must be examined and their results combined.
• The key to understanding the content is two-fold
– WordNet to distinguish word meanings of the same text string
– A markup that correctly describes relationships in TIME, because
– AQUAINT has found that there are dozens of logical forms of sentences and
distinguishing them means understanding real world temporal relationships.
– AQUAINT has funded a project that has provided a new more powerful markup
language for time.
Page 9
+ What this means to SI
• We now have a means of creating a text document that is precise
– All the word meanings are disambiguated
– All the time relationships are correctly stated
• For previously generated text documents a correct identification of concepts
is much more likely.
• For previously generated Relational or Object databases it is now worth the
effort to describe the data precisely
– Attributes full relationship to the Entities can be described.
– Inter-relationships among heretofore ambiguous dates can be clearly stated.
• Language Computer Corporation has a suite of tools that are State of the
Art in realizing this capability, and it works across languages.
• In other words: the schema mismatch problem can be addressed directly –
the semantics of such databases case be precisely specified and one can
reason about the different forms of the data.
• WHY? Because of IKRIS.
Page 10
+ What is IKRIS and why is it a Breakthrough?
• IKRIS as a project has created the IKL, the IKRIS Knowledge Language
• Using IKL one can specify
– Any construct in any version of OWL; According to DR,. Chris Welty, IKRIS CoPI, OWL is First Order Predicate Calculus without Variables
– Any construct in First Order Logic
– Certain expressions in Second Order Logic
– CONTEXT assumptions, that allows expressions in Non-Monotonic Logic
• How has this been used? One way is to show the interoperability among
different languages that specify processes. e.g.
– CYC-L, the language used by CYCORP for its massive Ontology
– PSL, the manufacturing process language developed at NIST
– SOA-S, the proposed Ontology for IT Services
• MORE IMPORTANT: Combine Context and Process specification to create
the Contrafactual Conditional!
Page 11
+ The What?
• The Contrafactual Conditional is a logical statement that is against fact. It is
used to specify scientific laws, e.g.
– “Glass is Brittle” “Were a pane of glass to be struck by a hammer it would
shatter”
– Question: “Is this pane of glass brittle? We cannot tell because it is intact!”
• Using the CONTEXT clause we make a logical assertion:
– “At Time T the pane of glass G is “” where P is the process “hit by a hammer H”.
– Conclusion: G is shattered into a set of shards {Si}
• Reasoning:
– Goal: Find a process Q which keeps the glass G intact.
– Conclusion: do not select process P as an instance of Q.
What’s Important: Any models of the Real-World
that can be described in a database can be
subjected to real –world reasoning by a computer
that has the relevant collection Real-World laws!
Page 12
+ So we can fully describe database schemas and …
• The Data Dictionary text can be implemented as texts that describe the
data in a relational database as a collection of information about static
states that are the result of real world processes.
– The processes that create the data can be described accurately.
– The processes that update the data can be described accurately.
– Therefore a Service can be created that reasons about whether the Real-world
processes that created database A are suited to the needs of he creators if
database B.
• Semantic Interoperability can now be realized.
• Further, there can be a real Service Oriented Architecture, not just
application programs repackaged as web-services.
• It also means that Ill-formed OWL Ontologies, ones that do not correspond
to linguistic principles, can be replaced by Knowledge Representations
where accurate OWL-DL structures are specified on the one hand and the
processes that use them ca be described separately using a more powerful
representation.
Page 13
+ Build knowledge bases - NOW
• There are two types of Knowledge bases that are needed for Semantic
Interoperability, linguistic and Real-World.
• CYCORP has the capabilities that are used by IKL and has them now.
– This means that there is no reason not to use CYC for building knwoledg bases
because its representation can be converted by IKL to any other suitably
powerful representation.
• It also means that whatever other tools are used the knowledge bases
created by then can be shared using IKL as a translator.
Page 14
+ In Summary
• Prior to 2006 Semantic Interoperability was stalled
– The principles of Computer Science to do the job right were not present
– As usual people did the best that they could with the tools at hand.
– Many low-level computer processes were incorrectly named as “Semantic” when
they were not. They were “gilded farthings”.
• On 2006 there were four new developments
–
–
–
–
IKRIS was specified
TimeML was developed by James Pustejovsky at Brandeis
WordNet 3.0 was completed
The AQUAINT Phase II projects to understand language were completed.
• What could only be wished for was now possible
Note: The Computer Science Breakthroughs
were paid for by your tax dollars!
There IS a role for government funding.
Page 15
Download