Semantic WebMelbourn.. - Buffalo Ontology Site

advertisement
The Semantic Web
Barry Smith
http://ontologist.com
The problem of ontology
human beings can integrate highly
heterogeneous information
Consider how the human mind
copes with complex phenomena in the social
realm (e.g. speech acts of promising)
which involve:
–
–
–
–
–
–
–
–
experiences (speaking, perceiving),
intentions,
language,
action (and tendencies to action),
deontic powers, obligations, claims, authority …
background habits,
mental competences,
records and representations
understanding how computers can effect
the same sort of integration
is a difficult problem
A new silver bullet
The Semantic Web
designed to integrate the vast amounts of
heterogeneous online data and services
via dramatically better support at the level
of metadata designed to yield the ability to
query and integrate across different
conceptual systems
Tim Berners-Lee, inventor of the
internet
‘sees a more powerful Web emerging, one
where documents and data will be
annotated with special codes allowing
computers to search and analyze the Web
automatically. The codes … are designed
to add meaning to the global network in
ways that make sense to computers’
hyperlinked vocabularies, called
‘ontologies’ will be used by Web authors
‘to explicitly define their words and
concepts as they post their stuff online.
‘The idea is the codes would let software
"agents" analyze the Web on our behalf,
making smart inferences that go far
beyond the simple linguistic analyses
performed by today's search engines.’
Exploiting tools such as:
XML
OWL (Ontology Web Language)
RDF (Resource Descriptor Framework)
DAML-OIL (Darpa Agent Mark-Up
Language – Ontology Inference Layer)
(? confusing syntactic integration with
semantic integration)
University Ontology
Person*
AdministrativeStaff
Employee*
Director
Faculty
Chair
Professor
Dean
AssistantProfessor
ClericalStaff
AssociateProfessor
SystemsStaff
FullProfessor
Student
VisitingProfessor
Undergraduate
Lecturer
GraduateStudent
PostDoc
Organization*
Assistant
Department
ResearchAssistant
Institute
TeachingAssistant
Program
ResearchGroup
School
University
Publication*
Article*
BookArticle*
ConferencePaper*
JournalArticle*
WorkshopPaper*
Book*
Periodical*
Journal*
Magazine*
Proceedings*
Thesis*
University Ontology Relations
advisor(Student, Professor)
affiliateOf(Organization, Person)*
affiliatedOrganization(Organization, Organization)*
alumnus(Organization, Person)*
containedIn(Document, Document)*
doctoralDegreeFrom(Person, University)
emailAddress(Person, .STRING)*
head(Organization, Person)*
listedCourse(Schedule, Course)
mastersDegreeFrom(Person, University)
member(SocialGroup, Person)*
University Ontology Relations
offers(University, Course)
publicationAuthor(Document, Person)*
publicationDate(Document, .DATE)*
publicationOrg(Document, Organization)*
publicationResearch(Publication, Research)
publisher(Document, Organization)*
researchInterest(Person, Research)
researchProject(ResearchGroup, Research)
subOrganizationOf(Organization:"suborganization"
, Organization:"superorganization")*
takesCourse(Student, Course)
Defining ‘gene’
GDB: a gene is a DNA fragment that can be
transcribed and translated into a protein
Genbank: a gene is a DNA region of
biological interest with a name and that
carries a genetic trait or phenotype
Example: The Enterprise Ontology
A Sale is an agreement between two Legal-Entities
for the exchange of a Product for a Sale-Price.
A Strategy is a Plan to Achieve a high-level
Purpose.
A Market is all Sales and Potential Sales within a
scope of interest.
Example: Statements of Accounts
Company Financial statements may be
prepared under either the (US) GAAP or
the (European) IASC standards
These allocate cost items to different
categories depending on the laws of the
countries involved.
Job:
to develop an algorithm for the automatic
conversion of income statements and balance
sheets between the two systems.
Not even this relatively simple problem has been
satisfactorily resolved
… why not?
Because the very same terms mean different
things
and are applied in different ways
in different cultures
Verizon
The promise of Web Services, augmented with the
Semantic Web, is to provide THE major solution for
integration, the largest IT cost / sector, at $ 500 BN/year.
The Web Services and Semantic Web trends are
heading for a major failure (i.e., the most recent Silver
Bullet).
In reality, Web Services, as a technology, is in its infancy.
... There is no technical solution (i.e., no basis) other
than fantasy for the rest of the Web Services story.
Analyst claims of maturity and adoption (...) are already
false. ...
Verizon must understand it so as not to invest too heavily
in technologies that will fail or that will not produce a
reasonable ROI.
Dr. Michael L. Brodie, Chief Scientist, Verizon IT
OntoWeb Meeting, Innsbruck, December 16-18, 2002
Assumptions
Communication / compatibility problems
should be solved automatically
(by machine)
Hence ontologies must be applications
running in real time
Application ontology:
Ontologies are inside the computer
thus subject to severe constraints on
expressive power
(effectively the expressive power of
Description Logic)
The Semantic Web Initiative
The Web is a vast edifice of
heterogeneous data sources
Needs the ability to query and integrate
across different conceptual systems
How resolve incompatibilities?
enforce terminological compatibility via
standardized term hierarchies, with
standardized definitions of terms, which
1. satisfy the constraints of a description logic
(DL)
2. are applied as meta-tags to the content of
websites
Clay Shirky
The Semantic Web is a machine for creating
syllogisms.
Humans are mortal
Greeks are human
Therefore, Greeks are mortal
Lewis Carroll
- No interesting poems are unpopular among
people of real taste
- No modern poetry is free from affectation
- All your poems are on the subject of soapbubbles
- No affected poetry is popular among people of
real taste
- No ancient poetry is on the subject of soapbubbles
Therefore: All your poems are bad.
the promise of the Semantic Web
it will improve all the areas of your life where
you currently use syllogisms
most of the data we use is not
amenable to recombination in
syllogistic form
because it is partial, inconclusive, contextsensitive
So we guess, extrapolate, intuit, we do what
we did last time, we do what we think our
friends would do … but we almost never
use syllogistic logic.
We Describe the World in
Generalities
People who live in Brooklyn speak with a
Brooklyn accent
People who live in France speak French
Merging Databases
Merging databases simply becomes a matter of
recording in RDF somewhere that "Person
Name" in your database is equivalent to "Name"
in my database, and then throwing all of the
information together and getting a processor to
think about it. [http://infomesh.net/2001/swintro/]
Is your "Person Name = John Smith" the same
person as my "Name = John Q. Smith"? Who
knows? Not the Semantic Web
XML-syntax does not help
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
<LASTNAME>Deryck</LASTNAME>
<COMPANY>Newco</COMPANY>
<MEMBEROF>XTC Group</MEMBEROF>
<JOBTITLE>Business Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
<CITY>Aartselaar</CITY>
<COUNTRY>Belgium</COUNTRY>
</ADDRESS>
</BUSINESS-CARD>
and with correct XML-syntax:
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
<LASTNAME>Deryck</LASTNAME>
<COMPANY>Newco</COMPANY>
<MEMBEROF>XTC Group</MEMBEROF>
<JOBTITLE>Business
Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17
</STREET>
and with correct XML-syntax:
Is "Jules" the
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
first name of the
<LASTNAME>Deryck</LASTNAME>
person, or of the
<COMPANY>Newco</COMPANY>
<MEMBEROF>XTC Group</MEMBEROF> business-card?
<JOBTITLE>Business Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
<CITY>Aartselaar</CITY>
<COUNTRY>Belgium</COUNTRY>
</ADDRESS>
</BUSINESS-CARD>
and with correct XML-syntax:
Is Jules or
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
Newco the
<LASTNAME>Deryck</LASTNAME>
member of XTC
<COMPANY>Newco</COMPANY>
<MEMBEROF>XTC Group</MEMBEROF> Group?
<JOBTITLE>Business Manager</JOBTITLE>
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE>
<ADDRESS>
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
<CITY>Aartselaar</CITY>
<COUNTRY>Belgium</COUNTRY>
</ADDRESS>
</BUSINESS-CARD>
and with correct XML-syntax:
<BUSINESS-CARD>
<FIRSTNAME>Jules</FIRSTNAME>
<LASTNAME>Deryck</LASTNAME>
<COMPANY>Newco</COMPANY>
<MEMBEROF>XTC Group</MEMBEROF> Do the phone
<JOBTITLE>Business Manager</JOBTITLE>
numbers and
<TEL>+32(0)3.471.99.60</TEL>
<FAX>+32(0)3.891.99.65</FAX>
address belong
<GSM>+32(0)465.23.04.34</GSM>
<WEBSITE>www.newco.com</WEBSITE> to Jules or to the
<ADDRESS>
business?
<STREET>Dendersesteenweg 17</STREET>
<ZIP>2630</ZIP>
<CITY>Aartselaar</CITY>
<COUNTRY>Belgium</COUNTRY>
</ADDRESS>
</BUSINESS-CARD>
Metadata: the new Silver Bullet
agree on a metadata standard for
washing machines as concerns
size, price, etc.
create machine-readable
databases and put them on the net
 consumers can query multiple
sites simultaneously
and search for highly specific,
reliable, context-sensitive results
Shirkey:
The Semantic Web's philosophical
argument -- the world should make more
sense than it does -- is hard to argue with.
The Semantic Web, with its neat
ontologies and its syllogistic logic, is a nice
vision. However, like many visions that
project future benefits but ignore present
costs, it requires too much coordination
and too much energy to be effective in the
real world …
Shirkey
Much of the proposed value of the Semantic
Web is coming, but it is not coming
because of the Semantic Web. The
amount of meta-data we generate is
increasing dramatically, and it is being
exposed for consumption by machines as
well as, or instead of, people. But it is
being designed a bit at a time, out of selfinterest and without regard for global
ontology.
Semantic Web effort
thus far devoted primarily to developing
systems for standardized representation of
web pages and web processes
(= ontology of web typography)
not to the harder task of developing of
ontologies (term hierarchies) for the
content of such web pages
Cory Doctorow
A world of exhaustive, reliable
metadata would be a utopia.
Problem 1: People lie
Meta-utopia is a world of reliable
metadata.
But poisoning the well can confer benefits
to the poisoners
Metadata exists in a competitive world.
Some people are crooks.
Some people are cranks.
Some people are French philosophers.
Practical problems
of the semantic web:
who will police the coding?
Problem 2: People are lazy
Half the pages on Geocities are called
“Please title this page”
Problem 3: People are stupid
The vast majority of the Internet's users
(even those who are native speakers of
English)
cannot spell or punctuate
Will internet users learn to accurately tag
their information with whatever DLhierarchy they're supposed to be using?
Problem 4: Multiple descriptions
“Requiring everyone to use the
same vocabulary denudes the
cognitive landscape, enforces
homogeneity in ideas.”
(Cary Doctorow)
Problem 5: Ontology Impedance
= semantic mismatch between ontologies
being merged
This problem recognized in Semantic Web
literature:
http://ontoweb.aifb.uni-karlsruhe.de
/About/Deliverables/ontoweb-del-7.6-swws1.pdf
Solution 1:
treat it as (inevitable)
‘impedance’
and learn to find ways to cope with the
disturbance which it brings
Suggested here:
http://ontoweb.aifb.uni-karls-ruhe.de/About/Deliverables/ontoweb-del-7.6-swws1.pdf
Solution 2: resolve the impedance
problem on a case-by-case basis
Suppose two databases are put on the
web.
Someone notices that "where" in the
friends table and "zip" in the places table
mean the same thing.
http://www.w3.org/DesignIssues/Semantic.html
We can use the Semantic Web
to prove that Joe loves Mary
we found two documents on a trusted site, one of which
said that ":Joe :loves :MJS", and another of which said
that ":MJS daml:equivalentTo :Mary". We also got the
checksums of the files in person from the maintainer of
the site.
To check this information, we can list the checksums in a
local file, and then set up some FOPL rules that say "if
file 'a' contains the information Joe loves mary and has
the checksum md5:0qrhf8q3hfh, then record SuccessA",
"if file 'b' contains the information MJS is equivalent to
Mary, and has the checksum md5:0892t925h, then
record SuccessB", and "if SuccessA and SuccessB, then
Joe loves Mary". [http://infomesh.net/2001/swintro/]
Both solutions fail
1. treating mismatches as ‘impedance’
ignores the problem of error propagation
(and is inappropriate in an area like
medicine)
2. resolving impedance on a case-by-case
basis defeats the very purpose of the
Semantic Web
Clinicians
often do not use category systems at all –
they use unstructured text
from which usable data has to be extracted
in a further step
Why?
Because every case is different, much
patient data is context-dependent
Problem 5: Ontology Impedance
= semantic mismatch between ontologies
‘gene’ used in websites issued by
biotech companies involved in gene
patenting
medical researchers interested in role of
genes in predisposition to smoking
insurance companies
Other problems with DL-based
ontologies
DL poor when dealing with contextdependent information/usages of terms
e.g. Severe Acute Respiratory Syndrome
and when it comes to dealing with time
and when it comes to dealing with
information about instances (rather than
concepts or classes)
SARS
is NOT
Severe Acute Respiratory Syndrome
it is THIS collection of instances of
Severe Acute Respiratory Syndrome
associated with THIS coronavirus and ITS
mutations
Experience shows
that there can be no mechanical solution
to the problems of data integration
in domains like medicine or genetics,
or in the domain of really existing
commercial transactions
The problem in every case
is one of finding an overarching framework
for good definitions,
definitions which will be adequate to the
nuances of the domain under investigation
For DL
Ontologies are software tools
thus limited in their expressive power
and in their effectiveness as quality
controls
IFOMIS idea:
distinguish two separate tasks:
- developing computer applications capable
of running in real time
- developing an expressively rich ontology of
a sort which will allow sophisticated quality
control
Problem 4: Multiple descriptions
Requiring everyone to use the
same vocabulary to describe their
material is not always practicable
and this is especially so in the
medical domain
Basic Formal Ontology
BFO
The Vampire Slayer
BFO
ontology not the ‘standardization’ or
‘specification’ of concepts
(not a branch of knowledge or concept
engineering)
but an inventory of the types of entities
existing in reality
BFO goal:
to remove ontological
impedance by constraining
terminology systems with good
ontology
BFO not a computer application
but a reference ontology
in the sense of Aristotelian philosophy
-- it sacrifices tractability for the sake of
expressive power
Defining ‘gene’
GDB: a gene is a DNA fragment that can be
transcribed and translated into a protein
Genbank: a gene is a DNA region of
biological interest with a name and that
carries a genetic trait or phenotype
Ontology
‘fragment’, ‘region’, ‘name’, ‘carry’, ‘trait’,
‘type’
... ‘part’, ‘whole’, ‘function’, ‘inhere’,
‘substance’ …
are ontological terms in the sense of
traditional (philosophical) ontology
Two basic BFO oppositions
Granularity
(of molecules, genes, cells, organs,
organisms ...)
SNAP vs. SPAN
getting time right of crucial importance for
medical informatics
MedO: medical domain ontology
theory of granularity relations
between
– molecule ontology
– gene ontology
– cell ontology
– anatomical ontology
– etc.
Will serve as basis for new, validated Medical
WordNet
BFO
not just a system of categories
but a formal theory
with definitions, axioms, theorems
designed to provide formal resources for the
building of reference ontologies for specific
domains
the latter should be of sufficient richness that
terminological incompatibilities can be
resolved intelligently rather than by brute
force
The Reference Ontology
Community
IFOMIS (Leipzig)
Laboratories for Applied Ontology (Trento/Rome,
Turin)
Foundational Ontology Project (Leeds)
Ontology Works (Baltimore)
Ontek Corporation (Buffalo/Leeds)
Language and Computing (L&C)
(Belgium/Philadelphia)
Domains of Current Work
IFOMIS Leipzig: Medicine, Bioinformatics
Laboratories for Applied Ontology
Trento/Rome: Ontology of Cognition/Language
Turin: Law
Foundational Ontology Project: Space, Physics
Ontology Works: Genetics, Molecular Biology
Ontek Corporation: Biological Systematics
Language and Computing: Natural Language
Understanding
MOG (Melbourne Ontology Group)(?)
The End
Download