Peer-Data Management Systems: Plumbing for ... Alon Y. Halevy

From: AAAI Technical Report SS-02-06. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved.
Peer-Data
Management Systems:
Plumbing for the Semantic Web
Alon Y. Halevy
University of Washington
alon @cs.washington.edu
The vision of the SemanticWeb[Berners-Leeet al.2001 ]
offers exciting challengesfor research in Artificial Intelligence. In a nutshell, the SemanticWebenvisions agents coordinating complextasks over the Intemet. The coordination
is facilitated by annotating the data and services on a web
destination with rich ontologies, henceenablingsemanticinteroperation.
In this talk I will discusssomeof the challengesraised by
the SemanticWeband highlight somedangerouspitfalls. I
will illustrate someof the challengesusing the Piazza PeerData Management
System[Gribble et a/.2001] (PDMS)being developedat the University of Washington.
At a highlevel, the Piazzaproject seeks to facilitate widescale data sharing amongcooperating communities.By data
sharing, we meanthat participants can export and share
schemas,base data, and data views constructed through integration over multiple distributed data sources. Topologically, we imagine the cooperating communitiesto be peeroriented, self-organizingstructures. That is, weassumethat
(!) each participant is both a client and a server, consuming and producingcomponentsof a distributed information
database, and that (2) participants join and leave the communities at will. The goal of our research is to design, implement, and deploythe infrastructure on whichsuch datasharing communitiescan form and grow.
The Piazza vision is the natural next beyondinformation
integration systems. Suchsystems permit queries over distributed heterogeneousdata sources, but they (1) assumethat
a set of data sourcesexist andare definedapriori, (2) present
a uniforminterface to the data via a mediator,and (3) employ
a single logical mediatedschemaat the mediatorto specify
the semantic mappingsbetweenthe mediated schemaand the
schemaof the data stored in the sources. In contrast, in Piazza all peers are equal, so a single logical mediatedschema
is undesirable, impractical, and perhapsevenimpossibleto
build.
Someof the challenges we face in Piazza include: (I)
managingschemaand data heterogeneity, (2) intelligent
placementof data on peers in the systemto improveperformanceand availability, (3) efficient propagationof updates
the data, and (4) storage andindexingof descriptionsof peer
contents. Weargue that a PDMS
is necessaryinfrastructure
Copyright(~) 2002,American
Associationfor Artificial Intelligence(www.aaai.org).
All rights reserved.
26
for building a robust SemanticWeb.
A crucial elementof a PDMS
is the ability to mapbetween
different modelsof the sameor related domains.A mapping
is set of formulaethat providethe semanticrelationships betweenthe conceptsin the models. It is rare that a global
ontology or schemacan be developedfor such a PDMS.In
practice, multiple ontologies and schemaswill be developed
by independententities, and coordination will require mapping betweenthe different models. There will always be
morethan one representation of any domainof discourse.
Hence,weargue that if knowledge
and data are to be shared,
the problemof mappingbetweenmodelsis as fundamental
as modelingitself.
Currently, whenevermappingsare needed (e.g., in information integration systems) they are provided manually in a labor-intensive and error-prone process, which
is a major bottleneck to scaling up systems to a
large number of sources. I will describe some recent
work on using Machine Learning techniques for semiautomatically constructing mappingsbetweendomainmodels. Thesetechniqueshavebeenappliedinitially to relational
schema [Doan, Domingos,& Halevy2001]and recently to
mappingbetweentaxonomies[Doanet al.2002].
References
[Berners-Leeet aL2001]Berners-Lee,T.; Hendler,J.; Lassila, O.; and Web,S. 2001. The semanticweb.Scientific
American.
[Doanetal.2002] Doan, A.; Madhavan,J.; Domingos,P.;
and Halevy, A. 2002. Learningto mapbetweenontologies
on the semantic web.In Proc. of the Int. WWW
Conf.
[Doan, Domingos,&Halevy2001]Doan, A.; Domingos,P.;
and Halevy, A. 2001. Reconciling schemasof disparate
data sources: a machinelearning approach. In Proc. of
SIGMOD.
[Gribble etal.2001] Gribble, S.; Halevy,A.; Ires, Z.; Rodrig, M.; and Suciu, D. 2001. Whatcan databases do for
peer-to-peer? In ACMSIGMOD
WebDBWorkshop2001.