From: AAAI Technical Report SS-02-06. Compilation copyright © 2002, AAAI (www.aaai.org). All rights reserved. Peer-Data Management Systems: Plumbing for the Semantic Web Alon Y. Halevy University of Washington alon @cs.washington.edu The vision of the SemanticWeb[Berners-Leeet al.2001 ] offers exciting challengesfor research in Artificial Intelligence. In a nutshell, the SemanticWebenvisions agents coordinating complextasks over the Intemet. The coordination is facilitated by annotating the data and services on a web destination with rich ontologies, henceenablingsemanticinteroperation. In this talk I will discusssomeof the challengesraised by the SemanticWeband highlight somedangerouspitfalls. I will illustrate someof the challengesusing the Piazza PeerData Management System[Gribble et a/.2001] (PDMS)being developedat the University of Washington. At a highlevel, the Piazzaproject seeks to facilitate widescale data sharing amongcooperating communities.By data sharing, we meanthat participants can export and share schemas,base data, and data views constructed through integration over multiple distributed data sources. Topologically, we imagine the cooperating communitiesto be peeroriented, self-organizingstructures. That is, weassumethat (!) each participant is both a client and a server, consuming and producingcomponentsof a distributed information database, and that (2) participants join and leave the communities at will. The goal of our research is to design, implement, and deploythe infrastructure on whichsuch datasharing communitiescan form and grow. The Piazza vision is the natural next beyondinformation integration systems. Suchsystems permit queries over distributed heterogeneousdata sources, but they (1) assumethat a set of data sourcesexist andare definedapriori, (2) present a uniforminterface to the data via a mediator,and (3) employ a single logical mediatedschemaat the mediatorto specify the semantic mappingsbetweenthe mediated schemaand the schemaof the data stored in the sources. In contrast, in Piazza all peers are equal, so a single logical mediatedschema is undesirable, impractical, and perhapsevenimpossibleto build. Someof the challenges we face in Piazza include: (I) managingschemaand data heterogeneity, (2) intelligent placementof data on peers in the systemto improveperformanceand availability, (3) efficient propagationof updates the data, and (4) storage andindexingof descriptionsof peer contents. Weargue that a PDMS is necessaryinfrastructure Copyright(~) 2002,American Associationfor Artificial Intelligence(www.aaai.org). All rights reserved. 26 for building a robust SemanticWeb. A crucial elementof a PDMS is the ability to mapbetween different modelsof the sameor related domains.A mapping is set of formulaethat providethe semanticrelationships betweenthe conceptsin the models. It is rare that a global ontology or schemacan be developedfor such a PDMS.In practice, multiple ontologies and schemaswill be developed by independententities, and coordination will require mapping betweenthe different models. There will always be morethan one representation of any domainof discourse. Hence,weargue that if knowledge and data are to be shared, the problemof mappingbetweenmodelsis as fundamental as modelingitself. Currently, whenevermappingsare needed (e.g., in information integration systems) they are provided manually in a labor-intensive and error-prone process, which is a major bottleneck to scaling up systems to a large number of sources. I will describe some recent work on using Machine Learning techniques for semiautomatically constructing mappingsbetweendomainmodels. Thesetechniqueshavebeenappliedinitially to relational schema [Doan, Domingos,& Halevy2001]and recently to mappingbetweentaxonomies[Doanet al.2002]. References [Berners-Leeet aL2001]Berners-Lee,T.; Hendler,J.; Lassila, O.; and Web,S. 2001. The semanticweb.Scientific American. [Doanetal.2002] Doan, A.; Madhavan,J.; Domingos,P.; and Halevy, A. 2002. Learningto mapbetweenontologies on the semantic web.In Proc. of the Int. WWW Conf. [Doan, Domingos,&Halevy2001]Doan, A.; Domingos,P.; and Halevy, A. 2001. Reconciling schemasof disparate data sources: a machinelearning approach. In Proc. of SIGMOD. [Gribble etal.2001] Gribble, S.; Halevy,A.; Ires, Z.; Rodrig, M.; and Suciu, D. 2001. Whatcan databases do for peer-to-peer? In ACMSIGMOD WebDBWorkshop2001.