Recap @ Science on the Semantic Web, Rutgers, October 2002 Invitational Workshop on Database and Information Systems Research For Semantic Web and Enterprises Amit Sheth & Robert Meersman NSF Information & Data Management PI’s Workshop Amit Sheth & Isabel Cruz “Ask not what the Semantic Web Can do for you, ask what you can do for the Semantic Web” Hans-Georg Stork, European Union http://lsdis.cs.uga.edu/SemNSF Context for Amicalola workshop • Series of Workshops and upcoming conferences: Lisbon (9/00), Hong Kong (5/01), Palo Alto (7/01), Amsterdam (12/01); since then WWW2002/ISWC – Observation: visible lack of DB/IS involvement • “Semantic Web – The Road Ahead,” [Decker, Hans-Georg Stork, Sheth, … SemWeb’2001 at WWW10, Hongkong, May 1, 2001. ] • Semantic Web: Rehash or Research Goldmine [Fensel, Mylopoulous, Meersman, Sheth (Chair), CooPIS’01] • At Castel Pergine, Italy Semantics & IDM – Brief History (partial) • Semantic Data Modeling M. Hammer and D. McLeod: "The Semantic Data Model: A Modelling Machanism for Data Base Applications"; Proc.. ACM SIGMOD, 1978. • Conceptual Modeling Michael Brodie, John Mylopoulos, and Joachim W. Schmidt. On Conceptual Modeling. Springer Verlag, New York, NY, 1984. A series of preceding workshops. • Data Semantic: What, Where and How? - "Database Semantics", R.A. Meersman and T.B. Steel (eds), Proceedings of the IFIP DS-1 Conference, North-Holland (1985). - So Far (Schematically) yet So Near (Semantically) –Sheth, Keynote at DS-5 - Meersman, Navathe, Rosenthal, Sheth (Chair); IFIP DS-6 Panel • Semantic Interoperability on Web many projects in 90s – 1994 CIKM paper on Semantic Information Brokering talked about query processing in a multi-ontology environment • Domain Modeling, Metadata, Context, Ontologies, Semantic Interoperability, Semantics in Schema Integration, Semantic Information Brokering, Spatio-temporal-geographic- image-videomultimodal semantics • All these involving Semantics, Databases, IS and even Web – before “Semantic Web” term is coined Challenges – unique role of IDM SCALE and PERFORMANCE Acceptable computation (query/analysis) time when you have millions and billions of instances (documents, digital content) and metadata (annotation) • locking for sharing/storage management • Semantic similarity, mappings, interoperability (schema transformation/integration aka ontology mismatch) • indexing for expediting computations • workflow for Web Services-based processes Organization/Output • • • • 20+ senior researchers/practitioners 2.5 days in Georgia Mountains Proceedings of position papers (also talks) Three workgroups: Application Pull (Brodie/Dayal), Ontology (Decker/Kashyap) and Web Services (Fensel/Singh) • <SWIS WG at IDM PI’s meeting> • Review at OntoWeb3 Panel • Final Report • SIGMOD Record special issue December 2002 Every thing is at lsdis.cs.uga.edu/SemNSF/ Participants Karl Aberer, LSIR, EPFL, Switzerland Mike Brodie, Verizon Isabel Cruz, The University of Illinois at Chicago Umeshwar Dayal, Hewlett-Packard Labs Stefan Decker, Stanford University Max Egenhofer, University of Maine Dieter Fensel, Vrije Universiteit Amsterdam William Grosky,University of Michigan-Dearborn Michael Huhns, University of South Carolina Ramesh Jain, UC-San Diego, and Praja Yahiko Kambayashi, Kyoto University Vipul Kashyap, National Library of Medicine Ling Liu, Georgia Institute of Technology Frank Manola, The MITRE Corporation Robert Meersman, Vrije Universiteit Brussel (VUB) Amit Sheth, University of Georgia and Voquette Munindar Singh, North Carolina State University George Stork, EU Rudi Studer, AIFB Universität Karlsruhe Bhavani Thuraisingham, NSF-CISE-IIS Michael Uschold, The Boeing Company Medical metaphor • Ontologies: anatomy • Processes: physiology • Applications: pathology Application Pull …Agenda • Premises – Every resource meaningfully available – Current & Planned Web Services – Beneficiaries and Requirements • Potential Semantic Services – B2B, C2C, Intra-Enterprise – Example Semantic Web Services • Challenges / Questions / Concepts • What the Semantic Web Will Look Like Application Pull …Scenarios • Scenarios – Tax preparation (Individual) – Supply Chain (B2B) – Scientific Research • Semantics will be added at three different levels in successive phases – Information – Transactions – Collaborations Application Pull …Benefits / Requirements • Lowering barriers to entry – Costs – Entrants • Consumers • Service providers • Dynamic – Ability to adjust to rapidly changing circumstances • Continuous – Continuous activity (i.e., taxes, financial activity) monitoring – Event Detection – Do taxes anytime, anywhere • X-Internet – Executable – Extended • Improved – – – – – Transparency Timeliness Accuracy Optimization Eliminate tasks mundane • Additional services • Reliability and trust • Archiving – Data – Meta-data – Transaction histories Application Pull …Challenges • Upper ontologies – Entities • Personal • Organizations – Activities / Events – Processes • Ontologies – – – – – – – – – Products Services Financial contracts Business objects Tax laws (all agencies) Financial activities Service providers Financial planning Supply chain processes – Activities (to be monitored) • Ontology activities – – – – Search Select Create, refine Maintain, version • Local • Shared • Global – Mapping • Ontology-based activities – Accountability • Arbitration • Trust • Tracing • Engineering – Managing ontologies and mappings – Scalability, robustness, Ontology Search Compare/Similarity Requirements/ Analysis Ontology Learning Merge/ Refine/Assemble Evaluation Maintenance Versioning Creation/ Change Consistency Checking Deployment (e.g., Hypothesis Generation, Query) DB Research in the Ontology LifeCycle • Operations to compare Models/Ontologies • Scalability/Storage Indexing of Ontologies – DB approaches data model specific – Need to support graph based data models • Temporal Query Languages Lots of work in Schema Integration/translation Ontology WG: DB Research in the Ontology LifeCycle II • Schema Mapping – Meta Model specific – Representation of exceptions, e.g., tweety – Specification of Inexact Schema Correspondences • E.g., 40% of animals are 30% of humans • Meta Model Transformations/Mappings (e.g., UML to RDF Schema) Ontology WG: DB Research in the Ontology LifeCycle III • Ontology Versioning – Collaborative editing – Meta Model specific versioning – Version of Schema/Meta Model Transformations Ontology WG: DB Research & Semantic Interoperation • Inference v/s Query Rewriting/Processing for Semantic Integration: • E.g., RichPerson = (AND Person (> Salary 100)) • Can Query Processing/Concept Rewriting provide the same functionality as inferences ? More efficiently ? • Distributed Inferences and Loss of Information • Query Languages for combining metadata and data queries • Graph-based data models and query languages • Schema Correspondences/Mappings •Intensional Answers (Answers are descriptions, e.g. (AND Person (> Salary 100)) instead of a list of all rich people) • Semantic Associations (identification of meaningful relationships between different documents and entities) Semantic Index Semantic WS Scope Worth pursuing Std Program All Formally self-described currency.com Amazon html Self-described Hard code People Mike’s Humor • Services vs. Ontologies “Well done is better than well said.” Ben Franklin Research Issues • Environment • Representation • • • • Programming Interaction (system) Architecture Utilities • Scalable, openness, autonomy, heterogeneity, evolving • Self-description, conversation, contracts, commitments, QoS • Compose & customize, workflow, negotiation • Trust, security, compliance • P2P, privacy, • Discovery, binding, trustservice SWS – Fitting in and expanding IS/DB/DM: Or why Bhavani & George should care? Data => services, similar yet more challenging: – – – – – – Modeling <functional and operational> Organizing collections Discovery and comparison (reputation) Distribution and replication Access and fuse (composition) Fulfillment • Contracts, coordination versus transactions • Quality: more general than correctness or precision • Compliance – Dynamic, flexible information security and trust. Research Issues • Conversational (state-based, event-based, historybased) • Interoperability of conversational services – compose, translate, • Representations for services: programmatic selfdescription • Commitments, contracts, negotiation, compliance, cooperation • Discovery, location, binding • Transactional workflow: rollback, roll-forward, semantic exception handling, recovery • Trustworthy service (discovery, provisioning, composition, description) • Security; privacy vs. personalization • Quality-of-Service, w.r.t. various aspects, negotiable DB / IS subcommu nity How is it relevant to research on the SW How may the SW stimulate research in this community DB theory Type theory, Complexity, theory of concurrency Ontology axiomatics and theory; formal semantics; semantics for incomplete, inconsistent and evolving representations Data(base) semantics Everything; in particular ontology language development; constraints; data structures Ontology modeling; formal semantics of web services Normalization/ design Not specifically as such; some work on Non-First Normal Form Requirement for formal properties for ontology organization; perhaps ontology design guidelines or “semantic normal forms”; conflict resolution; redundancy checks in general Data modeling reuse/extend/map DM formalisms, techniques and methods e.g. EER, ORM, UML for ontology (content) specification and design semantic data modeling; ontology content creation techniques and methods; complex ontological relationships; domain models View integration Ontology alignment, translation, object identities, updateable views…; model mappings see Federated DBs; ontology support for view and application integration; ontology composition and update Schema integration apply to autonomously designed schemas; global schemas as preontologies? conflict detection Ontology alignment; new kinds of models will pose new kinds of problems Deductive DB/Datalog Learn from its failure, processing and F-logic how to handle different complexity levels efficiently Multimedia DB Image ontologies; semantic indexing; similarity-based search Image-based ontologies? Temporal/Spati al DB GIS semantics and archiving; histories data management; requirement to model temporal knowledge as first class citizen in ontologies; spatial, temporal modeling in upper ontologies; versioning of GIS becomes critical issue Document DB Digital libraries, unstructured data; standards for digital library resource descriptions to beused on the SW Lack of a priori global model presents a research challenge OO DB Object-oriented and object-based models for ontologies, extensible databases; modeling of object behavior; build OODB into Java management of large collections of object-, behavior- and resource identifiers Visual DB Visualization for the SW, queries; ontology visualization semantic upgrades of image databases to be used as visual ontologies query visual XML/Web DB Most relevant, caching Size and semantics; XML shortcomings for semantics definition Distributed DB everything Constraint DB Constraint enforcement as semantics mechanism; semantics-based query processing loosening of ACID properties trust/privacy/compliance issues in distributed DBMS; design/dynamic tailoring of DDBMS underlying web services Non-closed world assumption issues Transaction modeling Transaction processing limits of what can/must be transactional Mobile DB not directly; “mobile” platform issue Main memory DB Semantic caching is a Web services, Extended distributed transaction models; non-CWA issues; smart user profiling ACID properties of Web services; semantic support for very long transactions context-aware computing; device location-independent semantics; mobility issues raised/enabled by the (Semantic) Web possibly semantic caching i.e. using application semantics or context Parallel DB unclear at present; straightforward reuse/apply (e.g. parallel queries, transactions, …) in certain niches DB machines Not clear at present Web SoA; parallel architectures for ontology servers? Not clear at present Web SoA DB security A lot, e.g., access control trust and privacy, QoS; dynamically changing and conflicting security requirements Federated DB Autonomy; approaches for integrating heterogeneous data sources, in particular web information sources; mediator/ wrapper-based architectures www = huge federated DB; develop more powerful (scalable) approaches for ontology alignment and integration; heterogeneous sources may have different credibility; service composition Query processing high applicability; e.g. “smart” query enhancement Query optimization high applicability; e.g. use domainknowledge to optimize query execution and rewriting Information retrieval broad applicability of techniques and theory; DB interoperability DB versioning Everything; esp. see federated DBs; see schema integration Semantic aspects of interoperability; see federated DBs; quality of interoperation Link maintenance; versioning Annotations, ontology versioning of instance data modeling, Annotations, versioning modeling, ontology Metadata ontology Mediation/Mi ddleware Web services will benefit P2P, collaboration, new mediating components DB warehousing DW architectures for decision support; improve e.g. web service efficiency; see the (S)Web as a giant DW web mining; clustering; learning; information extraction profiles Smart data warehousing; share/compose application semantics; ontology behind “real” data DBMS (components) as web service(s); add semantics to every function/module in a DBMS’s architectures Ontology support in data dictionaries; new, more flexible DB architectures for better SW support and processing on the web Data(base) mining Database architectures and DBMS market for mining from text; exploit semantics in mining; derive semantics inductively from query results on “real” data including exceptions; machine learning Web-IS architectures fitting enterprise IS (components) into the SW; Web IS; also see DBMS architectures New architectures and design principles for Web IS Functional modeling design of web services; functional modeling that deals explicitly with a domain’s semantics Decomposition and composition of web services; event modeling IS in organizations looser coupling required, provide potential for organizations to morph into the SW; see also workflow modeling serving new organizations of business, community and government with emergent SW-based IS technology Web-IS applications IS workflow modeling exception handling in long (business) transactions; workflows as “the” paradigm for “programming” the SW IS methodologies ontology lifecycle issues; as IS components become more intelligent, work shifts to selforganization CASE tools ontology management systems smart (ontology-driven) SW portals and search engines (“Google++”-type); SWbased “direct marketing”-style systems; smart user profiling unreliability of components; unavailability of services New thinking required! E.g. Web IS in enterprises; how must business processes change to deal with existence of the SW; develop/maintain SW-based systems for user community unknown a priori User interfaces new applications principles for GUIs of design DB application architectures AI-and-DB and Web application service knowledge inference representation, Uncharted territory 1 Uncharted territory 2 New and complex requirements methods, immersive environments Sensor input management In general, most algorithms in DM are poor when they are applied to access, report etc data on the web. Domain semantics in such requests need to be exploited; however “centralized” solutions (where resources need to notify potential requestors) will not be scalable. and stream data