DYNAMIC NET DATA: THEORY AND EXPERIMENT Philippa Gardner, Nobuko Yoshida, Sergio Maffeis, Alex Ahern Department of Computing, Imperial College London WHAT IS DYNAMIC NET DATA? MODELLING DYNAMIC NET DATA IN Xdπ Web data, such as XML, plays a fundamental role in the exchange of information between globally distributed applications. Applications naturally fall into some sort of mediator approach: systems are divided into peers, with mechanisms based on XML for interaction between peers. The development of analysis techniques, languages and tools for web data is by no means straightforward. In particular, although web services allow for interaction between processes and data, direct interaction between processes is not well-supported. We model large scale, peer-to-peer systems for sharing dynamic data over the Net. In this setting, distribution is on a large scale, with many sites providing and consuming data using a standardised set of functionalities. The data itself is interlinked and dynamic, containing calls to web services, forms, scripted code, etc. Semi-structured Data XML, XLink,… SOAP Diverse Languages: C#, Java, … Web Web Active Pages Applets Java/ECMAScript Java Peer-to-peer data management systems are decentralised distributed systems, where each component offers the same set of basic functionalities and acts both as a producer and as a consumer of information. We model systems where each peer consists of an XML data repository and a working space with active processes. Our processes can be regarded as agents with a simple set of functionalities; they communicate with each other, query and update the local repository, and migrate to other peers to continue execution. Process definitions can be included in documents and can be selected for execution by other processes. These functionalities are enough to express most of the dynamic behaviour found in web data, such as web services, distributed (and replicated) documents, distributed query patterns, hyperlinks, forms, and scripting. The Xdπ calculus provides a formal description of such systems. It is based on a network of locations (peers) containing a (semi-structured) data model, and π-like processes for modelling process interaction, process migration, and interaction with data. The data model consists of unordered labelled trees, with embedded processes for querying and updating data, and explicit pointers for referring to other parts of the network. EXISTING MODELS •Query languages for semi-structured data (XML) only describe data manipulation without taking into account the distribution of information. •Process calculi, whilst good for orchestrating data exchanges between peers and distributed infrastructure, tend to abstract away from the actual data. Modelling dynamic Net data requires merging these approaches. OUR MODEL: Xdπ •XML-like dynamic-data repositories •Explicit distribution •Coordination based on π-calculus The model consists of a flat space of locations, where each location contains (XML) trees and coordination processes. L2 L1 L3 Processes Trees L4 a c b P a a c Processes Processes a c b @L1:a/c L1 Data consists of unordered, edge-labelled trees, containing scripted processes and L2 pointers to data in other locations. AN EXAMPLE: WEB SERVICES We embed in data the process which represents a call to a web service get at site L2 with parameter a/c, a path expression representing a query on the data of L2. The stretchable band between processes denotes the sharing of a secret channel name. = go L2.get<a/c,L1,put> = (new put)( = put(x).pastea/e<x> a b e c = !get(x,y,w).copyx(z).go y.w<z> | ) a a c P R b b c c e Q a b T The execution continues with the service instance reading the data T from the tree at a/c, and then migrating to the peer L1 obtained as a parameter of the service invocation. a b a a c e P R b T b Q c a c e b T L1 a b L2 c a a c e R b P b Q T c a c e b T L1 L2 L2 L1 a a We follow the execution: the first step is the migration of the service call , followed by the invocation of the web service , which spawns a new service instance . c b c e a a c b R P b Q a c c e b T L1 a b e L2 c a a c b R P Q b c e a At L1 the required data item is passed to the result handling code on the secret channel, and the result data is inserted in the original tree. c c e a a b R T P b Q c a c e b T L2 L1 a b c a a c e b b T L1 T L1 b c R P Q b e c a c b T L2 L2 THE UNIFIED FRAMEWORK IMPLEMENTATION We reason about data and the distributed infrastructure in the same framework to: •understand the system behaviour •control access to resources •give schemas/types to documents containing scripts •propose new optimisations. Alex Ahern, Philippa Gardner, Gurdish Gill, Ameet Shah. Several prototypes of Xdπ have been constructed in a variety of languages: Java Abstract Machine •Round-robin term evaluation. •XML used for data structures. •XPath used as the data query language. •Migration by web service invocation provided by Apache Axis. •Browser-based user interface. •An ongoing student project is to build an API combining Xdπ communication and thread migration with Java code. PROCESS BEHAVIOUR Philippa Gardner, Sergio Maffeis. We define process equivalence in such a way that when equivalent processes are put in the same position in a network, the resulting networks are equivalent. This work involved adapting, in a non-trivial way, bisimulation techniques used for the higher-order π-calculus. Equational reasoning provides formal proofs of the (partial) correctness of implementations with respect to specifications. We have studied examples, including the transparent replication of web services. SECURITY AND TYPES Alex Ahern, Philippa Gardner, Jonathan Hayman, Sergio Maffeis, Nobuko Yoshida. Security is a central concern for systems sharing dynamic data on the Net. We are currently studying: •fine-grained access control for web services and documents, extending techniques studied for the distributed π-calculus, and using spatial logic formulae annotated with read and write capabilities to describe permissions on data; •type systems for statically guaranteeing the structure of Xdπ data, adapting techniques from semi-structured data and π-processes. O’Caml Abstract Machine •Functional language. •Custom data structures. •Migration by object serialisation and transmission. •Text-based user interface. Future Work Developing a browser for dynamic XML documents. These documents will contain scripted processes which the browser can interpret and execute. Documents will be queried using XQuery and will be statically typed. REFERENCES Modelling Dynamic Web Data, Gardner and Maffeis, Imperial College London Technical Report Nr.2003/14, October 2003. Extended abstract in Proc. of DBPL'03, LNCS 2921. Invited submission to the Journal of Theoretical Computer Science. Behavioural Equivalences for Dynamic Web Data, Maffeis and Gardner, Submitted. February 2004. A Language for Updating Web Data, A. Ahern, Master’s Thesis, Imperial College London, 2003.