CMYK Discovery Net The Discovery Net project (www.discovery-onthe.net) is a UK e-Science Pilot Project funded by the EPSRC under the UK e-Science programme. Discovery Net is building the software infrastructure and tools for providing Knowledge Discovery Services that allow scientists to conduct and manage complex data analysis and knowledge discovery activities on the new generation Internet. Discovery Net addresses the complexities faced by scientists from all information-intensive fields where: Project Goals Discovery Net defines the standards, architecture and tools that: • Allow scientists to plan, manage, share and execute complex knowledge discovery and data analysis procedures available as remote services. • Allow service providers to publish and make available data mining and data analysis software components as services to be used in knowledge discovery procedures. • Allow data owners to provide interfaces and access to scientific databases, data stores, sensors and experimental results as services so that they can be integrated in knowledge discovery processes • Modern high throughput devices are routinely generating and capturing large amounts of data. • Data sets are not analysed in isolation, but dynamically integrated during the analysis. • New data analysis methods and software components are continually being developed. • Knowledge discovery procedures are complex multi-step procedures conducted by interdisciplinary teams of scientists. Discovery Net Testbeds Discovery Net is multi-disciplinary project. In addition to developing the software infrastructure for Knowledge Discovery Services, the project is also developing a series of testbeds and demonstrators for using the technology in the areas of life sciences, environmental modelling and geo-hazard prediction. Discovery Net Architecture and Knowledge Discovery Services Discovery Net architecture provides open standards for specifying: Knowledge Discovery Adapters: used to declare the properties of analytical software components and scientific data stores including their input/output types, performance and accuracy characteristics. Knowledge Discovery Services look-up and registration: allowing scientists to retrieve and compose Knowledge Discovery Services in their discovery procedures. Integrated Scientific Database Access: allowing the integration of structured and semi-structured data from different data sources within a discovery procedure using XML schemas. Knowledge Discovery Process Management: including DPML (Discovery Process Markup Application of Discovery Net in Life Sciences Language) as a standard specification language for constructing and managing knowledge discovery procedures, as well as recording their history. Knowledge and Discovery Process Storage: allowing discovery procedures to be stored, shared and reexecuted. Knowledge Discovery Process Deployment: allowing users to deploy and publish their existing knowledge discovery procedures as new services. CMYK D-NET Grid-based Architecture: Discovery Net is based on an open architecture re-using common protocols and common infrastructures such as the Globus Toolkit and the OGSI. It also define its own protocol for workflows, Discovery Process Markup Language (DPML) which allows the definition of data analysis tasks to be executed on distributed resources. Using Discovery Net architecture it easy to build and deploy a variety of fully distributed data intensive applications. The D-NET client can connect to multiple servers in order to build a real Grid application where data, functionalities and resources can be located and used on any D-NET server • Data: Large tables can be transferred across servers • Tasks: Tasks implemented in Java can migrate to other servers • Resources: A trusted client can use any D-NET server to execute parts or all of the workflow D-NET takes advantage of the OGSA architecture to present its functionalities to other grid users and projects. • Grid Service: The functionalities of D-NET can be accessed through a simple interface exposed as an Discovery Net Grid-based Architecture OGSA-compliant Grid service, making all the workflows usable by other grid application builders. • OGSI: D-NET uses the OGSI technology preview to deploy its grid service and to have a simple front-end to use it. Discovery Net Geo-hazard Prediction Application Discovery Net Air Pollution Modelling Application The Discovery Net project is conducted at Imperial College London Principal Investigator: Dr. Yike Guo (Dept of Computing). Discovery Net Team: Prof. Tony Cass (Dept. of Biological Sciences), Prof. John Darlington (Dept. of Computing), Dr. John Hassard (Dept. of Physics), Dr. Jian Guo Liu (Dept. of Earth Sciences), Dr. Daniel Ruckert (Dept. of Computing), Prof. Robert Spence (Dept. of Electrical Engineering). Discovery Net Collaborations: Discovery Net is currently collaborating with National Centre for Data Mining (NCDM) at the University of Illinois at Chicago for the creation of the Global Discovery Net project. Contact Information: Dr. Moustafa M. Ghanem, Discovery Net Project Manager, Department of Computing, Imperial College, London SW7 2BZ, United Kingdom, Email: discoverynet@doc.ic.ac.uk Phone: +44 (0) 20 7594 8357 Fax: +44 (0) 20 7594 8246, www.discovery-on-the.net tel: fax: web: email: +44 (0)20 7594 8360 +44 (0)20 7581 8024 www.lesc.imperial.ac.uk lesc-admin@imperial.ac.uk London e-Science Centre Department of Computing South Kensington Campus Imperial College London SW7 2AZ United Kingdom