e-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation E-Science Centres of Excellence Meeting 6 November 2003 Dr. Denise Ecklund, edikt technical architect What is edikt? Standards Requirements analysis Technology matchmaking E-Science Apps CS Research Edikt project Gap filling Grid Services for e-Science Data Management Rigorous engineering Commercial SW components and skills 2 www.edikt.org The edikt team The team – – – – – – 8 professional software engineers project manager commercialisation manager architect support staff Scientific Advisory Board (SAB) SHEFC funded R&D grant – 3 years funding: May 2002 – 2005 – +3 years funding upon successful review 3 www.edikt.org Current activities Eldas – Enterprise level data access services – Core data services supporting e-Science virtual organisations BinX and AstroBinX – Binary XML – Supports data interchange for astronomy and other applications OSAGE – Ontology-based Species Atlas for Gene Expression – Defines a database schema for storing and annotating 3D anatomy and gene expression data for multiple species Technology and research evaluations 4 www.edikt.org Creating a Virtual Organization Radio spectrum DB2 DB Optical spectrum Xindice DB X-Ray spectrum MySQL DB 5 www.edikt.org Creating a Virtual Organization Radio spectrum Optical spectrum ELDAS DB2 DB X-Ray spectrum + Grid Directory Services Xindice DB MySQL DB 6 www.edikt.org ELDAS – Extensibility via DACs User1 User2 Reusable ELDAS Core User3 ELDAS ELDAS Core DAC2 DAC Xindice DB DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Data Access Components interface to distinct DBMSs Multiple DB drivers can be supported – JDBC, ODBC for relational DBMSs Plug-n-Play installation of ELDAS 7 www.edikt.org ELDAS – EJB Implementation Using EJBs ELDAS separates - data access - business logic - presentation layers Java Framework ELDAS EJB - GDS DAC Xindice DB DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Java 2 Enterprise Edition implements basic server tasks Java Beans container used to implement ELDAS core 8 www.edikt.org ELDAS – EJB Implementation Grid User1 Grid User2 Grid Proxy Suitable for grid users Java Framework ELDAS EJB - GDS DAC Xindice DB DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Java 2 Enterprise Edition implements basic server tasks Java Beans container used to implement ELDAS core 9 www.edikt.org ELDAS – EJB Implementation Grid User1 Suitable for grid users and web users Grid User2 Grid Proxy Web User1 Web Servlet Java Framework ELDAS EJB - GDS DAC Xindice DB DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Java 2 Enterprise Edition implements basic server tasks Java Beans container used to implement ELDAS core 10 www.edikt.org BinX – accessing legacy binary data simulations The Problem: – Many binary data files – Applications must “know” the data format of each file – Binary data formats are machine-specific Binary Binary Data File Binary Data File Data File e-Science Application 11 www.edikt.org BinX – accessing legacy binary data simulations The Solution: – Write a “stand-aside” format description in XML – Provide a library to Interpret the description Provide file access across different machines – Build higher-level services Binary Binary Data File Binary Data File Data File BinX file describes binary file structure BinX Library e-Science Application 12 www.edikt.org AstroBinX – format transformation Even when we try to agree, we disagree Multiple data format standards require conversions Binary Data File FITS data format Spectral Analysis Application Binary Data File VOTable data format 3D Image Data Mining Application 13 www.edikt.org AstroBinX – format transformation Data format transformations based on XML descriptions Build AstroBinX services using the BinX library BinX description Binary Data File FITS data format Spectral Analysis Application BinX Library BinX Utilities Binary Data File VOTable data format BinX description 3D Image Data Mining Application 14 www.edikt.org OSAGE – Applying Computer Science Extend the Edinburgh Mouse Atlas – Data model to describe multiple species – Support scientific collaboration via data sharing Computer Science theory and best practice – Data modelling to efficiently relate images and text – Flexible data annotation and versioning with XML CS theory DB2 DB Data Access Services 15 www.edikt.org The Future – bringing components together ELDAS BinX Library Xindice DB MySQL DB DB2 DB BinX is an intelligent Binary Files binary file data source 16 www.edikt.org The Future – bringing components together Extended Grid Data Services for Virtual Organisations Data Versioning Service Constraint Mgmt Service User Annotation Service ... CS research results layered over basic ELDAS services Xindice DB Data Archiving Service ELDAS BinX Library MySQL DB DB2 DB BinX is an intelligent Binary Files binary file data source 17 www.edikt.org e-Science Data Information and Knowledge Transformation http://www.edikt.org ELDAS –– download the library and docs in January 2004 BinX –– download the library, utilities, docs, and sample applications now Thank you! Questions?