e-Science Data Information and Knowledge Transformation Edikt : e-Science Data, Information and Knowledge Transformation NeSC Review, 30 September 2003 Dr. Denise Ecklund, edikt technical architect What is edikt? Requirements analysis Technology matchmaking Standards E-Science Apps CS Research Edikt project Gap filling Grid Services for e-Science Data Management Rigorous engineering Commercial SW components and skills The team: 8 professional software engineers, architect, project manager, and support staff SHEFC funded research and development grant – 3 years funding: May 2002 – 2005 – +3 years funding upon successful project and review 2 www.edikt.org Current activities Eldas – Enterprise level data access services – Core data services supporting e-Science virtual organisations BinX – Binary XML – Supports data interchange for astronomy and other applications OSAGE – Ontology-based Species Atlas for Gene Expression – Defines a database schema for storing and annotating 3D anatomy and gene expression data for multiple species Technology and research evaluations 3 www.edikt.org Creating a Virtual Organization Let’s Its atshare X. Get with Y ourit data! Great! I can’t read it! How do I get it? Radio spectrum Optical spectrum ELDAS DB2 DB Great! I can’t Where find is it!it? X-Ray spectrum + Grid Directory Services Xindice DB MySQL DB 4 www.edikt.org ELDAS – Extensibility via DACs User1 User2 Reusable ELDAS Core User3 ELDAS ELDAS Core DAC2 DAC Xindice DB DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Data Access Components interface to distinct DBMSs Multiple DB drivers can be supported – JDBC, ODBC for relational DBMSs Plug-n-Play installation of ELDAS 5 www.edikt.org ELDAS – EJB Implementation Grid User1 ELDAS runs anywhere Suitable for grid & web Grid User2 Grid Proxy Web User1 Web Servlet Java Framework ELDAS EJB - GDS DAC Xindice DB DAC MySQL DB DAC DAC DB2 DB Oracle 9i DB Java 2 Enterprise Edition implements basic server tasks Java Beans container used to implement ELDAS core 6 www.edikt.org BinX – accessing legacy binary data simulations The Problem: – Many binary data files – Applications must “know” the data format – Binary data formats are machine-specific Binary Binary Data File Binary Data File Data File The Solution: – Write a “stand-aside” format description in XML – Provide a library to Interpret the description Provide file access across different machines – Build higher-level services BinX file describes binary file structure BinX Library e-Science Application 7 www.edikt.org BinX – format transformation Even when we try to agree, we disagree Multiple data format standards require conversions BinX description Binary Data File Binary Data File BinX Library BinX Utilities FITS data format VOTable data format Spectral Analysis Application Data format transformations based on XML descriptions BinX description 3D Image Data Mining Application 8 www.edikt.org OSAGE – Applying Computer Science Extend the Edinburgh Mouse Atlas – Data model to describe multiple species – Support scientific collaboration via data sharing Computer Science theory and best practice – Generic data model for species anatomy – Flexible data annotation and versioning with XML CS theory DB2 DB Data Access Services 9 www.edikt.org The Future – bringing components together Extended Grid Data Services for Virtual Organisations Data Versioning Service Constraint Mgmt Service User Annotation Service ... CS research results layered over basic ELDAS services Xindice DB Data Archiving Service ELDAS BinX Library MySQL DB DB2 DB BinX is an intelligent Binary Files binary file data source 10 www.edikt.org e-Science Data Information and Knowledge Transformation Thank you! Questions? http://www.edikt.org