The STFC e-Science Centre Grid Data Management Technologies – what might it mean for the A&H Shirley Crompton (with thanks to & Kerstin Kleese van Dam and colleagues in eScience Centre + SRD: Andy Smith, Manolis Pantos) 02/07/2007 Presentation Outline Data Curation Data Management Services STFC Facilities CCLRC Metadata Model, Ontology, Metadata harvest e-Science Centre Archaeological Applications 02/07/2007 Overview of STFC e-Science Centre STFC e-Science Centre is about using leading edge IT to deliver e-Research – – – – High-quality scientific computing services Management and exploitation of large scale data. Collaborative R&D Support for collaborative working Sharing expertise - technology transfer Based on core strengths: Data analysis and Computation Data storage Data management 02/07/2007 Collaborative environments e-Infrastructure for Large Research Facilities Daresbury synchrotro n Diamond synchrotro n ISIS neutron and muon facility Vulcan laser facility 02/07/2007 The e-Science Centre was founded to develop, deploy and run services for experimental, computing and data facilities to enhance the research carried out at these facilities: • Understanding the research requirements through working with facilities staff and their users. •Creating a powerful, long lasting knowledge resource for UK academia. • Enabling users to get rapid access to their current and past data, related experiments, publications etc., leading to improved analysis through more complete information. e-Infrastructure STFC Active Directory Diamond Proposal Web pages DataPortal People DB DUO DUO Desk DLS ICAT SRB IKitten Data / metadata DDH StorageD GDA Diamond, CICT Modified by e-Science High Performance Grid Computing 02/07/2007 Nexus File & Data Atlas Data Store Probing the Past DSIC Heritage Science Centre Vegetable, Animal or Mineral Research in ancient materials and conservation Target Materials PEY-XAS 02/07/2007 The alloy composition by XRF, XRD and neutron diffraction Synchrotron X-ray diffraction and X-ray fluorescence together with neutron diffraction have answered the question of whether the repaired nose-guard is original: It is a modern replacement, made of a copper-zinc alloy, while the rest of the object is a copper-tin alloy, with small amounts of lead and iron. The noseguard added piece contains Zn, the head does not. Sn varies at various locations on the head. Fe content also differs. Rietveld fitting of neutron diffraction data showed conclusively that the bulk composition of the noseguard and the head is very different . 02/07/2007 . What’s the secret, soldier? Time-of-flight diffraction at ROTAX The “Sulfur problem” in archaeological marine timbers Mary Rose Sulfur Sulfate ‘Sulfur and iron speciation in recently recovered timbers of the Mary Rose revealed by X-ray absorption spectroscopy’ K.M. Wetherall, R.M.Moss, A.M. Jones, A.D. Smith, T. Skinner, D.M. Pickup, S.W. Goatham, A.V. Chadwick, R.J. Newport (2007) Submitted to J. Arch. Sci. 02/07/2007 Islamic & medieval lustreware A historic nano-material Understand historic production techniques Understand changes in that production relating to place and time Interest in developing new non-linear optical surfaces Studied historic fragments + laboratory reproductions Created reproductions in SR beams to study processes Differences in final visual effect arise from nano-particle type, size and density. Also from glaze type. Temperature and reduction /oxidising protocol very important. ‘Temperature resolved reproductions of medieval lustre’ T. Pradell, J. Molera, E. Pantos, A.D. Smith, C. Martin, A. Labrador (2007) Submitted to App.Phys.A. ‘The invention of lustre : Iraq 9th and 10 centuries AD’ T. Pradell, J. Molera, A.D. Smith, M.S. Tite (2007) Submitted to J. Arch. Sci. 02/07/2007 THz for art conservation ? K. Fukunaga, Y. Ogawa, S. Hayashi, I. Hosako (2007) “Terahertz spectroscopy for art conservation” 02/07/2007 IEICE Electronics Express 4 pp.258–263 Services - Petabyte Data Store Hardware – Tape library and drives: • Capacity 5PB • Bandwidth ~ 80MB/sec/drive • 0.5 PB commercial HSM system (DMF) • Fast, safe file storage Services for • STFC Facilities and User Communities • Other Research Councils 02/07/2007 Storing data in SRB (source: Roger Downing Can I use the system? Yes! SRB Client Store this data on the hexagonal server. SRB Redirect client connection to Hexagonal server Server SRB Server Is the client authorized? Yes! Once data stored update MCAT with location and other info SRB Server 02/07/2007 MCAT SRB Access Routes (source: Roger Downing) User Applications Project Specific Catalogues SRB ADS, Unix-file-system, etc (Based on Data Management for Diamond Doc) 02/07/2007 Data Storage -132 user communities currently including: BABAR H1 LHCB User Data Totals (GB) CSFSE RV Particle Physics Community (LHC: CMS, Atlas, LHcb,….) ISIS, British Atmospheric Data Centre EISCAT (Radar research) National Earth Observation Data Centre World Data Centre, CICT Central Laser Facility Diamond Light Source National Crystallography Service, Southampton University WASP, VIRGO Consortium BBSRC (BITS) Arts and Humanities Data Service Integrative Biology … 02/07/2007 SRB CM S A T LA S CSFA FS LE WI S 03 1 2 3 4 5 7 8 309 10 11 17 832 35 36 43 46 48 54 60 66 85 97 98 99 101 104 110 121 129 130 138 139 140 151 158 160 167 171 177 181 189 191 219 227 228 252 259 291 292 297 305 309 310 340 360 389 402 424 461 489 505 552 576 604 635 649 705 765 797 840 917 932 948 1,073 1,089 1,184 1,385 1,551 1,645 COLUM B US E I SCA T CM S A LE P H M OT T B A DC HROT HGA R SNO P DK DA T A GRI DDA T A DE LP HI CB A R E SCFSV R 35,604 SE RV I CE 1,946 SG WI GLA F 2,001 SCA RF P DK DST 2,022 NE ODC UCLCCS 2,077 RA LDB A FI NA NCE 1 2,273 I UE S LA T LA S B A B A RDB 2,534 M I NOS GRE NDE L 2,801 NI M G CSFRUT H DLDB A 18,021 2,898 NB C LWSI ST NCS 3,347 CM S42 ST E ST 8 ST E ST 7 I SI SV M S 4,510 B JS ST E ST 2 ST E ST 3 ST E ST 1 5,349 WA SP 11,572 WULFGA R ST E ST 4 ST E ST 5 5,642 ST E ST 6 GRA P HI CS OP A L 5,730 9,313 P E RU M JDC NE T B A CK SRB T ST E SDE M O GT F FUNNE L 6,399 7,433 8,109 Archival Services for BBSRC Institutes Institute Sites Archival Services operate on the economy of scale and require expert staff to operate them, thus central services for larger Campus Grid’s make financial sense. We operate these services both ‘on site’ within STFC and for external partners. Archival Service Central Site 02/07/2007 This example is for the BBSRC with about 16 sites across the UK, they operate on their own network and only their main site is connected to Janet. Scheduled archival and restores are handled via this central site. Arts and Humanities Data Service Dark Archive set up for the Arts and Humanities Data Service based at Kings College, similar to BBSRC solution, in operation since March 2007. 2.9TB – 170000 files 02/07/2007 AHDS Dark Archive Architecture Adil Hassan Mark Hedges SRB Client 02/07/2007 AHDS Databases Computational science (Hartree centre) First principles simulation allows the prediction of composition and structure of surfaces in external gaseous environments Single and multi-component systems can be studied Predict reaction and diffusion barriers and pathways from first principles Supercomputing – HPCx (BlueGene), HECToR ‘Stability of the AlF3 (0112) surface in H2O and HF environments : An investigation using hybrid density functional theory and atomistic thermodynamics’ S. Mukhopadhyay, C.L. Bailey, A. Wander, B.G. Searle, C.A. Muryn, S.L.M. Schroeder, R. Lindsay, N. Weiher, N.M. Harrison (2007) Surf.Sci. (in press) 02/07/2007 http://www.hpcx.ac.uk High Performance Computing Over 1500 registered users Hundreds of academic and government institutions More than 40 different applications Extensive coverage of the sciences, including bioinformatics, computational chemistry, plasma physics, astronomy and engineering 60+ papers every year 02/07/2007 Visualization Services 02/07/2007 Full Data Management Lifecycle Underpinning the Research Lifecycle 02/07/2007 CCLRC Scientific Metadata Model (Source: Shoaib Sufi) Metadata Granule M 1 Study Topic Access Conditions Investigation 1 1 Data Holding 1 1 1 M Data M Collection M 1 Data Object Related Materials 02/07/2007 M Legal Note Metadata Models and Catalogues STFC has also developed a number of community based Metadata, eg. ICAT, and Data Models, based on existing Standards. STFC operates a wide range of Metadata Catalogues for its facilities, other Research Councils and Dedicated User Communities. 02/07/2007 Data and Analysis Infrastructure Online Proposal System User Office System incl.: User Database Single Sign On Account Creation and Management Scheduling Health and Safety Proposal Management DataPortal Metadata Catalogue Data Acquisition System Storage Management System Data Analysis and Visualisation Interfaces Annotation Data Analysis Simulation Code Repository XML Output 02/07/2007 ISIS Facilities Ontology (source: Louisa Casley- Hayford) Class ISISExperiment hasTitle Hydrazinium Protein Crystallography GroupExperiment Class InvestigationTitle wasConductedIn Class CrystallographyGroupExperiment 1986 hasInvestigator hasDataFileName Class Year hasUsedInstrument Pete Jones HRPD HRP00145.RAW Class Investigator Class Instrument 02/07/2007 Class DataFile Data Portal & ICAT Architecture (source: Shoaib Sufi) 02/07/2007 The eScience Analysis Framework Allan) 02/07/2007 (Source: Rob AgentX Framework – Example (source: Phil Couch) DL_POLY3 (CCP5) integrated with CCP1 GUI Mappings CONTROL DL_POLY3 REVCON.xml CCP1 GUI AgentX Mappings CONFIG.xml AgentX core AgentX core - Core library written in C Fortran wrapper Python wrapper - Wrappers for Python, Perl and Fortran Standard Ontology Standard Mappings - Hides the complexities of dealing with XML - Simple API - Enables straightforward exchange of information 02/07/2007 Agent-X Application in the RMCS (source: Rik Tyler) Staging Job Management Meta Scheduling Simulation Simulation Simulation Simulation RGem User Desktop: Job Submission AgentX 02/07/2007 XML XML dataXML dataXML data data RCommands Parameter Range Selection Database Data Curation at STFC • Membership in various Standardisation bodies • Digital Repositories and Open Access influence on both national and international level • Long Term Archival and Preservation of scientific Data for over 30 years for many different communities. • Founding member of the UK Digital Curation Centre • Leader of the EU CASPAR project 02/07/2007 CASPAR Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval To build a framework for enabling long-term preservation of cultural, artistic, and scientific knowledge. To generate, evaluate, and develop the practices that will be needed in the future. 02/07/2007 CASPAR Architecture CASPAR information flow architecture See http://www.casparpreserves.eu 02/07/2007 Questions? 02/07/2007 The STFC e-Science Centre Grid Data Management Technologies – what might it mean for the A&H Shirley Crompton (with thanks to & Kerstin Kleese van Dam and colleagues in eScience Centre + SRD: Andy Smith, Manolis Pantos) 02/07/2007