The STFC e-Science Centre Grid Data Management Technologies –

advertisement
The STFC e-Science Centre
Grid Data Management
Technologies –
what might it mean for the A&H
Shirley Crompton
(with thanks to & Kerstin Kleese van Dam and colleagues in eScience Centre + SRD: Andy Smith, Manolis Pantos)
02/07/2007
Presentation Outline
Data Curation
Data
Management
Services
STFC Facilities
CCLRC Metadata Model,
Ontology,
Metadata harvest
e-Science
Centre
Archaeological
Applications
02/07/2007
Overview of STFC e-Science Centre
STFC e-Science Centre is about using leading edge IT to
deliver e-Research
–
–
–
–
High-quality scientific computing services
Management and exploitation of large scale data.
Collaborative R&D
Support for collaborative working
Sharing expertise - technology transfer
Based on core strengths:
Data analysis and
Computation
Data storage
Data
management
02/07/2007
Collaborative
environments
e-Infrastructure for Large Research
Facilities
Daresbury
synchrotro
n
Diamond
synchrotro
n
ISIS
neutron
and muon
facility
Vulcan
laser
facility
02/07/2007
The e-Science Centre was founded to
develop, deploy and run services for
experimental, computing and data
facilities to enhance the research
carried out at these facilities:
• Understanding the research
requirements through working with
facilities staff and their users.
•Creating a powerful, long lasting
knowledge resource for UK academia.
• Enabling users to get rapid access to
their current and past data, related
experiments, publications etc., leading
to improved analysis through more
complete information.
e-Infrastructure
STFC
Active Directory
Diamond
Proposal
Web pages
DataPortal
People DB
DUO
DUO Desk
DLS ICAT
SRB
IKitten
Data /
metadata
DDH
StorageD
GDA
Diamond, CICT
Modified by e-Science
High
Performance
Grid Computing
02/07/2007
Nexus
File
& Data
Atlas
Data Store
Probing the Past
DSIC Heritage Science Centre
Vegetable,
Animal or Mineral
Research in ancient materials
and conservation
Target Materials
PEY-XAS
02/07/2007
The alloy
composition
by XRF, XRD and
neutron diffraction
Synchrotron X-ray diffraction and X-ray fluorescence
together with neutron diffraction have answered the
question of whether the repaired nose-guard is original:
It is a modern replacement, made of a copper-zinc alloy,
while the rest of the object is a copper-tin alloy, with small
amounts of lead and iron.
The noseguard added piece contains Zn, the head does
not. Sn varies at various locations on the head. Fe
content also differs. Rietveld fitting of neutron diffraction
data showed conclusively that the bulk composition of
the noseguard and the head is very different .
02/07/2007
.
What’s the secret, soldier?
Time-of-flight diffraction at ROTAX
The “Sulfur problem” in
archaeological marine timbers
Mary Rose
Sulfur
Sulfate
‘Sulfur and iron speciation in recently recovered timbers of the Mary
Rose revealed by X-ray absorption spectroscopy’ K.M. Wetherall,
R.M.Moss, A.M. Jones, A.D. Smith, T. Skinner, D.M. Pickup, S.W.
Goatham, A.V. Chadwick, R.J. Newport (2007) Submitted to J. Arch. Sci.
02/07/2007
Islamic & medieval lustreware
A historic nano-material
Understand historic production techniques
Understand changes in that production relating to place
and time
Interest in developing new non-linear optical surfaces
Studied historic fragments + laboratory reproductions
Created reproductions in SR beams to study processes
Differences in final visual effect arise from nano-particle
type, size and density. Also from glaze type.
Temperature and reduction /oxidising protocol very
important.
‘Temperature
resolved reproductions of medieval lustre’
T. Pradell, J. Molera, E. Pantos, A.D. Smith, C. Martin, A.
Labrador (2007) Submitted to App.Phys.A.
‘The invention of lustre : Iraq 9th and 10 centuries AD’
T. Pradell, J. Molera, A.D. Smith, M.S. Tite (2007)
Submitted to J. Arch. Sci.
02/07/2007
THz for art conservation ?
K. Fukunaga, Y. Ogawa, S. Hayashi, I. Hosako
(2007)
“Terahertz spectroscopy for art conservation”
02/07/2007
IEICE Electronics Express 4 pp.258–263
Services - Petabyte Data Store
Hardware
– Tape library and
drives:
• Capacity 5PB
• Bandwidth ~
80MB/sec/drive
• 0.5 PB commercial
HSM system (DMF)
• Fast, safe file
storage
Services for
• STFC Facilities and User
Communities
• Other Research Councils
02/07/2007
Storing data in SRB
(source: Roger Downing
Can I use the system? Yes!
SRB
Client
Store this data on the
hexagonal server.
SRB
Redirect client
connection to
Hexagonal server
Server
SRB
Server
Is the client
authorized?
Yes!
Once data stored
update MCAT with
location and other info
SRB
Server
02/07/2007
MCAT
SRB Access Routes
(source: Roger
Downing)
User Applications
Project Specific Catalogues
SRB
ADS, Unix-file-system, etc
(Based on Data Management for Diamond Doc)
02/07/2007
Data Storage -132 user
communities currently including:
BABAR
H1
LHCB
User Data Totals (GB)
CSFSE RV
Particle Physics Community (LHC:
CMS, Atlas, LHcb,….)
ISIS, British Atmospheric Data
Centre
EISCAT (Radar research)
National Earth Observation Data
Centre
World Data Centre, CICT
Central Laser Facility
Diamond Light Source
National Crystallography Service,
Southampton University
WASP, VIRGO Consortium
BBSRC (BITS)
Arts and Humanities Data Service
Integrative Biology
…
02/07/2007
SRB CM S
A T LA S
CSFA FS
LE WI S
03
1
2
3
4
5
7
8
309
10
11
17
832
35
36
43
46
48
54
60
66
85
97
98
99
101
104
110
121
129
130
138
139
140
151
158
160
167
171
177
181
189
191
219
227
228
252
259
291
292
297
305
309
310
340
360
389
402
424
461
489
505
552
576
604
635
649
705
765
797
840
917
932
948
1,073
1,089
1,184
1,385
1,551
1,645
COLUM B US
E I SCA T
CM S
A LE P H
M OT T
B A DC
HROT HGA R
SNO
P DK DA T A
GRI DDA T A
DE LP HI
CB A R
E SCFSV R
35,604
SE RV I CE
1,946
SG
WI GLA F
2,001
SCA RF
P DK DST
2,022
NE ODC
UCLCCS
2,077
RA LDB A
FI NA NCE 1
2,273
I UE S
LA T LA S
B A B A RDB
2,534
M I NOS
GRE NDE L
2,801
NI M G
CSFRUT H
DLDB A
18,021
2,898
NB C
LWSI ST
NCS
3,347
CM S42
ST E ST 8
ST E ST 7
I SI SV M S
4,510
B JS
ST E ST 2
ST E ST 3
ST E ST 1
5,349
WA SP
11,572
WULFGA R
ST E ST 4
ST E ST 5
5,642
ST E ST 6
GRA P HI CS
OP A L
5,730
9,313
P E RU
M JDC
NE T B A CK
SRB T ST
E SDE M O
GT F
FUNNE L
6,399
7,433
8,109
Archival Services for BBSRC
Institutes
Institute Sites
Archival Services operate on the
economy of scale and require expert
staff to operate them, thus central
services for larger Campus Grid’s
make financial sense.
We operate these services both ‘on
site’ within STFC and for external
partners.
Archival
Service
Central Site
02/07/2007
This example is for the BBSRC with
about 16 sites across the UK, they
operate on their own network and only
their main site is connected to Janet.
Scheduled archival and restores are
handled via this central site.
Arts and Humanities Data
Service
Dark Archive set up for the Arts and
Humanities Data Service based at
Kings College, similar to BBSRC
solution, in operation since March
2007.
2.9TB – 170000 files
02/07/2007
AHDS Dark Archive Architecture
Adil Hassan
Mark Hedges
SRB
Client
02/07/2007
AHDS
Databases
Computational
science
(Hartree
centre)
First principles
simulation
allows the
prediction of composition and structure
of surfaces in external gaseous
environments
Single and multi-component systems can
be studied
Predict reaction and diffusion barriers and
pathways from first principles
Supercomputing – HPCx (BlueGene),
HECToR
‘Stability of the AlF3 (0112) surface in H2O and HF
environments : An investigation using hybrid density
functional theory and atomistic thermodynamics’
S. Mukhopadhyay, C.L. Bailey, A. Wander, B.G. Searle,
C.A. Muryn, S.L.M. Schroeder, R. Lindsay, N. Weiher,
N.M. Harrison (2007) Surf.Sci. (in press)
02/07/2007
http://www.hpcx.ac.uk
High Performance
Computing
Over 1500 registered users
Hundreds of academic and
government institutions
More than 40 different
applications
Extensive coverage of the
sciences, including
bioinformatics,
computational chemistry,
plasma physics, astronomy
and engineering
60+ papers every year
02/07/2007
Visualization Services
02/07/2007
Full Data Management Lifecycle
Underpinning the Research Lifecycle
02/07/2007
CCLRC Scientific Metadata Model (Source: Shoaib Sufi)
Metadata Granule
M
1
Study
Topic
Access
Conditions
Investigation
1
1
Data
Holding
1
1
1
M
Data
M
Collection
M
1
Data
Object
Related
Materials
02/07/2007
M
Legal Note
Metadata Models and
Catalogues
STFC has also developed a number of
community based Metadata, eg. ICAT, and
Data Models, based on existing Standards.
STFC operates a wide range of Metadata
Catalogues for its facilities, other
Research Councils and Dedicated User
Communities.
02/07/2007
Data and Analysis
Infrastructure
Online Proposal
System
User Office
System incl.:
User Database
Single Sign On
Account Creation
and Management
Scheduling
Health and Safety
Proposal
Management
DataPortal
Metadata
Catalogue
Data Acquisition
System
Storage
Management
System
Data Analysis and
Visualisation
Interfaces
Annotation
Data Analysis
Simulation
Code Repository
XML Output
02/07/2007
ISIS Facilities Ontology
(source: Louisa Casley-
Hayford)
Class ISISExperiment
hasTitle
Hydrazinium
Protein Crystallography
GroupExperiment
Class InvestigationTitle
wasConductedIn
Class CrystallographyGroupExperiment
1986
hasInvestigator
hasDataFileName
Class Year
hasUsedInstrument
Pete Jones
HRPD
HRP00145.RAW
Class Investigator
Class Instrument
02/07/2007
Class DataFile
Data Portal & ICAT Architecture (source: Shoaib Sufi)
02/07/2007
The eScience Analysis Framework
Allan)
02/07/2007
(Source: Rob
AgentX Framework – Example (source: Phil Couch)
DL_POLY3 (CCP5) integrated with CCP1 GUI
Mappings
CONTROL
DL_POLY3
REVCON.xml
CCP1 GUI
AgentX
Mappings
CONFIG.xml
AgentX
core
AgentX
core
- Core library written in C
Fortran
wrapper
Python
wrapper
- Wrappers for Python, Perl and
Fortran
Standard
Ontology
Standard
Mappings
- Hides the complexities of dealing
with XML
- Simple API
- Enables straightforward
exchange of information
02/07/2007
Agent-X Application in the RMCS (source: Rik Tyler)
Staging
Job
Management
Meta
Scheduling
Simulation
Simulation
Simulation
Simulation
RGem
User Desktop:
Job Submission
AgentX
02/07/2007
XML
XML
dataXML
dataXML
data
data
RCommands
Parameter Range
Selection
Database
Data Curation at STFC
• Membership in various Standardisation
bodies
• Digital Repositories and Open Access
influence on both national and
international level
• Long Term Archival and Preservation of
scientific Data for over 30 years for many
different communities.
• Founding member of the UK Digital
Curation Centre
• Leader of the EU CASPAR project
02/07/2007
CASPAR
Cultural, Artistic and Scientific knowledge for
Preservation, Access and Retrieval
To build a framework for enabling long-term
preservation of cultural, artistic, and scientific
knowledge.
To generate, evaluate, and develop the
practices that will be needed in the future.
02/07/2007
CASPAR Architecture
CASPAR
information flow
architecture
See http://www.casparpreserves.eu
02/07/2007
Questions?
02/07/2007
The STFC e-Science Centre
Grid Data Management
Technologies –
what might it mean for the A&H
Shirley Crompton
(with thanks to & Kerstin Kleese van Dam and colleagues in eScience Centre + SRD: Andy Smith, Manolis Pantos)
02/07/2007
Download