Middleware services on the NGS

advertisement
http://www.grid-support.ac.uk
http://www.ngs.ac.uk
Middleware services on the
NGS
http://www.nesc.ac.uk/
http://www.pparc.ac.uk/
http://www.eu-egee.org/
Acknowledgments
• Matt Ford, NGS Induction Workshop (Dec. 2004,
NeSC)
• Neil Chue Hong , OGSA-DAI Tutorial GGF13
• OGSA-DAI website, www.ogsadai.org
Induction to Grid Computing and the National Grid Service
3
NGS software
• Computation services based on Globus Toolkit 2
– Use compute nodes for sequential or parallel jobs, primarily from
batch queues
– Can run multiple jobs concurrently
• Data services:
– Storage Resource Broker:
• Primarily for file storage and access
• Virtual filesystem with replicated files
– “OGSA-DAI”: Data Access and Integration
• Primarily for grid-enabling databases (relational, XML)
– NGS Oracle service
– GridFTP
• Portal to support collaboration and ease use
• Authorisation and Authentication using GSI
Induction to Grid Computing and the National Grid Service
4
Globus Toolkit
illustration
• Command line interface to the tool for job
globus-job-submit
grid-data.rl.ac.uk/jobmanager-pbs
submission
– need to know name of/bin/hostname
a Compute-f
https://grid-data.rl.ac.uk:64001/1415/1110129853/
Element (batch queue)
globus-job-status https://grid-data.rl.ac.uk:64001/1415/1110129853/
DONE
globus-job-get-output https://grid-data.rl.ac.uk:64001/1415/1110129853/
grid-data12.rl.ac.uk
Induction to Grid Computing and the National Grid Service
5
The “UI” machine
• The user’s interface to the grid
– Where you upload your certificate for your session
– Where you create proxy certificates
– Where you can run the various commands,
including…
•
•
•
•
The clients and development tools from Globus Toolkit 2.4.3
GSI (Grid Security Infrastructure) enabled Secure Shell
Storage Resource Broker (more on this tomorrow)
OGSA-DAI (more on this tomorrow)
Induction to Grid Computing and the National Grid Service
6
Our setup
Tutorial room
machines
ssh
pub-234.nesc.ed.ac.uk.
UI
gsissh and Globus
commands
Internet
NGS head
nodes
Execute
Nodes
grid-data.rl.ac.uk
7
Job submission : CLI
Command-line
interfaces
GLOBUS,
GLOBUS,
etc.
etc.
User’s Interface to the grid
Induction to Grid Computing and the National Grid Service
8
Application-specific tools
Application Specific
and / or
Higher generic tools
e.g. BRIDGES
Command-line
interfaces
GLOBUS, etc.
User’s Interface to the grid
Induction to Grid Computing and the National Grid Service
9
Application-specific tools
API’s:
•Java
•C
•…
Application Specific
and / or
Higher generic tools
e.g. BRIDGES
Command-line
interfaces
GLOBUS, etc.
User’s Interface to the grid
Induction to Grid Computing and the National Grid Service
10
Secure shell access
UI
NGS head node
Code and data
gsiscp: copies file
using proxy
certificate to allow
AA
Induction to Grid Computing and the National Grid Service
11
Open shell on NGS CN
UI
NGS node
Code and data
gsissh
Code and data
Compile, edit, recompile,
build
Can be an Xwindows client
SHORT interactive runs
are ok (sequential)
Totalview debugger.
Induction to Grid Computing and the National Grid Service
12
Run jobs from the UI
UI
NGS execute node
Code and data
Code and data
Executables
globus_job_run
Or
globus_job_submit /
globus_get_output
Can pass files with
these commands: e,g,
parameters for a job.
Induction to Grid Computing and the National Grid Service
13
Non-communicating Processes
Globus_job_submit
UI
Internet
Head processors of
clusters
Worker processors of
clusters
Processes run without any communication between them
Induction to Grid Computing and the National Grid Service
14
Communicating Processes
UI
Globus_job_submit
Internet
Head processors of
clusters
Worker processors of
clusters
Processes send messages to each other – Must run on same cluster
Induction to Grid Computing and the National Grid Service
15
Communicating Processes
UI
Internet
Head processors of
clusters
MPI
Worker processors of
clusters
Processes send messages to each other – Must run on same cluster
Induction to Grid Computing and the National Grid Service
16
Available API’s
• C http://www.globus.org/developer/apireference.html
• “Community Grid” CoG http://www.cogkit.org/
– Java, Python, Matlab
Induction to Grid Computing and the National Grid Service
17
Data services
• OGSA-DAI: data access and integration
• GridFTP: a protocol for large file transfer
• The Storage Resource Broker
• But first….
Induction to Grid Computing and the National Grid Service
18
Oracle and the NGS (1)
• The NGS core nodes, from the outset, have been
partitioned into compute and data clusters.
• As the NGS matures the requirement for data hosting
will grow
• Oracle database: for both users and services offered by
the NGS.
• The RAL and Manchester sites are designated as the
data clusters with each site having the ability to dedicate
up to eight nodes for use by Oracle.
Induction to Grid Computing and the National Grid Service
19
Oracle and the NGS (2)
Support
• Additional application needed after joining the
NGS
• All enquiries and production support for the
Oracle service is via the Grid Operations
Support Centre (GOSC)
– 9-5 Operational support (monitoring, notification,
maintenance) other times best endeavours basis.
Induction to Grid Computing and the National Grid Service
20
Data services on NGS
Simple data files
• Middleware supporting
– Replica files
– Logical filenames
– Catalogue: maps logical
name to physical storage
device/file
– Virtual filesystems,
POSIX-like I/O
• Storage Resource
Broker
Structured data
– RDBMS, XML databases
• Require extendable
middleware tools to support
– Move computation near to
data
– easy access, controlled by AA
– integration and federation
• OGSA -DAI
Induction to Grid Computing and the National Grid Service
21
OGSA-DAI
www.ogsadai.org
Induction to Grid Computing and the National Grid Service
22
What is OGSA-DAI?
• The Open Grid Services Architecture Data Access and
Integration project is concerned with constructing
middleware to assist with access and integration of data
from separate data sources via the grid.
• The project was conceived by the UK Database Task
Force and is working closely with the Global Grid Forum
DAIS-WG and the Globus team.
Induction to Grid Computing and the National Grid Service
23
OGSA-DAI Motivation
• OGSA-DAI is motivated by the need to:
– Provide an extensible framework for easily integrating data
resources on to Grids.
– Provide for data discovery from previously unknown locations.
– Allow different types of data models from distributed data
resources to be easily integrated to Grid applications.
– Allow data to be accessed through uniform interfaces.
– Facilitate the integration of data from various sources to
obtain the required information.
– …
Induction to Grid Computing and the National Grid Service
24
OGSA-DAI Provides
• Access to and updating of data resources
• Exposure of Data Resources to the Grid
• Additional data manipulation functionality at the
service level
• Uniform access to disparate, heterogeneous
data resources
– Does not hide underlying data model
• Data resources exposed through services
– Clients interact with these services
Induction to Grid Computing and the National Grid Service
25
Interacting with Data
Resources
• Activity: The data resource manipulation, data
transformation and delivery actions that the client wants
the service to perform.
– Think of sending the job to the data not the data to the job.
• Perform documents: Used by clients to specify to the
services the activities they want executed.
• Response documents: Used by the services to inform
clients as to the status of execution of their Perform
documents and, often, to also return data to a client.
Induction to Grid Computing and the National Grid Service
26
OGSA-DAI Deck of Activities
Induction to Grid Computing and the National Grid Service
27
OGSA-DAI and the NGS
• the OGSA-DAI deployment on the NGS is being actively
developed
• users should expect that procedures may change – it
does not reflect the commitment NGS has to providing a
service.
• Initially the Manchester JISC data cluster has been
charged with deploying the OGSA-DAI service
Induction to Grid Computing and the National Grid Service
28
Storage Resource Broker
Induction to Grid Computing and the National Grid Service
29
SRB Projects
•
•
•
•
•
•
•
•
Digital Libraries
–
–
UCB, Umich, UCSB, Stanford,CDL
NSF NSDL - UCAR / DLESE
NASA Information Power Grid
Astronomy
–
–
National Virtual Observatory
2MASS Project (2 Micron All Sky Survey)
Particle Physics
–
–
–
Particle Physics Data Grid (DOE)
GriPhyN
SLAC Synchrotron Data Repository
Medicine
–
Digital Embryo (NLM)
Earth Systems Sciences
–
–
ESIPS
LTER
Persistent Archives
–
–
NARA
LOC
Neuro Science & Molecular Science
–
–
TeleScience/NCMIR, BIRN
SLAC, AfCS, …
Over 90 Tera Bytes in 16 million files
Induction to Grid Computing and the National Grid Service
30
What is SRB?
• Storage Resource Broker (SRB) is a software
product developed by the San Diego
Supercomputing Centre (SDSC).
• Allows users to access files and database
objects across a distributed environment.
• Actual physical location and way the data is
stored is abstracted from the user
• Allows the user to add user defined metadata
describing the scientific content of the
information
Induction to Grid Computing and the National Grid Service
31
How SRB Works
• 4 major components:
MCAT
Database
c
d
MCAT
Server
b
e
f
SRB A
Server
SRB B
Server
– The Metadata Catalogue
(MCAT)
– The MCAT-Enabled
SRB Server
– The SRB Storage Server
– The SRB Client
g
a
SRB
Client
Induction to Grid Computing and the National Grid Service
32
SRB Client Tools
• Provide a user interface to send
requests to the SRB server.
• 4 main interfaces:
–
–
–
–
Command line (S-Commands)
MS Windows (InQ)
Web based (MySRB).
Java (JARGON)
• Web Services (MATRIX)
Induction to Grid Computing and the National Grid Service
33
Planned Deployment on
NGS
Disk
Farm
Database Servers @ Manchester
MCAT
DB1
SRB
Server
DB n
MCAT Server @ Manchester
Online Replication
Failover link
User
MCAT Server @ RAL
MCAT
DB1
SRB
Server
DB n
Database Servers @ RAL
SRB server @ Leeds
Resource Driver
Disk
Farm
SRB server @ RAL
Resource Driver
Disk
Farm
SRB server @ Oxford
Resource Driver
SRB server @ HPCX
Resource Driver
Disk Farm
Induction to Grid Computing and the National Grid Service
Disk
Farm
34
GridFTP
Induction to Grid Computing and the National Grid Service
35
What is GridFTP?
• A secure, robust, fast, efficient, standards based, widely
accepted data transfer protocol
• A Protocol
– Multiple independent implementations can interoperate
• This works. Both the Condor Project at Uwis and Fermi Lab have
home grown servers that work with ours.
• Lots of people have developed clients independent of the Globus
Project.
• Globus also supply a reference implementation:
– Server
– Client tools (globus-url-copy)
– Development Libraries
Induction to Grid Computing and the National Grid Service
36
Summary
• Computation services
– Globus Toolkit 2
• Data services
–
–
–
–
ORACLE
SRB
OGSA-DAI
GridFTP
• Collaboration services
– the portal
Induction to Grid Computing and the National Grid Service
37
Download