Middleware emerging onto the NGS: Resource Broker -support.ac.uk

advertisement
http://www.grid-support.ac.uk
http://www.ngs.ac.uk
Middleware emerging onto
the NGS: Resource Broker
Mike Mineter
mjm@nesc.ac.uk
http://www.nesc.ac.uk/
http://www.pparc.ac.uk/
http://www.eu-egee.org/
Outline
• NGS middleware : Toolkits inviting development of
higher level services
– By projects – e.g. RealityGrid and BRIDGES
– For deployment as NGS services
• What is a Resource Broker?
• Where does it come from?
– LCG-2 (= EGEE-0)
– Providing production service for LCG-2
– Being configured for the NGS
• Current LCG-2 activity
2
Resource broker
• On the current NGS we have
– GRAM to submit jobs
– Information service to tell us what queues are busy
• The RB takes the work out of deciding where to
run a job
• First step: the LCG-2 RB is being added to the
NGS
(LCG = Large Hadron Collider Compute grid)
3
Current production
m’ware: LCG-2
Application level services
User interfaces
Applications
EU DataGrid
“Collective” services
App monitoring system
VDT (Condor, Globus, GLUE)
User access
Data management
“Basic” services
Information system
NFS, …
RedHat Linux
Workload management
Operating system
Information schema
System software
File system
Data transfer
Security
PBS, Condor, LSF,…
Local scheduler
Hardware
Computing cluster
Network resources
HPSS, CASTOR…
Data storage
4
Major components
“User
interface”
Input “sandbox”
Output “sandbox”
DataSets info
Replica
Catalogue
Information
Service
Resource
Broker
Publish
Logging &
Book-keeping
Job Query
Job Submit Event
Author.
&Authen.
Storage
Element
Job Status
Computing
Element
Network
Server
RB node
Replica
Location
Server
UI
Workload
Manager
Inform.
Service
Job Contr.
Characts.
& status
Computing
Element
Storage
Element
6
Job
Status
RB node
Replica
Location
Server
Network
Server
submitted
UI
Workload
Manager
UI: allows users to
access the functionalities
of the WMS
(via command line, GUI,
C++ and Java APIs)
Computing
Element
Inform.
Service
Job Contr.
CondorG
CE characts
& status
SE characts
& status
Storage
Element
7
edg-job-submit myjob.jdl
Myjob.jdl
UI
Job
Statu
s
RB node
submitted
JobType = “Normal”;
Replica
Network
Location
Executable = "$(CMS)/exe/sum.exe";
Server
Server
InputSandbox = {"/home/user/WP1testC","/home/file*”,
"/home/user/DATA/*"};
OutputSandbox = {“sim.err”, “test.out”, “sim.log"};
Workload
Requirements =Manager
other. GlueHostOperatingSystemName
==
Inform.
“linux" &&
Service
other. GlueHostOperatingSystemRelease == "Red Hat 7.3“ &&
other.GlueCEPolicyMaxCPUTime > 10000;
Job Contr.
Rank = other.GlueCEStateFreeCPUs;
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Job Description Language
(JDL) to specify job
Storage
characteristics and
Element
requirements
8
Job
RB node
Network
Server
Job
NS: network daemon Status
responsible for accepting submitted
Replica
incoming requests
Location
Server
waiting
UI
Input
Sandbox
files
RB
storage
Workload
Manager
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
9
Job
Status
RB node
Job submission
Replica
Location
Server
Network
Server
submitted
waiting
Job
UI
RB
storage
Workload
manager
WM: acts to
satisfy the request
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
10
Job submission
Network
Server
UI
RB
storage
Job Status
RB node
Workload
Manager
Job Contr.
CondorG
Replica
Location
Server
MatchMaker/
Broker
Where must
job be
executed ?
waiting
Inform.
thisService
CE characts
& status
Computing
Element
submitted
SE characts
& status
Storage
Element
11
Job submission
Matchmaker: responsible
Network
to find the “best” CEServer
UIfor a job
RB
storage
Job
Status
RB node
MatchMaker/
Broker
Workload
Manager
Replica
Location
Server
submitted
waiting
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
12
Job
Job
Status
Where are (which
RB nodeSEs)
submission the needed data ?
Network
Server
MatchMaker/
Broker
UI
RB
storage
Workload
Manager
Replica
Location
Server
submitted
waiting
Inform.
Service
Job Contr.
- What
CondorG
is the
status of the
characts
Grid ? CE
& status
Computing
Element
SE characts
& status
Storage
Element
13
Job
Status
RB node
Job submission
Network
Server
MatchMaker/
Broker
UI
RB
storage
Workload
Manager
CE choice
Replica
Location
Server
submitted
waiting
Inform.
Service
Job Contr.
CondorG
CE characts
& status
Computing
Element
SE characts
& status
Storage
Element
14
Job
Status
RB node
Job submission
Replica
Location
Server
Network
Server
submitted
waiting
UI
RB
storage
Workload
Manager
Inform.
Service
Job
Adapter
Job Contr.
CondorG
characts
SE characts
Job Adapter: responsibleCE
for
“touches”
& statusthe final
& status
to the job before performing submission
(e.g. creation of wrapper script, PFN, etc.)
Computing
Element
Storage
Element
15
Job
Status
RB node
Job submission
submitted
Replica
Location
Server
Network
Server
waiting
UI
RB
storage
ready
Workload
Manager
Inform.
Service
Job
Job Contr.
Job Controller: responsible for the
actual job management
operations (done via
Computing
CondorG)
Element
CE characts
& status
SE characts
& status
Storage
Element
16
Job
Status
RB node
Job submission
Replica
Location
Server
Network
Server
UI
RB
storage
submitted
waiting
ready
Workload
Manager
Inform.
Service
scheduled
Job Contr.
CondorG
CE characts
& status
SE characts
& status
Job
Computing
Element
Storage
Element
17
“Compute element” – reminder!
Job request
Logging
Logging
Globus gatekeeper
I.S.
Info
system
gridmapfile
Grid gate node
Local resource management system:
Condor / PBS / LSF master
Homogeneous set of
worker nodes
18
Job
Status
RB node
Job submission
Replica
Location
Server
Network
Server
UI
RB
storage
submitted
waiting
ready
Workload
Manager
Inform.
Service
scheduled
Job Contr.
CondorG
running
Input
Sandbox
files
“Grid enabled”
data transfers/
accesses
Computing
Element
Job
Storage
Element
19
Job
Status
RB node
Job submission
Network
Server
Replica
Location
Server
submitted
waiting
UI
RB
storage
Output
Sandbox
files
Computing
Element
Workload
Manager
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
done
Storage
Element
20
Job
Status
RB node
edg-job-get-output <dg-job-id>
Job submission
Network
Server
Replica
Location
Server
submitted
waiting
UI
RB
storage
Workload
Manager
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
done
Computing
Element
Storage
Element
21
Job submission
submitted
Network
Server
UI
RB
storage
Output
Sandbox
files
Job
Status
RB node
Workload
Manager
Replica
Location
Server
waiting
ready
Inform.
Service
Job Contr.
CondorG
scheduled
running
done
cleared
Computing
Element
Storage
Element
22
RB node
Job monitoring
edg-job-status <dg-job-id>
edg-job-get-logging-info <dg-job-id>
UI
LB: receives and stores
job events; processes
corresponding job status
Network
Server
Workload
Manager
Job
status
Job Contr.
CondorG
Logging &
Bookkeeping
Log
Monitor
Log of
job events
LM: parses CondorG log
file (where CondorG logs
info about jobs) and notifies LB
Computing
Element
23
LCG-2 and NGS
• LCG-2 replica management:
– Logical file names, mapped by catalogue to
multiple physical files
• Storage element
– Corresponds to NGS data node (approx.)
• Compute element
– A batch queue – PBS or Condor for example
• Information service
– Same middleware and GLUE schema are used
24
More about the RB
• Developed by the European DataGrid project, EDG then
“hardened” by LCG, and now one of the sources for the
EGEE middleware (next talk)
• Uses components of Condor
– matchmaker and Condor-G
• Try the GENIUS portal on GILDA
– GILDA is a dissemination grid running the LCG-2 middleware
– Demo site: https://grid-demo.ct.infn.it/
• And look at
http://lcg.web.cern.ch/LCG/
http://www.hep.ph.ic.ac.uk/escience/projects/demo/index.html
25
Implications for the NGS
• Are being worked out!
• Integration with NGS core nodes in progress
• “UI” requirements??:
– LCG user interface + OGSA-DAI + SRB client
– Lighter-weight alternatives?
• To packaging?
• For client software
26
Summary
• The resource broker receives a job description
in JDL
• It choose a batch queue for job submisison
• Its an example of the higher services that will be
deployed for the NGS, built upon the current
toolkits
27
Download