Workflow, Portals, Brokers & Schedulers: ICENI: A Next Generation Grid Middleware Contents

advertisement
Workflow, Portals, Brokers & Schedulers:
ICENI: A Next Generation Grid Middleware
Steven Newhouse
Technical Director
London e-Science Centre,
Imperial College London, UK
Contents
• The Grid – A few definitions
• Enabling Applied Science
– Capturing Requirements
– Exploiting Services
• ICENI: An Integrated Grid Middleware
• Conclusions
1
What is the Grid?
“ Grid computing [is] distinguished from
conventional distributed computing by its
focus on large-scale resource sharing,
innovative applications, and, in some
cases, high-performance orientation...we
review the "Grid problem", which we
define as flexible, secure, coordinated
resource sharing among dynamic
collections of individuals, institutions, and
resources - what we refer to as virtual
organizations."
From "The Anatomy of the Grid: Enabling Scalable Virtual
Organizations" by Foster, Kesselman and Tuecke
Why Grids & Why Now?
• Large-scale science and engineering are done
through the interaction of people, heterogeneous
computing resources, information systems, and
instruments, all of which are geographically and
organizationally dispersed.
• The overall motivation for “Grids” is to facilitate the
routine interactions of these resources in order to
support large-scale science and engineering.
• Technology Drivers:
– CPU: doubling every 18 months
– Network: doubling every 9 month
– Result: ubiquitous universal connectivity
2
What is e-Science?
•
•
•
•
•
Applied Scientists are becoming e-scientists
Dependent on remote electronic services
Utilising scarce expensive instruments
Involvement in global collaborations
Interaction through mobile devices
Urgent need for an integrated
environment to support this actvity
Enabling Applied Science
•
•
•
•
Do what you do today… but better
Transparently exploit available resources
Pervasive and persistent environment
In essence a two stage problem:
– Capture your requirements & intents
– Map these to the accessible services
3
Human Grid Interface - HGI
• Moving to portable mobile devices
– Phones, PDAs, Laptops
• More resources available for our use
– No longer a permanent shell to a specific resource
• Need more resources to do our work
– Multiple: data in, analyse & data out
• Rapidly moving beyond human comprehension
Grid Architecture
Scripting
Languages
Portal
Problem
Solving Env.
User
Workflow
Broker A
Scheduler
Resource A
Scheduler
Resource B
Broker n
Scheduler
Resource m
4
Portals
• Exploit ubiquitous web browser technology
– Well established client side standards (HTML)
– Different browsers with different look & feel
• Server side standardisation underway
– Standard Portal Specification
• Improve portability (c.f Java Beans)
– Strong Industry drive and support
• Examples:
– DataPortal, HPCPortal, EPIC, ICENI
– Recent NeSC workshop
Grid Engine Portal within EPIC
5
Workflow
• Capture user intentions & requirements
– Key interaction between applied & computing scientists
• Ought to be abstract to retain flexibility
– Specify service interface NOT location
• Needs to be complete
– Execute & forget (but monitor & report progress)
• Needs to be natural
– If it can’t be used it won’t be used
• See recent NeSC workshop
Brokering
• Need to enact abstract workflow
– Discover compatible services
• Service selection against multiple criteria
– Performance
– Cost
• Examples:
– Performance: Manchester, Warwick & Imperial
– Cost: Imperial & Manchester
6
Scheduling
• Interaction with the low-level fabric
– Ensure requested action(s) takes place
• Need to schedule (reserve) across:
– Networks: Dedicated bandwidth
– Storage: Provide space for generated data
– Compute: Perform data analysis
– Visualisation: Stream results to local facility
• Active area within GGF
• Recent NeSC workshop
CERN’s Large Hadron Collider
1800 Physicists, 150 Institutes, 32 Countries
100 PB of data by 2010; 50,000 CPUs?
www.griphyn.org
www.ppdg.net
www.eu-datagrid.org
7
‘Simple’ LHC Analysis Problem
• Scientist wants to do an analysis:
– Move data to local compute facility
– Perform analysis on cluster local to data
– Move data & analysis to remote resources
• Data is replicated around the world
– Mapping between logical and real files
– Permanent and temporary data caches
• Information is key to decision making
– Location & availability of compute & network resources
London e-Science Centre
‘Enabling the e-Scientist’
• Industrial Collaborations:
– Sun Centre of Excellence in e-Science
– Intel Virtual European Centre of Grid Computing
• Cross-campus collaborations:
– Bioinformatics
– High Energy Physics
– Computational Engineering
• Projects:
– e-Science Portal, Markets for Computational Services
– OGSA UK Grid, Climate Modelling, Protein Annotation
– Workflow for Grid Services, Materials Modelling, …
• Specialisation: Next Generation Grid Middleware
8
ICENI: Imperial College e-Science
Network Infrastructure
•
•
•
•
•
•
•
•
Integrated Grid Middleware Solution
Interoperability between architectures, APIs
Added value layer to other middleware
Usability: Interactive Grid Workflows
Deployment: Complete Install from Webstart
Role and policy driven security
Foundation for higher-level Services and
Autonomous Composition
ICENI Open Source licence (extended SISSL)
http://www.lesc.ic.ac.uk/iceni/
ICENI Release 1.0 available for download
ICENI Strands
Service Oriented Architecture
Workflow Guided Scheduling
Role Based Access & Security
ICENI
Component Programming Model
Semantic Adaptation
Deployment
Usability
9
The ICENI Stack
Higher
Level
Services
Client Side Tools: Netbeans / Portal
Runtime Component Framework
OGSA Gateway
Execution Mechanism
Security
Layer
Service
Oriented
Architecture
Domain & Identity Management
Service API
Discovery API
Core API
Jini
Jxta
OGSI
Implementation Fabric
Focus on Deployment:
Installation Mechanism and Control Centre
Client Requirements:
• JRE 1.4.2
• Java Web Start (inc.)
• Internet Access
Centralised configuration and service execution
10
Focus on Usability:
ICENI Netbeans OGSA Service Browser
Focus on Usability:
ICENI Portal
11
Augmented Component Programming
Model
Matrix
Meaning
Linear
Solver
Vector
How components can
beMatrix
linked together
Vector
Matrix
Jacobi
LU
Vector
Vector
Vector
Vector
Behaviour
How they interact
with each other
Pull Model
Push Model
Parallel LU
Implementation
How they will perform
on different resources
Sequential LU
Dynamic Application Construction
Data In
User
System
Meaning
Data Out
Behaviour
Implementation
[Sparc,
[RH8, Solaris]
Linux]
12
Inferred temporal view of workflow
Width:
Resource
Usage
Length:
Execution
Time
Added Value:
Dynamic Discovery & Composition
Drag-and-drop
running component
Deployed application
Application
Visualisation
Server
Register as running
component services
in the NetBeans user
interface
Add new advertised
components
Execute to create new component
instances and connect to
application
13
Collaborative Visualisation & Steering
integrated with ICENI driven Access Grid!
Service Oriented Architecture
Application
component
Visualisation
server
Dataset B
Dataset A & B
Dataset A
Rendering
engine 1
Visualisation client 1
Streamed to Access Grid
Rendering
engine 2
Visualisation client
2
View of
dataset A
View of
dataset B
Focus on Deployment:
ICENI Role Management Utility
• Managing role details
– Use ICENI Role Management Utility
– Remote access through ICENI SOA
14
Job Proxies
• Job Proxy Certificates
– Valid only for the duration of a single job
– X.509 based: signed by user’s master cert.
– Increased security & flexibility
– Embedded policies
JobProxy
ProxyCertificate
Certificate
Job
Version:33
Version:
S/N:XX-XX-XX-XX
XX-XX-XX-XX
S/N:
Issuer:/C=UK/O=CA/OU=CA1/L=London/CN=jhc02
/C=UK/O=CA/OU=CA1/L=London/CN=jhc02
Issuer:
IssuerSignature:
Signature:………………………..
………………………..
Issuer
ValidityPeriod:From:
Period:From:00:01
00:0101/01/00
01/01/00
Validity
To:00:00
00:0001/01/01
01/01/01
To:
Subject
DN:
/C=UK/O=Org/OU=A/L=London/CN=jhc02,CN=34534534
Subject DN: /C=UK/O=Org/OU=A/L=London/CN=jhc02,CN=34534534
SubjectPublic
PublicKey:
Key:…………………………………..
…………………………………..
Subject
EmbeddedAccess
AccessPolicy:
Policy:
Embedded
<policy>
<policy>
<allow><location
<locationname=”vostock.doc.ic.ac.uk”/></allow>
name=”vostock.doc.ic.ac.uk”/></allow>
<allow>
</policy>
</policy>
Focus on Usability:
Job Proxies in ICENI
• Job Proxy use configured in Netbeans
15
Delivering e-Science:
Who is using ICENI?
LB3D – (Lattice-Boltzmann 3D)
ICENI provides collaborative visualisation and steering
across the Access Grid
GENIE:
Analysing Thermohaline circulation
•
•
•
•
ƒ
ƒ
Ocean transports heat
through the “global
conveyor belt.”
Heat transport
controls global
climate.
Wish to investigate
strength of model
ocean circulation as a
function of two
external parameters.
Use GENIE-Trainer.
Wish to perform 31×31 = 961 individual simulations.
Each simulation takes ∼4 hours to execute on typical Intel
P3/1GHz, 256MB RAM, machine ⇒
time taken for 961 sequential runs ≈ 163 days!!!
16
e-Science Portal at Imperial College
(EPIC)
• EPIC: Centre
project
• Leverage within
the GENIE portal
• For an experiment
– Create
– Monitor
– Stop
– Retrieve results
Focus on Usability:
Netbeans Component Application Builder
17
Case Study: Parameter Sweep
The binary component will
get executed 10 times
Other components like the argument
constructor or the output and error consoles
will get automatically expanded.
The Solution:
Delivering Grid Computing Resources
• Use flocked Condor pools between SReSC, DoC at
Imperial College London, and LeSC (∼200 Linux and
Solaris nodes).
time taken for 961 Condor runs ≈ 3 days!!!
• Advantages of Condor:
– simulations are nearly parallel.
– automatic check pointing and job migration.
– Condor File Transfer Mechanism.
• Problems:
– Firewalls! Overcame by designating and utilising port ranges
specified by the Condor and firewall admin.
18
Scheduling job over Resources
Job x 961
Scheduler
Job x 331
Job x 630
Condor Launcher
SGE Launcher
Condor
Cluster
The Results:
Scientific Achievements
Intensity of the thermohaline circulation as
a function of freshwater flux between
Atlantic and Pacific oceans (DFWX), and
mid-Atlantic and North Atlantic (DFWY).
Surface air temperature difference
between extreme states (off - on) of the
thermohaline circulation.
North Atlantic 2°C colder when the
circulation is off.
19
Development Infrastructure
• Project Website &
mailing lists
• Daily build
–
–
–
–
Regression tests
On success binaries updated
Regenerated JavaDoc
Deployment tests
• CVS
– Code split across multiple repositories & modules
• Documentation, manuals & user guides
• ICENI Open Source License (Extended SISSL)
Conclusions
• Mid point of e-science programme
– Emergence of usable grid middleware
– Demonstrable use by applied scientists
• CCP have second mover advantage
– Experience from applied & computer scientists
• Way ahead…
– Identify some ‘easy’ demonstrators
– Use cases to inform community
20
Acknowledgements
• Director: Professor John Darlington
• Technical Director: Dr Steven Newhouse
• Research Staff:
–
–
–
–
Anthony Mayer, Nathalie Furmento, Stephen McGough
William Lee, Marko Krznaric, Murtaza Gulamali
Asif Saleem, Laurie Young, Gary Kong, Jeffrey Hau
Angela O’Brien, Jeremy Cohen, Ali Afzal
• Support Staff:
– System:
• Keith Sephton, David McBride
– Operations:
• Susan Brookes, Oliver Jevons
• Contacts:
– E-mail: lesc@ic.ac.uk
– Web: http://www.lesc.imperial.ac.uk
21
Download