Workflow, Portals, Brokers & Schedulers: ICENI: A Next Generation Grid Middleware Steven Newhouse Technical Director London e-Science Centre, Imperial College London, UK Contents • The Grid – A few definitions • Enabling Applied Science – Capturing Requirements – Exploiting Services • ICENI: An Integrated Grid Middleware • Conclusions 1 What is the Grid? “ Grid computing [is] distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation...we review the "Grid problem", which we define as flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations." From "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" by Foster, Kesselman and Tuecke Why Grids & Why Now? • Large-scale science and engineering are done through the interaction of people, heterogeneous computing resources, information systems, and instruments, all of which are geographically and organizationally dispersed. • The overall motivation for “Grids” is to facilitate the routine interactions of these resources in order to support large-scale science and engineering. • Technology Drivers: – CPU: doubling every 18 months – Network: doubling every 9 month – Result: ubiquitous universal connectivity 2 What is e-Science? • • • • • Applied Scientists are becoming e-scientists Dependent on remote electronic services Utilising scarce expensive instruments Involvement in global collaborations Interaction through mobile devices Urgent need for an integrated environment to support this actvity Enabling Applied Science • • • • Do what you do today… but better Transparently exploit available resources Pervasive and persistent environment In essence a two stage problem: – Capture your requirements & intents – Map these to the accessible services 3 Human Grid Interface - HGI • Moving to portable mobile devices – Phones, PDAs, Laptops • More resources available for our use – No longer a permanent shell to a specific resource • Need more resources to do our work – Multiple: data in, analyse & data out • Rapidly moving beyond human comprehension Grid Architecture Scripting Languages Portal Problem Solving Env. User Workflow Broker A Scheduler Resource A Scheduler Resource B Broker n Scheduler Resource m 4 Portals • Exploit ubiquitous web browser technology – Well established client side standards (HTML) – Different browsers with different look & feel • Server side standardisation underway – Standard Portal Specification • Improve portability (c.f Java Beans) – Strong Industry drive and support • Examples: – DataPortal, HPCPortal, EPIC, ICENI – Recent NeSC workshop Grid Engine Portal within EPIC 5 Workflow • Capture user intentions & requirements – Key interaction between applied & computing scientists • Ought to be abstract to retain flexibility – Specify service interface NOT location • Needs to be complete – Execute & forget (but monitor & report progress) • Needs to be natural – If it can’t be used it won’t be used • See recent NeSC workshop Brokering • Need to enact abstract workflow – Discover compatible services • Service selection against multiple criteria – Performance – Cost • Examples: – Performance: Manchester, Warwick & Imperial – Cost: Imperial & Manchester 6 Scheduling • Interaction with the low-level fabric – Ensure requested action(s) takes place • Need to schedule (reserve) across: – Networks: Dedicated bandwidth – Storage: Provide space for generated data – Compute: Perform data analysis – Visualisation: Stream results to local facility • Active area within GGF • Recent NeSC workshop CERN’s Large Hadron Collider 1800 Physicists, 150 Institutes, 32 Countries 100 PB of data by 2010; 50,000 CPUs? www.griphyn.org www.ppdg.net www.eu-datagrid.org 7 ‘Simple’ LHC Analysis Problem • Scientist wants to do an analysis: – Move data to local compute facility – Perform analysis on cluster local to data – Move data & analysis to remote resources • Data is replicated around the world – Mapping between logical and real files – Permanent and temporary data caches • Information is key to decision making – Location & availability of compute & network resources London e-Science Centre ‘Enabling the e-Scientist’ • Industrial Collaborations: – Sun Centre of Excellence in e-Science – Intel Virtual European Centre of Grid Computing • Cross-campus collaborations: – Bioinformatics – High Energy Physics – Computational Engineering • Projects: – e-Science Portal, Markets for Computational Services – OGSA UK Grid, Climate Modelling, Protein Annotation – Workflow for Grid Services, Materials Modelling, … • Specialisation: Next Generation Grid Middleware 8 ICENI: Imperial College e-Science Network Infrastructure • • • • • • • • Integrated Grid Middleware Solution Interoperability between architectures, APIs Added value layer to other middleware Usability: Interactive Grid Workflows Deployment: Complete Install from Webstart Role and policy driven security Foundation for higher-level Services and Autonomous Composition ICENI Open Source licence (extended SISSL) http://www.lesc.ic.ac.uk/iceni/ ICENI Release 1.0 available for download ICENI Strands Service Oriented Architecture Workflow Guided Scheduling Role Based Access & Security ICENI Component Programming Model Semantic Adaptation Deployment Usability 9 The ICENI Stack Higher Level Services Client Side Tools: Netbeans / Portal Runtime Component Framework OGSA Gateway Execution Mechanism Security Layer Service Oriented Architecture Domain & Identity Management Service API Discovery API Core API Jini Jxta OGSI Implementation Fabric Focus on Deployment: Installation Mechanism and Control Centre Client Requirements: • JRE 1.4.2 • Java Web Start (inc.) • Internet Access Centralised configuration and service execution 10 Focus on Usability: ICENI Netbeans OGSA Service Browser Focus on Usability: ICENI Portal 11 Augmented Component Programming Model Matrix Meaning Linear Solver Vector How components can beMatrix linked together Vector Matrix Jacobi LU Vector Vector Vector Vector Behaviour How they interact with each other Pull Model Push Model Parallel LU Implementation How they will perform on different resources Sequential LU Dynamic Application Construction Data In User System Meaning Data Out Behaviour Implementation [Sparc, [RH8, Solaris] Linux] 12 Inferred temporal view of workflow Width: Resource Usage Length: Execution Time Added Value: Dynamic Discovery & Composition Drag-and-drop running component Deployed application Application Visualisation Server Register as running component services in the NetBeans user interface Add new advertised components Execute to create new component instances and connect to application 13 Collaborative Visualisation & Steering integrated with ICENI driven Access Grid! Service Oriented Architecture Application component Visualisation server Dataset B Dataset A & B Dataset A Rendering engine 1 Visualisation client 1 Streamed to Access Grid Rendering engine 2 Visualisation client 2 View of dataset A View of dataset B Focus on Deployment: ICENI Role Management Utility • Managing role details – Use ICENI Role Management Utility – Remote access through ICENI SOA 14 Job Proxies • Job Proxy Certificates – Valid only for the duration of a single job – X.509 based: signed by user’s master cert. – Increased security & flexibility – Embedded policies JobProxy ProxyCertificate Certificate Job Version:33 Version: S/N:XX-XX-XX-XX XX-XX-XX-XX S/N: Issuer:/C=UK/O=CA/OU=CA1/L=London/CN=jhc02 /C=UK/O=CA/OU=CA1/L=London/CN=jhc02 Issuer: IssuerSignature: Signature:……………………….. ……………………….. Issuer ValidityPeriod:From: Period:From:00:01 00:0101/01/00 01/01/00 Validity To:00:00 00:0001/01/01 01/01/01 To: Subject DN: /C=UK/O=Org/OU=A/L=London/CN=jhc02,CN=34534534 Subject DN: /C=UK/O=Org/OU=A/L=London/CN=jhc02,CN=34534534 SubjectPublic PublicKey: Key:………………………………….. ………………………………….. Subject EmbeddedAccess AccessPolicy: Policy: Embedded <policy> <policy> <allow><location <locationname=”vostock.doc.ic.ac.uk”/></allow> name=”vostock.doc.ic.ac.uk”/></allow> <allow> </policy> </policy> Focus on Usability: Job Proxies in ICENI • Job Proxy use configured in Netbeans 15 Delivering e-Science: Who is using ICENI? LB3D – (Lattice-Boltzmann 3D) ICENI provides collaborative visualisation and steering across the Access Grid GENIE: Analysing Thermohaline circulation • • • • Ocean transports heat through the “global conveyor belt.” Heat transport controls global climate. Wish to investigate strength of model ocean circulation as a function of two external parameters. Use GENIE-Trainer. Wish to perform 31×31 = 961 individual simulations. Each simulation takes ∼4 hours to execute on typical Intel P3/1GHz, 256MB RAM, machine ⇒ time taken for 961 sequential runs ≈ 163 days!!! 16 e-Science Portal at Imperial College (EPIC) • EPIC: Centre project • Leverage within the GENIE portal • For an experiment – Create – Monitor – Stop – Retrieve results Focus on Usability: Netbeans Component Application Builder 17 Case Study: Parameter Sweep The binary component will get executed 10 times Other components like the argument constructor or the output and error consoles will get automatically expanded. The Solution: Delivering Grid Computing Resources • Use flocked Condor pools between SReSC, DoC at Imperial College London, and LeSC (∼200 Linux and Solaris nodes). time taken for 961 Condor runs ≈ 3 days!!! • Advantages of Condor: – simulations are nearly parallel. – automatic check pointing and job migration. – Condor File Transfer Mechanism. • Problems: – Firewalls! Overcame by designating and utilising port ranges specified by the Condor and firewall admin. 18 Scheduling job over Resources Job x 961 Scheduler Job x 331 Job x 630 Condor Launcher SGE Launcher Condor Cluster The Results: Scientific Achievements Intensity of the thermohaline circulation as a function of freshwater flux between Atlantic and Pacific oceans (DFWX), and mid-Atlantic and North Atlantic (DFWY). Surface air temperature difference between extreme states (off - on) of the thermohaline circulation. North Atlantic 2°C colder when the circulation is off. 19 Development Infrastructure • Project Website & mailing lists • Daily build – – – – Regression tests On success binaries updated Regenerated JavaDoc Deployment tests • CVS – Code split across multiple repositories & modules • Documentation, manuals & user guides • ICENI Open Source License (Extended SISSL) Conclusions • Mid point of e-science programme – Emergence of usable grid middleware – Demonstrable use by applied scientists • CCP have second mover advantage – Experience from applied & computer scientists • Way ahead… – Identify some ‘easy’ demonstrators – Use cases to inform community 20 Acknowledgements • Director: Professor John Darlington • Technical Director: Dr Steven Newhouse • Research Staff: – – – – Anthony Mayer, Nathalie Furmento, Stephen McGough William Lee, Marko Krznaric, Murtaza Gulamali Asif Saleem, Laurie Young, Gary Kong, Jeffrey Hau Angela O’Brien, Jeremy Cohen, Ali Afzal • Support Staff: – System: • Keith Sephton, David McBride – Operations: • Susan Brookes, Oliver Jevons • Contacts: – E-mail: lesc@ic.ac.uk – Web: http://www.lesc.imperial.ac.uk 21