Oxford University e-Science Centre CampusGrid Workshop Paul Jeffreys and Peter Clarke 16/17 June 2005 NeSC paul.jeffreys@oucs.ox.ac.uk 1 16/17 June 2005 Thank you Oxford University e-Science Centre • To large group who helped define the agenda • Might seem straight-forward! • • • • • • To Peter for co-leading To NeSC for hosting, publicising and coordinating To Core Programme for dinner To Peter Couvares and Condor Team for their interest/participation To all the speakers – relatively short notice To you for attending - and the contributions you will make Looking forward to the next two days… 2 16/17 June 2005 Framework & SR2004 Oxford University e-Science Centre • Funding – SR2004, SR2006 – fEC Extract from Science & innovation investment framework 2004 - 2014 • 2.25 The Government will therefore work with interested funders and stakeholders to consider the national e-infrastructure (hardware, networks, communications technology) necessary to deliver an effective system. – Due to the potential importance of a national e-infrastructure to the needs of the research base and its supporting infrastructure in meeting the Government’s broader science and innovation goals, as a first step OST will take a lead in taking forward discussion and development of proposals for action and funding, drawing in other funders and stakeholders as necessary. • OST e-Infrastructure Group formed for SR2006 3 16/17 June 2005 Oxford University e-Science Centre SR2004 and the Core Programme: Towards a Persistent e-Infrastructure Tony Hey 4 16/17 June 2005 e-Infrastructure Components Oxford University e-Science Centre • Activity: National Grid Service and Operational Support – Deployed National Grid Service (NGS) and established Grid Operation Support Centre (GOSC) – Need to extend funding for this activity beyond present 2 years – CP should provide funding for hardware renewal: JISC should fund support services – Scale of funding required £8m 5 16/17 June 2005 Personal View Oxford University e-Science Centre • CampusGrids are where e-Science meets University • Essential component of University Computing Services… – as is e-Infrastructure in general • Reach point where will be asked why we are not running CampusGrid? • But precisely what a CampusGrid actually is, and the best way to provide the service needs: – – – – – refinement of ideas technology development costs analyses security clarification development of best practice 6 16/17 June 2005 Campus Grid Workshop at NeSC Oxford University e-Science Centre • Advertised on NeSc web site.. http://www.nesc.ac.uk/esi/events/556/ “Meeting to explore the use of CampusGrids and to inform sites how to set up such CampusGrids for the first time. The aims of the meeting are: to find out which centres are working in this area, the technologies that are being used, what might be done to share best practice, to look for commonality where appropriate, to help new sites create their own Campusgrids, and to consider how the centres may be interconnected to each other and to the UK National Grid Service The target audience will be mainly computing service departments and e-Scientists directly involved in the development and operation of CampusGrids” 7 16/17 June 2005 Thursday objectives Oxford University e-Science Centre - Overview of existing CampusGrid activities, determine best practice, understand security issues, explore full economic costing, outline challenges in setting up a CampusGrid, discuss how to connect with the National Grid Service, and review general methods for federation of resources. 8 16/17 June 2005 Thursday 16 June Oxford University e-Science Centre 10.30 11.00 11.15 12.00 12.30 13.00 Coffee Introduction, aims, deliverables Overview of Condor and Globus Review of how to deal with Firewalls within a CampusGrid environment Deploying Grids on Campus Networks Lunch 14.00 14.30 15.00 15.45 Local Security issues Full Economic Costing on CampusGrids Federation of Resources and building more Complex Grids Tea Bruce Beckles Rhys Newman Steve Newhouse 16.00 16.30 How to Connect CampusGrids to NGS Panel Discussion, overview of CampusGrid activities across UK, Q&A Neil Geddes Facilitated by Peter Clarke 19.00 Dinner funded by Core Programme Paul Jeffreys John Wakelin Bruce Beckles Andrew Cormack Set of questions for discussion over dinner: What is the broad consensus on the definition of a CampusGrid? How different is a CampusGrid from a Condor Pool? What are the next steps to realise the vision of full CampusGrids? 9 16/17 June 2005 Friday objectives Oxford University e-Science Centre - Overview issues relating to sharing data within a CampusGrid environment, find consensus on what defines a CampusGrid (as preparation for the BOF at the All Hands Meeting); and review exciting new technologies which have potential for a second generation of CampusGrids. 10 16/17 June 2005 Friday 17 June Oxford University e-Science Centre 09.00 09.30 10.00 10.30 Review of dealing with data in CampusGrid environment Condor and Stork Review of ‘Dinner questions’ as input into BOF Coffee Kerstin Kleese Peter Couvares John Kewley/Bruce Beckles 11.00 11.30 12.00 12.30 13.00 Overview of New technologies for Building CampusGrids Xen Inferno GridMP at ISIS Lunch Rhys Newman Andrew Warfield Jon Blower Tom Griffin 14.00 14.30 15.00 Dynamic Service Deployment Virtualisation at Oxford Tea Paul Watson Rhys Newman 15.30 16.00 Conclusions, deliverables from workshop, final discussion End of workshop Paul Jeffreys 11 16/17 June 2005 Condor/Globus/Bristol Oxford University e-Science Centre Condor: High throughput computing system Class Ads - core to operations Condor-G: allows use of Condor commands in Globus environment Can be used to perform matchmaking -> resource broker Globus: Offers applications, protocols and APIs and GSI - certificates SRB: Interface to heterogeneous data storage resources Bristol: Centralised access to disparate resources (including wider than Campus!) Dedicated departmental clusters (Windows Condor pools not requested – no libs.) Clusters within single firewall Virtual Data Toolkit – bundles tools Virtual Organisation Manager Use SGE and PBS as schedulers on clusters Resource Broker has access to Bristol and NGS 12 16/17 June 2005 Firewalls and CampusGrids Oxford University e-Science Centre Firewalls: Used to: protect machines/prevent unwanted traffic/monitor traffic/control flow Threat-asset model required Typical firewall deployments: Institutional firewall around perimeter of institution; may have DMZ May be divisional firewalls in addition – challenge for CampusGrid… Recommendations for CampusGrids: Provide external access, if any, via a small number of gateway machines To cross divisional boundaries: tunnel, VLANs, centralise job submission 13 16/17 June 2005 Deploying Grids on Campus Networks Oxford University e-Science Centre Campus Network World: Service oriented 28 minutes to be hacked on average Security document – UKERNA Grid Development Technical Guide General principles; tools, techniques and resources Specific package issues; Globus, Condor Make CampusGrids easier: Standard system image(s) – automatic updates, patches etc Recognisable (private?) network addresses For ease of configuration – not host security NGS should have its own subnet Develop incident detection plan Advice on further audiences for “Deploying Grids” document hardcopies 14 16/17 June 2005 Local security issues in CampusGrid envs Oxford University e-Science Centre Very helpful Security Checklist • Automate deployment, monitoring and security – only way • Use your existing authentication infrastructure – Avoid digital certificates and GSI; else make it transparent to users (this was contentious – can you have authentication for CampusGrid and external?) • Authorisation: – must scale well – Decide between central control and delegated control • Auditing is vital to support your security policy – Use system’s auditing facilities wherever possible – Intrusion Detection Systems can have too many false positives • Have as few access points as possible • Centrally manage workstations; keep all software on workstations up-to-date • Control local environment as much as you can 15 16/17 June 2005 fEC on CampusGrids Oxford University e-Science Centre Propose GHzHr is the basic measure of cost • Absolute rock bottom cost: GHzHr 0.12p • http://www.host-it.co.uk commercial provider GHzHr 1.01p • Best estimate of cost from spreadsheet is GHzHr 0.5p – Includes all costs, sys admin etc • Percentage cost of CG compared to buying machines 30% • http://e-science.ox.ac.uk/technical/documents/CampusGRIDCosting.xls 16 16/17 June 2005 Federation of resources – more complex Grids Oxford University e-Science Centre Grid middleware:- gLite, OMII, GT4, UnitedDevices, GridSystems, CROWN,.. Requirements:Research vs supporting research Bleeding edge vs production Commercial vs Open Source ETF: Look at inter-institutional Condor pool UDDI registries to publish services GridSystems Need to wait for ETF to reach conclusions 17 16/17 June 2005 Connecting CampusGrids to NGS Oxford University e-Science Centre To join NGS: Ask, get assigned a buddy, install minimum s/w stack, pass monitoring test, agree to security and AUP, define service level, approve by GOSC Board Why NGS? Common tools, procedures and interfaces Early adopter system for UK research grids Why join? Users – resources as services, common interfaces (incl. Hector) Leverage national expertise Future Common interface to range of national facilities October 2006 – technology refresh for core nodes Common interface to NGS and CampusGrids not in place fully 18 16/17 June 2005 Panel: CampusGrids across UK.. Oxford University e-Science Centre • CamGrid Env 2: (Mark C) – 125 nodes (desktops and farms); each CUDN-only routable address – Authentication not strong; handled at submit machine – Vanilla and standard universe jobs; also cluster for MPI universe jobs • Reading (Jon Blower) – Condor under Linux and Windows (50-100 processors); vanilla and MPI – Utilising dual boot workstations and ‘diskless’ clients (also CoLinux) – Investigating Inferno Grid • Newcastle (Mark Hewitt) – Condor based as first step, 200 nodes – mainly Linux; running 4 months – Focus on student clusters with 1500 machines, and expand to 10,000 nodes – Aim to add database/filestores/workflow enactors • Liverpool (Cliff Addison) – Prototype running for 18 months, 900 processors. Condor and Condor-G 19 16/17 June 2005 ..How to connect CampusGrids to NGS Oxford University e-Science Centre • UW-Madison Campus Grid – Comp Sci depart: (opportunistic Windows) 1400, GLOW: 1500 dedicated (Linux) CPUs • Cardiff – 900 processors – Few users – Greater aspirations • How many system support staff/application support staff – – – – – Cambridge: 1 plus local support from departments L’pool: 2 total Newcastle: 2*0.25FTE plus little support (3 for CCE) Cardiff: 0.1FTE (There was more details about staff effort from UCL?) • 75% of HPC resources run HTC 20 16/17 June 2005 Data in CampusGrid Environment Oxford University e-Science Centre eMinerals given as example Scientist has common interface across resources in 6 institutions through minigrid mySRB; need suitable metadata-> CCLRC Scientific Metadata Model (for general discovery): Keywords/provenance/conditions/description/location (SRB)/references Metadata editor as well Data Portal Enables scientist to search for studies and data from within metadata 21 16/17 June 2005 STORK and NeST Oxford University e-Science Centre STORK What Condor is for computational jobs, STORK is for data placement Scheduler for data placement activities in the Grid (equiv. to batch system) Access heterogeneous resources (SRB, GridFTP server, ..) Recover from failures DAGMAN required to run both Condor and STORK NeST A lightweight, portable storage manager for data placement activities on Grid Not competitor for SRB! There are best effort Lots and guaranteed lots 22 16/17 June 2005 Conclusions: CampusGrid/ComputePool Oxford University e-Science Centre CampusGrid Heterogeneous non-dedicated (and dedicated) resources Cycle stealing of heterogeneous resources which already exist, using existing networks Enable resource sharing and (over time) cost saving A method for federating compute and data resources across a campis, abstracting the interface for users Compute Pool Dedicated resource with specific purpose CG needs buy-in from Computer Services What next? CG must become easier to use for users CG smaller overhead to run securely (for admins and, especially – users) 23 16/17 June 2005 Questions Oxford University e-Science Centre • Are CGs cycle stealing? • Are turn-key solutions possible – or desirable? • How does a CampusGrid support a VLE/VRE 24 16/17 June 2005 Conclusions Oxford University e-Science Centre • • • • • Compute Pool is a resource while a CG is a (centrally) managed service Data storage, access, and retrieval are important (not just cycle stealing) User interfaces must be simpler The cycles they provide are not free; policy for funding required Computer services should be 1st class stakeholders in the definition and operation of the CG • CGs need to be set up to avoid impinging on resources’ original intended use 25 16/17 June 2005 Overview of new technologies Oxford University e-Science Centre Virtualisation is the key to offering a CG which can be shared, simple to administrate, offers users absolute control of their environment, owners retain absolute control of their machines, and uses non-dedicated resources A virtual machine completely isolates the Grid Process from the underlying OS VMWare performance – 97% native performance Not marketed as Grid technology 2006will be released Intel VTx (Vanterpol) – hw support for visualisation Comment: single node problem sorted; but not many-node issues: DoS, nodes which die, etc need to be addressed 26 16/17 June 2005 Xen Oxford University e-Science Centre Hardware virtualisation treats OS as a component Xen is a virtual machine manager; an isolation kernel Recently included in mainline Linux Offers: consolidation/avoids downtime/re-balance workload/enforces security Paravirtualisation – optimises OS to run in a virtualised environment Not ideal for low-latency MPI-style communications 27 16/17 June 2005 Inferno Oxford University e-Science Centre Inferno Grid – free to academia Condor-like Inferno Grid based on Inferno OS Inferno OS – built for distributing computing, very lightweight Runs as emulated application identically on multiple platforms Pros Easy to install and maintain; industry quality Only need one port open Ease of streaming, transparency of “data and compute” as data transfer. Cons Hard to get into; poor documentation Not have all of Condor features: migration, MPI universe No Globus integration yet Question over scalability at 1000 or more Not really used in anger yet, not let proper users loose! 28 16/17 June 2005 GridMP at ISIS Oxford University e-Science Centre Uses United Devices commercial software Run by local staff as opposed to computing services Very quick and easy to set up 80 licences £15k Applied for additional 400 licences (£60k) Have to make some changes to the applications (wrap it; not change executable) Investment worthwhile Mercifully few problems encountered 29 16/17 June 2005 Dynasoar Oxford University e-Science Centre Possible to build grid applications entirely from services Share cost of service deployment over multiple requests Dynasoar can be built as a high-level infrastructure Domain specialists can build applications to a deployment environment Web service provider: Has services available to be dynamically deployed – if not already available Cost of service deployment shared across multiple requests Can have:web service providers <-> marketplace <-> host providers Can move computation to the data – sometimes much more efficient Virtual machines -> right approach (as presented earlier) Exploring Virtual Machines as a general service deployment mechanism Exploring tripartite security model (different levels of trust) 30 16/17 June 2005 OxGrid Oxford University e-Science Centre OxGrid – central management of virtualised resources around OU All key technologies and equipment available in principle OxGrid Client: Assess impact on donors for Grid Looks at CPU usage and storage A VM can be frozen on command to provide OS level check-pointing With virtualisation computing becomes a liquid commodity -> trading 31 16/17 June 2005 Workshop Conclusions Oxford University e-Science Centre • Clearer views on CampusGrid, Compute Cluster – way ahead – Contribution to BOF • Collection of developments re. new technologies – Virtualisation agreed to be important direction – Roadmap… Thank you to all speakers, all participants. See you at AHM!! 32 16/17 June 2005