CampusGrid Workshop Paul Jeffreys and Peter Clarke NeSC

advertisement
Oxford University
e-Science Centre
CampusGrid
Workshop
Paul Jeffreys and Peter Clarke
16/17 June 2005
NeSC
paul.jeffreys@oucs.ox.ac.uk
1
16/17 June 2005
Thank you
Oxford University
e-Science Centre
• To large group who helped define the agenda
• Might seem straight-forward!
•
•
•
•
•
•
To Peter for co-leading
To NeSC for hosting, publicising and coordinating
To Core Programme for dinner
To Peter Couvares and Condor Team for their interest/participation
To all the speakers – relatively short notice
To you for attending - and the contributions you will make
Looking forward to the next two days…
2
16/17 June 2005
Framework & SR2004
Oxford University
e-Science Centre
•
Funding
– SR2004, SR2006
– fEC
Extract from Science & innovation investment framework 2004 - 2014
• 2.25 The Government will therefore work with interested funders and
stakeholders to consider the national e-infrastructure (hardware,
networks, communications technology) necessary to deliver an effective
system.
– Due to the potential importance of a national e-infrastructure to the needs of the
research base and its supporting infrastructure in meeting the Government’s
broader science and innovation goals, as a first step OST will take a lead in
taking forward discussion and development of proposals for action and funding,
drawing in other funders and stakeholders as necessary.
•
OST e-Infrastructure Group formed for SR2006
3
16/17 June 2005
Oxford University
e-Science Centre
SR2004 and the Core Programme: Towards a
Persistent e-Infrastructure
Tony Hey
4
16/17 June 2005
e-Infrastructure Components
Oxford University
e-Science Centre
• Activity: National Grid Service and Operational Support
– Deployed National Grid Service (NGS) and established Grid Operation Support
Centre (GOSC)
– Need to extend funding for this activity beyond present 2 years
– CP should provide funding for hardware renewal: JISC should fund support
services
– Scale of funding required £8m
5
16/17 June 2005
Personal View
Oxford University
e-Science Centre
• CampusGrids are where e-Science meets University
• Essential component of University Computing Services…
– as is e-Infrastructure in general
• Reach point where will be asked why we are not running CampusGrid?
• But precisely what a CampusGrid actually is, and the best way to provide
the service needs:
–
–
–
–
–
refinement of ideas
technology development
costs analyses
security clarification
development of best practice
6
16/17 June 2005
Campus Grid Workshop at NeSC
Oxford University
e-Science Centre
• Advertised on NeSc web site..
http://www.nesc.ac.uk/esi/events/556/
“Meeting to explore the use of CampusGrids and to inform sites how to set up such
CampusGrids for the first time.
The aims of the meeting are: to find out which centres are working in this area, the
technologies that are being used, what might be done to share best practice, to
look for commonality where appropriate, to help new sites create their own
Campusgrids, and to consider how the centres may be interconnected to each
other and to the UK National Grid Service
The target audience will be mainly computing service departments and e-Scientists
directly involved in the development and operation of CampusGrids”
7
16/17 June 2005
Thursday objectives
Oxford University
e-Science Centre
- Overview of existing CampusGrid activities, determine best
practice, understand security issues, explore full economic
costing, outline challenges in setting up a CampusGrid, discuss
how to connect with the National Grid Service, and review
general methods for federation of resources.
8
16/17 June 2005
Thursday 16 June
Oxford University
e-Science Centre
10.30
11.00
11.15
12.00
12.30
13.00
Coffee
Introduction, aims, deliverables
Overview of Condor and Globus
Review of how to deal with Firewalls within a CampusGrid environment
Deploying Grids on Campus Networks
Lunch
14.00
14.30
15.00
15.45
Local Security issues
Full Economic Costing on CampusGrids
Federation of Resources and building more Complex Grids
Tea
Bruce Beckles
Rhys Newman
Steve Newhouse
16.00
16.30
How to Connect CampusGrids to NGS
Panel Discussion, overview of CampusGrid activities across UK, Q&A
Neil Geddes
Facilitated by Peter Clarke
19.00
Dinner funded by Core Programme
Paul Jeffreys
John Wakelin
Bruce Beckles
Andrew Cormack
Set of questions for discussion over dinner:
What is the broad consensus on the definition of a CampusGrid?
How different is a CampusGrid from a Condor Pool?
What are the next steps to realise the vision of full CampusGrids?
9
16/17 June 2005
Friday objectives
Oxford University
e-Science Centre
- Overview issues relating to sharing data within a CampusGrid
environment, find consensus on what defines a CampusGrid
(as preparation for the BOF at the All Hands Meeting); and
review exciting new technologies which have potential for a
second generation of CampusGrids.
10
16/17 June 2005
Friday 17 June
Oxford University
e-Science Centre
09.00
09.30
10.00
10.30
Review of dealing with data in CampusGrid environment
Condor and Stork
Review of ‘Dinner questions’ as input into BOF
Coffee
Kerstin Kleese
Peter Couvares
John Kewley/Bruce Beckles
11.00
11.30
12.00
12.30
13.00
Overview of New technologies for Building CampusGrids
Xen
Inferno
GridMP at ISIS
Lunch
Rhys Newman
Andrew Warfield
Jon Blower
Tom Griffin
14.00
14.30
15.00
Dynamic Service Deployment
Virtualisation at Oxford
Tea
Paul Watson
Rhys Newman
15.30
16.00
Conclusions, deliverables from workshop, final discussion
End of workshop
Paul Jeffreys
11
16/17 June 2005
Condor/Globus/Bristol
Oxford University
e-Science Centre
Condor:
High throughput computing system
Class Ads - core to operations
Condor-G: allows use of Condor commands in Globus environment
Can be used to perform matchmaking -> resource broker
Globus: Offers applications, protocols and APIs and GSI - certificates
SRB: Interface to heterogeneous data storage resources
Bristol:
Centralised access to disparate resources (including wider than Campus!)
Dedicated departmental clusters (Windows Condor pools not requested – no libs.)
Clusters within single firewall
Virtual Data Toolkit – bundles tools
Virtual Organisation Manager
Use SGE and PBS as schedulers on clusters
Resource Broker has access to Bristol and NGS
12
16/17 June 2005
Firewalls and CampusGrids
Oxford University
e-Science Centre
Firewalls:
Used to: protect machines/prevent unwanted traffic/monitor traffic/control flow
Threat-asset model required
Typical firewall deployments:
Institutional firewall around perimeter of institution; may have DMZ
May be divisional firewalls in addition – challenge for CampusGrid…
Recommendations for CampusGrids:
Provide external access, if any, via a small number of gateway machines
To cross divisional boundaries: tunnel, VLANs, centralise job submission
13
16/17 June 2005
Deploying Grids on Campus Networks
Oxford University
e-Science Centre
Campus Network World: Service oriented
28 minutes to be hacked on average
Security document – UKERNA Grid Development Technical Guide
General principles; tools, techniques and resources
Specific package issues; Globus, Condor
Make CampusGrids easier:
Standard system image(s) – automatic updates, patches etc
Recognisable (private?) network addresses
For ease of configuration – not host security
NGS should have its own subnet
Develop incident detection plan
Advice on further audiences for “Deploying Grids” document hardcopies
14
16/17 June 2005
Local security issues in CampusGrid envs
Oxford University
e-Science Centre
Very helpful Security Checklist
• Automate deployment, monitoring and security – only way
• Use your existing authentication infrastructure
– Avoid digital certificates and GSI; else make it transparent to users
(this was contentious – can you have authentication for CampusGrid and external?)
• Authorisation:
– must scale well
– Decide between central control and delegated control
• Auditing is vital to support your security policy
– Use system’s auditing facilities wherever possible
– Intrusion Detection Systems can have too many false positives
• Have as few access points as possible
• Centrally manage workstations; keep all software on workstations up-to-date
• Control local environment as much as you can
15
16/17 June 2005
fEC on CampusGrids
Oxford University
e-Science Centre
Propose GHzHr is the basic measure of cost
• Absolute rock bottom cost: GHzHr 0.12p
• http://www.host-it.co.uk commercial provider GHzHr 1.01p
• Best estimate of cost from spreadsheet is GHzHr 0.5p
– Includes all costs, sys admin etc
• Percentage cost of CG compared to buying machines 30%
•
http://e-science.ox.ac.uk/technical/documents/CampusGRIDCosting.xls
16
16/17 June 2005
Federation of resources – more complex Grids
Oxford University
e-Science Centre
Grid middleware:- gLite, OMII, GT4, UnitedDevices, GridSystems, CROWN,..
Requirements:Research vs supporting research
Bleeding edge vs production
Commercial vs Open Source
ETF:
Look at inter-institutional Condor pool
UDDI registries to publish services
GridSystems
Need to wait for ETF to reach conclusions
17
16/17 June 2005
Connecting CampusGrids to NGS
Oxford University
e-Science Centre
To join NGS:
Ask, get assigned a buddy, install minimum s/w stack, pass monitoring test, agree
to security and AUP, define service level, approve by GOSC Board
Why NGS?
Common tools, procedures and interfaces
Early adopter system for UK research grids
Why join?
Users – resources as services, common interfaces (incl. Hector)
Leverage national expertise
Future
Common interface to range of national facilities
October 2006 – technology refresh for core nodes
Common interface to NGS and CampusGrids not in place fully
18
16/17 June 2005
Panel: CampusGrids across UK..
Oxford University
e-Science Centre
• CamGrid Env 2: (Mark C)
– 125 nodes (desktops and farms); each CUDN-only routable address
– Authentication not strong; handled at submit machine
– Vanilla and standard universe jobs; also cluster for MPI universe jobs
• Reading (Jon Blower)
– Condor under Linux and Windows (50-100 processors); vanilla and MPI
– Utilising dual boot workstations and ‘diskless’ clients (also CoLinux)
– Investigating Inferno Grid
• Newcastle (Mark Hewitt)
– Condor based as first step, 200 nodes – mainly Linux; running 4 months
– Focus on student clusters with 1500 machines, and expand to 10,000 nodes
– Aim to add database/filestores/workflow enactors
• Liverpool (Cliff Addison)
– Prototype running for 18 months, 900 processors. Condor and Condor-G
19
16/17 June 2005
..How to connect CampusGrids to NGS
Oxford University
e-Science Centre
• UW-Madison Campus Grid
– Comp Sci depart: (opportunistic Windows) 1400, GLOW: 1500 dedicated
(Linux) CPUs
• Cardiff
– 900 processors
– Few users
– Greater aspirations
• How many system support staff/application support staff
–
–
–
–
–
Cambridge: 1 plus local support from departments
L’pool: 2 total
Newcastle: 2*0.25FTE plus little support (3 for CCE)
Cardiff: 0.1FTE
(There was more details about staff effort from UCL?)
• 75% of HPC resources run HTC
20
16/17 June 2005
Data in CampusGrid Environment
Oxford University
e-Science Centre
eMinerals given as example
Scientist has common interface across resources in 6 institutions through minigrid
mySRB; need suitable metadata->
CCLRC Scientific Metadata Model (for general discovery):
Keywords/provenance/conditions/description/location (SRB)/references
Metadata editor as well
Data Portal
Enables scientist to search for studies and data from within metadata
21
16/17 June 2005
STORK and NeST
Oxford University
e-Science Centre
STORK
What Condor is for computational jobs, STORK is for data placement
Scheduler for data placement activities in the Grid (equiv. to batch system)
Access heterogeneous resources (SRB, GridFTP server, ..)
Recover from failures
DAGMAN required to run both Condor and STORK
NeST
A lightweight, portable storage manager for data placement activities on Grid
Not competitor for SRB!
There are best effort Lots and guaranteed lots
22
16/17 June 2005
Conclusions: CampusGrid/ComputePool
Oxford University
e-Science Centre
CampusGrid
Heterogeneous non-dedicated (and dedicated) resources
Cycle stealing of heterogeneous resources which already exist, using existing
networks
Enable resource sharing and (over time) cost saving
A method for federating compute and data resources across a campis, abstracting
the interface for users
Compute Pool
Dedicated resource with specific purpose
CG needs buy-in from Computer Services
What next?
CG must become easier to use for users
CG smaller overhead to run securely (for admins and, especially – users)
23
16/17 June 2005
Questions
Oxford University
e-Science Centre
• Are CGs cycle stealing?
• Are turn-key solutions possible – or desirable?
• How does a CampusGrid support a VLE/VRE
24
16/17 June 2005
Conclusions
Oxford University
e-Science Centre
•
•
•
•
•
Compute Pool is a resource while a CG is a (centrally) managed service
Data storage, access, and retrieval are important (not just cycle stealing)
User interfaces must be simpler
The cycles they provide are not free; policy for funding required
Computer services should be 1st class stakeholders in the definition and
operation of the CG
• CGs need to be set up to avoid impinging on resources’ original intended use
25
16/17 June 2005
Overview of new technologies
Oxford University
e-Science Centre
Virtualisation is the key to offering a CG which can be shared, simple to
administrate, offers users absolute control of their environment, owners
retain absolute control of their machines, and uses non-dedicated resources
A virtual machine completely isolates the Grid Process from the underlying OS
VMWare performance – 97% native performance
Not marketed as Grid technology 2006will be released
Intel VTx (Vanterpol) – hw support for visualisation
Comment: single node problem sorted; but not many-node issues:
DoS, nodes which die, etc need to be addressed
26
16/17 June 2005
Xen
Oxford University
e-Science Centre
Hardware virtualisation treats OS as a component
Xen is a virtual machine manager; an isolation kernel
Recently included in mainline Linux
Offers: consolidation/avoids downtime/re-balance workload/enforces security
Paravirtualisation – optimises OS to run in a virtualised environment
Not ideal for low-latency MPI-style communications
27
16/17 June 2005
Inferno
Oxford University
e-Science Centre
Inferno Grid – free to academia
Condor-like
Inferno Grid based on Inferno OS
Inferno OS – built for distributing computing, very lightweight
Runs as emulated application identically on multiple platforms
Pros
Easy to install and maintain; industry quality
Only need one port open
Ease of streaming, transparency of “data and compute” as data transfer.
Cons
Hard to get into; poor documentation
Not have all of Condor features: migration, MPI universe
No Globus integration yet
Question over scalability at 1000 or more
Not really used in anger yet, not let proper users loose!
28
16/17 June 2005
GridMP at ISIS
Oxford University
e-Science Centre
Uses United Devices commercial software
Run by local staff as opposed to computing services
Very quick and easy to set up
80 licences £15k
Applied for additional 400 licences (£60k)
Have to make some changes to the applications (wrap it; not change executable)
Investment worthwhile
Mercifully few problems encountered
29
16/17 June 2005
Dynasoar
Oxford University
e-Science Centre
Possible to build grid applications entirely from services
Share cost of service deployment over multiple requests
Dynasoar can be built as a high-level infrastructure
Domain specialists can build applications to a deployment environment
Web service provider:
Has services available to be dynamically deployed – if not already available
Cost of service deployment shared across multiple requests
Can have:web service providers <-> marketplace <-> host providers
Can move computation to the data – sometimes much more efficient
Virtual machines -> right approach (as presented earlier)
Exploring Virtual Machines as a general service deployment mechanism
Exploring tripartite security model (different levels of trust)
30
16/17 June 2005
OxGrid
Oxford University
e-Science Centre
OxGrid – central management of virtualised resources around OU
All key technologies and equipment available in principle
OxGrid Client:
Assess impact on donors for Grid
Looks at CPU usage and storage
A VM can be frozen on command to provide OS level check-pointing
With virtualisation computing becomes a liquid commodity -> trading
31
16/17 June 2005
Workshop Conclusions
Oxford University
e-Science Centre
• Clearer views on CampusGrid, Compute Cluster – way ahead
– Contribution to BOF
• Collection of developments re. new technologies
– Virtualisation agreed to be important direction
– Roadmap…
Thank you to all speakers, all participants.
See you at AHM!!
32
16/17 June 2005
Download