The Academic Service Partnership Peter Dew & Joanna Schmidt plus

advertisement
e-Science Centre of Excellence
The Academic Service
Partnership
Peter Dew & Joanna Schmidt
plus
White Rose Grid Teams
1
e-Science Centre of Excellence
Overview
• The White Rose Grid as an example of an interuniversity collaboration providing an early production
Grid service
• Computing Services roles
• Delivering the WRG
– Organisational structure
– Technical implementation
– User management
• Lessons learned
• Benefits
• Concluding remarks
2
e-Science Centre of Excellence
About the WRG
• The White Rose Grid (WRG) works under the auspices of the White
Rose University Consortium (WRUC) – an association of the three
research Universities
– Leeds, York & Sheffield
– Employs complementary skill bases to support both: larger
projects than can be delivered by any one University, and a
broad research agenda
– WRUC featured as a model of collaboration and enterprise in
the HEFCE White Paper
• Why Grid?
– Enhances the competitive position of the three Universities to
attract funding
– Enables inter-enterprise computing resources optimisation and
an increase in the effectiveness of service delivery to
researchers
3
e-Science Centre of Excellence
WRG aims
• to strengthen e-Science research (using experience
gained from e-Science projects such as DAME,
HYDRA, or gViz)
– initial focus
• decision support (engineering, health, social science)
• scientific visualisation
• to support and enlarge new scientific communities
including bio-technology, aerospace, tissue
engineering and healthcare
• to assess and grow, in collaboration with YF,
regional demand for Grid technology
4
e-Science Centre of Excellence
DAME
Grid Services
5
e-Science Centre of Excellence
Commitment
• Senior staff from the three Universities (White
Rose Grid Executive: Chief Exec of White Rose Univ
Consortium - M Doxey; P Dew and K Brodlie – Leeds;
J Austin – York; P Fleming – Sheffield)
• Senior Computing Services staff (C Cartledge –
Sheffield, S Chidlow – Leeds) & Computing staff from
Comp Science Dept (A Turner – York)
• White Rose Grid staff
• WRG Project Teams (Computing Services staff &
Computer Science staff)
• IT Vendors – Esteem involving Sun & Streamline
6
e-Science Centre of Excellence
Computing Services roles
• To provide a stable, well-managed and responsive HPC
service
• To promote effective use of HPC facilities across the
three Universities through a variety of training including
joint seminars and user group meetings
• To offer user-support & training in basic HPC
techniques (e.g. parallel programming), Globus and
e-Science applications
• To support an early production Grid service under the
leadership of WRG technical staff
7
e-Science Centre of Excellence
WRG staff responsibilities
•
•
•
•
Technical developments (Leeds)
Grid training (Sheffield)
Coordination of joint activities (Leeds)
Liaison with e-Science communities within the
WRG, UK e-Science and others (Leeds)
• Business outreach (York)
8
e-Science Centre of Excellence
Setting up the WRG
• Purposely acquired - with over £3M investment - 4 HPC
nodes (in total nearly 500 CPUs)
• A heterogeneous facility comprising 3 clusters of Sun
shared-memory systems and 2 Intel processor-based
Beowulf clusters
25%
• To offer both:
– local HPC services (75% resources)
– the Grid infrastructure (25% resources)
• Each node specialises in the provision of a
distinct service
WRG
75%
resource allocation
9
e-Science Centre of Excellence
The WRG architecture
General
Purpose
HPC node
CFD node
Engineering
Application
node
Computer
Science
node
10
e-Science Centre of Excellence
WRG software & hardware stack
WRG Portal
GPDK Portal interface
Tomcat/Apache
Global Grid Infrastructure
MyProxy
Globus Toolkit 2.4
• Software stack composed
largely from open source
software
Campus Grid Infrastructure
Grid Engine Enterprise Edition
Sun™ HPC Cluster Tools
Sun ONE Studio
Solaris™ and Linux Operating Environments
Sun Enterprise™ and Sun Fire™ Servers,
Sun StorEdge™ Systems, Intel thin servers
11
e-Science Centre of Excellence
Delivering our Grid
• Procedures & resources:
– Strong organisational structure
– Computing infrastructure - computer systems,
storage currently being expanded with a Storage
Area Network (SAN), networking infrastructure
(YHMAN reprocurement underway -implementation
due Nov'04), software (includes traditional HPC
tools, Grid software - Globus Toolkit, and Portals)
– Mixture of experienced support staff and research
staff working in teams
12
e-Science Centre of Excellence
WRG project teams
Architecture Team
Globus, MyProxy, portals
Authentication, Authorisation user management, usage account.
& Accounting Team
X.509 digital certificates
Technical Team
Training Team
Business Outreach Team
stable service
WRG
USERS
HPC techniques, Grid access &
applications
working with regional companies
& Universities
13
e-Science Centre of Excellence
Access to the WRG
• Users baffled by Globus (due to its novelty and
lack of easy documentation)
• Preferred way of access:
– using Grid portals developed by the WRG
– running Grid-enabled applications e.g. the gViz
project has developed Grid-enabled IRIS
Explorer modules
• Portals need to be developed
14
e-Science Centre of Excellence
The DAME XTO portal
Enables aeronautical
engineers to identify
abnormal behaviour in
aircraft engines by
performing DSP
analyses of vibration
data from onboard
sensors.
15
e-Science Centre of Excellence
User management
• Includes:
– user registration
– user authorisation for access to resources
– user validation & approval of request for a digital certificate
– accounting for resource usage
– documentation of procedures
• These schemes and user administration processes were developed
taking into consideration:
– the distributed nature of WRG
– the cultural differences in registering and managing users at
the three sites
– the existence of two distinct classes of users: local and WRG as
well as including other academic and commercial partners
16
e-Science Centre of Excellence
Managing information
• New registration forms needed to be developed
• Documentation for users, e.g.
– how to register & obtain a digital certificate
– how to access the WRG systems
– further local user documentation
• Documentation for system administrators such as:
– registering users
– propagation of Grid distinguished names between
systems and mapping out local UNIX user names in the
grid-map files
– producing usage accountings reports
• Development of Web pages
17
e-Science Centre of Excellence
User registration & authorisation
Local user registration
University users
Completion of the Application Form for
WRG Resources
Industrial partners
Obtaining a digital certificate
Approval of the new WRG project and the
user by PI
Approval of the new WRG project and the
user by the local WRG Executive member
User requests a digital certificate
User validation by Computing Service
Authorisation
Approval of a UK CA digital certificate
Validation of the new WRG user by
Comp Services
Allocation of local & remote WRG
usernames
Local Computing Service
Email requesting update of grid- map files
User
registration
details
Request for
user registration
User registration at remote
site 2
User registration at remote
site 1
A centralised database
of WRG users
18
e-Science Centre of Excellence
Digital certification
•
•
•
•
•
•
•
Globus requires personal X.509v3 digital certificates
WRG systems support certificates from the UK e-Science Grid Certification
Authority (CA)
The e-Science CA at Rutherford Appleton Laboratory (RAL) is being run as
part of the Grid Support Centre
Registration Authorities (RA) were established at the three Universities
Training of User Administration staff
– courses available at RAL
– a short introduction to digital certificates issued by the UK e-Science CA
available at: http://www.grid-support.ac.uk/ca/
At present the RA interface to the CA system will only work reliably with
Netscape 4.79
All certificates issued will expire after one year
19
e-Science Centre of Excellence
Lessons learned
• New user management procedures must be
fully endorsed by Computing Services (e.g. the
WRG local identity had to be modified several
times to take this into account)
• Continuous staff training is required (due to
rapidly changing technology – e.g. GLOBUS)
• The local system administrators must be
involved with their own Globus & other Grid
fabric’s installations
20
e-Science Centre of Excellence
Benefits to Computing Services
• Added dynamic to support issues
• Enlarged support team memberships
• Broaden knowledge by working in collaboration
with other sites
• More interesting job specifications for system
and user support staff (i.e. include a research
approach)
21
e-Science Centre of Excellence
Overall lessons learned
•
Complexity due to:
– geographically distributed support teams (lack of full understanding
how the three sites work)
– large number of support staff involved (new issues may cause
confusion who is doing what)
– innovative technology (lack of good understanding of new
implementations and software dependency/interoperability; lack of
good documentation)
– human interaction factor (caused by misunderstandings etc)
– communication issues within a VO (due to its size)
– constantly posed questions of ownership and trust (due to crossing
organisational boundaries)
– distributed resource management (e.g. software revisions)
– software licensing issues ( need licensing for a Grid)
– increased exposure to security issues
– lack of a central Help Desk
22
e-Science Centre of Excellence
Addressing issues within WRG
• Many problems resolved through:
– effective organisational structure led by the WRG
Executive
– Computing Services staff involvement and expertise
– research element (portal development, Globus
installations) led by Computer Sciences staff but
with the involvement of Computing Services’
practical approach and using their well-established
support infrastructure
23
e-Science Centre of Excellence
WRG Evolution
e-Science Grid
WRG
C
C
C
WUN Grid
Academic Service
Infrastructure
WRG
ComServices
Companies
“Buy
Services”
24
e-Science Centre of Excellence
Concluding remarks
• The WRG serves as a test-bed Grid environment
• Addresses a large variety of problems and issues,
including key sociological constraints (human
interactions, ownership, trust) reflected in global Grids
• Computing services staff expertise is vital to Grid
success
• Many gaps (see the e-Science Gap Analysis at
http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html )
• Need to continue to work with Grid users and increase
the number of Grid applications and enhance outreach
25
e-Science Centre of Excellence
References
• WRG web site - http://www.wrgrid.org.uk/
• DAME XTO portal –
http://iri02.leeds.ac.uk:8080/damexto/damexto
• P M Dew, J G Schmidt, M Thompson, P Morris
The White Rose Grid: practice and experience –
in the proceedings of the All Hands conference
• e-Science Gap Analysis
http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html
26
e-Science Centre of Excellence
Thank you for your attention
27
Download