e-Science Centre of Excellence The Academic Service Partnership Peter Dew & Joanna Schmidt plus White Rose Grid Teams 1 e-Science Centre of Excellence Overview • The White Rose Grid as an example of an interuniversity collaboration providing an early production Grid service • Computing Services roles • Delivering the WRG – Organisational structure – Technical implementation – User management • Lessons learned • Benefits • Concluding remarks 2 e-Science Centre of Excellence About the WRG • The White Rose Grid (WRG) works under the auspices of the White Rose University Consortium (WRUC) – an association of the three research Universities – Leeds, York & Sheffield – Employs complementary skill bases to support both: larger projects than can be delivered by any one University, and a broad research agenda – WRUC featured as a model of collaboration and enterprise in the HEFCE White Paper • Why Grid? – Enhances the competitive position of the three Universities to attract funding – Enables inter-enterprise computing resources optimisation and an increase in the effectiveness of service delivery to researchers 3 e-Science Centre of Excellence WRG aims • to strengthen e-Science research (using experience gained from e-Science projects such as DAME, HYDRA, or gViz) – initial focus • decision support (engineering, health, social science) • scientific visualisation • to support and enlarge new scientific communities including bio-technology, aerospace, tissue engineering and healthcare • to assess and grow, in collaboration with YF, regional demand for Grid technology 4 e-Science Centre of Excellence DAME Grid Services 5 e-Science Centre of Excellence Commitment • Senior staff from the three Universities (White Rose Grid Executive: Chief Exec of White Rose Univ Consortium - M Doxey; P Dew and K Brodlie – Leeds; J Austin – York; P Fleming – Sheffield) • Senior Computing Services staff (C Cartledge – Sheffield, S Chidlow – Leeds) & Computing staff from Comp Science Dept (A Turner – York) • White Rose Grid staff • WRG Project Teams (Computing Services staff & Computer Science staff) • IT Vendors – Esteem involving Sun & Streamline 6 e-Science Centre of Excellence Computing Services roles • To provide a stable, well-managed and responsive HPC service • To promote effective use of HPC facilities across the three Universities through a variety of training including joint seminars and user group meetings • To offer user-support & training in basic HPC techniques (e.g. parallel programming), Globus and e-Science applications • To support an early production Grid service under the leadership of WRG technical staff 7 e-Science Centre of Excellence WRG staff responsibilities • • • • Technical developments (Leeds) Grid training (Sheffield) Coordination of joint activities (Leeds) Liaison with e-Science communities within the WRG, UK e-Science and others (Leeds) • Business outreach (York) 8 e-Science Centre of Excellence Setting up the WRG • Purposely acquired - with over £3M investment - 4 HPC nodes (in total nearly 500 CPUs) • A heterogeneous facility comprising 3 clusters of Sun shared-memory systems and 2 Intel processor-based Beowulf clusters 25% • To offer both: – local HPC services (75% resources) – the Grid infrastructure (25% resources) • Each node specialises in the provision of a distinct service WRG 75% resource allocation 9 e-Science Centre of Excellence The WRG architecture General Purpose HPC node CFD node Engineering Application node Computer Science node 10 e-Science Centre of Excellence WRG software & hardware stack WRG Portal GPDK Portal interface Tomcat/Apache Global Grid Infrastructure MyProxy Globus Toolkit 2.4 • Software stack composed largely from open source software Campus Grid Infrastructure Grid Engine Enterprise Edition Sun™ HPC Cluster Tools Sun ONE Studio Solaris™ and Linux Operating Environments Sun Enterprise™ and Sun Fire™ Servers, Sun StorEdge™ Systems, Intel thin servers 11 e-Science Centre of Excellence Delivering our Grid • Procedures & resources: – Strong organisational structure – Computing infrastructure - computer systems, storage currently being expanded with a Storage Area Network (SAN), networking infrastructure (YHMAN reprocurement underway -implementation due Nov'04), software (includes traditional HPC tools, Grid software - Globus Toolkit, and Portals) – Mixture of experienced support staff and research staff working in teams 12 e-Science Centre of Excellence WRG project teams Architecture Team Globus, MyProxy, portals Authentication, Authorisation user management, usage account. & Accounting Team X.509 digital certificates Technical Team Training Team Business Outreach Team stable service WRG USERS HPC techniques, Grid access & applications working with regional companies & Universities 13 e-Science Centre of Excellence Access to the WRG • Users baffled by Globus (due to its novelty and lack of easy documentation) • Preferred way of access: – using Grid portals developed by the WRG – running Grid-enabled applications e.g. the gViz project has developed Grid-enabled IRIS Explorer modules • Portals need to be developed 14 e-Science Centre of Excellence The DAME XTO portal Enables aeronautical engineers to identify abnormal behaviour in aircraft engines by performing DSP analyses of vibration data from onboard sensors. 15 e-Science Centre of Excellence User management • Includes: – user registration – user authorisation for access to resources – user validation & approval of request for a digital certificate – accounting for resource usage – documentation of procedures • These schemes and user administration processes were developed taking into consideration: – the distributed nature of WRG – the cultural differences in registering and managing users at the three sites – the existence of two distinct classes of users: local and WRG as well as including other academic and commercial partners 16 e-Science Centre of Excellence Managing information • New registration forms needed to be developed • Documentation for users, e.g. – how to register & obtain a digital certificate – how to access the WRG systems – further local user documentation • Documentation for system administrators such as: – registering users – propagation of Grid distinguished names between systems and mapping out local UNIX user names in the grid-map files – producing usage accountings reports • Development of Web pages 17 e-Science Centre of Excellence User registration & authorisation Local user registration University users Completion of the Application Form for WRG Resources Industrial partners Obtaining a digital certificate Approval of the new WRG project and the user by PI Approval of the new WRG project and the user by the local WRG Executive member User requests a digital certificate User validation by Computing Service Authorisation Approval of a UK CA digital certificate Validation of the new WRG user by Comp Services Allocation of local & remote WRG usernames Local Computing Service Email requesting update of grid- map files User registration details Request for user registration User registration at remote site 2 User registration at remote site 1 A centralised database of WRG users 18 e-Science Centre of Excellence Digital certification • • • • • • • Globus requires personal X.509v3 digital certificates WRG systems support certificates from the UK e-Science Grid Certification Authority (CA) The e-Science CA at Rutherford Appleton Laboratory (RAL) is being run as part of the Grid Support Centre Registration Authorities (RA) were established at the three Universities Training of User Administration staff – courses available at RAL – a short introduction to digital certificates issued by the UK e-Science CA available at: http://www.grid-support.ac.uk/ca/ At present the RA interface to the CA system will only work reliably with Netscape 4.79 All certificates issued will expire after one year 19 e-Science Centre of Excellence Lessons learned • New user management procedures must be fully endorsed by Computing Services (e.g. the WRG local identity had to be modified several times to take this into account) • Continuous staff training is required (due to rapidly changing technology – e.g. GLOBUS) • The local system administrators must be involved with their own Globus & other Grid fabric’s installations 20 e-Science Centre of Excellence Benefits to Computing Services • Added dynamic to support issues • Enlarged support team memberships • Broaden knowledge by working in collaboration with other sites • More interesting job specifications for system and user support staff (i.e. include a research approach) 21 e-Science Centre of Excellence Overall lessons learned • Complexity due to: – geographically distributed support teams (lack of full understanding how the three sites work) – large number of support staff involved (new issues may cause confusion who is doing what) – innovative technology (lack of good understanding of new implementations and software dependency/interoperability; lack of good documentation) – human interaction factor (caused by misunderstandings etc) – communication issues within a VO (due to its size) – constantly posed questions of ownership and trust (due to crossing organisational boundaries) – distributed resource management (e.g. software revisions) – software licensing issues ( need licensing for a Grid) – increased exposure to security issues – lack of a central Help Desk 22 e-Science Centre of Excellence Addressing issues within WRG • Many problems resolved through: – effective organisational structure led by the WRG Executive – Computing Services staff involvement and expertise – research element (portal development, Globus installations) led by Computer Sciences staff but with the involvement of Computing Services’ practical approach and using their well-established support infrastructure 23 e-Science Centre of Excellence WRG Evolution e-Science Grid WRG C C C WUN Grid Academic Service Infrastructure WRG ComServices Companies “Buy Services” 24 e-Science Centre of Excellence Concluding remarks • The WRG serves as a test-bed Grid environment • Addresses a large variety of problems and issues, including key sociological constraints (human interactions, ownership, trust) reflected in global Grids • Computing services staff expertise is vital to Grid success • Many gaps (see the e-Science Gap Analysis at http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html ) • Need to continue to work with Grid users and increase the number of Grid applications and enhance outreach 25 e-Science Centre of Excellence References • WRG web site - http://www.wrgrid.org.uk/ • DAME XTO portal – http://iri02.leeds.ac.uk:8080/damexto/damexto • P M Dew, J G Schmidt, M Thompson, P Morris The White Rose Grid: practice and experience – in the proceedings of the All Hands conference • e-Science Gap Analysis http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html 26 e-Science Centre of Excellence Thank you for your attention 27