Experiences with Globus on DAS-2 in an educational setting Herbert Bos & Lex Wolters LIACS, Leiden University {herbertb,llexx}@liacs.nl DAS-2 workshop, June 6 2002 Seminar Grid Computing • • • • • Fall 2001 11 students (year 3 or 4) started, 8 finished Once a week, 2 hours 13 classes Programming assignment Goals • Try to separate Grid hype from Grid reality • Show the underlying technologies that are currently being developed and used to provide a 'pervasive computing grid' DAS-2 workshop, June 6 2002 Topics by lecturers • • • • • • • What is Grid Computing? Requirements History Grid architecture Basic Services Taxonomy of Grids QoS (final class) DAS-2 workshop, June 6 2002 Presentations by students • • • • • • • • • Legion Globus Resource management: GRAM Scheduling: AppLeS Communication (Nexus, etc.) Information service: MDS, GRIS, GIIS Data access: GASS + RSL Security: GSS-API Language support DAS-2 workshop, June 6 2002 Programming assignment Using Globus to implement a Grid application: – Computation chopped up in subtasks which are distributed to computational nodes – Final result is combination of results of subtasks – Resource discovery – At least one of the following options: • Data is distributed in secure fashion • Incorporate costs DAS-2 workshop, June 6 2002 Topics • • • • • • Willem de Bruijn: Distributed Evolutionary Algorithm Hongqin Chen: RSA Key Breaking Jeroen Laros: GridCrafty Hui Li: Parallel Fractal Image Generation Yafei Sun: Adaptive Quadrature Arjan Tijms & Shlomo Raikin: Parallel Genetic Algorithm DAS-2 workshop, June 6 2002 Situation • Delivery DAS-2 delayed • Globus installation on SUN server – System-managers unfamiliar with Globus – Incorrect installation, e.g. certificates • Jan 21, 2002: DAS-2 operational – Platform for students – Focus on PBS, MPI; not Globus DAS-2 workshop, June 6 2002 Distributed Evolutionary Algorithm • Purpose: minimizes an arbitrary function • Strategy: – self-adaptation, no distinction between worker and controller nodes – predefined number of runs • Language: C++ • Modules: – communication: Globus IO – resource management: GRAM • Results: – master/slave set-up best results in shortest time-span – other strategies increases self-adaptiveness, but worse results in current setting DAS-2 workshop, June 6 2002 Distr. Evolutionary Algorithm (cont’d) • Problems: – Distinction between fileserver and compute node: starting up new processes – Wall-time value (60 s) of scheduler cannot be altered (also not by maxTime in RSL): waiting processes are killed • Suggestions for improvement: – Symbolic links to Globus libraries – Documentation on Globus: • Overall idea is neglected • Q&A forum, globus.org DAS-2 workshop, June 6 2002 RSA Key Breaking • Purpose: factoring large numbers • Strategy: – Pollard’s Rho factoring algorithm – Master/slave framework • Language: C • Modules: – Communication: Nexus – Job allocation: GRAM and PBS • Results: – Significant speed-ups, depending on workload/distribution DAS-2 workshop, June 6 2002 RSA Key Breaking (cont’d) • Problems: – Start-up • Problems to get correct certificate • Libraries were not installed correctly • Functions were not available – ‘Real’ problems • GRAM macro-definitions not in corresponding header-file – Documentation • Lack of practical guidelines and examples DAS-2 workshop, June 6 2002 GridCrafty • Purpose: shell script which parallelises the chess engine Crafty • Strategy: – Master: all possible moves; worker: grade moves • Modules: – Storage access GASS, globus_rcp, openssh • Results: – Due to problems with Globus implementation it was also bypassed entirely which leads to speed-up of 17.5 (theoretical 22) DAS-2 workshop, June 6 2002 GridCrafty (cont’d) • Problems: – start-up • GASS did not work properly • Globus_rcp was not installed • Openssh did not work – ‘real’ problem • Scheduling of tasks takes a lot of time • Final implementation: – connect to all nodes; query load: • Static: < 10% host free • Dynamic: clients checks load before start of intensive calculations – ssh implementation much faster than Globus (speed-ups 17.5 versus 5-9) DAS-2 workshop, June 6 2002 Parallel Fractal Image Generation • Purpose: see title • Strategy – Master distributes work, collects output, draw image – Slaves calculates points line-wise • Language: C and C++ • Modules – Resource management GRAM – Communication MPI DAS-2 workshop, June 6 2002 Par. Fractal Image Generation (cont’d) • Problem – Conflict between current MPI set-up and GRAM job submit script (temporary fixed only on UvA-cluster) • Suggestions for improvement: – Installation of MPICH-G2 – Where can one find good examples on exploiting Globus to get started? DAS-2 workshop, June 6 2002 Adaptive Quadrature • Purpose: calculate the quadrature of the curve of an arbitrary function • Strategy: – Divide curve into smaller ones – Ring of processes – Results via files • Language: gcc • Modules – Process control and allocation DUROC – Communication Nexus DAS-2 workshop, June 6 2002 Adaptive Quadrature (cont’d) • Problems – Start-up • Getting the correct certificate • Using the right RSL parameter (hostCount) – ‘Real’ problem • Conflict between duroc_runtime_barrier and PBS: fixed only on UvA-cluster • Suggestions for improvement – Info on different communication techniques DAS-2 workshop, June 6 2002 Parallel Genetic Algorithm • Purpose: improving results of GAs • Strategy – Start independent searches at different locations of the solution landscape – Periodically exchange highest fitting of individuals – Init process: job dispatching and bootstrap communication set-up – Master process: relay for communications, synchronizes the start of worker processes, collects final results, and sets up GUI for monitoring and progress display – Worker processes: each runs a single N-generation run DAS-2 workshop, June 6 2002 Parallel Genetic Algorithm (cont’d) • Language: C and C++ • Modules – Communication NEXUS – RPC – Job submission GRAM – Thread creation Globus_Common • Preliminary results – Parallel algorithm achieves results that are 8-17% better than sequential algorithm DAS-2 workshop, June 6 2002 Parallel Genetic Algorithm (cont’d) • Problems – Start-up • • • • Environment and path setting Obtaining certificates Who is responsible for globus on das-2? Different versions of globus (1.1.3 versus 2.0 beta) – ‘Real’ problems • • • • Shared libraries are not installed at nodes Delegating proxies Information about resource availability static or not present Globus 2.0 is a beta version: things not implemented or missing DAS-2 workshop, June 6 2002 Parallel Genetic Algorithm (cont’d) • Suggestions for improvement: – Default Globus environment – Globus libraries on nodes via • NFS partition • Symbolic link to the ‘strange’ globus-edg beta 2.1 names – ‘fork’ is default jobmanager, which only ‘schedules’ jobs to local file server (adding PBS makes code dependent on this scheduler) – Installation of a cluster monitor better than beowulf – Examples and makefiles DAS-2 workshop, June 6 2002 DAS-2 workshop, June 6 2002 DAS-2 workshop, June 6 2002 DAS-2 workshop, June 6 2002 Conclusions • Seminar quite successful • DAS-2 – – – – – Great environment for teaching purposes Start-up problems Current setting not optimal Who is responsible for DAS-2? Who determines policies, implementations? • Globus – Documentation, examples (probably better with current training material on globus.org) – Installation not trivial • IBM – Pre-sales OK, after-sales??? DAS-2 workshop, June 6 2002 Thanks Many thanks to David Groep who helped our students many, many times without any hesitation! Great job! DAS-2 workshop, June 6 2002