Experiences with Globus on DAS-2 in an educational setting LIACS, Leiden University

advertisement
Experiences with Globus on
DAS-2 in an educational setting
Herbert Bos & Lex Wolters
LIACS, Leiden University
{herbertb,llexx}@liacs.nl
DAS-2 workshop, June 6 2002
Seminar Grid Computing
•
•
•
•
•
Fall 2001
11 students (year 3 or 4) started, 8 finished
Once a week, 2 hours
13 classes
Programming assignment
Goals
• Try to separate Grid hype from Grid reality
• Show the underlying technologies that are
currently being developed and used to provide a
'pervasive computing grid'
DAS-2 workshop, June 6 2002
Topics by lecturers
•
•
•
•
•
•
•
What is Grid Computing?
Requirements
History
Grid architecture
Basic Services
Taxonomy of Grids
QoS (final class)
DAS-2 workshop, June 6 2002
Presentations by students
•
•
•
•
•
•
•
•
•
Legion
Globus
Resource management: GRAM
Scheduling: AppLeS
Communication (Nexus, etc.)
Information service: MDS, GRIS, GIIS
Data access: GASS + RSL
Security: GSS-API
Language support
DAS-2 workshop, June 6 2002
Programming assignment
Using Globus to implement a Grid application:
– Computation chopped up in subtasks which are
distributed to computational nodes
– Final result is combination of results of subtasks
– Resource discovery
– At least one of the following options:
• Data is distributed in secure fashion
• Incorporate costs
DAS-2 workshop, June 6 2002
Topics
•
•
•
•
•
•
Willem de Bruijn: Distributed Evolutionary Algorithm
Hongqin Chen: RSA Key Breaking
Jeroen Laros: GridCrafty
Hui Li: Parallel Fractal Image Generation
Yafei Sun: Adaptive Quadrature
Arjan Tijms & Shlomo Raikin: Parallel Genetic Algorithm
DAS-2 workshop, June 6 2002
Situation
• Delivery DAS-2 delayed
• Globus installation on SUN server
– System-managers unfamiliar with Globus
– Incorrect installation, e.g. certificates
• Jan 21, 2002: DAS-2 operational
– Platform for students
– Focus on PBS, MPI; not Globus
DAS-2 workshop, June 6 2002
Distributed Evolutionary Algorithm
• Purpose: minimizes an arbitrary function
• Strategy:
– self-adaptation, no distinction between worker and controller nodes
– predefined number of runs
• Language: C++
• Modules:
– communication: Globus IO
– resource management: GRAM
• Results:
– master/slave set-up best results in shortest time-span
– other strategies increases self-adaptiveness, but worse results in
current setting
DAS-2 workshop, June 6 2002
Distr. Evolutionary Algorithm (cont’d)
• Problems:
– Distinction between fileserver and compute node:
starting up new processes
– Wall-time value (60 s) of scheduler cannot be altered
(also not by maxTime in RSL): waiting processes are
killed
• Suggestions for improvement:
– Symbolic links to Globus libraries
– Documentation on Globus:
• Overall idea is neglected
• Q&A forum, globus.org
DAS-2 workshop, June 6 2002
RSA Key Breaking
• Purpose: factoring large numbers
• Strategy:
– Pollard’s Rho factoring algorithm
– Master/slave framework
• Language: C
• Modules:
– Communication: Nexus
– Job allocation: GRAM and PBS
• Results:
– Significant speed-ups, depending on workload/distribution
DAS-2 workshop, June 6 2002
RSA Key Breaking (cont’d)
• Problems:
– Start-up
• Problems to get correct certificate
• Libraries were not installed correctly
• Functions were not available
– ‘Real’ problems
• GRAM macro-definitions not in corresponding header-file
– Documentation
• Lack of practical guidelines and examples
DAS-2 workshop, June 6 2002
GridCrafty
• Purpose: shell script which parallelises the chess
engine Crafty
• Strategy:
– Master: all possible moves; worker: grade moves
• Modules:
– Storage access GASS, globus_rcp, openssh
• Results:
– Due to problems with Globus implementation it was
also bypassed entirely which leads to speed-up of 17.5
(theoretical 22)
DAS-2 workshop, June 6 2002
GridCrafty (cont’d)
• Problems:
– start-up
• GASS did not work properly
• Globus_rcp was not installed
• Openssh did not work
– ‘real’ problem
• Scheduling of tasks takes a lot of time
• Final implementation:
– connect to all nodes; query load:
• Static: < 10% host free
• Dynamic: clients checks load before start of intensive
calculations
– ssh implementation much faster than Globus (speed-ups
17.5 versus 5-9)
DAS-2 workshop, June 6 2002
Parallel Fractal Image Generation
• Purpose: see title
• Strategy
– Master distributes work, collects output, draw image
– Slaves calculates points line-wise
• Language: C and C++
• Modules
– Resource management GRAM
– Communication MPI
DAS-2 workshop, June 6 2002
Par. Fractal Image Generation (cont’d)
• Problem
– Conflict between current MPI set-up and
GRAM job submit script (temporary fixed only
on UvA-cluster)
• Suggestions for improvement:
– Installation of MPICH-G2
– Where can one find good examples on
exploiting Globus to get started?
DAS-2 workshop, June 6 2002
Adaptive Quadrature
• Purpose: calculate the quadrature of the curve of
an arbitrary function
• Strategy:
– Divide curve into smaller ones
– Ring of processes
– Results via files
• Language: gcc
• Modules
– Process control and allocation DUROC
– Communication Nexus
DAS-2 workshop, June 6 2002
Adaptive Quadrature (cont’d)
• Problems
– Start-up
• Getting the correct certificate
• Using the right RSL parameter (hostCount)
– ‘Real’ problem
• Conflict between duroc_runtime_barrier and PBS:
fixed only on UvA-cluster
• Suggestions for improvement
– Info on different communication techniques
DAS-2 workshop, June 6 2002
Parallel Genetic Algorithm
• Purpose: improving results of GAs
• Strategy
– Start independent searches at different locations of the solution
landscape
– Periodically exchange highest fitting of individuals
– Init process: job dispatching and bootstrap communication set-up
– Master process: relay for communications, synchronizes the start
of worker processes, collects final results, and sets up GUI for
monitoring and progress display
– Worker processes: each runs a single N-generation run
DAS-2 workshop, June 6 2002
Parallel Genetic Algorithm (cont’d)
• Language: C and C++
• Modules
– Communication NEXUS – RPC
– Job submission GRAM
– Thread creation Globus_Common
• Preliminary results
– Parallel algorithm achieves results that are 8-17%
better than sequential algorithm
DAS-2 workshop, June 6 2002
Parallel Genetic Algorithm (cont’d)
• Problems
– Start-up
•
•
•
•
Environment and path setting
Obtaining certificates
Who is responsible for globus on das-2?
Different versions of globus (1.1.3 versus 2.0 beta)
– ‘Real’ problems
•
•
•
•
Shared libraries are not installed at nodes
Delegating proxies
Information about resource availability static or not present
Globus 2.0 is a beta version: things not implemented or missing
DAS-2 workshop, June 6 2002
Parallel Genetic Algorithm (cont’d)
• Suggestions for improvement:
– Default Globus environment
– Globus libraries on nodes via
• NFS partition
• Symbolic link to the ‘strange’ globus-edg beta 2.1 names
– ‘fork’ is default jobmanager, which only ‘schedules’
jobs to local file server (adding PBS makes code
dependent on this scheduler)
– Installation of a cluster monitor better than beowulf
– Examples and makefiles
DAS-2 workshop, June 6 2002
DAS-2 workshop, June 6 2002
DAS-2 workshop, June 6 2002
DAS-2 workshop, June 6 2002
Conclusions
• Seminar quite successful
• DAS-2
–
–
–
–
–
Great environment for teaching purposes
Start-up problems
Current setting not optimal
Who is responsible for DAS-2?
Who determines policies, implementations?
• Globus
– Documentation, examples (probably better with current
training material on globus.org)
– Installation not trivial
• IBM
– Pre-sales OK, after-sales???
DAS-2 workshop, June 6 2002
Thanks
Many thanks to David Groep who helped
our students many, many times without any
hesitation! Great job!
DAS-2 workshop, June 6 2002
Download