http://www.grid-support.ac.uk http://www.ngs.ac.uk NGS computation services: API's, concurrent and parallel jobs Mike Mineter mjm@nesc.ac.uk http://www.nesc.ac.uk/ http://www.pparc.ac.uk/ http://www.eu-egee.org/ Overview 3 distinct techniques will be in the practical: • API’s to the low-level tools • Using multiple processors – Concurrent processing - for many jobs – Parallel processing – to speed execution of one job 3 Overview • API’s to the low-level tools 4 Job submission so far Browser Portal Command-line interfaces GLOBUS, GLOBUS, etc. etc. User’s Interface to the grid 5 Application-specific tools Browser Browser Portal Application Specific Portal and / or Higher generic tools Command-line interfaces GLOBUS, etc. User’s Interface to the grid 6 Application-specific tools API’s: Browser •Java •C Portal •… Application Specific and / or Higher generic tools Command-line interfaces GLOBUS, etc. User’s Interface to the grid 7 Available API’s • “Community Grid” CoG http://www.cogkit.org/ – Java, Python, Matlab – (very limited functionality on Windows – no GSI) • C http://www.globus.org/developer/apireference.html – Example in the practical 8 Using multiple processors 9 High performance computing • If run-times are too long, you can – Optimise, re-code – Use HPC • Parallel processing: multiple processors cooperate on one task – you run one executable; its distributed for you • Concurrent processing: multiple processors, multiple tasks – multiple executables distributed and run by a script you write • High-throughput computing: multiple processors, multiple tasks using spare CPU cycles - CPU years in elapsed days. E.g. sensitivity analyses in modelling. Condor, for example. 10 High performance computing NGS core nodes open these routes to you – but you have to do a bit of work! (Grid is not magic!...) • HPC – Parallel processing: on NGS, you run one globus-jobsubmit command – but need to code and build program so it is parallelised – Concurrent processing: on NGS, multiple executables run from a script on the UI – High-throughput computing: Many independent runs using spare CPU cycles - CPU years in days. Watch for addition of Condor pools on the NGS!!!! 11 Concurrent processing UI Globus_job_submit Internet Head processors of clusters via PBS Worker processors of clusters Processes run independently 12 Concurrent processing • An approach: – Script 1: to submit multiple jobs and gather their URIs from globus-job-submit – Script 2: to test for completion and gather results – In the practical, the jobs are short, so these scripts are integrated 13 Parallel Processing • In early 1990’s parallel processing had some similarities to current state of grid computing – if on a smaller scale – – – – – National resources begun to be available Standards were needed Lot of hype Early adopters in physics Inadequate integration with the desktop • And from the jungle of conflicting approaches emerged… • MPI: Message Passing Interface http://www-unix.mcs.anl.gov/mpi/mpich/ 14 Parallel processing UI Globus_job_submit Internet Head processors of clusters Worker processors of clusters 15 Parallel processing UI Internet Head processors of clusters MPI Worker processors of clusters 16 Parallel processing UI Internet Head processors of clusters MPI Worker processors of clusters Processes communicate, synchronise (the simpler the better!) 17 MPI concepts • Each processor is assigned an ID [0,P-1] for P processors – Used to send/recv messages and to determine processing to be done • Usually code so that processor 0 has coordinating role: – Reading input files, Broadcasting data to all other processors – Receiving results • Point-to-point Messages are used to exchange data – Either blocking (wait until both processors are ready to send/receive) – Or non-blocking: start the send/recv, continue, check for send/recv completion • Collective functions – – – – Broadcast: sends data to all processors Scatter: sends array elements to different processors Gather: collects different array elements from different processors Reduce: collects data and performs operation as dta are received • Can group processors for specific purposes (e.g. functional parallelism) 18 MPI notes • How does the task split into sub-tasks? – By functions that can run in parallel??! – By sending different subsets of data to different processes? More usual ! Overheads of scatter and gather • Need to design and code carefully: be alert to – sequential parts of your program (if half your runtime is sequential, speedup will never be more than 2) – how load can be balanced (64 processes with 65 tasks will achieve no speedup over 33 processes) – Deadlock! • MPI functions are usually invoked from C, Fortran programs, but also Java • Several example patterns are given in the practical. Many MPI tutorials are on the Web! 19 Summary • Can build user applications and higher level tools on GT2 via APIs • NGS core nodes support – concurrent processing (if you do a bit of scripting / coding) – MPI for parallelism (if you parallelise your code) • Remember – the CSAR, HPCx services! – NGS is batch oriented at present. 20 Practical • http://homepages.nesc.ac.uk/~gcw/NGS/GRAM _II.html • (That II is letters not numbers on the end!) 21