Grid Computing (2) (Special Topics in Computer Engineering) Veera Muangsin 30 January 2004 1 Outline • • • • High-Performance Computing Grid Computing Grid Applications Grid Architecture – Parallel Computers Architectures – Cluster Architecture – Grid Architecture • Grid Middleware • Grid Services 2 Parallel Computer Architectures 3 Parallel Architecture Taxonomy • • • • Single Instruction Single Data (SISD ) Multiple Instruction Single Data (MISD) Single Instruction Multiple Data (SIMD) Multiple Instruction Multiple Data (MIMD) – Shared Memory MIMD – Distributed Memory MIMD 4 SISD : A Conventional Computer Instructions Data Input Processor Data Output Speed is limited by the rate at which computer can transfer information internally. Ex: PC, Macintosh, Workstations 5 The MISD Architecture Instruction Stream A Instruction Stream B Instruction Stream C Processor Data Output Stream A Data Input Stream Processor B Processor C More of an intellectual exercise than a practical configuration. Few built, but commercially not available 6 SIMD Architecture Instruction Stream Data Input stream A Data Input stream B Data Input stream C Data Output stream A Processor A Data Output stream B Processor B Processor C Data Output stream C Ci<= Ai * Bi Ex: CRAY machine vector processing 7 MIMD Architecture Instruction Instruction Instruction Stream A Stream B Stream C Data Input stream A Data Input stream B Data Input stream C Data Output stream A Processor A Data Output stream B Processor B Processor C Data Output stream C Unlike SISD, MISD, MIMD computer works asynchronously. Shared memory (tightly coupled) MIMD Distributed memory (loosely coupled) MIMD 8 Clusters • Distributed Memory MIMD • The most common architecture in the TOP500 11 Top 2-5 Clusters • #2 LANL’s ASCI Q • 13.88 TFlops • 8192-node cluster HP AlphaServer 1.25 GHz • #3 Virginia Tech’s System X • 10.28 TFlops • 1,100-node cluster, Apple G5 12 • #4 NCSA’s Tungsten • 9.81 TFlops • 1,450-node cluster, dual-processor Dell PowerEdge 1750 • #5 PNNL’s MPP2 • 8.63 TFlops • 980-node cluster, HP Longs Peak, dual Intel Itanium-2 1.5 GHz 13 Our Parallel Computers Apollo Zeus and Athena 14 Our Parallel Computers Apollo Cluster •6-node cluster •Athlon XP 2000+ processor, 512 MB memory •Linux + MPI + PBS (batch scheduler system) + Globus (Grid middleware) Zeus and Athena •Two 4-processor Sun Enterprise 420R multiprocessor computers •450 MHz UltraSPARC II processors, 1 GB memory •Solaris + Pthread + MPI 15 Cluster Architecture 16 Cluster Middleware • Resides Between OS and Applications and offers in infrastructure for supporting: – Single System Image (SSI) – System Availability (SA) • SSI makes collection appear as single machine • SA - Check pointing and process migration 17 Single System Image Components • NFS (Network File System) • NIS (Network Information System) • NTP (Network Time Protocol) server client client client 18 Programming Environments • Threads (Cluster of SMPs) – POSIX Threads – Java Threads • Message Passing – MPI – PVM • Virtual Shared Memory • Batch Scheduling – PBS, Condor, etc. 19 Batch Scheduling • Process distribution • Load balancing • Job scheduling • PBS, Condor, Sun Grid Engine, IBM Load Leveler, LSF, DQS, … 20 Cluster Applications • Sequential • Parallel / Distributed (Cluster-aware app.) – Grand Challenging applications • • • • • Weather Forecasting Quantum Chemistry Molecular Biology Modeling Engineering Analysis (CAD/CAM) ………………. – Web servers, data-mining 21 Grid Architecture 22 What is Grid ? • An infrastructure that dynamically couples – Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops, notebooks, mobile computers, PDA, and so on) – Software (e.g., renting special purpose applications on demand) – Databases (e.g., transparent access to human genome database) – Special Instruments (e.g., radio) – People • across the local/wide-area networks (enterprise, organisations, or Internet) and presents them as a unified resource or problem solving environment. 23 Grid Infrastructure 24 TeraGrid 25 Grid Applications • Old and new applications getting Grid-enabled via coupling of computers, databases, instruments, people, etc: – (distributed) Supercomputing – Collaborative engineering – high-throughput computing • large scale simulation & parameter studies – Remote software access / Renting Software – Data-intensive computing – On-demand computing 26 Conceptual view of the Grid 27 How can the Grid help me? • Provide access to a global distributed computing environment – via authentication, authorisation, negotiation, security • Identify and allocate appropriate resources – – – – interrogate information services -> resource discovery enquire current status/loading via monitoring tools decide strategy - eg move data or move application (co-)allocate resources -> process flow 28 How can the Grid help me? (2) • Schedule tasks and analyse results – ensure required application code is available on remote machine – transfer or replicate data and update catalogues – monitor execution and resolve problems as they occur – retrieve and analyse results - eg using local visualization 29 To make this happen you need … • • • • • • • agreed protocols (Grid protocols) defined application programming interfaces (APIs) distributed data management availability of current status of resources monitoring tools accepted authentication procedures and policies network traffic management 30 Grid Components Applications and Portals Scientific Engineering Collaboration … Prob. Solving Env. Development Environments and Tools Languages Libraries Debuggers Monitoring Resource Brokers Web enabled Apps … Distributed Resources Coupling Services Comm. Sign on & Security Information Process Data Access Web tools … QoS Grid Apps. Grid Tools Grid Middleware Local Resource Managers Operating Systems Computers Queuing Systems Clusters Libraries & App Kernels Networked Resources across Organisations Storage Systems Data Sources … … TCP/IP & UDP Scientific Instruments Grid Fabric Before the Grid User Application The User is responsible for resolving the complexities of the environment Network Site A • independent sites • independent hardware and software • independent user ids • security policy requiring local connection to the machine. Site B 32 First Step to the Grid Metacenter User Application A layer of abstraction is added that hides some of the complexities associated with running jobs in a distributed computing environment, however, limitations exist Network Centralized Scheduler and file staging Site A Site B • Two or more resources connected in a controlled user environment Constraints • common architecture • single name space • common scheduler 33 The Grid Today 1 Request info from the grid 2 Get response 3 Make selection and submit job User Application 1 2 3 The underlying infrastructure is abstracted into Middleware defined APIsGrid thereby simplifying developer and the user access to resources, however, this layer is not Infrastructure intelligent Network Site A Common Middleware - abstracts independent, hardware, software, user ids, into a service layer with defined APIs - comprehensive security, - allows for site autonomy - provides a common infrastructure based on middleware Site B 34 The Near Future Grid User Application Resources are accessed via various intelligent services that access Intelligent, Customized Middleware infrastructure APIs Grid Middleware - Infrastructure APIs The result: The (service Scientist oriented) and Application Developer can Infrastructure focus on science and not on systems management Network Site A Customizable Grid Services built on defined Infrastructure APIs • automatic selection of resources • information products tailored to users • accountless processing • flexible interface: web based, command line, APIs Site B 35 How the User Sees a Grid • A set of grid functions that are available as – Application programmer interfaces (APIs) – Command-line functions • After authentication, functions can be used to – – – – Spawn jobs on different processors with a single command Access data on remote systems Move data from one processor to another Support the communication between programs executing on different processors – Discover the properties of computational resources available on the grid using the grid information service – Use a broker to select the best place for a job to run and then negotiate the reservation and execution (coming soon). Tom Hinke 36 Many GRID Projects and Initiatives • Public Grid Initiatives • PUBLIC FORUMS – – – – – Computing Portals Grid Forum European Grid Forum IEEE TFCC! GRID’2000 and more. • Australia – Nimrod/G – EcoGrid and GRACE – DISCWorld • Europe – – – – – – – – – – UNICORE MOL METODIS Globe Poznan Metacomputing CERN Data Grid MetaMPI DAS JaWS and many more... – Distributed.net – SETI@Home – Compute Power Grid • USA – – – – – – – – – – – – Globus Legion JAVELIN AppLes NASA IPG Condor Harness NetSolve NCSA Workbench WebFlow EveryWhere and many more... • Japan – Ninf – Bricks – and many more... http://www.gridcomputing.com/ 37 NetSolve Client/Server/Agent -- Based Computing Easy-to-use tool to provide efficient and uniform access to a variety of scientific packages on UNIX platforms • • • • • • • Client-Server design Network-enabled solvers Network Resources Seamless access to resources Non-hierarchical system Load Balancing Fault Tolerance reply Interfaces to Fortran, C, Java, Matlab, more Software Repository choice request Software is available www.cs.utk.edu/netsolve/ NetSolve Client NetSolve Agent 38 Nimrod - A Job Management System http://www.dgs.monash.edu.au/~davida/nimrod.html39 Job processing with Nimrod 40 Nimrod/G Architecture Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod Engine Schedule Advisor Trading Manager Persistent Store Dispatcher Grid Explorer TM Middleware Services TS GE GIS Grid Information Services RM & TS RM & TS RM & TS GUSTO Test Bed 41 RM: Local Resource Manager, TS: Trade Server Compute Power Market Grid Information Server Grid Explorer Application Job Control Agent Schedule Advisor Trade Server Charging Alg. Trading Trade Manager Deployment Agent User Resource Broker Accounting Resource Reservation Other services Resource Allocation R1 R2 … Rn A Resource Domain 42 Globus Toolkit • Grid computing middleware – Software between the hardware and high-level services – Basic libraries, services, command-line programs • Most common middleware used in grids • Integrated with Web Service 43 Globus Software Architecture •login •execute commands •copy files information about resources and services Monitoring and Discovery Service (MDS) LDAP distributed directory service •get and put files •3rd party copy •interactive file management •parallel transfers Grid Grid FTP SSH Grid Security Infrastructure (GSI) X.509 Certificates SSL/TLS credentials for users, services, hosts •execute remote applications •stage executable, stdin, stdout, stderr Globus Resource Allocation Manager (GRAM) PBS LSF fork/exe c job management systems •authentication •secure communication •single sign on •delegation of credentials •authorization 44 Globus Deployment Architecture User Globus client system Grid FTP Client MDS server system Grid FTP Server Globus server system User Web portal application/tool GRAM Grid SSH MDS Client Client Client Clients are programs and libraries MDS GIIS GRAM Server Grid SSH Server Grid SSH Server GRAM Server PBS MDS GRIS MDS GRIS LSF Grid FTP Server Globus server system 45 For More Information • Globus Project™ – www.globus.org • Grid Forum – www.gridforum.org • Book (Morgan Kaufman) – www.mkp.com/grids 46