The Ibis Project: Simplifying Grid Programming & Deployment Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam The ‘Promise of the Grid’ Efficient and transparent (i.e. easy-to-use) wall-socket computing over a distributed set of resources [Sunderam ICCS’2004, based on Foster/Kesselman] Parallel computing on grids ● Mostly limited to ● trivially parallel applications ● ● applications that run on one cluster at a time ● ● parameter sweeps, master/worker use grid to schedule application on a suitable cluster Our goal: run real parallel applications on a large-scale grid, using co-allocated resources Efficient wide-area algorithms ● ● ● Latency-tolerant algorithms with asynchronous communication ● Search algorithms (Awari-solver [CCGrid’08]) ● Model checkers (DiVinE [PDMC’08]) Algorithms with hierarchical communication ● Divide-and-conquer ● Broadcast trees ….. Reality: ‘Problems of the Grid’ ● ● ● Performance & scalability Heterogeneous Low-level & changing programming interfaces User ● ● ● ● Connectivity issues Fault tolerance Malleability Wide-Area Grid Systems writing & deploying grid applications is hard The Ibis Project ● Goal: ● drastically simplify grid programming/deployment ● write and go! Approach (1) ● Write & go: minimal assumptions about execution environment ● ● Use middleware-independent APIs ● ● Virtual Machines (Java) deal with heterogeneity Mapped automatically onto middleware Different programming abstractions ● Low-level message passing ● High-level divide-and-conquer Approach (2) ● Designed to run in dynamic/hostile grid environment ● ● ● Handle fault-tolerance and malleability Solve connectivity problems automatically (SmartSockets) Modular and flexible: can replace Ibis components by external ones ● Scheduling: Zorilla P2P system or external broker Global picture Rest of talk Applications Satin: divide & conquer Communication layer (IPL) SmartSockets JavaGAT Zorilla P2P Outline ● ● ● Grid programming ● IPL ● Satin ● SmartSockets Grid deployment ● JavaGAT ● Zorilla Applications and experiments Ibis Design Ibis Portability Layer (IPL) ● ● Java-centric “run-anywhere” library ● Sent along with the application (jar-files) ● Point-to-point, multicast, streaming, …. Efficient communication ● ● Configured at startup, based on capabilities (multicast, ordering, reliability, callbacks) Bytecode rewriter avoids serialization overhead Serialization ● Based on bytecode-rewriting ● ● source Adds (de)serialization code to serializable types Prevents reflection overhead during runtime Java compiler bytecode bytecode rewriter JVM bytecode JVM JVM Membership Model ● ● JEL (Join-Elect-Leave) model Simple model for tracking resources, supports malleability & fault-tolerance ● Notifications of nodes joining or leaving ● Elections ● Supports all common programming models ● Centralized and distributed implementations ● Broadcast trees, gossiping Programming models ● Remote Method Invocation (RMI) ● Group Method Invocation (GMI) ● MPJ (MPI Java 'standard') ● Satin (Divide & Conquer) Satin: divide-and-conquer fib(5) ● fib(4) Divide-and-conquer is fib(3) inherently hierarchical fib(3) fib(2) fib(2) More general than fib(2) fib(1) fib(1) fib(0) fib(1) master/worker fib(0) fib(1) Cilk-like primitives (spawn/sync) in Java Supports malleability and fault-tolerance Supports data-sharing between different branches through Shared Objects cpu 2 ● fib(1) fib(0) cpu 3 cpu 1 ● ● ● cpu 1 Satin implementation ● Load-balancing is done automatically ● ● ● Cluster-aware Random Stealing (CRS) Combines Cilk’s Random Stealing with asynchronous wide-area steals Self-adaptive malleability and fault-tolerance ● ● Add/remove machines on the fly Survive crashes by efficient recomputations/checkpointing Self-adaptation with Satin ● Adapt #CPUs to level of parallelism ● Migrate work from overloaded to idle CPUs ● Remove CPUs with poor network connectivity ● Add CPUs dynamically when ● ● ● Level of parallelism increases CPUs were removed or crashed Can also remove/add entire clusters ● E.g., for network problems [Wrzesinska et al., PPoPP’07 ] Approach ● Weighted Average Efficiency (WAE): 1/#CPUs * Σ speedi * (1 – overheadi ) overhead is fraction idle+communication time speedi= relative speed of CPUi (measured periodically) ● General idea: Keep WAE between Emin (30%) and Emax(50%) Iteration duration Overloaded network link Iteration ● Uplink of 1 cluster reduced to 100 KB/s ● Remove badly connected cluster, get new one Connectivity Problems ● ● ● Firewalls & Network Address Translation (NAT) restrict incoming traffic Addressing problems ● Machines with >1 network interface (IP address) ● Machine on a private network (e.g., NAT) No direct communication allowed ● E.g., between compute nodes and external world SmartSockets library ● Detects connectivity problems ● Tries to solve them automatically ● ● Integrates existing and several new solutions ● ● With as little help from the user as possible Reverse connection setup, STUN, TCP splicing, SSH tunneling, smart addressing, etc. Uses network of hubs as a side channel Example Example [Maassen et al., HPDC’07 ] Overview JavaGAT Zorilla P2P JavaGAT ● GAT: Grid Application Toolkit ● ● Used by applications to access grid services ● ● Makes grid applications independent of the underlying grid infrastructure File copying, resource discovery, job submission & monitoring, user authentication API is currently standardized (SAGA) ● SAGA implemented on JavaGAT Grid Applications with GAT File.copy(...) Grid Application GAT Remote Files submitJob(...) Monitoring Info Resource service Management GAT Engine GridLab Globus Unicore SSH P2P Local Intelligent dispatching gridftp globus [van Nieuwpoort et al., SC’07 ] Zorilla: Java P2P supercomputing middleware Zorilla components ● Job management ● ● Handling malleability and crashes Robust Random Gossiping ● Periodic information exchange between nodes ● Robust against Firewalls, NATs, failing nodes ● Clustering: nearest neighbor ● Flood scheduling ● Incrementally search for resources at more and more distant nodes [Drost et al., HPDC’07 ] Overview Ibis applications ● e-Science (VL-e) ● Brain MEG-imaging ● Mass spectroscopy ● Multimedia content analysis ● Various parallel applications ● ● SAT-solver, N-body, grammar learning, … Other programming systems ● Workflow engine for astronomy (D-grid), grid file system, ProActive, Jylab, … Overview experiments ● DAS-3: Dutch Computer Science grid ● Satin applications on DAS-3 ● Zorilla desktop grid experiment ● Multimedia content analysis ● High resolution video processing DAS-3 272 nodes (AMD Opterons) 792 cores 1TB memory LAN: Myrinet 10G Gigabit Ethernet WAN (StarPlane): 20-40 Gb/s OPN Heterogeneous: 2.2-2.6 GHz Single/dual-core Delft no Myrinet Gene sequence comparison in Satin (on DAS-3) Speedup on 1 cluster Run times on 5 clusters • Divide&conquer scales much better than master-worker • 78% efficiency on 5 clusters (with 1462 WAN-msgs/sec) Barnes-Hut (Satin) on DAS-3 Speedup on 1 cluster Run times on 5 clusters • Shared object extension to D&C model improves scalability • 57% efficiency on 5 clusters (with 1371 WAN-msgs/sec) Zorilla Desktop Grid Experiment ● ● Small experimental desktop grid setup ● Student PCs running Zorilla overnight ● PCs with 1 CPU, 1GB memory, 1Gb/s Ethernet Experiment: gene sequence application ● 16 cores of DAS-3 with Globus ● 16 core desktop grid with Zorilla ● Combination, using Ibis-Deploy Ibis-Deploy deployment tool 877 sec 3574 sec • 1099 sec Easy deployment with Zorilla, JavaGAT & Ibis-Deploy Multimedia content analysis ● Analyzes video streams to recognize objects ● Extract feature vectors from images ● ● Describe properties (color, shape) ● Data-parallel task implemented with C++/MPI Compute on consecutive images ● Task-parallelism on a grid MMCA application Client (Java) Parallel Parallel Servers Horus Horus (C++) Server Servers Ibis (local desk-top machine) (Java) Broker (grid) (Java) (any machine world-wide) MMCA with Ibis ● Initial implementation with TCP was unstable ● Ibis simplifies communication, fault tolerance ● SmartSockets solves connectivity problems ● Clickable deployment interface ● ● Demonstrated at many conferences (SC’07) 20 clusters on 3 continents, 500-800 cores ● Frame rate increased from 1/30 to 15 frames/sec [Seinstra et al., IEEE Multimedia’07 ] MMCA ‘Most Visionary Research’ award at AAAI 2007, (Frank Seinstra et al.) High Resolution Video Processing ● Realtime processing of CineGrid movie data ● ● 3840x2160 (4xHD) @ 30 fps = 1424 MB/sec Multi-cluster processing pipeline ● Using DAS-3, StarPlane and Ibis CineGrid with Ibis ● ● Use of StarPlane requires no configuration ● StarPlane is connected to local Myrinet network ● Detected & used automatically by SmartSockets Easy setup of application pipeline ● Connection administration of application is simplified by the IPL election mechanism ● Simple multi-cluster deployment (Ibis-Deploy) ● Uses Ibis serialization for high throughput Summary ● Goal: Simplify grid programming/deployment ● Key ideas in Ibis ● Virtual machines (JVM) deal with heterogeneity ● High-level programming abstractions (Satin) ● Handle fault-tolerance, malleability, connectivity problems automatically (Satin, SmartSockets) ● Middleware-independent APIs (JavaGAT) ● Modular Acknowledgements Current members Past members Rob van Nieuwpoort Jason Maassen Thilo Kielmann Frank Seinstra Niels Drost Ceriel Jacobs Kees Verstoep John Romein Gosia Wrzesinska Rutger Hofman Maik Nijhuis Olivier Aumage Fabrice Huet Alexandre Denis Roelof Kemp Kees van Reeuwijk More information ● Ibis can be downloaded from ● ● Papers: ● • http://www.cs.vu.nl/ibis Satin [PPoPP’07], SmartSockets [HPDC’07], Gossiping [HPDC’07], JavaGAT [SC’07], MMCA [IEEE Multimedia’07] Ibis tutorials • Next one at CCGrid 2008 (19 May, Lyon)