Ibis: A Programming System for Real-World Distributed Computing Henri Bal bal@cs.vu.nl Vrije Universiteit Amsterdam Introduction ● Distributed systems continue to change ● ● Distributed applications continue to change ● ● Clusters, grids, clouds, mobile devices e-Science, web, pervasive applications Distributed programming continues to be notoriously difficult Distributed Systems: 1980s ● Networks of Workstations (NOWs) ● Collections of Workstations (COWs) ● Processor pools ● Condor pools ● Clusters Distributed Systems: 1990s ● Metacomputing (Smarr & Catlett, CACM) ● Flocking Condor (Epema) ● DAS (Distributed ASCI Supercomputer) ● Grid Blueprint (Foster & Kesselman) ● Desktop grids, SETI@home Distributed Systems: 2000s ● Optical networks, light paths IJKDIJK ● Sensor networks ● Distributed smart phones ● Cloud computing Clouds at Euro-Par 2009 ? View of Delft, Johannes Vermeer (1659) Real-world distributed systems World wide testbed Problem ● ● How to write (high-performance) applications for real-world distributed systems? How to deal with: ● Performance: efficiency on wide-area system ● Heterogeneity: different systems & APIs ● Malleability: resources come and go ● Fault tolerance: crashes ● Connectivity: firewalls, NAT, etc. Our approach ● Study fundamental underlying problems ● … hand-in-hand with realistic applications ● … integrate solutions in one system: Ibis User Distributed Systems Outline rest of talk ● Distributed applications ● The Ibis distributed programming system ● Demo (movie) ● Distributed smart phones applications Applications ● Scientific applications ● Imaging (VU Medical Center, AMOLF) ● Bioinformatics (sequence analysis) ● Astronomy (data analysis challenge) ● Multimedia content analysis ● Games and model checking ● Semantic web (distributed reasoning) Multimedia content analysis ● Automatically extract information from images & video ● ● Extract feature vectors from images ● ● ● E.g., video archive, surveillance cameras Describe properties (color, shape) Data-parallel task on a cluster Compute on consecutive images ● Task-parallelism on a grid Example: object recognition ● ● Analyze video stream from camera to learn and recognize every-day objects Representative for more serious applications ● ● Same algorithms used for surveillance cameras London Underground >120.000 years of processing for >> 10.000’s CCTV cameras Games and Model Checking ● Can solve entire Awari game on wide-area DAS-3 (889 B positions) ● ● Distributed model checking has very similar communication pattern ● ● Needs 10G private optical network [CCGrid’08] Search huge state spaces, random work distribution, bulk asynchronous transfers Can efficiently run DeVinE model checker on widearea DAS-3, use up to 1 TB memory [IPDPS’09] DAS-3 Required wide-area bandwidth Distributed reasoning ● MaRVIN (Frank van Harmelen et al, VU): ● ● ● ● A distributed platform for massive RDF inferencing (deductive closure) ``a brain the size of a planet’’ Uses Ibis to run on heterogeneous systems (clusters, desktop grids) Used for Billion Triple track of Semantic Web Challenge 2008 ● Inputs 800M RDF triples, derives 29B triples Awards Astronomy DACH 2008 – BS DACH 2008 - FT (Cluster/Grid’08) SCALE 2008 (CCGrid’08) ISWC 2008 Multimedia Computing AAAI-VC 2007 Semantic Web (van Harmelen et al.) Outline rest of talk ● Distributed applications ● The Ibis distributed programming system ● Demo (movie) ● Distributed smart phones applications Ibis Philosophy ● Real-world distributed applications should be developed and compiled on a local workstation, and simply be launched from there Ibis Approach ● Virtual Machines (Java) deal with heterogeneity ● Provide range of programming abstractions ● Designed for dynamic/faulty environments ● ● Easy deployment through middlewareindependent programming interfaces Modular and flexible: can replace Ibis components by external ones Ibis Design ● Applications need functionality for ● Programming (as in programming languages) ● Deployment (as in operating systems) Programming Deployment Logical Practical Likes math Visual (GUI) Ibis System Ibis brains Programming system Programming models ● ● Message passing (IPL, RMI, MPJ) Satin: ● Fault-tolerant, malleable divide-and-conquer system [ACM TOPLAS 2009] cpu 2 fib(2) fib(1) ● fib(4) fib(3) fib(3) fib(2) fib(2) fib(1) fib(1) fib(0) fib(1) fib(0) fib(0) cpu 3 cpu 1 Jorus: ● ● fib(1) fib(5) Transparent library with multimedia operations cpu 1 Maestro: ● Self-optimizing fault-tolerant dataflow framework [HPDC’09] IPL (Ibis Portability Layer) ● Java-centric “run-anywhere” library ● Point-to-point, multicast, streaming ● Simple model for tracking resources ● Join-Elect-Leave ● Supports malleability & fault-tolerance SmartSockets library ● Detects connectivity problems ● Tries to solve them automatically ● ● Integrates existing and several new solutions ● ● With as little help from the user as possible Reverse connection setup, STUN, TCP splicing, SSH tunneling, smart addressing, etc. Uses network of hubs as a side channel SmartSockets Ibis Deployment system IbisDeploy GUI JavaGAT ● GAT: Grid Application Toolkit ● ● Used by applications to access grid services ● ● Makes grid applications independent of the underlying grid infrastructure File copying, resource discovery, job submission & monitoring, user authentication Successor API is currently being standardized Grid Applications with GAT File.copy(...) Grid Application GAT Remote Files submitJob(...) Monitoring Info Resource service Management GAT Engine GridLab Globus Unicore SSH P2P Local Intelligent dispatching gridftp globus Koala Zorilla: Java P2P supercomputing middleware Ibis demo (movie) Object recognition Client Servers Ibis (Java) ● ● Runs simultaneously on clusters (DAS-3, Japan, Australia), Desktop Grid, Amazon EC2 Cloud Connectivity problems solved automatically by Ibis SmartSockets Broker Ibis movie (part 1) Performance on 1 DAS-3 cluster ● Relative speedups of Java/Ibis and C++/MPI ● ● Using TCP or Myricom’s MX protocol Sequential performance Java: 88% of C++ Speedup (wide-area) ● Homogeneous wide-area systems (DAS-3): ● ● Frame rate increases linearly with #clusters World-wide experiment: ● ● 24 frames per second (@ 640 x 480 resolution) Speed limited by camera, not computing infrastructure Outline rest of talk ● Distributed applications ● The Ibis distributed programming system ● Demo (movie) ● Distributed smart phones applications Smart Phones ● GSM + PC + GPS + camera + networks + …. ● Will become ubiquitous (like GSMs) ● Our goal: study distributed applications running on (multiple) smart phones & other resources Distributed smart phone applications ● Current model: client/server ● ● ● Client runs on the phone Server runs in a cloud provided by developer Disadvantages ● ● User depends on service provider Developer must deal with scalability, cost etc heavy weight app client heavy weight server market cloud smartphone Cyber Foraging ● ``Dynamically augment the computing resources of a wireless mobile computer by exploiting wired hardware infrastructure (surrogates)’’ ● ● Surrogates ● ● ● ● ``Living off the land’’ [Satyanarayanan, IEEE Pers. Comm. 2001] Any PC, cluster, grid, cloud … No pre-installed application code Can be used for different applications Requires deployment and communication systems → Ibis Cyber Foraging with Ibis ● Implemented Ibis on Android ● ● ● Google’s open-source Java-based platform Ibis deployment system ● JavaGAT (SSH adaptor) ● IbisDeploy library + GUI Ibis programming system ● SmartSockets library ● IPL + Jorus multimedia library Application: eyeDentify ● Object recognition on a G1 smartphone ● Smartphone is too limited for the application ● Can reduce accuracy parameters of the algorithm ● Can run only up to 128 x 96 pixels (memory bound) eyeDentify with cyber foraging ● Ibis cyber foraging version [ISM’09] ● ● Deploys computation server (with high accuracy and large images) on a surrogate (DAS-3 cluster) Launched from IbisDeploy/eyeDentify client on phone + + Comparison ● ● Response time for 64 x 48 pixels ● Standalone version: 32 sec ● Foraging: 0.54 sec (0.12 sec computation) Response time for 2048 x 1536 pixels ● ● ● Standalone: would take ~ 20 minutes with enough memory Foraging: 6.5 sec (4.9 sec computation) Foraging version is 40x more energy-efficient Ibis movie (part 2) Other distributed applications ● Disaster management (Katrina) ● Use ad-hoc Wifi network when GSM network fails ● Finding nearby people with certain skills ● ● Distributed decision support ● ● Bus drivers, CPR Moving people to shelters (logistics) Social networks ● Similar issues ● Find nearby friends, decide on restaurant Another serious app ● Track position → automatic diary of your life ● Cross-comparisons between diaries Haven’t we met before? Yes, on 23 Oct 2010, 3.48 pm at N 52°22.688´ E 004°53.990´ Interdroid Novel Mobile Distributed Applications Data Management Distributed Communication Context Sensitive Programming Models Summary ● Ibis provides integrated solutions for many hard problems ● ● performance, heterogeneity, malleability, fault tolerance, connectivity Used for many applications on real-world distributed systems ● Extends to the mobile world ● Download from http://www.cs.vu.nl/ibis/ Future work: DAS-4 (2010) Head node GPU Cell FPGA nodes Classic nodes 10 Gb/s E t h e r n e t s w i t c h To local network ... ... To SURFnet / other DAS sites Tunable transponders to photonic network SURFnet By 10, 40 or 100 Gb/s lambda’s Acknowledgements Niels Drost Ceriel Jacobs Roelof Kemp Timo van Kessel Thilo Kielmann Jason Maassen Rob van Nieuwpoort Nick Palmer Kees van Reeuwijk Frank J. Seinstra Kees Verstoep Gosia Wrzesinska Questions? ?