Ibis: A Programming System for Real-World Distributed Computing Henri Bal

advertisement
Ibis: A Programming System for
Real-World Distributed Computing
Henri Bal
bal@cs.vu.nl
Vrije Universiteit Amsterdam
Introduction
●
Distributed systems continue to change
●
●
Distributed applications continue to change
●
●
Clusters, grids, clouds, mobile devices
e-Science, web, pervasive applications
Distributed programming continues to be
notoriously difficult
Distributed Systems: 1980s
●
Networks of Workstations (NOWs)
●
Collections of Workstations (COWs)
●
Processor pools
●
Condor pools
●
Clusters
Distributed Systems: 1990s
●
Metacomputing (Smarr & Catlett, CACM)
●
Flocking Condor (Epema)
●
DAS (Distributed ASCI Supercomputer)
●
Grid Blueprint (Foster & Kesselman)
●
Desktop grids, SETI@home
Distributed Systems: 2000s
●
Optical networks, light paths
IJKDIJK
●
Sensor networks
●
Distributed smart phones
●
Cloud computing
Clouds at Euro-Par 2009 ?
View of Delft, Johannes Vermeer (1659)
Real-world distributed
systems
World wide testbed
Problem
●
●
How to write (high-performance) applications
for real-world distributed systems?
How to deal with:
●
Performance:
efficiency on wide-area system
●
Heterogeneity:
different systems & APIs
●
Malleability:
resources come and go
●
Fault tolerance: crashes
●
Connectivity:
firewalls, NAT, etc.
Our approach
●
Study fundamental underlying problems
●
… hand-in-hand with realistic applications
●
… integrate solutions in one system: Ibis
User
Distributed Systems
Outline rest of talk
●
Distributed applications
●
The Ibis distributed programming system
●
Demo (movie)
●
Distributed smart phones applications
Applications
●
Scientific applications
●
Imaging (VU Medical Center, AMOLF)
●
Bioinformatics (sequence analysis)
●
Astronomy (data analysis challenge)
●
Multimedia content analysis
●
Games and model checking
●
Semantic web (distributed reasoning)
Multimedia content analysis
●
Automatically extract information from
images & video
●
●
Extract feature vectors from images
●
●
●
E.g., video archive, surveillance cameras
Describe properties (color, shape)
Data-parallel task on a cluster
Compute on consecutive images
●
Task-parallelism on a grid
Example: object recognition
●
●
Analyze video stream from camera to learn
and recognize every-day objects
Representative for more serious applications
●
●
Same algorithms used for surveillance cameras
London Underground  >120.000 years of processing for
>> 10.000’s CCTV cameras
Games and Model Checking
●
Can solve entire Awari game on
wide-area DAS-3 (889 B positions)
●
●
Distributed model checking has very
similar communication pattern
●
●
Needs 10G private optical network [CCGrid’08]
Search huge state spaces, random work distribution, bulk
asynchronous transfers
Can efficiently run DeVinE model checker on widearea DAS-3, use up to 1 TB memory [IPDPS’09]
DAS-3
Required wide-area
bandwidth
Distributed reasoning
●
MaRVIN (Frank van Harmelen et al, VU):
●
●
●
●
A distributed platform for massive RDF
inferencing (deductive closure)
``a brain the size of a planet’’
Uses Ibis to run on heterogeneous systems
(clusters, desktop grids)
Used for Billion Triple track of Semantic Web
Challenge 2008
●
Inputs 800M RDF triples, derives 29B triples
Awards
Astronomy
DACH 2008 – BS
DACH 2008 - FT
(Cluster/Grid’08)
SCALE 2008
(CCGrid’08)
ISWC 2008
Multimedia
Computing
AAAI-VC
2007
Semantic Web (van Harmelen et al.)
Outline rest of talk
●
Distributed applications
●
The Ibis distributed programming system
●
Demo (movie)
●
Distributed smart phones applications
Ibis Philosophy
●
Real-world distributed applications should be
developed and compiled on a local workstation,
and simply be launched from there
Ibis Approach
●
Virtual Machines (Java) deal with heterogeneity
●
Provide range of programming abstractions
●
Designed for dynamic/faulty environments
●
●
Easy deployment through middlewareindependent programming interfaces
Modular and flexible: can replace Ibis
components by external ones
Ibis Design
●
Applications need functionality for
●
Programming (as in programming languages)
●
Deployment (as in operating systems)
Programming
Deployment
Logical
Practical
Likes math
Visual (GUI)
Ibis System
Ibis brains
Programming system
Programming models
●
●
Message passing (IPL, RMI, MPJ)
Satin:
●
Fault-tolerant, malleable
divide-and-conquer system
[ACM TOPLAS 2009]
cpu 2
fib(2)
fib(1)
●
fib(4)
fib(3)
fib(3)
fib(2)
fib(2)
fib(1)
fib(1)
fib(0)
fib(1)
fib(0)
fib(0)
cpu 3
cpu 1
Jorus:
●
●
fib(1)
fib(5)
Transparent library with multimedia operations
cpu 1
Maestro:
●
Self-optimizing fault-tolerant dataflow
framework [HPDC’09]
IPL (Ibis Portability Layer)
●
Java-centric “run-anywhere” library
●
Point-to-point, multicast, streaming
●
Simple model for tracking resources
●
Join-Elect-Leave
●
Supports malleability & fault-tolerance
SmartSockets library
●
Detects connectivity problems
●
Tries to solve them automatically
●
●
Integrates existing and several new solutions
●
●
With as little help from the user as possible
Reverse connection setup, STUN, TCP splicing,
SSH tunneling, smart addressing, etc.
Uses network of hubs as a side channel
SmartSockets
Ibis Deployment system
IbisDeploy GUI
JavaGAT
●
GAT: Grid Application Toolkit
●
●
Used by applications to access grid services
●
●
Makes grid applications independent of the
underlying grid infrastructure
File copying, resource discovery, job submission
& monitoring, user authentication
Successor API is currently being standardized
Grid Applications with GAT
File.copy(...)
Grid Application
GAT
Remote
Files
submitJob(...)
Monitoring
Info
Resource
service
Management
GAT Engine
GridLab
Globus
Unicore
SSH
P2P
Local
Intelligent
dispatching
gridftp
globus
Koala
Zorilla: Java P2P
supercomputing middleware
Ibis demo (movie)
Object recognition
Client
Servers
Ibis
(Java)
●
●
Runs simultaneously on clusters
(DAS-3, Japan, Australia), Desktop
Grid, Amazon EC2 Cloud
Connectivity problems solved
automatically by Ibis SmartSockets
Broker
Ibis movie (part 1)
Performance on 1 DAS-3 cluster
●
Relative speedups of Java/Ibis and C++/MPI
●
●
Using TCP or Myricom’s MX protocol
Sequential performance Java: 88% of C++
Speedup (wide-area)
●
Homogeneous wide-area systems (DAS-3):
●
●
Frame rate increases linearly with #clusters
World-wide experiment:
●
●
24 frames per second
(@ 640 x 480 resolution)
Speed limited by camera,
not computing infrastructure
Outline rest of talk
●
Distributed applications
●
The Ibis distributed programming system
●
Demo (movie)
●
Distributed smart phones applications
Smart Phones
●
GSM + PC + GPS + camera + networks + ….
●
Will become ubiquitous (like GSMs)
●
Our goal: study distributed applications running
on (multiple) smart phones & other resources
Distributed smart phone
applications
●
Current model: client/server
●
●
●
Client runs on the phone
Server runs in a cloud
provided by developer
Disadvantages
●
●
User depends on
service provider
Developer must deal with
scalability, cost etc
heavy weight app
client
heavy weight server
market
cloud
smartphone
Cyber Foraging
●
``Dynamically augment the computing resources of
a wireless mobile computer by exploiting wired
hardware infrastructure (surrogates)’’
●
●
Surrogates
●
●
●
●
``Living off the land’’ [Satyanarayanan, IEEE Pers. Comm. 2001]
Any PC, cluster, grid, cloud …
No pre-installed application code
Can be used for different applications
Requires deployment and
communication systems → Ibis
Cyber Foraging with Ibis
●
Implemented Ibis on Android
●
●
●
Google’s open-source Java-based platform
Ibis deployment system
●
JavaGAT (SSH adaptor)
●
IbisDeploy library + GUI
Ibis programming system
●
SmartSockets library
●
IPL + Jorus multimedia library
Application: eyeDentify
●
Object recognition on a G1 smartphone
●
Smartphone is too limited for the application
●
Can reduce accuracy parameters of the algorithm
●
Can run only up to 128 x 96 pixels (memory bound)
eyeDentify with cyber foraging
●
Ibis cyber foraging version [ISM’09]
●
●
Deploys computation server (with high accuracy and
large images) on a surrogate (DAS-3 cluster)
Launched from IbisDeploy/eyeDentify client on phone
+
+
Comparison
●
●
Response time for 64 x 48 pixels
●
Standalone version: 32 sec
●
Foraging: 0.54 sec (0.12 sec computation)
Response time for 2048 x 1536 pixels
●
●
●
Standalone: would take ~ 20 minutes with
enough memory
Foraging: 6.5 sec (4.9 sec computation)
Foraging version is 40x more energy-efficient
Ibis movie (part 2)
Other distributed applications
●
Disaster management (Katrina)
●
Use ad-hoc Wifi network when GSM network fails
●
Finding nearby people with certain skills
●
●
Distributed decision support
●
●
Bus drivers, CPR
Moving people to shelters (logistics)
Social networks
●
Similar issues
●
Find nearby friends, decide on restaurant
Another serious app
●
Track position → automatic diary of your life
●
Cross-comparisons between diaries
Haven’t
we met
before?
Yes, on 23 Oct 2010, 3.48 pm at
N 52°22.688´ E 004°53.990´
Interdroid
Novel Mobile Distributed Applications
Data Management
Distributed Communication
Context Sensitive
Programming
Models
Summary
●
Ibis provides integrated solutions for many
hard problems
●
●
performance, heterogeneity, malleability, fault
tolerance, connectivity
Used for many applications on real-world
distributed systems
●
Extends to the mobile world
●
Download from http://www.cs.vu.nl/ibis/
Future work: DAS-4 (2010)
Head node
GPU
Cell
FPGA
nodes
Classic
nodes
10 Gb/s
E
t
h
e
r
n
e
t
s
w
i
t
c
h
To local network
...
...
To SURFnet / other DAS sites
Tunable transponders to
photonic network SURFnet
By 10, 40 or 100 Gb/s
lambda’s
Acknowledgements
Niels Drost
Ceriel Jacobs
Roelof Kemp
Timo van Kessel
Thilo Kielmann
Jason Maassen
Rob van Nieuwpoort
Nick Palmer
Kees van Reeuwijk
Frank J. Seinstra
Kees Verstoep
Gosia Wrzesinska
Questions?
?
Download