Towards Jungle Computing with Ibis/Constellation Jason Maassen, Niels Drost Henri Bal, Frank Seinstra

advertisement
Towards Jungle Computing with
Ibis/Constellation
Jason Maassen, Niels Drost
Henri Bal, Frank Seinstra
Department of Computer Science
VU University, Amsterdam, The Netherlands
Introduction
●
HPC is entering many domains
●
●
●
Not just: physics / chemistry / climate modelling
Also: semantic web / medical / multimedia analysis /
neuroinformatics / remote sensing / astronomy / ...
HPC is becoming more complex
●
●
Not just large SMP or clusters, instead:
● Clusters of SMPs / Grids / Clouds / Supers / ...
● Heterogenous machines using GPU / Cell / FPGA
“It‟s a jungle out there“
3DAPAS Workshop 2011
2
Example Domain
Computational Astrophysics (amusecode.org)
Jungle Computing
●
Worst case computing ... as required by users
●
Arbitrary combination of distributed, hierarchical,
and heterogenous computing
3DAPAS Workshop 2011
4
Many Task Computing
According to Raicu, Foster, et al [SC‟08]
“High-performance computations comprising multiple
distinct activities, coupled via file system operations or
message passing. Tasks may be small or large,
uni-processor or multi-processor, compute-intensive or
data-intensive. The set of tasks may be static or
dynamic, homogeneous or heterogeneous, loosely
coupled or tightly coupled. The aggregate number of
tasks, quantity of computing, and volumes of data may
be extremely large.”
●
Applications are dynamic and heterogeneous
workflows / DAGs of activities
3DAPAS Workshop 2011
5
MTC in the Jungle
●
MTC has advantages for Jungle Computing
●
●
Many distinct activities
● Can be implemented independently using the tools
and targeted to the HPC architecture, that best suit
them
Reduced programming complexity
● Complete applications are constructed using
sequences and combinations of activities
3DAPAS Workshop 2011
6
Constellation
●
MTC system for
Jungle Computing
●
Model based on:
activities (tasks)
executors (resources)
contexts (matchmaking)
events (communication)
3DAPAS Workshop 2011
7
Constellation Model
Application
●
Application: set of activities
●
●
●
●
●
●
Distinct tasks
Size and complexity may vary
Targeted at specific HPC platform
(Loosly) Coupled using events
Often wrapper around existing code
Similar to workflow or DAG of tasks
●
Dynamic and unlimited in size
3DAPAS Workshop 2011
8
Constellation Model
Hardware
●
Hardware: set of executors
●
●
●
●
Capable of running activities
May represent anything from a single
core to an entire cluster, a GPU, etc.
May be application specific
Provides an application specific
heterogeneous resource pool
3DAPAS Workshop 2011
9
Constellation Model
Context
●
Both activities and executors
are tagged with a context
●
●
●
●
Application defined label (+ rank)
Used to defines relationship between
activites and executors, e.g.:
● Data dependencies, hardware
requirements, ...
May combine contexts
Executors may have preference
for label or rank
3DAPAS WorkShop 2011
10
Constellation Model
Matchmaking
●
RTS performs load-balancing
and match-making
●
●
●
●
Ensures activities are forwarded to
a suitable executor
Tries to keep all executors busy
Uses context-aware work-stealing
RTS also performs event routing
●
Based on unique activity identifier
ComplexHPC Spring School 2011
11
Constellation API
3DAPAS Workshop 2011
12
Constellation API
3DAPAS Workshop 2011
13
DACH 2008
Data Challenge in conjunction with IEEE Cluster/Grid 2008
●
Supernova detection
●
●
Analyse 1052 image pairs on 11 clusters (Intrigger)
„Sequential‟ executable provided
3DAPAS Workshop 2011
14
DACH 2008
Problem
●
Main problems:
●
●
●
Data distribution
Heterogeneity of
work and hardware
Load balancing
3DAPAS Workshop 2011
15
DACH 2008
Workflow
●
Winning approach in 2008:
●
●
●
Parallelize workflow to improve hardware utilization
Create hierarchical master worker framework
Scheduling heuristics using data location and size
3DAPAS Workshop 2011
16
Constellation Version
Option 1: Monolythic
●
Wrap entire application in
a single activity
●
●
Wrap each machine in
one executor
●
●
3DAPAS Workshop 2011
One activity per image pair
Multiple cores per executor
Use context to influence
order and placement of
each of activities
17
Evaluation
●
●
Intrigger not available
Instead we use DAS3+DAS4
●
●
●
●
5+6 clusters in the Netherlands
Mix of 2/4/8/12/48 core machines
Various types of GPUs
Three Scenarios
●
●
●
Data locality
(Executor granularity)
Heterogeneous processing
3DAPAS Workshop 2011
18
Scenario 1
Data Locality
●
Data distributed over 4 clusters of DAS3 + DAS4
●
Use context to express data locality and preferred
processing order
●
●
Adapt context to tune application
No change in application
3DAPAS Workshop 2011
19
Scenario 1
Results
Activity
Executor
Effect
“any”
“any”
Random
order
“any”,50
“any”,
biggest
Sorted
by size
“VU3”,”VU4”,50
“VU3”,
biggest
Local only
Sorted
by size
“VU3”,”VU4”,
”any”,50
“VU3”,
“any”,
biggest
Preference
for local
Fallback
to any,
Sorted
by size
3DAPAS Workshop 2011
20
Constellation Version
Option 2: Workflow
●
●
●
3DAPAS Workshop 2011
Wrap each stage in activity
Wrap each core executor
Use context to influence
order and placement of
each of the jobs
21
Scenario 3:
Heterogeneous System
●
18 node GPU cluster
●
●
●
●
8 cores + 1 GPU per node
Activity: single task
Executor:
1 core (top)
1 core or GPU (bottom)
Replaced activity 7.2
with GPU version.
●
●
Label activities and
executors accordingly
Significant performance
gain.
ComplexHPC Spring School 2011
22
Conclusions
●
●
●
We think Jungle Computing is a neccesity for
some application areas.
Constellation offers a suitable model (MTC) to
create such applications.
Initial experiments show that Constellation works
well for a wide range of hardware configurations
●
●
●
Easy to reconfigure applications to match resources
Allows integration of specialized accellerator codes
Suitable basis for a Jungle Computing model
3DAPAS Workshop 2011
23
Future Work
●
Application development
●
●
●
●
AMUSE
Remote Sensing
Climate modelling
Platform improvements
●
●
●
●
Easier integration of existing codes
Smart/automatic deployment/tuning of executors
Improve data handling
Better monitoring
3DAPAS Workshop 2011
24
Questions ?
jason@cs.vu.nl
www.cs.vu.nl/ibis
3DAPAS Workshop 2011
25
Scenario 2
Executor Granularity
●
●
30 largest images only
Single 48 core machine
●
●
●
No change in application
for experiment (a-c)
●
●
Activity: entire application (a-c)
single task (d)
Executor: [n]-cores
Only change executor config.
Completely ported
application in (d)
●
Significant performance gain!
3DAPAS Workshop 2011
26
Download