Dynamic Load Distribution in Massively Parallel Architectures: the

advertisement
Copyright 1994 IEEE. Published in the Proceedings of MPCS '94, May 1994 at Ischia, ITALY. Personal use of this material is permitted. However,
permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or
lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions /
IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: + Intl. 732-562-3966.
Dynamic Load Distribution in Massively Parallel Architectures:
the Parallel Objects Example
Antonio Corradi, Letizia Leonardi, Franco Zambonelli
Dipartimento di Elettronica Informatica e Sistemistica - Università di Bologna
Viale Risorgimento 2, 40136 Bologna - Ph.: +39-51-6443001 - Fax: +39-51-6443073
E-mail: {antonio, letizia, franco}@deis33.cineca.it
This work has been partially supported by the "Progetto Finalizzato Sistemi Informatici e Calcolo Parallelo" of the Italian National Research Council (grant No.
89.00012.69) and by the "MURST" 40% Funding.
The paper presents the mechanisms for dynamic load
distribution implemented within the support for the
Parallel Objects (PO for short) programming
environment. PO applications evolve depending on their
dynamic need of resources, enhancing application
performance. The goal is to show how dynamic load
distribution can be successfully applied on a massively
parallel architecture.
application, a policy phase decides the needed action and
a mechanism phase applies the taken decisions.
The paper focuses on dynamic load distribution
mechanisms instead of policies and it is organised as
follows. Section 2 describes the PO model and its issues
in the allocation. The implementation of the support for
the PO environment is presented in section 3. The
migration mechanisms are described in section 4 and
their effectiveness is evaluated, via several application
examples, in section 5.
1. Introduction
2. The PO model
To build complete high-level programming
environments is one of the challenges general-purpose
parallel computing must face. Those environments
would attract a large community of users if they help in
solving the complexities introduced by parallel
computing. We claim that the presence of transparent
allocation tools is essential: the programmer should not
worry about the mapping of his(her) applications onto the
machine, unless (s)he wants to do it. These
characteristics are requirements of the Parallel Objects
programming environment [1]. In addition, in massively
parallel architectures, load balancing and locality of
communications are also goals to be met for efficiency
sake [2].
Load balancing and locality can be obtained via a
dynamic distribution of load during execution, that, in
the PO environment, is achieved in two ways. On the one
hand, PO uses remote creation of newly needed objects
[3]. On the other hand, PO provides migration of already
allocated objects [4]. Even if migration can be more
expensive than remote creation, it can be more effective
both to achieve load balancing and to allow efficient
access to remote resources and communication partners.
The dynamic load distribution consists of different
phases: a monitoring phase identifies the evolution of the
The PO environment is based on the active object
model [5], that provides independent execution facilities
internal to each object: at least one execution thread is
associated with each object.
In PO, computation results from message passing
between objects. When one object requires an external
service, it sends a message to another object, specifying
the service it needs. Asynchronous modes of
communication leads to inter-object parallelism,
decoupling sending and receiving objects. Intra-object
parallelism is given by the presence of multiple
execution threads within the same PO object. A parallel
object can receive more requests: each is served by an
internal activity. The consistency of the object state
concurrently accessed by these activities is guaranteed by
a scheduling policy that can be user-defined and subject
to inheritance.
Each PO object is composed of two logically separated
parts: the parallel part and the non-parallel one. The
structure of the non-parallel part is similar to the one of a
Smalltalk-80 object [6]: it contains the state and the
operations (called also methods). The parallel part is
constituted by management components, in order to
interface the object with the outside and co-ordinate the
other internal components.
Abstract
PO objects are always instances of a given class. PO
classes describe both the non-parallel and the parallel
part of an object: the interface, all the operations that can
be requested to objects, the state variables of an object
and the synchronisation scheduling. For distributed
implementation, PO adopts a solution with multiple
copies of classes, one for each node where an instance of
the class is executing.
Dynamic allocation issues, in PO, arises for object
creation (remote creation), and for dynamic movement
of already existing objects (migration).
Any PO application, as it starts, is composed of few
objects, called the starter objects, in charge of booting the
application. Each object but the starters is created at runtime: objects can explicitly request the creation of other
objects and can successively request their services.
Since objects can be created at any time by one creator
depending on the application execution, a static policy
could not decide a priori their allocation. A remote
creation policy is capable of dynamically finding a node
where to allocate usefully the objects to be created [7].
Such a policy has to take into account several issues: the
presence of the class is a needed condition, but a new
object should be preferably allocated in an under-loaded
node and close to its communication partners.
User is not aware of remote creation: the new object is
automatically placed on a node chosen by the remote
creation policy. However, the user can influence the
policy by giving allocation hints: for example, a user can
specify that the objects of a given class have to be locally
created or remotely created onto a neighbour node.
Migration is the movement of an entity from one
node (the sender) to another one (the receiver) [4]. The
effectiveness of migrating processes has been doubted
[8], saying that the overhead due to the transfer is rarely
compensated by performance improvements. We claim,
instead, that migration of active objects is usually
effective in PO because:
− many long-lived objects exist during the whole
execution of an application. For those objects, the
remote creation policy is not sufficient to balance the
load. Moreover, the cost of a migration is negligible
with respect to the object lifetime.
− The migration mechanisms implemented in the PO
support (see section 4) are simpler than the process
migration one, even if they involve several active
entities.
Migration occurs transparently to the user, but objects
can be qualified as "fixed" (i.e. not migratable) by the
user with, again, an allocation hint. That avoids, for
example, useless migration of objects with limited lifetime.
3. PO support implementation
The PO support is currently implemented on a
transputer-based architecture, the Meiko Computing
Surface, but its layered and modular structure - the
support is based on encapsulated primitives - makes the
user unaware of any characteristic of the physical
architecture. The support consists of two levels: a support
for the object functions, present in each object, and a
support to handle object dynamicity.
The object support is composed by a set of threads that
realises the execution management for each PO object:
they are the Object Manager (OM for short) and one or
more State Managers (SM).
The OM is the receiver of the service requests arriving
to the object and it is the only entitled to access and
modify the object execution state (represented by the
status of the pending requests and of the executing
activities). The OM shares no memory with the other
object components, neither the activities nor the SMs: it
interacts by message exchange with other object
components and it can be allocated independently of
them.
The state of any PO object can be split in partitions:
one SM is associated with each partition. When an object
is distributed onto different nodes (we will see in the
following that this occurs during migration), the SMs
allow activities to remotely access to the managed part of
the state. An access to a non-local state partition is
transparently translated into a message for the
corresponding SM. However, an activity, when it is coresident with the interested state, overcomes the SM
service and directly accesses to the local memory for that
state part.
The PO support for object dynamicity is composed by
a set of entities, present on each node and called system
managers: they implement the dynamic management of
an application at the architecture level.
The monitoring manager periodically measures the
application behaviour to detect its evolution.
The allocation manager is in charge of the allocation
policies. It chooses the allocation of new entities and,
when in need, it decides migration. The allocation
manager takes its decision on the basis of load
information provided by the monitoring manager of the
node itself and on the load information of other nodes in
the system. The migration and remote creation policies
we have experienced are mostly based on locality, for
scalability sake [9].
The creation manager implements the decision taken
by the allocation manager: it receives the order of
creating entities in its node. This creation can involve
whole objects (either a "new" object, or a migrated one,
whose internal information comes from a previous
execution), activities and SMs. The creation manager
needs to have access to classes data structures to handle
creations correctly.
The router manager is in charge of delivering all
messages that flow between application objects (both
inter-object and intra-object communications) and
between the managers of not physically connected nodes.
Routers manage these communications in a transparent
way and independently on the allocation. In the current
implementation, the router is static and provided by the
CSTools programming environment of the Meiko
Computing Surface.
4. Migration mechanisms in the PO support
The migration mechanisms implemented in the PO
support are characterised by their implementation,
simpler than that of process migration [4].
The PO support deals with object migration by using
two separated mechanisms: the migration of the
management part of an object (the OM) and the
migration of its state (the SMs). The OM and the SMs
can migrate from one node to another one because they
are cyclic threads without any internal execution status.
Therefore, a manager can be blocked at any time. The
migration of Object and State Managers, even if involves
active entities, is similar to data migration. The data
structures managed by the OM and by the SMs (i.e., the
execution state and the object state) are transferred from
one node to another. The manager of that data is not
really moved: it simply asks another (a new one)
manager on the receiver node to take the charge of the
management.
The PO support rules out activity migration, avoiding
the costs of process migration. When one object migrates,
its activities can go on executing up to their completion
on the creation site. Only the new activities are going to
be allocated to the new site. Not to move the execution of
the activities inside an object can be criticised; we claim
its effectiveness. Let us consider, for example, the case of
an object that migrates from an over-loaded sender node
to an under-loaded receiver one:
− its OM and its SMs are migrated with a minimal
effort.
− already executing activities do not move from the
sender node (the object is temporarily distributed).
However, future creation of activities loads the
receiver node;
− the sender node load is going to gradually diminish
when the activities belonging to the migrated object
complete.
SMs follow the OM in its migration (as in the above
example) or move later, when the number of accesses to
the state in the sender node overcomes the ones on the
receiver node. The general scenario is one where OMs
follow the load evolution and attracts activities and SMs.
Any object migration incurs in the problem of requalifying object references [10]. In the PO support, this
problem is currently solved without leaving any
dependency on the sender node: every object is identified
and referred by a unique name for its whole life. This
name is associated with the current OM address (the
receiver of any message directed to the object), that
changes when the allocation of the OM changes. When
an object is migrated, any communication to it fails: if
this happens, the new name-address association is
automatically found by the support before retrying the
communication.
The following sub-sections describe the mechanisms
for the migration of OM and SMs, with their costs. Since
any external access to the object structures should be
viewed as a deviation from the Object-Oriented
paradigm, migration is managed by the object itself at the
reception of a request (OM or SM migration message).
4.1. Object Manager migration
The OM migration means OM relocation from a
sender node to a receiver one. The OM migration of an
object O is composed of the following steps (figure 1):
1. a migration message is sent to the object O by the
allocation manager;
2. the O's OM receives the migration command and then
2a. it sends an object creation message to the creation
manager of the receiver node. Once the new object
(O1) is created onto the receiver node, its OM
reveals that it is a migrated object. Then, O1 must
receive the data structures of the object O before
starting its normal execution cycle
2b. O stops to receive external requests;
2c. O packs its execution state in a message and sends
it to the new object O1 on the receiver node.
3. the O's OM de-allocates itself.
OM migration does not require the object activities to
be informed about it: activities always refer to their OM
by sending it a message (i.e. the termination message),
and never by a direct access to its data. When the OM is
migrated, the communication from an activity to its OM
fails: the new OM address is found and the
communication retried. Moreover, it is important that the
OM migration does not suspend the executing activities
inside the object: the only effect of this migration is a
delay on the service of the waiting or incoming requests.
Therefore, the cost of this migration can be measured
with the time during which an object is unable to serve
external requests. This cost has been maintained very low
(from 25 to 30 ms) by exploiting parallelism: when the
object receives the migration command and stops to serve
requests, the new object has been already created in
parallel at the receiver node and it just waits for the
execution state before restarting its service [4]. We
emphasise that this time is the worst case a request to the
migrating object to be delayed because of a migration.
Allocation Manager
Migration Message
Sender Node
Receiver Node
Object Creation
Message
5. Application examples
Creation Manager
Object
Creation
Migrating
Object
Stopping of
Receiving
that do not need to access to the state during its
migration are not involved in the migration at all.
The cost of a state migration can be measured by the
period an activity could be suspended while waiting to
access the state. This cost can vary from 16 ms up to 20
ms. The dominant factor is the number of executing
activities in the object: the number of messages to be sent
to the activities increase the cost. Because of the high
bandwidth (20 Mbit/s) of the interconnection network,
the cost of migration is not influenced by the dimension
of the state, as long as it is less than 10 Kbyte. The above
cost evaluation considers the worst case: activities
suspend themselves for the whole time only if they try to
access to the state at the beginning of the migration; in
all other cases, the suspension is more limited.
Sending of the
Execution State
Packing of the
Execution State
New Object
Several applications have been developed as testbeds
for the above described mechanisms: these applications
aim to test the use of the migration mechanisms to
achieve load balancing and locality of communications.
New Object ?
NO
Restoration of the
Execution State
Deallocation
Reactivation
Figure 1. The Object Manager migration
4.2. State Manager migration
The migration mechanism of the SM follows the same
guidelines of above. It consists of creating a new SM onto
the receiver node and of sending the state to it; finally,
the old SM is de-allocated.
Anyway, a relevant difference from OM migration
outstands. Let us recall that activities co-resident with the
state can directly access to it, without any SM mediation.
While the SM is under migration and even afterwards, an
activity could access to a part of memory no longer part
of the object state: the executing activities in need to
access to the state under migration must then be
suspended for correctness sake. Thus, the SM migration
mechanism for the object O needs the sending to the
activities of a suspension message. The "state migrating"
message is sent to all activities, but only when the
activities try to have access to the state they receive it.
Then, activities suspend their execution until a "state
migration completed" message is received. Activities
5.1. Load balancing: the Mandelbrot fractal
The calculus of the Mandelbrot fractal figure [11]
shows how the migration mechanisms can be applied to
achieve load balancing.
This application is equally partitioned into several
objects of the same class: each PO object is devoted to the
calculus of a strip of the figure, with no interaction
among these objects. The same number of objects is
assigned to each node: when the application starts, the
load is balanced. The calculation stops for some objects
beforehand and for others afterwards: that unbalances the
load of the system. When that occurs, object migrations
shift load from overloaded nodes to underloaded ones:
decisions are taken by following a local load balancing
policy [9].
By averaging the data on several experiments and
with a different number of nodes, the speedup is over
30%. In particular, we notice improvement over 60% for
very unbalanced experiments, while very little overhead
(0,3%) was imposed on the application by the load
balancing mechanisms when the load has been balanced
for the whole calculation.
5.2. Locality of communications
Another application aims to test migration effects to
improve locality of inter-object communications. The
application consists of a multitude of objects belonging to
a Client class and a Database class. Objects of the Client
class require services (such as insertion, extraction and
query) to objects of the Database class.
We measure the average cost of a service, function of
the distance between the nodes where Client and
Database objects reside. This cost can be lowered in
presence of an allocation policy that migrates Client
objects that requests services close to the DataBase
server. Even if, in the latter case, we have to consider
also the migration cost, it can be counterbalanced by the
number of services the Client requires.
When the system is well balanced in load, a large
number of service requests is needed to outweigh the cost
of migration. If the Client and the Database are initially
at distance 2, over 1000 service requests counterbalance a
migration that moves the objects at distance 1. If the
Client and the Database objects are initially at distance
10, 280 service requests are needed.
If the system load is not balanced and the Client and
the Database are, respectively, in an overloaded part of
the system and in an underloaded one, a migration that
moves the Client toward its Database produces better
results. Table 1 shows the average number of service
requests needed to outweigh migration depending on the
system load unbalance and for different initial distances
between client and server. This number dramatically
lowers, as soon as the system becomes unbalanced. For
an unbalance (measured by the standard deviation σ) of
50%, the Client can effectively migrate to its Database at
distance 10 even if it requests only 20 services.
These results show how load balancing and locality of
communication are strictly connected. This can also
apply to one-to-many or many-to-many relationships.
However, the implementation of policies for managing
those communication patterns is not trivial: what client
must be moved close to what server?
Distance
(σ)
2
6
10
14
20
...
40
0%
1100
496
280
173
130
...
98
25%
160
100
60
35
24
...
17
50%
51
31
20
14
12
...
10
75%
15
9
6
5
5
...
4
100%
7
4
3
3
3
...
3
Table 1. Communication locality:
number of services to outweigh migration
6. Conclusions and future work
The paper presents the support for the PO objectoriented programming environment. It provides a
transparent and dynamic distribution of the system load
via remote creation and migration of objects. Its
effectiveness has been achieved with the assumption of
the object-oriented framework: the cases of profitable
migration are frequent in an object scenario and the
intrusion degree of mechanisms is low. The implemented
mechanisms have been evaluated for several testbed
applications: they have both achieved load balancing and
improved performance.
The paper does not focus on policies, but shows the
necessity of working in that direction, in particular to
consider the communication costs. To solve the
complexities introduced by communication, our
perspective is that allocation policies cannot be
completely automated, but need also to be directed by the
user. In the current implementation, the user can specify
only few allocation hints, such as "object X is fixed",
"object Y is close to creator" (see section 3). More
flexibility stems from the expression in these hints of the
execution and communication needs of the whole object
and its components.
References
1. M. Boari et alii, "A Programming Environment Based on
Parallel Objects for Transputer Architectures", in "Models and
Tools for Massively Parallel Architectures", CNR, Napoli, June
1993.
2. W. C. Hsieh, P. Wang, W. E. Weihl, "Computation
Migration: Enhancing Locality for Distributed Memory Parallel
System", ACM SIGPLAN Notices, Vol. 28, No. 7, July 1993.
3. D. L. Eager, E. D. Lazowska, J. Zahorjan, "Adaptive Load
Sharing in Homogeneous Distributed Systems", IEEE
Transactions on Software Engineering, May 1986.
4. J.M.Smith, "A Survey of Process Migration Mechanisms",
Operating Systems Review, ACM, July 1988.
5. R.S. Chin, S.T. Chanson, "Distributed Object-Based
Programming Systems", ACM Computing Surveys, Vol. 23,
No. 1, March 1991.
6. A. Goldberg, D. Robson, "Smalltalk-80: the Language and its
Implementation", Addison-Wesley, 1983.
7. N. G. Shivaratri, P. Krueger, M. Singhal, "Load Distributing
for Locally Distributed System", IEEE Computer, Vol. 25, No.
12, Dec. 1992.
8. D. L. Eager, E. D. Lazowska, J. Zahorjan, "The Limited
Performance Benefits of Migrating Active Processes for Load
Sharing", ACM SIGMETRICS Conf. on Modelling of
Computer System, 1988.
9. A. Corradi, L. Leonardi, F. Zambonelli, "Load Balancing
Strategies for Massively Parallel Architectures", Parallel
Processing Letters, Vol. 2, No. 2 & 3, Sept. 1992.
10. Y. Artsy, R. Finkel, "Designing a Process Migration
Facility: the Charlotte Experience", IEEE Computer, v. 22, n. 9,
Sept. 1989.
11. B.B.Mandelbrot, "The Fractal Geometry of Nature",
W.H.Freeman, San Francisco, 1982.
Download