Introduction to Distributed Systems

advertisement
1
Introduction to Distributed Systems
Chapter Two
Architecture
Introduction
Distributed systems are often complex pieces of software of which the components are by
definition dispersed across multiple machines. To master their complexity, it is crucial that these
systems are properly organized. There are different ways on how to view the organization of a
distributed system, but an obvious one is to make a distinction between the logical organization
of the collection of software components and on the other hand the actual physical realization.
The organization of distributed systems is mostly about the software components that constitute
the system. These software architectures tell us how the various software components are to be
organized and how they should interact.
The actual realization of a distributed system requires that we instantiate and place software
components on real machines. There are many different choices that can be made in doing so.
The final instantiation of software architecture is also referred to as system architecture.
Centralized architectures
A single server implements most of the software components (and thus functionality), while
remote clients can access that server using simple communication means.
The main problem with the centralized model is that it is not easily scalable. There is a limit to
the number of CPUs in a system and eventually the entire system needs to be upgraded or
replaced.
Fig 2: sample centralized system
ITEC 551
Compiled by: Miraf Belyu
Page 1
2
Introduction to Distributed Systems
Decentralized architectures
Two or more machines more or less play equal roles, as well as hybrid organizations.
Fig 3: sample decentralized system
Architectural style
Architectural style is formulated in terms of components, the way that components are connected
to each other, the data exchanged between components, and finally how these elements are
jointly configured into a system.
Several styles have by now been identified, of which the most important ones for distributed
systems are:
1.
2.
3.
4.
Layered architectures
Object-based architectures
Data-centered architectures
Event-based architectures
Layered architectures
The basic idea for the layered style is simple: components are organized in a layered fashion
where a component at layer L; is allowed to call components at the underlying layer Li:«, but not
the other way around, as shown in the following figure.
ITEC 551
Compiled by: Miraf Belyu
Page 2
3
Introduction to Distributed Systems
Fig 4: layered architecture
This model has been widely adopted by the networking community; a key observation is that
control generally flows from layer to layer: requests go down the hierarchy whereas the results
flow upward.
Object-based architectures
A far looser organization is followed in object-based architectures,. In essence, each object
corresponds to what is called a component, and these components are connected through a
(remote) procedure call mechanism. Not surprisingly, this software architecture matches the
client-server system architecture. The layered and object-based architectures still form the most
important styles for large software systems.
Fig 5: sample object-based architecture
ITEC 551
Compiled by: Miraf Belyu
Page 3
4
Introduction to Distributed Systems
Data-centric architectures
Data-centered architectures evolve around the idea that processes communicate through a
common (passive or active) repository. It can be argued that for distributed systems these
architectures are as important as the layered and object-based architectures. For example, a
wealth of networked applications has been developed that rely on a shared distributed file system
in which virtually all communication takes place through files. Likewise, Web-based distributed
systems, are largely data-centric: processes communicate through the use of shared Web-based
data services.
Event-based architectures
In event-based architectures, processes essentially communicate through the propagation of
events, which optionally also carry data.
For distributed systems, event propagation has generally been associated with what are known as
publish/subscribe systems. The basic idea is that processes publish events after which the
middleware ensures that only those processes that subscribed to those events will receive them.
The main advantage of event-based systems is that processes are loosely coupled. In principle,
they need not explicitly refer to each other. This is also referred to as being decoupled in space,
or referentially decoupled.
Fig 6: sample event-based architecture
System architectures
Now that we have briefly discussed some common architectural styles, let us take a look at how
many distributed systems are actually organized by considering where software components are
placed. Deciding on software components, their interaction, and their placement leads to an
instance of software architecture, also called system architecture. We will discuss centralized
and decentralized organizations, as well as various hybrid forms.
ITEC 551
Compiled by: Miraf Belyu
Page 4
5
Introduction to Distributed Systems
Centralized architectures
In the basic client-server model, processes in a distributed system are divided into two (possibly
overlapping) groups. A server is a process implementing a specific service, for example, a file
system service or a database service. A client is a process that requests a service from a server by
sending it a request and subsequently waiting for the server's reply, this client-server interaction
also known as request-reply.
Fig 7: general interaction between client and server
Communication between a client and a server can be implemented by means of a simple
connectionless protocol when the underlying network is fairly reliable as in many local-area
networks. In these cases, when a client requests a service, it simply packages a message for the
server, identifying the service it wants, along with the necessary input data. The message is then
sent to the server. The latter, in turn, will always wait for an incoming request, subsequently
process it, and package the results in a reply message that is then sent to the client.
Using a connectionless protocol has the obvious advantage of being efficient. As long as
messages do not get lost or corrupted, the request/reply protocol just sketched works fine.
Unfortunately, making the protocol resistant to occasional transmission failures is not trivial. The
only thing we can do is possibly let the client resend the request when no reply message comes
in. The problem, however, is that the client cannot detect whether the original request message
was lost, or that transmission of the reply failed. If the reply was lost, then resending a request
may result in performing the operation twice. If the operation was something like "transfer
10,000 birr from my bank account," then clearly, it would have been better that we simply
reported an error instead. On the other hand, if the operation was "tell me how much money I
have left," it would be perfectly acceptable to resend the request. When an operation can be
repeated multiple times without harm, it is said to be idempotent. Since some requests are
idempotent and others are not it should be clear that there is no single solution for dealing with
lost messages.
As an alternative, many client-server systems use a reliable connection-oriented protocol.
Although this solution is not entirely appropriate in a local-area network due to relatively low
ITEC 551
Compiled by: Miraf Belyu
Page 5
6
Introduction to Distributed Systems
performance, it works perfectly fine in wide-area systems in which communication is inherently
unreliable. For example, virtually all Internet application protocols are based on reliable TCP/IP
connections.
In this case, whenever a client requests a service, it first sets up a connection to the server before
sending the request. The server generally uses that same connection to send the reply message,
after which the connection is torn down. The trouble is that setting up and tearing down a
connection is relatively costly, especially when the request and reply messages are small.
Application layering
The client-server model has been subject to many debates and controversies over the years. One
of the main issues was how to draw a clear distinction between a client and a server. Not
surprisingly, there is often no clear distinction. For example, a server for a distributed database
may continuously act as a client because it is forwarding requests to different file servers
responsible for implementing the database tables. In such a case, the database server itself
essentially does no more than process queries.
However, considering that many client-server applications are targeted toward supporting user
access to databases, many people have advocated a distinction between the following three
levels, essentially following the layered architectural style we discussed previously:
1. The user-interface level
2. The processing level
3. The data level
The user-interface level contains all that is necessary to directly interface with the user, such as
display management. Clients typically implement the user-interface level. This level consists of
the programs that allow end users to interact with applications. There is a considerable difference
in how sophisticated user-interface programs are.
The simplest user-interface program is nothing more than a character-based screen. Such an
interface has been typically used in main frame environments. In those cases where the
mainframe controls all interaction, including the keyboard and monitor, one can hardly speak of
a client-server environment. However, in many cases, the user's terminal does some local
processing such as echoing typed keystrokes, or supporting form-like interfaces in which a
complete entry is to be edited before sending it to the main computer. Nowadays, even in
mainframe environments, we see more advanced user interfaces. Typically, the client machine
offers at least a graphical display in which pop-up or pull-down menus are used, and of which
many of the screen controls are handled through a mouse instead of the keyboard. Typical
examples of such interfaces include the X-Windows interfaces as used in many UNIX
environments, and earlier interfaces developed for MS-DOS PCs and Apple Macintoshes.
ITEC 551
Compiled by: Miraf Belyu
Page 6
7
Introduction to Distributed Systems
Modern user interfaces offer considerably more functionality by allowing applications to share a
single graphical window, and to use that window to exchange data through user actions. For
example, to delete a file, it is usually possible to move the icon representing that file to an icon
representing a trash can. Likewise, many word processors allow a user to move text in a
document to another position by using only the mouse.
Many client-server applications can be constructed from roughly three different pieces:
 A part that handles interaction with a user
 A part that operates on a database or file system and
 A middle part that generally contains the core functionality of an application. This middle
part is logically placed at the processing level. In contrast to user interfaces and
databases, there are not many aspects common to the processing level.
For example, consider an Internet search engine. Ignoring all the animated banners, images,
and other fancy window dressing, the user interface of a search engine is very simple: a user
types in a string of keywords and is subsequently presented with a list of titles of Webpages. The
back end is formed by a huge database of Webpages that have been pre-fetched and indexed.
The core of the search engine is a program that transforms the user's string of keywords into one
or more database queries. It subsequently ranks the results into a list, and transforms that list into
a series of HTML pages. Within the client-server model, this information retrieval part is
typically placed at the processing level.
Fig 8: three different layers of a search engine
ITEC 551
Compiled by: Miraf Belyu
Page 7
8
Introduction to Distributed Systems
Decentralized architectures
Multi-tiered client-server architectures are a direct consequence of dividing applications into a
user-interface, processing components, and a data level. The different tiers correspond directly
with the logical organization of applications. In many business environments, distributed
processing is equivalent to organizing a client-server application as a multi-tiered architecture.
We refer to this type of distribution as vertical distribution. The characteristic feature of
vertical distribution is that it is achieved by placing logically different components on different
machines.
Having a vertical distribution can help: functions are logically and physically split across
multiple machines, where each machine is tailored to a specific group of functions.
However, vertical distribution is only one way of organizing client-server applications. In
modem architectures, it is often the distribution of the clients and the servers that counts, which
we refer to as horizontal distribution. In this type of distribution, a client or server may be
physically split up into logically equivalent parts, but each part is operating on its own
share of the complete data set, thus balancing the load.
Structured peer – to – peer architecture
In a structured peer-to-peer architecture, the overlay network is constructed using a deterministic
procedure. By far the most used procedure is to organize the processes through a distributed
hash table (DHT). In a DHT based system, data items are assigned a random key from a large
identifier space, such as a 128-bit or 160-bit identifier. Likewise, nodes in the system are also
assigned a random number from the same identifier space. The crux of every DHT-based system
is then to implement an efficient and deterministic scheme that uniquely maps the key of a data
item to the identifier of a node based on some distance metric. Most importantly, when looking
up a data item, the network address of the node responsible for that data item is returned.
Effectively, this is accomplished by routing a request for a data item to the responsible node.
Unstructured peer – to – peer architecture
Unstructured peer-to-peer systems largely rely on randomized algorithms for constructing an
overlay network. The main idea is that each node maintains a list of neighbors, but that this list is
constructed in a more or less random way. Likewise, data items are assumed to be randomly
placed on nodes. As a consequence, when a node needs to locate a specific data item, the only
thing it can effectively do is flood the network with a search query.
Super peers
Notably in unstructured peer-to-peer systems, locating relevant data items can become
problematic as the network grows. The reason for this scalability problem is simple: as there is
no deterministic way of routing a lookup request to a specific data item, essentially the only
technique a node can resort to is flooding the request. As alternative many peer-to-peer systems
have proposed to make use of special nodes that maintain an index of data items which are Super
Peers.
ITEC 551
Compiled by: Miraf Belyu
Page 8
9
Introduction to Distributed Systems
Consider a collaboration of nodes that offer resources to each other. For example, in a
collaborative content delivery network (CDN), nodes may offer storage for hosting copies of
Webpages allowing Web clients to access pages nearby, and thus to access them quickly. In this
case a node P may need to seek for resources in a specific part of the network. In that case,
making use of a broker that collects resource usage for a number of nodes that are in each other's
proximity will allow to quickly selecting a node with sufficient resources. Nodes such as those
maintaining an index or acting as a broker are generally referred to as super peers. As their name
suggests, super peers are often also organized in a peer-to-peer network, leading to a hierarchical
organization. A simple example of such an organization is shown in Fig.9. In this organization,
every regular peer is connected as a client to a super peer. All communication, from and to a
regular peer, proceeds through that peer's associated super peer.
Fig. 9 Hierarchical organization of nodes into a super peer network
In many cases, the client-super peer relation is fixed: whenever a regular peer joins the
network, it attaches to one of the super peers and remains attached until it leaves the network.
Obviously, it is expected that super peers are long-lived processes with a high availability. To
compensate for potential unstable behavior of a super peer, backup schemes can be deployed,
such as pairing every super peer with another one and requiring clients to attach to both.
Having a fixed association with a super peer may not always be the best solution. For example,
in the case of file-sharing networks, it may be better for a client to attach to a super peer that
maintains an index of files that the client is generally interested in. In that case, chances are
bigger that when a client is looking for a specific file, its super peer will know where to find it.
Garbackietal describe a relatively simple scheme in which the client-super peer relation can
change as clients discover better super peers to associate with. In particular, a super peer
returning the result of a lookup operation is given preference over other super-peers.
ITEC 551
Compiled by: Miraf Belyu
Page 9
10
Introduction to Distributed Systems
As we have seen, peer-to-peer networks offer a flexible means for nodes to join and leave the
network. However, with super peer networks a new problem is introduced, namely how to select
the nodes that are eligible to become super peer. This problem is closely related to the leaderelection problem
Hybrid architectures
So far, we have focused on client-server architectures and a number of peer-to-peer architectures.
Many distributed systems combine architectural features, as we already came across in super
peer networks. In this section we take a look at some specific classes of distributed systems in
which client-server solutions are combined with decentralized architectures.
 Edge-Server Systems
 Collaborative Distributed Systems
Edge-Server Systems
An important class of distributed systems that is organized according to hybrid architecture is
formed by edge-server systems. These systems are deployed on the Internet where servers are
placed "at the edge" of the network. This edge is formed by the boundary between enterprise
networks and the actual Internet, for example, as provided by an Internet Service Provider (ISP).
Likewise, where end users at home connect to the Internet through their ISP, the ISP can be
considered as residing at the edge of the Internet. This leads to a general organization as shown
below.
Fig 10: Viewing the Internet as consisting of a collection of edge servers
End users or clients in general, connect to the Internet by means of an edge server. The edge
server's main purpose is to serve content, possibly after applying filtering and transcoding
functions. More interesting is the fact that a collection of edge servers can be used to optimize
content and application distribution. The basic model is that for a specific organization, one edge
ITEC 551
Compiled by: Miraf Belyu
Page 10
11
Introduction to Distributed Systems
server acts as an origin server from which all content originates. That server can use other edge
servers for replicating Webpages and such.
Collaborative Distributed Systems
Hybrid structures are notably deployed in collaborative distributed systems. The main issue in
many of these systems to first gets started, for which often a traditional client-server scheme is
deployed. Once a node has joined the system, it can use a fully decentralized scheme for
collaboration.
Let us first consider the BitTorrent file-sharing system. BitTorrent is a peer-to-peer file
downloading system. Its principal working is shown in Fig.11. The basic idea is that when an
end user is looking for a file, she downloads chunks of the file from other users until the
downloaded chunks can be assembled together yielding the complete file. An important design
goal was to ensure collaboration. In most file-sharing systems, a significant fraction of
participants merely download files but otherwise contribute close to nothing. To this end, a file
can be downloaded only when the downloading client is providing content to someone else.
Fig 11: The principal working of BitTorrent
To download a file, a user needs to access a global directory, which is just one of a few wellknown Websites. Such a directory contains references to what are called .torrent files. A .torrent
file contains the information that is needed to download a specific file. In particular, it refers to
what is known as a tracker, which is a server that is keeping an accurate account of active nodes
that have (chunks) of the requested file. An active node is one that is currently downloading
another file. Obviously, there will be many different trackers, although (there will generally be
only a single tracker per file (or collection of files).
Once the nodes have been identified from where chunks can be downloaded, the downloading
node effectively becomes active. At that point, it will be forced to help others, for example by
providing chunks of the file it is downloading that others do not yet have. This enforcement
comes from a very simple rule: if node P notices that node Q is downloading more than it is
uploading, P can decide to decrease the rate at which it sends data to Q. This scheme works well
ITEC 551
Compiled by: Miraf Belyu
Page 11
12
Introduction to Distributed Systems
provided P has something to download from Q. For this reason, nodes are often supplied with
references to many other nodes putting them in a better position to trade data.
System Models
Systems that are intended for use in real world environments should be designed to function
correctly in the widest possible range of circumstances and in the face of many possible
difficulties and threats. Each type of model is intended to provide an abstract, simplified but
consistent description of a relevant aspect of distributed system design:
 Physical models are the most explicit way in which to describe a system; they capture
the hardware composition of a system in terms of the computers (and other devices, such
as mobile phones) and their interconnecting networks.
 Architectural models describe a system in terms of the computational and
communication tasks performed by its computational elements; the computational
elements being individual computers or aggregates of them supported by appropriate
network interconnections.
 Fundamental models take an abstract perspective in order to examine individual aspects
of a distributed system. In this section we introduce fundamental models that examine
three important aspects of distributed systems: interaction models, which consider the
structure and sequencing of the communication between the elements of the system;
failure models, which consider the ways in which a system may fail to operate correctly
and; security models, which consider how the system is protected against attempts to
interfere with its correct operation or to steal its data.
Physical Models
A physical model is a representation of the underlying hardware elements of a distributed system
that abstracts away from specific details of the computer and networking technologies employed.
A distributed system was defined in Chapter 1 as one in which hardware or software components
located at networked computers communicate and coordinate their actions only by passing
messages. This leads to a minimal physical model of a distributed system as an extensible set of
computer nodes interconnected by a computer network for the required passing of messages.
Beyond this baseline model, we can usefully identify three generations of distributed systems.
 Early distributed systems: Such systems emerged in the late 1970s and early 1980s in
response to the emergence of local area networking technology, usually Ethernet. These
systems typically consisted of between 10 and 100 nodes interconnected by a local area
network, with limited Internet connectivity and supported a small range of services such
as shared local printers and file servers as well as email and file transfer across the
Internet. Individual systems were largely homogeneous and openness was not a primary
concern. Providing quality of service was still very much in its infancy and was a focal
point for much of the research around such early systems.
ITEC 551
Compiled by: Miraf Belyu
Page 12
13
Introduction to Distributed Systems
 Internet-scale distributed systems: Building on this foundation, larger-scale distributed
systems started to emerge in the 1990s in response to the dramatic growth of the Internet
during this time (for example, the Google search engine was first launched in 1996). In
such systems, the underlying physical infrastructure consists of a physical model that is,
an extensible set of nodes interconnected by a network of networks (the Internet). Such
systems exploit the infrastructure offered by the Internet to become truly global. They
incorporate large numbers of nodes and provide distributed system services for global
organizations and across organizational boundaries. The level of heterogeneity in such
systems is significant in terms of networks, computer architecture, operating systems,
languages employed and the development teams involved. This has led to an increasing
emphasis on open standards and associated middleware technologies such as CORBA
and more recently, web services. Additional services were employed to provide end-toend quality of service properties in such global systems.
 Contemporary distributed systems: In the above systems, nodes were typically desktop
computers and therefore relatively static (that is, remaining in one physical location for
extended periods), discrete (not embedded within other physical entities) and autonomous
(to a large extent independent of other computers in terms of their physical
infrastructure).
 The emergence of mobile computing has led to physical models where nodes such
as laptops or smart phones may move from location to location in a distributed
system, leading to the need for added capabilities such as service discovery and
support for spontaneous interoperation.
 The emergence of ubiquitous computing has led to a move from discrete nodes to
architectures where computers are embedded in everyday objects and in the
surrounding environment (for example, in washing machines or in smart homes
more generally).
 The emergence of cloud computing and, in particular, cluster architectures has led
to a move from autonomous nodes performing a given role to pools of nodes that
together provide a given service (for example, a search service as offered by
Google).
ITEC 551
Compiled by: Miraf Belyu
Page 13
14
Introduction to Distributed Systems
Fig 12: Generation of Distributed Systems
Architectural Models
The architecture of a system is its structure in terms of separately specified components and their
interrelationships. The overall goal is to ensure that the structure will meet present and likely
future demands on it. Major concerns are to make the system reliable, manageable, adaptable and
cost-effective. The architectural design of a building has similar aspects – it determines not only
its appearance but also its general structure and architectural style (gothic, neo-classical, modern)
and provides a consistent frame of reference for the design.
Architectural elements
To understand the fundamental building blocks of a distributed system, it is necessary to
consider four key questions:
1. What are the entities that are communicating in the distributed system?
2. How do they communicate, or, more specifically, what communication paradigm is used?
3. What (potentially changing) roles and responsibilities do they have in the overall
architecture?
4. How are they mapped on to the physical distributed infrastructure (what is their
placement)?
Communicating Entities
The first two questions above are absolutely central to an understanding of distributed systems;
what is communicating and how those entities communicate together define a rich design space
for the distributed systems developer to consider. It is helpful to address the first question from a
system-oriented and a problem-oriented perspective.
ITEC 551
Compiled by: Miraf Belyu
Page 14
15
Introduction to Distributed Systems
From a system perspective, the answer is normally very clear in that the entities that
communicate in a distributed system are typically processes, leading to the prevailing view of a
distributed system as processes coupled with appropriate inter process communication paradigms
with two caveats:
 In some primitive environments, such as sensor networks, the underlying operating
systems may not support process abstractions (or indeed any form of isolation), and
hence the entities that communicate in such systems are nodes.
 In most distributed system environments, processes are supplemented by threads, so,
strictly speaking, it is threads that are the endpoints of communication.
Objects
Objects have been introduced to enable and encourage the use of object-oriented approaches in
distributed systems (including both object-oriented design and object-oriented programming
languages). In distributed object-based approaches, a computation consists of a number of
interacting objects representing natural units of decomposition for the given problem domain.
Objects are accessed via interfaces, with an associated interface definition language (or IDL)
providing a specification of the methods defined on an object.
Components
Since their introduction a number of significant problems have been identified with distributed
objects, and the use of component technology has emerged as a direct response to such
weaknesses. Components resemble objects in that they offer problem-oriented abstractions for
building distributed systems and are also accessed through interfaces. The key difference is that
components specify not only their (provided) interfaces but also the assumptions they make in
terms of other components/interfaces that must be present for a component to fulfill its function
in other words, making all dependencies explicit and providing a more complete contract for
system construction.
Web services
Web services represent the third important paradigm for the development of distributed systems,
web services are closely related to objects and components, again taking an approach based on
encapsulation of behavior and access through interfaces. In contrast, however, web services are
intrinsically integrated into the World Wide Web, using web standards to represent and discover
services.
Communication paradigms
Defines how entities communicate in a distributed system, and consider three types of
communication paradigm:
a. Inter process communication
b. Remote invocation
c. Indirect communication.
ITEC 551
Compiled by: Miraf Belyu
Page 15
16
Introduction to Distributed Systems
Inter process communication refers to the relatively low-level support for communication
between processes in distributed systems, including message-passing primitives, direct access to
the API offered by Internet protocols (socket programming) and support for multicast
communication.
Remote invocation represents the most common communication paradigm in distributed
systems, covering a range of techniques based on a two-way exchange between communicating
entities in a distributed system and resulting in the calling of a remote operation, procedure or
method.
 Request-reply protocols: Request-reply protocols are effectively a pattern imposed on
an underlying message-passing service to support client-server computing. In particular,
such protocols typically involve a pairwise exchange of messages from client to server
and then from server back to client, with the first message containing an encoding of the
operation to be executed at the server and also an array of bytes holding associated
arguments and the second message containing any results of the operation, again encoded
as an array of bytes. This paradigm is rather primitive and only really used in embedded
systems where performance is paramount. The approach is also used in the HTTP
protocol. Most distributed systems will elect to use remote procedure calls or remote
method invocation, as discussed below, but note that both approaches are supported by
underlying request-reply exchanges.
 Remote procedure calls: In RPC, procedures in processes on remote computers
can be called as if they are procedures in the local address space. The underlying
RPC system then hides important aspects of distribution, including the encoding
and decoding of parameters and results, the passing of messages and the
preserving of the required semantics for the procedure call. This approach directly
and elegantly supports client-server computing with servers offering a set of
operations through a service interface and clients calling these operations directly
as if they were available locally. RPC systems therefore offer (at a minimum)
access and location transparency.
 Remote method invocation: Remote method invocation (RMI) strongly
resembles remote procedure calls but in a world of distributed objects. With this
approach, a calling object can invoke a method in a remote object. As with RPC,
the underlying details are generally hidden from the user. RMI implementations
may, though, go further by supporting object identity and the associated ability to
pass object identifiers as parameters in remote calls.
Indirect communication Key techniques for indirect communication include:
 Group communication: concerned with the delivery of messages to a set of recipients and
hence is a multiparty communication paradigm supporting one-to-many communication.
Group communication relies on the abstraction of a group which is represented in the
ITEC 551
Compiled by: Miraf Belyu
Page 16
17
Introduction to Distributed Systems




system by a group identifier. Recipients elect to receive messages sent to a group by
joining the group. Senders then send messages to the group via the group identifier, and
hence do not need to know the recipients of the message. Groups typically also maintain
group membership and include mechanisms to deal with failure of group members.
Publish-subscribe systems: many systems can be classified as information-dissemination
systems wherein a large number of producers (or publishers) distribute information items
of interest (events) to a similarly large number of consumers (or subscribers). It would be
complicated and inefficient to employ any of the core communication paradigms
discussed above for this purpose and hence publish-subscribe systems (sometimes also
called distributed event-based systems) have emerged to meet this important need.
Publish-subscribe systems all share the crucial feature of providing an intermediary
service that efficiently ensures information generated by producers is routed to
consumers who desire this information.
Message queues: whereas publish-subscribe systems offer a one-to-many style of
communication, message queues offer a point-to-point service whereby producer
processes can send messages to a specified queue and consumer processes can receive
messages from the queue or be notified of the arrival of new messages in the queue.
Queues therefore offer an indirection between the producer and consumer processes.
Tuple spaces: tuple spaces offer a further indirect communication service by supporting a
model whereby processes can place arbitrary items of structured data, called tuples, in a
persistent tuple space and other processes can either read or remove such tuples from the
tuple space by specifying patterns of interest. Since the tuple space is persistent, readers
and writers do not need to exist at the same time. This style of programming, otherwise
known as generative communication.
Distributed shared memory: Distributed shared memory (DSM) systems provide an
abstraction for sharing data between processes that do not share physical memory. The
underlying infrastructure must ensure a copy is provided in a timely manner and also deal
with issues relating to synchronization and consistency of data.
ITEC 551
Compiled by: Miraf Belyu
Page 17
Download